skip to main content
10.1145/1655008.1655013acmconferencesArticle/Chapter ViewAbstractPublication PagesccsConference Proceedingsconference-collections
research-article

Website fingerprinting: attacking popular privacy enhancing technologies with the multinomial naïve-bayes classifier

Published:13 November 2009Publication History

ABSTRACT

Privacy enhancing technologies like OpenSSL, OpenVPN or Tor establish an encrypted tunnel that enables users to hide content and addresses of requested websites from external observers This protection is endangered by local traffic analysis attacks that allow an external, passive attacker between the PET system and the user to uncover the identity of the requested sites. However, existing proposals for such attacks are not practicable yet.

We present a novel method that applies common text mining techniques to the normalised frequency distribution of observable IP packet sizes. Our classifier correctly identifies up to 97% of requests on a sample of 775 sites and over 300,000 real-world traffic dumps recorded over a two-month period. It outperforms previously known methods like Jaccard's classifier and Naïve Bayes that neglect packet frequencies altogether or rely on absolute frequency values, respectively. Our method is system-agnostic: it can be used against any PET without alteration. Closed-world results indicate that many popular single-hop and even multi-hop systems like Tor and JonDonym are vulnerable against this general fingerprinting attack. Furthermore, we discuss important real-world issues, namely false alarms and the influence of the browser cache on accuracy.

References

  1. T. G. Abbott, K. J. Lai, M. R. Lieberman, and E. C. Price. Browser-Based Attacks on Tor. In Borisov and Golle {5}, pages 184--199. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. K. Bauer, D. McCoy, D. Grunwald, T. Kohno, and D. Sicker. Low-Resource Routing Attacks Against Tor. In WPES '07: Proceedings of the 2007 ACM workshop on Privacy in electronic society, pages 11--20, New York, NY, USA, 2007. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. O. Berthold, H. Federrath, and S. Köpsell. Web MIXes: a system for anonymous and unobservable Internet access. In International workshop on Designing privacy enhancing technologies, pages 115--129, New York, NY, USA, 2001. Springer-Verlag New York, Inc. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. G. Bissias, M. Liberatore, D. Jensen, and B. N. Levine. Privacy Vulnerabilities in Encrypted HTTP Streams. In Proc. Privacy Enhancing Technologies Workshop (PET), pages 1--11, May 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. N. Borisov and P. Golle, editors. Privacy Enhancing Technologies, 7th International Symposium, PET 2007 Ottawa, Canada, June 20-22, 2007, Revised Selected Papers, volume 4776 of Lecture Notes in Computer Science. Springer, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. D. Chaum. Untraceable Electronic Mail, Return Addresses, and Digital Pseudonyms. Communications of the ACM, 4(2), February 1981. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. H. Cheng and R. Avnur. Traffic Analysis of SSL Encrypted Web Browsing. http://www.cs.berkeley.edu/~daw/teaching/cs261-f98/projects/final-reports/ronathan-heyning.ps.Google ScholarGoogle Scholar
  8. S. Coull, M. Collins, C. Wright, F. Monrose, and M. Reiter. On Web Browsing Privacy in Anonymized NetFlows. In Proceedings of the 16th USENIX Security Symposium, Boston, MA, August 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. C. Díaz, S. Seys, J. Claessens, and B. Preneel. Towards Measuring Anonymity. In Dingledine and Syverson {11}, pages 54--68.Google ScholarGoogle Scholar
  10. R. Dingledine, N. Mathewson, and P. Syverson. Tor: The Second-Generation Onion Router. In SSYM'04: Proceedings of the 13th conference on USENIX Security Symposium, pages 21--21, Berkeley, CA, USA, 2004. USENIX Association. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. R. Dingledine and P. F. Syverson, editors. Privacy Enhancing Technologies, Second International Workshop, PET 2002, San Francisco, CA, USA, April 14-15, 2002, Revised Papers, volume 2482 of Lecture Notes in Computer Science. Springer, 2003.Google ScholarGoogle Scholar
  12. J. Erman, A. Mahanti, and M. Arlitt. Internet Traffic Identification using Machine Learning. In Proceedings of IEEE Global Telecommunications Conference (GLOBECOM), pages 1--6, San Francisco, CA, USA, November 2006.Google ScholarGoogle ScholarCross RefCross Ref
  13. R. Fielding, J. Gettys, J. Mogul, H. Frystyk, L. Masinter, P. Leach, and T. Berners-Lee. RFC 2616 Hypertext Transfer Protocol - HTTP/1.1, June 1999.Google ScholarGoogle Scholar
  14. A. Hintz. Fingerprinting Websites Using Traffic Analysis. In Dingledine and Syverson {11}, pages 171--178. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. A. Huttunen, B. Swander, V. Volpe, L. DiBurro, and M. Stenberg. RFC 3948 UDP Encapsulation of IPsec ESP Packets, January 2005.Google ScholarGoogle Scholar
  16. G. H. John and P. Langley. Estimating Continuous Distributions in Bayesian Classifiers. In P. Besnard and S. Hanks, editors, UAI, pages 338--345. Morgan Kaufmann, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. C. Kiraly, S. Teofili, G. Bianchi, R. L. Cigno, M. Nardelli, and E. Delzeri. Traffic Flow Confidentiality in IPsec: Protocol and Implementation. In Preproceedings Third IFIP/FIDIS Summer School "The Future of Identity in the Information Society", August 2007.Google ScholarGoogle Scholar
  18. W. Koehler. An analysis of Web page and Web site constancy and permanence. Journal of the American Society for Information Science, 50(2):162--180, 1999. Google ScholarGoogle ScholarCross RefCross Ref
  19. W. Koehler. Web Page Change and Persistence -- A Four-Year Longitudinal Study. Journal of the American Society for Information Science and Technology, 53(2):162--171, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. D. Koukis, S. Antonatos, and K. G. Anagnostakis. On the privacy risks of publishing anonymized ip network traces. In H. Leitold and E. P. Markatos, editors, Communications and Multimedia Security, volume 4237 of Lecture Notes in Computer Science, pages 22--32. Springer, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. M. Leech, M. Ganis, Y. Lee, R. Kuris, D. Koblas, and L. Jones. RFC 1928 SOCKS Protocol Version 5, March 1996.Google ScholarGoogle Scholar
  22. M. Liberatore and B. N. Levine. Inferring the Source of Encrypted HTTP Connections. In CCS '06: Proceedings of the 13th ACM conference on Computer and communications security, pages 255--263, New York, NY, USA, 2006. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. C. D. Manning, P. Raghavan, and H. Schütze. Introduction to Information Retrieval. Cambridge University Press, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. S. Mistry and B. Raman. Quantifying Traffic Analysis of Encrypted Web-Browsing. http://bmrc.berkeley.edu/people/shailen/Classes/SecurityFall98/paper.ps, December 1998.Google ScholarGoogle Scholar
  25. A. W. Moore and D. Zuev. Internet Traffic Classification Using Bayesian Analysis Techniques. In SIGMETRICS '05: Proceedings of the 2005 ACM SIGMETRICS international conference on Measurement and modeling of computer systems, pages 50--60, New York, NY, USA, 2005. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. S. J. Murdoch and G. Danezis. Low-Cost Traffic Analysis of Tor. In SP '05: Proceedings of the 2005 IEEE Symposium on Security and Privacy, pages 183--195, Washington, DC, USA, 2005. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. A. Ntoulas, J. Cho, and C. Olston. What's New on the web? The Evolution of the Web from a Search Engine Perspective. In WWW '04: Proceedings of the 13th international conference on World Wide Web, pages 1--12, New York, NY, USA, 2004. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. A. Panchenko and L. Pimenidis. Towards Practical Attacker Classification for Risk Analysis in Anonymous Communication. In Proceedings of Communications and Multimedia Security, 10th IFIP TC-6 TC-11 International Conference, CMS 2006, Heraklion, Crete, Greece, October 19-21, 2006, Proceedings, volume 4237 of Lecture Notes in Computer Science, pages 240--251, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. J.-F. Raymond. Traffic Analysis: Protocols, Attacks, Design Issues, and Open Problems. In International workshop on Designing privacy enhancing technologies, pages 10--29, New York, NY, USA, 2001. Springer-Verlag New York, Inc. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Q. Sun, D. R. Simon, Y.-M. Wang, W. Russell, V. N. Padmanabhan, and L. Qiu. Statistical Identification of Encrypted Web Browsing Traffic. In SP '02: Proceedings of the 2002 IEEE Symposium on Security and Privacy, page 19, Washington, DC, USA, 2002. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. C. J. van Rijsbergen. Information Retrieval. Butterworth, 1979. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. P. Warren, C. Boldyreff, and M. Munro. The Evolution of Websites. In IWPC '99: Proceedings of the 7th International Workshop on Program Comprehension, page 178, Washington, DC, USA, 1999. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. R. Wendolsky, D. Herrmann, and H. Federrath. Performance Comparison of Low-Latency Anonymisation Services from a User Perspective. In Borisov and Golle {5}, pages 233--253. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. N. Williams, S. Zander, and G. Armitage. A Preliminary Performance Comparison of Five Machine Learning Algorithms for Practical IP Traffic Flow Classification. SIGCOMM Comput. Commun. Rev., 36(5):5--16, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. I. H. Witten and E. Frank. Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems). Morgan Kaufmann, June 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. C. Wright, S. Coull, and F. Monrose. Traffic Morphing: An Efficient Defense Against Statistical Traffic Analysis. In Proceedings of the 16th Network and Distributed Security Symposium, pages 237--250. IEEE, February 2009.Google ScholarGoogle Scholar
  37. T. Ylonen and C. Lonvick. RFC 4254 The Secure Shell (SSH) Connection Protocol, January 2006.Google ScholarGoogle Scholar
  38. D. Zuev and A. W. Moore. Traffic Classification Using a Statistical Approach. In C. Dovrolis, editor, PAM, volume 3431 of Lecture Notes in Computer Science, pages 321--324. Springer, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Website fingerprinting: attacking popular privacy enhancing technologies with the multinomial naïve-bayes classifier

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in
          • Published in

            cover image ACM Conferences
            CCSW '09: Proceedings of the 2009 ACM workshop on Cloud computing security
            November 2009
            144 pages
            ISBN:9781605587844
            DOI:10.1145/1655008
            • Program Chairs:
            • Radu Sion,
            • Dawn Song

            Copyright © 2009 ACM

            Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 13 November 2009

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article

            Acceptance Rates

            Overall Acceptance Rate37of108submissions,34%

            Upcoming Conference

            CCS '24
            ACM SIGSAC Conference on Computer and Communications Security
            October 14 - 18, 2024
            Salt Lake City , UT , USA

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader