ABSTRACT
Packet traces of operational Internet traffic are invaluable to network research, but public sharing of such traces is severely limited by the need to first remove all sensitive information. Current trace anonymization technology leaves only the packet headers intact, completely stripping the contents; to our knowledge, there are no publicly available traces of any significant size that contain packet payloads. We describe a new approach to transform and anonymize packet traces. Our tool provides high-level language support for packet transformation, allowing the user to write short policy scripts to express sophisticated trace transformations. The resulting scripts can anonymize both packet headers and payloads, and can perform application-level transformations such as editing HTTP or SMTP headers, replacing the content of Web items with MD5 hashes, or altering filenames or reply codes that match given patterns. We discuss the critical issue of verifying that anonymizations are both correctly applied and correctly specified, and experiences with anonymizing FTP traces from the Lawrence Berkeley National Laboratory for public release.
- S. Axelsson. The base-rate fallacy and the difficulty of intrusion detection. ACM Transactions on Information and System Security, 3(3):186--205, August 2000. Google ScholarDigital Library
- Capture the capture the flag. http://www.shmoo.com/cctf/.Google Scholar
- G. Combs. The Ethereal Network Analyzer. http://www.ethereal.com/.Google Scholar
- Federal Committee on Statistical Methodology. Report on statistical disclosure limitation methodology (statistical policy working paper 22), 1994. http://www.fcsm.gov/working-papers/spwp22.html.Google Scholar
- A. Feldmann. BLT: Bi-layer tracing of HTTP and TCP/IP. In Proceedings of WWW-9, May 2000. Google ScholarDigital Library
- Anonymized FTP traces. http://www-nrg.ee.lbl.gov/anonymized-traces.html.Google Scholar
- S. D. Gribble and E. A. Brewer. System design issues for Internet middleware services: Deductions from a large client trace. In Proc. USENIX Symp. on Internet Technologies and Systems, December 1997. Google ScholarDigital Library
- The honeypot challenge. http://project.honeynet.org/misc/chall.html.Google Scholar
- C. Kreibich. NetDuDe (NETwork DUmp data Displayer and Editor). http://netdude.sourceforge.net/.Google Scholar
- R. Lippmann, S. Webster, and D. Stetson. The effect of identifying vulnerabilities and patching software on the utility of network intrusion detection. In Proceedings of Recent Advances in Intrusion Detection, number 2516 in Lecture Notes in Computer Science. Springer-Verlag, 2002. Google ScholarDigital Library
- G. R. Malan and F. Jahanian. An extensible probe architecture for network protocol performance measurement. In Proceedings of ACM SIGCOMM, 1998. Google ScholarDigital Library
- G. Minshall. TCPdpriv: Program for Eliminating Confidential Information from Traces. Ipsilon Networks, Inc. http://ita.ee.lbl.gov/html/contrib/tcpdpriv.html.Google Scholar
- J. Mogul. Trace anonymization misses the point. Presentation on WWW 2002 Panel on Web Measurements.Google Scholar
- S. Patarin and M. Makpangou. Pandora: A flexible network monitoring platform. In Proceedings of the USENIX 2000 Annual Technical Conference, San Diego, June 2000. Google ScholarDigital Library
- V. Paxson. Bro: A System for Detecting Network Intruders in Real-Time. http://www.icir.org/vern/bro-info.html.Google Scholar
- V. Paxson. Bro: A system for detecting network intruders in real time. Computer Networks, December 1999. Google ScholarDigital Library
- M. Peuhkuri. A method to compress and anonymize packet traces. In Proceedings of the ACM SIGCOMM Internet Measurement Workshop, November 2001. Google ScholarDigital Library
- S. Savage. Private communication.Google Scholar
- Q. Sun, D. R. Simon, Y. Wang, W. Russell, V. N. Padmanabhan, and L. Qiu. Statistical identification of encrypted web browsing traffic. In Proceedings of IEEE Symposium on Security and Privacy, Oakland, CA, USA, May 2002. Google ScholarDigital Library
- WildPackets, Inc. EtherPeek. http://www.etherpeek.com/.Google Scholar
- J. Xu, J. Fan, M. Ammar, and S. B. Moon. On the design and performance of prefix preserving IP traffic trace anonymization. In Proceedings of the ACM SIGCOMM Internet Measurement Workshop, November 2001. Google ScholarDigital Library
- T. Ylonen. Thoughts on how to mount an attack on tcpdpriv's "-a50" option. http://ita.ee.lbl.gov/html/contrib/attack50/attack50.html.Google Scholar
Index Terms
- A high-level programming environment for packet trace anonymization and transformation
Recommendations
IMR based Anonymization for Privacy Preservation in Data Mining
KMO '16: Proceedings of the The 11th International Knowledge Management in Organizations Conference on The changing face of Knowledge Management Impacting SocietyPrivacy Preserving Data Mining (PPDM) is a data mining research area that aims to protect individual's personal information from unsolicited or unauthorized disclosure. Privacy relates to personal information that a person would not wish others to know ...
The devil and packet trace anonymization
Releasing network measurement data---including packet traces---to the research community is a virtuous activity that promotes solid research. However, in practice, releasing anonymized packet traces for public use entails many more vexing considerations ...
An evolutionary feature set decomposition based anonymization for classification workloads: Privacy Preserving Data Mining
Privacy has become an important concern while publishing micro data about a population. The emerging area called privacy preserving data mining (PPDM) focus on individual privacy without compromising data mining results. An adversarial exploitation of ...
Comments