Abstract
Programmers hoping to achieve performance improvements often use custom memory allocators. This in-depth study examines eight applications that use custom allocators. Surprisingly, for six of these applications, a state-of-the-art general-purpose allocator (the Lea allocator) performs as well as or better than the custom allocators. The two exceptions use regions, which deliver higher performance (improvements of up to 44%). Regions also reduce programmer burden and eliminate a source of memory leaks. However, we show that the inability of programmers to free individual objects within regions can lead to a substantial increase in memory consumption. Worse, this limitation precludes the use of regions for common programming idioms, reducing their usefulness.We present a generalization of general-purpose and region-based allocators that we call reaps. Reaps are a combination of regions and heaps, providing a full range of region semantics with the addition of individual object deletion. We show that our implementation of reaps provides high performance, outperforming other allocators with region-like semantics. We then use a case study to demonstrate the space advantages and software engineering benefits of reaps in practice. Our results indicate that programmers needing fast regions should use reaps, and that most programmers considering custom allocators should instead use the Lea allocator.
- Apache Foundation. Apache Web server. http://www.apache.org.Google Scholar
- William S. Beebee and Martin C. Rinard. An implementation of scoped memory for Real-Time Java. In EMSOFT, pages 289--305, 2001. Google ScholarDigital Library
- Emery D. Berger, Kathryn S. McKinley, Robert D. Blumofe, and Paul R. Wilson. Hoard: A scalable memory allocator for multithreaded applications. In International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS-IX), pages 117--128, Cambridge, MA, November 2000. Google ScholarDigital Library
- Emery D. Berger, Benjamin G. Zorn, and Kathryn S. McKinley. Composing high-performance memory allocators. In Proceedings of the 2001 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 114--124, Snowbird, Utah, June 2001. Google ScholarDigital Library
- Greg Bollella, James Gosling, Benjamin Brosgol, Peter Dibble, Steve Furr, and Mark Turnbull. The Real-Time Specification for Java. Addison-Wesley, 2000. Google ScholarDigital Library
- Gilad Bracha and William Cook. Mixin-based inheritance. In Norman Meyrowitz, editor, Proceedings of the Conference on Object-Oriented Programming: Systems, Languages, and Applications (OOPSLA) / Proceedings of the European Conference on Object-Oriented Programming (ECOOP), pages 303--311, Ottawa, Canada, 1990. ACM Press. Google ScholarDigital Library
- Dov Bulka and David Mayhew. Efficient C++. Addison-Wesley, 2001.Google Scholar
- Trishul Chilimbi. Efficient representations and abstractions for quantifying and exploiting data reference locality. In Proceedings of the 2001 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 191--202, Snowbird, Utah, June 2001. Google ScholarDigital Library
- Trishul M. Chilimbi, Mark D. Hill, and James R. Larus. Cache-conscious structure layout. In Proceedings of the 1999 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 1--12, Atlanta, GA, May 1999. Google ScholarDigital Library
- Margaret A. Ellis and Bjarne Stroustrop. The Annotated C++ Reference Manual. Addison-Wesley, 1990. Google ScholarDigital Library
- Boris Fomitchev. STLport. http://www.stlport.org/.Google Scholar
- Christopher W. Fraser and David R. Hanson. A Retargetable C Compiler: Design and Implementation. Addison-Wesley, 1995. Google ScholarDigital Library
- Free Software Foundation. GCC Home Page. http://gcc.gnu.org/.Google Scholar
- David Gay and Alex Aiken. Memory management with explicit regions. In Proceedings of the 1998 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 313--323, Montreal, Canada, June 1998. Google ScholarDigital Library
- David Gay and Alex Aiken. Language support for regions. In Proceedings of the 2001 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 70--80, Snowbird, Utah, June 2001. Google ScholarDigital Library
- Wolfram Gloger. Dynamic memory allocator implementations in Linux system libraries. http://www.dent.med.uni-muenchen.de/~wmglo/malloc-slides.html.Google Scholar
- Dan Grossman, Greg Morrisett, Trevor Jim, Michael Hicks, Yanling Wang, and James Cheney. Region-based memory management in cyclone. In Proceedings of the 2002 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 282--293, Berlin, Germany, June 2002. Google ScholarDigital Library
- Sam Guyer, Daniel A. Jiménez, and Calvin Lin. The C-Breeze compiler infrastructure. Technical Report UTCS-TR01-43, The University of Texas at Austin, November 2001.Google Scholar
- David R. Hanson. Fast allocation and deallocation of memory based on object lifetimes. In Software Practice & Experience, number 20(1), pages 5--12. Wiley, January 1990. Google ScholarDigital Library
- David R. Hanson. C Interfaces and Implementation. Addison-Wesley, 1997.Google Scholar
- Reed Hastings and Bob Joyce. Purify: Fast detection of memory leaks and access errors. In Proceedings of the Winter USENIX 1992 Conference, pages 125--136, December 1992.Google Scholar
- Mark S. Johnstone and Paul R. Wilson. The memory fragmentation problem: Solved? In International Symposium on Memory Management, pages 26--36, Vancouver, B.C., Canada, 1998. Google ScholarDigital Library
- Doug Lea. A memory allocator. http://g.oswego.edu/dl/html/malloc.html.Google Scholar
- Scott Meyers. Effective C++. Addison-Wesley, 1996. Google ScholarDigital Library
- Scott Meyers. More Effective C++. Addison-Wesley, 1997. Google ScholarDigital Library
- Bartosz Milewski. C++ In Action: Industrial-Strength Programming Techniques. Addison-Wesley, 2001.Google Scholar
- Philip A. Nelson. bc - An arbitrary precision calculator language. http://www.gnu.org/software/bc/bc.html.Google Scholar
- Jeffrey Richter. Advanced Windows: the developer's guide to the Win32 API for Windows NT 3.5 and Windows 95. Microsoft Press. Google ScholarDigital Library
- Gustavo Rodriguez-Rivera, Mike Spertus, and Charles Fiterman. Conservative garbage collection for general memory allocators. In International Symposium on Memory Management, Minneapolis, Minnesota, 2000. Google ScholarDigital Library
- D. T. Ross. The AED free storage package. Communications of the ACM, 10(8):481--492, 1967. Google ScholarDigital Library
- Colin Runciman and Niklas Rojemo. Lag, drag and postmortem heap profiling. In Implementation of Functional Languages Workshop, Bastad, Sweden, September 1995.Google Scholar
- SGI. The Standard Template Library for C++: Allocators. http://www.sgi.com/tech/stl/Allocators.html.Google Scholar
- Ran Shaham, Elliot K. Kolodner, and Mooly Sagiv. Heap profiling for space-efficient Java. In Proceedings of the 2001 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 104--113, Snowbird, Utah, June 2001. Google ScholarDigital Library
- Standard Performance Evaluation Corporation. SPEC2000. http://www.spec.org.Google Scholar
- Standard Performance Evaluation Corporation. SPEC95. http://www.spec.org.Google Scholar
- Lincoln Stein, Doug MacEachern, and Linda Mui. Writing Apache Modules with Perl and C. O'Reilly & Associates, 1999. Google ScholarDigital Library
- Bjarne Stroustrup. The C++ Programming Language, Second Edition. (Addison-Wesley), 1991. Google ScholarDigital Library
- Suzanne Pierce. PPRC: Microsoft's Tool Box. http://research.microsoft.com/research/pprc/mstoolbox.asp.Google Scholar
- Mads Tofte and Jean-Pierre Talpin. Region-based memory management. Information and Computation, 132(2):109--176, 1997. Google ScholarDigital Library
- Dan N. Truong, François Bodin, and André Seznec. Improving cache behavior of dynamically allocated data structures. In International Conference on Parallel Architectures and Compilation Techniques, pages 322--329, October 1998. Google ScholarDigital Library
- Kiem-Phong Vo. Vmalloc: A general and efficient memory allocator. In Software Practice & Experience, number 26, pages 1--18. Wiley, 1996.Google Scholar
- Mark Weiser, Alan Demers, and Carl Hauser. The Portable Common Runtime approach to interoperability. In Twelfth ACM Symposium on Operating Systems Principles, pages 114--122, December 1989. Google ScholarDigital Library
- P. R. Wilson, M. S. Johnstone, M. Neely, and D. Boles. Dynamic storage allocation: A survey and critical review. Lecture Notes in Computer Science, 986, 1995. Google ScholarDigital Library
- Benjamin G. Zorn. The measured cost of conservative garbage collection. Software Practice and Experience, 23(7):733--756, 1993. Google ScholarDigital Library
Index Terms
- Reconsidering custom memory allocation
Recommendations
Reconsidering custom memory allocation
OOPSLA '02: Proceedings of the 17th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applicationsProgrammers hoping to achieve performance improvements often use custom memory allocators. This in-depth study examines eight applications that use custom allocators. Surprisingly, for six of these applications, a state-of-the-art general-purpose ...
Register allocation for write activity minimization on non-volatile main memory
ASPDAC '11: Proceedings of the 16th Asia and South Pacific Design Automation ConferenceNon-volatile memories are good candidates for DRAM replacement as main memory in embedded systems and they have many desirable characteristics. Nevertheless, the disadvantages of non-volatile memory co-exist with its advantages. First, the lifetime of ...
Makalu: fast recoverable allocation of non-volatile memory
OOPSLA 2016: Proceedings of the 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and ApplicationsByte addressable non-volatile memory (NVRAM) is likely to supplement, and perhaps eventually replace, DRAM. Applications can then persist data structures directly in memory instead of serializing them and storing them onto a durable block device. ...
Comments