An Empirical Evaluation of Variable-length Record B+Trees on a Modern Graph Database System

Abstract

B+Trees are widely used as persistent index implementations for databases. They are often implemented in a way that allows the index to be in main memory while the indexed data remains on disk. Over the years, multiple optimization techniques have been proposed to improve the efficiency of B+Trees by accelerating the key search within a node or compressing data based on common prefixes. This paper describes our empirical research implementing such optimized B+Trees in Neo4j, a modern graph database management system (DBMS). We were able to confirm that the optimized versions lived up to their performance claims over plain B+Trees when benchmarked in isolation. However, we also found that incorporating them into a real DBMS yields marginal improvements only. This is partly because Neo4j is not index-heavy, typically only using indexes to find starting points for graph traversals. The other part is that integrating optimized indexes into the transactions and page-based storage components of Neo4j incurs a performance penalty (for reasons of crash-tolerance) compared to the standalone implementations. Given the additional implementation and maintenance complexity of optimized B+Trees, our research suggests that regular B+Trees remain the preferred general-purpose implementation.

Publication
2024 IEEE 40th International Conference on Data Engineering Workshops (ICDEW), Utrecht, Netherlands
Jim Webber
Jim Webber
Chief Scientist

I’m a computer scientist interested in fault-tolerance for graph databases.