I am Neo4j’s Chief Scientist and Visiting Professor at Newcastle University, UK. At Neo4j I lead the research group, working on a variety of database topics including query languages and runtimes, temporality, streaming, scale, and fault-tolerance. I have also also co-authored several books on graph technology including Graph Databases - 1st and 2nd Editions (O’Reilly), Graph Databases for Dummies (Wiley), and Building Knowledge Graphs (O’Reilly).
Prior to Neo4j, I worked on fault-tolerant distributed systems. First at Newcastle University startup Arjuna and then for a variety of clients for global consulting firm ThoughtWorks. Along the way I co-authored the distrubuted systems books REST in Practice (O’Reilly) and Developing Enterprise Web Services - An Architect’s Guide (Prentice-Hall).
Visting Professor of Practice, 2018-present
Newcastle University
Ph.D. in Programming Languages for High-Performance Computing, 2000
Newcastle University
B.Sc. (1st class Honours) Computing Science, 1996
Newcastle University
A practitioner’s guide to building Knowledge Graphs for the enterprise.
A practice and humane introduction to graph databases and Neo4j, Graph Databases For Dummies walks you through modeling, querying, and importing graph data, all the way through to your first production system.
The first book on graph databases, now in its second edition. Provides in-depth coverage of graph modeling and querying, as well as thorough explanations of the internal workings of Neo4j.
Why don’t typical enterprise projects go as smoothly as projects you develop for the Web? Does the REST architectural style really present a viable alternative for building distributed systems and enterprise-class applications?
In this insightful book, three SOA experts provide a down-to-earth explanation of REST and demonstrate how you can develop simple and elegant distributed hypermedia systems by applying the Web’s guiding principles to common enterprise computing problems. You’ll learn techniques for implementing specific Web technologies and patterns to solve the needs of a typical company as it grows from modest beginnings to become a global enterprise.
This was one of the first books to demonstrate how to build (WS-*) Web Services with enterprise-class reliability, and performance. This book takes a no-nonsense view of architecting and constructing enterprise-class Web services and applications. The authors assess the state of the art of the Web services platform circa 2004, offering best practices and new architectural patterns for taking advantage of Web Services.
While the architectural patterns in this book generally remain worthwhile today, the protocols and standards covered are now looking somewhat out of date, especially since there is a strong groundswell towards building RESTful systems on the Web rather than tunnelling through HTTP with XML payloads.
I discovered Neo4j and graph databases while working at ThoughtWorks, but the way they work seemed so natural to me that I became involved initially as an open source contributor. As Neo4j gained ground commercially, I moved over to the company full time as Chief Scientist and executive manager. Initially I lead the engineering team delivering the early versions of the database product, then worked for a long time building fault tolerance into clustering for the Neo4j database, and latterly I have split my time between working with academia on the next generation of transaction support for graphs, taking care of Neo4j’s customers and community, and authoring books and talks to help grow Neo4j’s market.
Responsibilities included:
I joined ThoughtWorks in Sydney as part of a small group of early employees. My initial responsibilities were to help drive sales of consultancy in finance, media, and telecoms and to deliver consulting and software delivery services to those clients. While at ThoughtWorks, I created a community of practice around SOA and developed a lightweight, iterative method of building service-oriented systems known as “Guerilla SOA.” After a move to London, I was promoted to Director of Professional services, and continued to provide strategic technology advisory (internally and externally), sales and marketing support, as well as building interesting software systems for clients.
Responsibilities included:
I took a role as a Senior RA at the Newcastle University (UK), working at Sydney University (Australia). My role involved the development of example systems of Web Services that demonstrated the utility of the WS-* protocols for Grid computing, rather than needing to develop a new, competing suite of protocols for that domain.
While at the University of Sydney, I also lectured a Masters degree course in Parallel Computing.
Responsibilities included:
I joined Bluestone software’s Arjuna lab from my Ph.D. initially to work on transactional workflow middleware. As Web Services rose to prominence, I started a new team around transaction support for systems of Web Services. I lead the development of this middleware through being acquired by HP, and later spun out as Arjuna again. Ultimately the Arjuna IP was sold to JBoss.
Responsibilities included:
In this paper, we describe No-Wait concurrency control mechanisms to address conflict resolution and then comprehensively evaluate their performance under Read-Committed and Serializability isolation levels using an in-memory database system in various configurations and contention scenarios. Key performance metrics are percentage of transaction aborts and average latency for those who do not abort. Our evaluations affirm that the No-Wait approach indeed offers a cost-effective, practical alternative to traditional conflict resolution mechanisms.
B+Trees are widely used as persistent index implementations for databases. They are often implemented in a way that allows the index to be in main memory while the indexed data remains on disk. Over the years, multiple optimization techniques have been proposed to improve the efficiency of B+Trees by accelerating the key search within a node or compressing data based on common prefixes. This paper describes our empirical research implementing such optimized B+Trees in Neo4j, a modern graph database management system (DBMS). We were able to confirm that the optimized versions lived up to their performance claims over plain B+Trees when benchmarked in isolation. However, we also found that incorporating them into a real DBMS yields marginal improvements only. This is partly because Neo4j is not index-heavy, typically only using indexes to find starting points for graph traversals. The other part is that integrating optimized indexes into the transactions and page-based storage components of Neo4j incurs a performance penalty (for reasons of crash-tolerance) compared to the standalone implementations. Given the additional implementation and maintenance complexity of optimized B+Trees, our research suggests that regular B+Trees remain the preferred general-purpose implementation.
BIFROST is a novel query engine for graph databases that supports high-fidelity data modeling on arbitrary and evolving graph topologies. It dynamically optimizes queries according to meta-level changes in the underlying graph (i.e. changes in topology) without the need for any explicit schema. This is possible by using state-of-the-art techniques from managed programming languages, such as self-optimizing ASTs and deoptimization, to combine query optimization and compilation. The approach provides high fidelity for even highly irregular labeled property graphs and gives good performance when compared to other systems that depend on fixed schemas for query planning and optimization.
Modern graph database management systems (DBMSs) can process highly dynamic labeled property graphs (LPGs) with many billions of relationships comfortably, but those systems often ignore the temporal dimension of data, how a graph evolved over time. Temporal analytics allow users to query and compute over the graph throughout its history so that valuable line-of-business data is always accessible and never lost. However, existing approaches tend to be ad-hoc and vary in performance depending on the size of the effective graph workload, such as local pattern matching or global graph algorithms. In this work, we describe Aion, a transactional temporal graph DBMS that generalizes previous approaches for LPGs. Aion extends Neo4j, a modern graph DBMS, incurring minimal performance overhead by decoupling the graph’s history from the latest graph version. To support efficient temporal analytics independently of workload characteristics, Aion adopts a hybrid temporal storage approach: (i) for fast full graph restoration at arbitrary time points, it uses TimeStore that indexes updates by time; (ii) for fine-grained graph history accesses, it uses LineageStore that indexes updates by entity identifiers. To enable incremental graph computations for improved latency, Aion introduces a computeefficient in-memory LPG representation. Our experiments show that Aion achieves comparable or better performance versus existing non-transactional temporal systems and provides up to an order of magnitude speedup over classic Neo4j
A policy that reduces communication overheads by commit- ting together all transactions completed within an interval of time is examined. A model of the system involving two queues served alterna- tively with preemptions is analysed in the steady-state under Markovian assumptions. An exact and easily implementable solution is derived and is used in order to determine performance measures such as average oc- cupancy or average latency. The optimal length of the operative interval is evaluated numerically. A non-preemptive policy is simulated and is shown to be considerably less efficient than the preemptive one analysed here. A generalization to non-Markovian operative intervals is outlined.
Socal Media
Twitter and BlueSky
I have Twitter and BlueSky accounts which are mixture of chatter with friends and colleagues, some computing science things, and a dash of left politics.
Following the example of Jonthan Dowland, my Twitter feed has a sliding window of 90 days worth of tweets. I like Twitter (somewhat) for conversations, but as a system of record much less so.
Facebook, Instagram, Snapchat etc.
I’m not on any other social media sites, I prefer email. If you meet a Jim Webber on any other platforms, it’s not me.