Jim Webber

Chief Scientist

Neo4j

Biography

I am Neo4j’s Chief Scientist and Visiting Professor at Newcastle University, UK. At Neo4j I lead the research group, working on a variety of database topics including query languages and runtimes, temporality, streaming, scale, and fault-tolerance. I have also also co-authored several books on graph technology including Graph Databases - 1st and 2nd Editions (O’Reilly), Graph Databases for Dummies (Wiley), and Building Knowledge Graphs (O’Reilly).

Prior to Neo4j, I worked on fault-tolerant distributed systems. First at Newcastle University startup Arjuna and then for a variety of clients for global consulting firm ThoughtWorks. Along the way I co-authored the distrubuted systems books REST in Practice (O’Reilly) and Developing Enterprise Web Services - An Architect’s Guide (Prentice-Hall).

Interests

Graph Theory
Databases
Distributed Systems
Fault Tolerance

Education

Visting Professor of Practice, 2018-present

Newcastle University
Ph.D. in Programming Languages for High-Performance Computing, 2000

Newcastle University
B.Sc. (1st class Honours) Computing Science, 1996

Newcastle University

Recent & Upcoming Talks

The Pub-Time Parliament (2025)

Imagine a busy pub on a Friday night. It’s crowded, lots of people are talking at the same time. They’re all exchanging information with each other which makes tiny changes in their brains. Some folks are taking it easy on the drink, a few are a bit tipsy after one too many beers, and my mate Stevo is plastered, falling off his barstool. Classic Stevo. Now imagine trying to get this crowd to agree on something when they can’t even agree on which footy team is the worst this week and some of them can’t remember their own names.

In comparison you’d think that getting a bunch of computers to agree on something, say a simple number would be pretty easy. Computers can do way smarter things than agreeing upon a number, right? Sadly not. Computers are often in various states of being wrong or crashing. Much like a bunch of drunks they all want to talk at the same time, and they’re confident they have the best opinion.

How have we have built such incredible systems on such a flakey foundation? In this talk we will visit classic consensus algorithms and see how they provide benefits of correctness and fault-tolerance for systems but at the price of reduced scalability. Then we’ll explore some new research which aims to provide both correctness and scalability for distributed systems. The talk will be interactive - you may need a drink yourself afterwards.

Yow! Australia 4 December 2025

Lies, Damn Lies, and AIs (2024)

Generative AI has taken the world by storm, but it’s not always a reliable helper. It makes up alternative facts, has difficulty with number and logical reasoning, all while exuding the confidence of a used car salesperson. In this talk we’ll see how to use Knowledge Graphs to improve accuracy. The audience will hear several technology patterns where deterministic knowledge graphs complement generative AI to create systems that are compelling and truthful.

Great International Developer Summit (GIDS) 24 April 2024

Distributed Consensus in 15 Minutes! (2024)

Getting two computers to agree on a value seems easy. One computer thinks of a value and tells the other. But in the real world where many values are shared, concurrency and race conditions are plentiful, and failures occur at inconvenient points, getting two (or more) computers to agree on a value is really hard work. In this talk we will discuss Raft, which is a a humane protocol for dealing with all these challenges. Raft is designed for simplicity to disallow tricky edge cases, and with a simple extension we developed, can be made to work over the WAN.

Great International Developer Summit (GIDS) 23 April 2024

See all talks

Books

Jesùs Barrasa, Jim Webber

June 2023

Building Knowledge Graphs - A Practitioner's Guide

A practitioner’s guide to building Knowledge Graphs for the enterprise.

PDF Buy at Amazon UK Buy at Amazon US

Jim Webber, Rik Van Bruggen

September 2020

Graph Databases for Dummies

A practice and humane introduction to graph databases and Neo4j, Graph Databases For Dummies walks you through modeling, querying, and importing graph data, all the way through to your first production system.

PDF

Ian Robinson, Jim Webber, And Emil Eifrem

June 2015

Graph Databases

The first book on graph databases, now in its second edition. Provides in-depth coverage of graph modeling and querying, as well as thorough explanations of the internal workings of Neo4j.

PDF

Jim Webber, Savas Parastatidis, Ian Robinson

September 2010

Rest in Practice

Why don’t typical enterprise projects go as smoothly as projects you develop for the Web? Does the REST architectural style really present a viable alternative for building distributed systems and enterprise-class applications?

In this insightful book, three SOA experts provide a down-to-earth explanation of REST and demonstrate how you can develop simple and elegant distributed hypermedia systems by applying the Web’s guiding principles to common enterprise computing problems. You’ll learn techniques for implementing specific Web technologies and patterns to solve the needs of a typical company as it grows from modest beginnings to become a global enterprise.

Code Project Buy at Amazon UK Buy at Amazon US

Sandeep Chatterjee, Jim Webber

November 2003

Developing Enterprise Web Services

This was one of the first books to demonstrate how to build (WS-*) Web Services with enterprise-class reliability, and performance. This book takes a no-nonsense view of architecting and constructing enterprise-class Web services and applications. The authors assess the state of the art of the Web services platform circa 2004, offering best practices and new architectural patterns for taking advantage of Web Services.

While the architectural patterns in this book generally remain worthwhile today, the protocols and standards covered are now looking somewhat out of date, especially since there is a strong groundswell towards building RESTful systems on the Web rather than tunnelling through HTTP with XML payloads.

Career History

Chief Scientist

Neo4j

Oct 2010 – Present London

I discovered Neo4j and graph databases while working at ThoughtWorks, but the way they work seemed so natural to me that I became involved initially as an open source contributor. As Neo4j gained ground commercially, I moved over to the company full time as Chief Scientist and executive manager. Initially I lead the engineering team delivering the early versions of the database product, then worked for a long time building fault tolerance into clustering for the Neo4j database, and latterly I have split my time between working with academia on the next generation of transaction support for graphs, taking care of Neo4j’s customers and community, and authoring books and talks to help grow Neo4j’s market.

Responsibilities included:

Research manager
Executive manager
Research and development for fault-tolerant database clusters

Director of Professional Services

ThoughtWorks

Jan 2005 – Oct 2010 Sydney, London

I joined ThoughtWorks in Sydney as part of a small group of early employees. My initial responsibilities were to help drive sales of consultancy in finance, media, and telecoms and to deliver consulting and software delivery services to those clients. While at ThoughtWorks, I created a community of practice around SOA and developed a lightweight, iterative method of building service-oriented systems known as “Guerilla SOA.” After a move to London, I was promoted to Director of Professional services, and continued to provide strategic technology advisory (internally and externally), sales and marketing support, as well as building interesting software systems for clients.

Responsibilities included:

Leading technlogy delivery
Strategic technology advisory
Office of the CTO

Senior Research Associate

Newcastle University

Jan 2004 – Dec 2004 hosted by University of Sydney

I took a role as a Senior RA at the Newcastle University (UK), working at Sydney University (Australia). My role involved the development of example systems of Web Services that demonstrated the utility of the WS-* protocols for Grid computing, rather than needing to develop a new, competing suite of protocols for that domain.

While at the University of Sydney, I also lectured a Masters degree course in Parallel Computing.

Responsibilities included:

Research on emerging Web services standards and Grid computing
WS-GAF protocol design and empirical validation
Co-author of SSDL
Outreach to Australian academia

Senior Developer

Bluestone/Hewlett-Packard/Arjuna

Oct 2000 – Oct 2003 Newcastle upon Tyne

I joined Bluestone software’s Arjuna lab from my Ph.D. initially to work on transactional workflow middleware. As Web Services rose to prominence, I started a new team around transaction support for systems of Web Services. I lead the development of this middleware through being acquired by HP, and later spun out as Arjuna again. Ultimately the Arjuna IP was sold to JBoss.

Responsibilities included:

Design and implementation of Web Services transaction protocols and platform-specific bindings (Java, .NET)
Web Services transaction protocols standardisation
Co-author of “Developing Enterprise Web Services”
Industry and partner outreach

Publications

Search publication history

Yingming Wang, Paul Ezhilchelvan, Jack Waudby, Jim Webber

June 2024 EPEW 2024 Transactions, Concurrency Control, Database Management Systems

Implementations Based Evaluation of No-Wait Approach for Resolving Conflicts in Databases

In this paper, we describe No-Wait concurrency control mechanisms to address conflict resolution and then comprehensively evaluate their performance under Read-Committed and Serializability isolation levels using an in-memory database system in various configurations and contention scenarios. Key performance metrics are percentage of transaction aborts and average latency for those who do not abort. Our evaluations affirm that the No-Wait approach indeed offers a cost-effective, practical alternative to traditional conflict resolution mechanisms.

PDF

Georgios Theodorakis, James Clarkson, Jim Webber

May 2024 SEAGraph 2024 Benchmark Testing, Data Engineering, Computer Crashes, Maintenance, Complexity Theory, Indexes, B+Trees, Graph Database Management Systems

An Empirical Evaluation of Variable-length Record B+Trees on a Modern Graph Database System

B+Trees are widely used as persistent index implementations for databases. They are often implemented in a way that allows the index to be in main memory while the indexed data remains on disk. Over the years, multiple optimization techniques have been proposed to improve the efficiency of B+Trees by accelerating the key search within a node or compressing data based on common prefixes. This paper describes our empirical research implementing such optimized B+Trees in Neo4j, a modern graph database management system (DBMS). We were able to confirm that the optimized versions lived up to their performance claims over plain B+Trees when benchmarked in isolation. However, we also found that incorporating them into a real DBMS yields marginal improvements only. This is partly because Neo4j is not index-heavy, typically only using indexes to find starting points for graph traversals. The other part is that integrating optimized indexes into the transactions and page-based storage components of Neo4j incurs a performance penalty (for reasons of crash-tolerance) compared to the standalone implementations. Given the additional implementation and maintenance complexity of optimized B+Trees, our research suggests that regular B+Trees remain the preferred general-purpose implementation.

PDF DOI

James Clarkson, Georgios Theodorakis, Jim Webber

May 2024 ICDE 2024 Graph Database, Query Language Runtime, Dynamic Optimization, JIT Compilation

BIFROST: A Future Graph Database Runtime

BIFROST is a novel query engine for graph databases that supports high-fidelity data modeling on arbitrary and evolving graph topologies. It dynamically optimizes queries according to meta-level changes in the underlying graph (i.e. changes in topology) without the need for any explicit schema. This is possible by using state-of-the-art techniques from managed programming languages, such as self-optimizing ASTs and deoptimization, to combine query optimization and compilation. The approach provides high fidelity for even highly irregular labeled property graphs and gives good performance when compared to other systems that depend on fixed schemas for query planning and optimization.

PDF DOI

Georgios Theodorakis, James Clarkson, Jim Webber

March 2024 EDBT 2024 Graph Databases, Temporal Graphs

Aion: Efficient Temporal Graph Data Management

Modern graph database management systems (DBMSs) can process highly dynamic labeled property graphs (LPGs) with many billions of relationships comfortably, but those systems often ignore the temporal dimension of data, how a graph evolved over time. Temporal analytics allow users to query and compute over the graph throughout its history so that valuable line-of-business data is always accessible and never lost. However, existing approaches tend to be ad-hoc and vary in performance depending on the size of the effective graph workload, such as local pattern matching or global graph algorithms. In this work, we describe Aion, a transactional temporal graph DBMS that generalizes previous approaches for LPGs. Aion extends Neo4j, a modern graph DBMS, incurring minimal performance overhead by decoupling the graph’s history from the latest graph version. To support efficient temporal analytics independently of workload characteristics, Aion adopts a hybrid temporal storage approach: (i) for fast full graph restoration at arbitrary time points, it uses TimeStore that indexes updates by time; (ii) for fine-grained graph history accesses, it uses LineageStore that indexes updates by entity identifiers. To enable incremental graph computations for improved latency, Aion introduces a computeefficient in-memory LPG representation. Our experiments show that Aion achieves comparable or better performance versus existing non-transactional temporal systems and provides up to an order of magnitude speedup over classic Neo4j

PDF DOI

Paul Ezhilchelvan, Isi Mitrani, Jim Webber

August 2023 Quantitative Evaluation of Systems (QEST) 2023 Databases, Concurrency Control, Weak Isolation

Analysis of an epoch commit protocol for distributed processing systems

A policy that reduces communication overheads by commit- ting together all transactions completed within an interval of time is examined. A model of the system involving two queues served alterna- tively with preemptions is analysed in the steady-state under Markovian assumptions. An exact and easily implementable solution is derived and is used in order to determine performance measures such as average oc- cupancy or average latency. The optimal length of the operative interval is evaluated numerically. A non-preemptive policy is simulated and is shown to be considerably less efficient than the preemptive one analysed here. A generalization to non-Markovian operative intervals is outlined.

PDF DOI

See all publications

Recent Academic Activity

2025

VLDB

2024

SEAGRAPH: Search, Exploration, and Analysis in Heterogeneous Datastores - Graph Edition

2019

Communications of the ACM

2018

Fifth International Workshop on Large-scale Graph Analysis, Management and Applications

2018

IEEE Software

See all academic activity

Socal Media

Twitter and BlueSky

I have Twitter and BlueSky accounts which are mixture of chatter with friends and colleagues, some computing science things, and a dash of left politics.

Following the example of Jonthan Dowland, my Twitter feed has a sliding window of 90 days worth of tweets. I like Twitter (somewhat) for conversations, but as a system of record much less so.

Facebook, Instagram, Snapchat etc.

I’m not on any other social media sites, I prefer email. If you meet a Jim Webber on any other platforms, it’s not me.

Jim Webber

Chief Scientist

Biography

Interests

Education

Recent & Upcoming Talks

Books

Recent Posts

Career History

Chief Scientist

Director of Professional Services

Senior Research Associate

Senior Developer

Publications

Recent Academic Activity

Socal Media

Twitter and BlueSky

Facebook, Instagram, Snapchat etc.