Jim Webber

Jim Webber

Chief Scientist

Neo4j

Biography

I am Neo4j’s Chief Scientist and Visiting Professor at Newcastle University, UK. At Neo4j I lead the research group, working on a variety of database topics including query languages and runtimes, temporality, streaming, scale, and fault-tolerance. I have also also co-authored several books on graph technology including Graph Databases - 1st and 2nd Editions (O’Reilly), Graph Databases for Dummies (Wiley), and Building Knowledge Graphs (O’Reilly).

Prior to Neo4j, I worked on fault-tolerant distributed systems. First at Newcastle University startup Arjuna and then for a variety of clients for global consulting firm ThoughtWorks. Along the way I co-authored the distrubuted systems books REST in Practice (O’Reilly) and Developing Enterprise Web Services - An Architect’s Guide (Prentice-Hall).

Interests

  • Graph Theory
  • Databases
  • Distributed Systems
  • Fault Tolerance

Education

  • Visting Professor of Practice, 2018-present

    Newcastle University

  • Ph.D. in Programming Languages for High-Performance Computing, 2000

    Newcastle University

  • B.Sc. (1st class Honours) Computing Science, 1996

    Newcastle University

Recent & Upcoming Talks

Industry and Research

Books

Career History

 
 
 
 
 

Chief Scientist

Neo4j

Oct 2010 – Present London

I encountered Neo4j while working at ThoughtWorks, and the data model seemed so natural that I became involved as an open source contributor building the first Neo4j Server implementation. As Neo4j gained ground commercially, I moved over to the company full time as Chief Scientist and executive manager. Initially I lead the engineering team delivering the early versions of the database product, then worked for a long time building fault-tolerant clustering for the Neo4j database. I currently lead Neo4j Research, an empirical systems-focussed group that provides optionality for the long-term future of Neo4j. We work alongside our engineering team and academic researchers on the next generation of graph data system.

Responsibilities included:

  • Research manager
  • Executive manager
  • Empirical research on scalable fault-tolerant methods
 
 
 
 
 

Director of Professional Services

ThoughtWorks

Jan 2005 – Oct 2010 Sydney, London

I joined ThoughtWorks in Sydney as part of a small group of early employees. My initial responsibilities were to help drive sales of consultancy in finance, media, and telecoms to deliver consulting and software delivery services. While at ThoughtWorks, I created a community of practice around SOA and developed a lightweight, iterative method of building service-oriented systems known as “Guerilla SOA.” After a move to London, I was promoted to Director of Professional services, and continued to provide strategic technology advisory (internally and externally), sales and marketing support, as well as building large-scale software systems for clients.

Responsibilities included:

  • Leading technlogy delivery
  • Strategic technology advisory
  • Office of the CTO
 
 
 
 
 

Senior Research Associate

Newcastle University

Jan 2004 – Dec 2004 hosted by University of Sydney

I took a role as a Senior RA at the Newcastle University (UK), working at Sydney University (Australia). My role involved the development of example systems of Web Services that demonstrated the utility of the WS-* protocols for Grid computing, rather than needing to develop a new, competing suite of protocols for that domain.

While at the University of Sydney, I also lectured a Masters degree course in Parallel Computing.

Responsibilities included:

  • Research on emerging Web services standards and Grid computing
  • WS-GAF protocol design and empirical validation
  • Co-author of SSDL
  • Outreach to Australian academia
 
 
 
 
 

Senior Developer

Bluestone/Hewlett-Packard/Arjuna

Oct 2000 – Oct 2003 Newcastle upon Tyne

I joined Bluestone software’s Arjuna lab from my Ph.D. initially to work on transactional workflow middleware. As Web Services rose to prominence, I started a new team around transaction support for systems of Web Services. I lead the development of this middleware through being acquired by HP, and later spun out as Arjuna again. Ultimately the Arjuna IP was sold to JBoss.

Responsibilities included:

  • Design and implementation of Web Services transaction protocols and platform-specific bindings (Java, .NET)
  • Web Services transaction protocols standardisation
  • Co-author of “Developing Enterprise Web Services”
  • Industry and partner outreach

Publications

RIOT: Replicated Independently-Ordered Transactions

Consensus protocols such as Raft and Paxos implement state machine replication through a single leader that enforces a totally ordered log. While this simplifies correctness, it introduces sequential bottlenecks that restrict scalability. We present RIOT, a generalized consensus protocol that eliminates centralized leadership and log replication in favor of decentralized coordination over a directed acyclic graph (DAG) of entries. RIOT guarantees that all servers maintain a logically identical DAG, preserving order where conflicts require it while allowing commutative operations to execute concurrently. RIOT is motivated by our work on distributed graph databases,which must guarantee reciprocal consistency for edges that span shards. Unlike specialized transaction protocols, RIOT makes no assumptions about concurrency control or transaction models. It provides a replicated state machine abstraction that integrates cleanly with transactional databases, treating DAG entries as transaction placeholders. Both single-phase and two-phase variants are supported, ensuring atomic agreement on entries and their ordering constraints. We integrate RIOT with Neo4j and evaluate it against Neo4j’s production Raft implementation. For common workloads, RIOT delivers up to 2.5× higher throughput and 2.3× lower tail latency while matching the strong consistency guarantees of log-based consensus. In doing so, RIOT demonstrates how consensus can be generalized to unlock scalability for transactional databases at scale.

TuskFlow: An Efficient Graph Database for Long-Running Transactions

Mammoth transactions, which involve long-running operations that access many items, are common in graph workloads. Graph analytics tasks, including pattern matching and graph algorithms, can generate large read-write operations that impact signi!cant portions of data, which makes their execution challenging under strict isolation guarantees. Consequently, we face an apparent trade-off between ensuring high isolation and achieving high performance, forcing users to choose between the two. In this work, we present TuskFlow, an experimental graph database based on Neo4j, designed to e#ciently handle mammoth transactions on graphs (the technique is applicable to other models such as relational) while maintaining existing transactional semantics. TuskFlow employs a deterministic protocol that safely reorders regular transactions around mammoths within an epoch. Our protocol supports parallel mammoth execution inspired by graph-parallel algorithms. To minimize con$icts with regular transactions, TuskFlow introduces query- and workload-aware optimizations, including graph entity tagging and partitioning. Our experiments demonstrate that, unlike traditional protocols like two-phase locking or MVCC, TuskFlow avoids blocking write transactions and improves tail latency by up to 45x.

Socal Media

Twitter and BlueSky

I have Twitter and BlueSky accounts which are mixture of chatter with friends and colleagues, some computing science things, and a dash of left politics.

Following the example of Jonthan Dowland, my Twitter feed has a sliding window of 90 days worth of tweets. I like Twitter (somewhat) for conversations, but as a system of record much less so.

Facebook, Instagram, Snapchat etc.

I’m not on any other social media sites, I prefer email. If you meet a Jim Webber on any other platforms, it’s not me.