Dr. Jim Webber

Dr. Jim Webber

Chief Scientist

Neo4j

Biography

I am Neo4j’s Chief Scientist and Visiting Professor at Newcastle University, UK. At Neo4j I lead the research group, working on a variety of database topics including query languages and runtimes, temporality, streaming, scale, and fault-tolerance. I have also also co-authored several books on graph technology including Graph Databases - 1st and 2nd Editions (O’Reilly), Graph Databases for Dummies (Wiley), and Building Knowledge Graphs (O’Reilly).

Prior to Neo4j, I worked on fault-tolerant distributed systems. First at Newcastle University startup Arjuna and then for a variety of clients for global consulting firm ThoughtWorks. Along the way I co-authored the distrubuted systems books REST in Practice (O’Reilly) and Developing Enterprise Web Services - An Architect’s Guide (Prentice-Hall).

Interests

  • Graph Theory
  • Databases
  • Distributed Systems
  • Fault Tolerance

Education

  • Visting Professor of Practice, 2018-present

    Newcastle University

  • Ph.D. in Programming Languages for High-Performance Computing, 2000

    Newcastle University

  • B.Sc. (1st class Honours) Computing Science, 1996

    Newcastle University

Recent & Upcoming Talks

Books

Career History

 
 
 
 
 

Chief Scientist

Neo4j

Oct 2010 – Present London

I discovered Neo4j and graph databases while working at ThoughtWorks, but the way they work seemed so natural to me that I became involved initially as an open source contributor. As Neo4j gained ground commercially, I moved over to the company full time as Chief Scientist and executive manager. Initially I lead the engineering team delivering the early versions of the database product, then worked for a long time building fault tolerance into clustering for the Neo4j database, and latterly I have split my time between working with academia on the next generation of transaction support for graphs, taking care of Neo4j’s customers and community, and authoring books and talks to help grow Neo4j’s market.

Responsibilities included:

  • Research manager
  • Executive manager
  • Research and development for fault-tolerant database clusters
 
 
 
 
 

Director of Professional Services

ThoughtWorks

Jan 2005 – Oct 2010 Sydney, London

I joined ThoughtWorks in Sydney as part of a small group of early employees. My initial responsibilities were to help drive sales of consultancy in finance, media, and telecoms and to deliver consulting and software delivery services to those clients. While at ThoughtWorks, I created a community of practice around SOA and developed a lightweight, iterative method of building service-oriented systems known as “Guerilla SOA.” After a move to London, I was promoted to Director of Professional services, and continued to provide strategic technology advisory (internally and externally), sales and marketing support, as well as building interesting software systems for clients.

Responsibilities included:

  • Leading technlogy delivery
  • Strategic technology advisory
  • Office of the CTO
 
 
 
 
 

Senior Research Associate

Newcastle University

Jan 2004 – Dec 2004 hosted by University of Sydney

I took a role as a Senior RA at the Newcastle University (UK), working at Sydney University (Australia). My role involved the development of example systems of Web Services that demonstrated the utility of the WS-* protocols for Grid computing, rather than needing to develop a new, competing suite of protocols for that domain.

While at the University of Sydney, I also lectured a Masters degree course in Parallel Computing.

Responsibilities included:

  • Research on emerging Web services standards and Grid computing
  • WS-GAF protocol design and empirical validation
  • Co-author of SSDL
  • Outreach to Australian academia
 
 
 
 
 

Senior Developer

Bluestone/Hewlett-Packard/Arjuna

Oct 2000 – Oct 2003 Newcastle upon Tyne

I joined Bluestone software’s Arjuna lab from my Ph.D. initially to work on transactional workflow middleware. As Web Services rose to prominence, I started a new team around transaction support for systems of Web Services. I lead the development of this middleware through being acquired by HP, and later spun out as Arjuna again. Ultimately the Arjuna IP was sold to JBoss.

Responsibilities included:

  • Design and implementation of Web Services transaction protocols and platform-specific bindings (Java, .NET)
  • Web Services transaction protocols standardisation
  • Co-author of “Developing Enterprise Web Services”
  • Industry and partner outreach

Publications

An Empirical Evaluation of Variable-length Record B+Trees on a Modern Graph Database System

B+Trees are widely used as persistent index implementations for databases. They are often implemented in a way that allows the index to be in main memory while the indexed data remains on disk. Over the years, multiple optimization techniques have been proposed to improve the efficiency of B+Trees by accelerating the key search within a node or compressing data based on common prefixes. This paper describes our empirical research implementing such optimized B+Trees in Neo4j, a modern graph database management system (DBMS). We were able to confirm that the optimized versions lived up to their performance claims over plain B+Trees when benchmarked in isolation. However, we also found that incorporating them into a real DBMS yields marginal improvements only. This is partly because Neo4j is not index-heavy, typically only using indexes to find starting points for graph traversals. The other part is that integrating optimized indexes into the transactions and page-based storage components of Neo4j incurs a performance penalty (for reasons of crash-tolerance) compared to the standalone implementations. Given the additional implementation and maintenance complexity of optimized B+Trees, our research suggests that regular B+Trees remain the preferred general-purpose implementation.

Aion: Efficient Temporal Graph Data Management

Modern graph database management systems (DBMSs) can process highly dynamic labeled property graphs (LPGs) with many billions of relationships comfortably, but those systems often ignore the temporal dimension of data, how a graph evolved over time. Temporal analytics allow users to query and compute over the graph throughout its history so that valuable line-of-business data is always accessible and never lost. However, existing approaches tend to be ad-hoc and vary in performance depending on the size of the effective graph workload, such as local pattern matching or global graph algorithms. In this work, we describe Aion, a transactional temporal graph DBMS that generalizes previous approaches for LPGs. Aion extends Neo4j, a modern graph DBMS, incurring minimal performance overhead by decoupling the graph’s history from the latest graph version. To support efficient temporal analytics independently of workload characteristics, Aion adopts a hybrid temporal storage approach: (i) for fast full graph restoration at arbitrary time points, it uses TimeStore that indexes updates by time; (ii) for fine-grained graph history accesses, it uses LineageStore that indexes updates by entity identifiers. To enable incremental graph computations for improved latency, Aion introduces a computeefficient in-memory LPG representation. Our experiments show that Aion achieves comparable or better performance versus existing non-transactional temporal systems and provides up to an order of magnitude speedup over classic Neo4j

Socal Media

Twitter and BlueSky

I have Twitter and BlueSky accounts which are mixture of chatter with friends and colleagues, some computing science things, and a dash of left politics.

Following the example of Jonthan Dowland, my Twitter feed has a sliding window of 90 days worth of tweets. I like Twitter (somewhat) for conversations, but as a system of record much less so.

Facebook, Instagram, Snapchat etc.

I’m not on any other social media sites, I prefer email. If you meet a Jim Webber on any other platforms, it’s not me.