Bridges Between RDF and Property Graphs
Over the last months, we were working on a search system that enables exploratory search over a knowledge graph (i.e. an RDF graph with contained ontologies and a reasoning engine). An essential part of the search system is the computation of centrality and similarity metrics for ranking and recommendations respectively. The standard query language SPARQL for RDF and triplestores has however some serious shortcomings in this regard.
Traversing an RDF Graph: SPARQL 1.1 [1] introduces property paths as a new feature, which is defined as a possible route between two nodes of a graph. It adds the ability to match the connectivity of two resources in an RDF graph. However, computing the distance between a pair of resources is not possible, because we can only check whether a path exists and don’t know how the path looks like. Hence, SPARQL is not well suited for tasks like computing the shortest path between a pair of resources or computing graph algorithms such as page rank.
Intractable Blank Nodes: Blank nodes are no stable identifiers and they can get an arbitrary name assigned to in the query result of a SPARQL query according to the specification of SPARQL 1.1 [1], which makes it impossible to track entities referred to by a blank node over multiple SPARQL queries.
An RDF graph can however also be represented in a property graph. A property graph is composed of vertices and directed edges with labels, whereas both vertices and edges can have an arbitrary number of key/value pairs, also called properties, associated with them. While graph databases for property graphs have limited support for reasoning engines over OWL ontologies, they are widely used for modern data analytics applications.
But what if we could have a graph database that allows us to query RDF data with SPARQL, do complex data analytics with a query language for property graphs and additionally supports reasoning for commonly used profiles of OWL? There are unfortunately some stumbling blocks that we have to overcome. At the moment of writing no query language is accepted as a standard for property graphs. GQL1 is one of the emerging attempts to establish such a standard.
Nonetheless, a popular query language besides Cypher from Neo4J2is Gremlin. It is a functional graph traversal language for property graphs that can be executed on all Tinkerpop-enabled systems. Apache TinkerPop is an open source, vendor-agnostic, graph computing framework. As stated in documentation of Tinkerpop3, ”Every Gremlin traversal is composed of a sequence of (potentially nested) steps. A step performs an atomic operation on the data stream. Every step is either a map-step (transforming the objects in the stream), a filter-step (removing objects from the stream), or a sideEffect-step (computing statistics about the stream).”
Thakkar et.al. propose in their recent publication [2] a translator of SPARQL queries into Gremlin graph traversals, which would allow us to execute SPARQL queries on any TinkerPop-enabled system, i.e. graph database for property graphs such as JanusGraph4 or Amazon Neptune5. We would additionally have Gremlin for computing centrality and similarity metrics as we would like to do for our exploratory search system.
Thakkar et.al. [2] reports promising results and some of the deal-breaking limitations from previous versions [3] have been resolved. We would like to experiment with sparql-gremlin for our exploratory search system, and report our experience and findings in a future blog post.
References
[1] Steve Harris, Andy Seaborne, and Eric Prud’hommeaux. SPARQL 1.1 query language. W3C recommendation, 21(10), 2013. [2] Harsh Thakkar, Renzo Angles, Marko Rodriguez, Stephen Mallette, and Jens Lehmann. Let’s build Bridges, not Walls: SPARQL Querying of Tin kerPop Graph Databases with Sparql-Gremlin. In 2020 IEEE 14th Inter national Conference on Semantic Computing (ICSC), pages 408–415, San Diego, CA, USA, February 2020. IEEE. [3] Harsh Thakkar, Dharmen Punjani, Jens Lehmann, and S¨oren Auer. Two for one: Querying property graph databases using sparql via gremlinator. In Proceedings of the 1st ACM SIGMOD Joint International Workshop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA), pages 1–5, 2018.- GQL Manifesto, https://gql.today/
- Cypher, Neo4J, https://neo4j.com/developer/cypher/
- Gremlin Documentation, Apache Tinkerpop, https://tinkerpop.apache.org/gremlin.html
- JanusGraph, https://janusgraph.org/
- Amazon Neptune, https://aws.amazon.com/en/neptune/