Graph database (Basis data graf) dalam dunia ilmu komputer adalah basis data yang menggunakan struktur data graf yg memiliki komponen simpel (node), tepi (edge) dan atribut (properties) untuk merepresentasikan penyimpanan data. Basis data graf menyediaakan index-free adjacency, konsep yang memungkinkan penyimpanan dan Pemrosesan graf secara efisien.[1] Dalam sistem ini, setiap simpul langsung terhubung dengan simpul-simpul yang ada di sekitarnya tanpa perlu menggunakan indeks tambahan untuk mencari simpul-simpul tersebut. Artinya, hubungan antar simpul disimpan langsung di dalam simpul itu sendiri, sehingga akses ke simpul terrait bisa dilakukan secara cepas tanpa pencarían melalui indeks yang terpisah. Pendekatan ini sangat berguna dalam aplikasi yang membutuhkan performa tinggi, seperti rekomendasi atau analisis jaringan sosial.
Berikut struktur dari graph database:
Graph database
Setiap simpul melambangkan suatu entitas seperti orang, bisnis, akun, atau item lain yang hendak dilacak.
Sifat
Dibanding dengan relational database, graph database sering lebih cepat untuk himpunan data asosiatif, dan memetakan lebih langsung ke struktur aplikasi berorientasi objek (object-oriented application). Database ini dapat diskala lebih alamiah ke himpunan data lebih besar karena umumnya tidak membutuhkan operasi "join" yang mahal. Karena kurang tergantung dari skema kaku, mereka lebih cocok untuk dikelola secara ad hoc dan data yang berubah-ubah dengan skema yang terus diperbarui. Sebaliknya, relational database umumnya lebih cepat dalam mengerjakan operasi yang sama dengan jumlah elemen data yang lebih banyak.
Proyek-proyek graph database
Berikut adalah daftar sejumlah proyek graph database yang terkenal:
A distributed multi-model document store and graph database. Highly scalable supporting ACID and full transaction support. Including a built-in graph explorer.
A high-performance and scalable graph database management system from Sparsity TechnologiesDiarsipkan 2017-05-04 di Wayback Machine., a technology transition company from DAMA-UPCDiarsipkan 2015-02-23 di Wayback Machine.. Its main characteristics is its query performance for the retrieval & exploration of large networks. Sparksee 5 mobile is the first graph database for mobile devices.
A high performance graph store using natively implemented graph data structures and primitives for achieving superior efficiency. IBM System G Native Store can handle various simple graphs, property graphs, and RDF graphs, in terms of storage, analytics, and visualization. Native Store is accessible from most programming languages by providing APIs in C++, Java (Tinkerpop/Blueprints), and Python. Its gShell graph command collection and the Native Store REST APIs provide language-free interfaces.
MapGraph is Massively Parallel Graph processing on GPUs. The MapGraph API makes it easy to develop high performance graph analytics on GPUs. The API is based on the Gather-Apply-Scatter (GAS) model as used in GraphLab.[7] MapGraph is up to two orders of magnitude faster than parallel CPU implementations on up 24 CPU cores and has performance comparable to a state-of-the-art manually optimized GPU implementation. New algorithms can be implemented in a few hours that fully exploit the data-level parallelism of the GPU and offer throughput of up to 3.3 billion traversed edges per second on a single GPU.[8] and up to 30 billion traversed edges per second on a cluster with 64 GPUs [9]
A highly scalable open source graph database that supports ACID, has high-availability clustering for enterprise deployments, and comes with a web-based administration tool that includes full transaction support and visual node-link graph explorer.[11] Neo4j is accessible from most programming languages using its built-in RESTweb API interface. Neo4j is the most popular graph database in use today.[12]
A highly scalable open source graph database.[13] Orly is accessible from most programming languages using its built-in RESTweb API interface. Orly is a popular graph database in use today.
A hybrid database server handling RDF and other graph data, RDB/SQL data, XML data, filesystem documents/objects, and free text. May be deployed as a local embedded instance (as used in the Nepomuk Semantic Desktop), a single-instance network server, or a shared-nothing elastic-cluster multiple-instance networked server.[14]
1) RDF Semantic Graph: comprehensive W3C RDF graph management in Oracle Database with native reasoning and triple-level label security. 2) Network Data Model property graph: for physical/logical networks with persistent storage and a Java API for in-memory graph analytics.
OrientDB is a 2nd Generation Distributed Graph Database with the flexibility of Documents in one product with an Open Source commercial friendly license (Apache 2 license). It has a multi-master replication and sharding. Supports schema-less, schema-full and schema-mixed modes. Has a strong security profiling system based on user and roles and supports SQL amongst the query languages. Thanks to the SQL layer, it's straightforward to use for those skilled in the relational database world.
A graph database engine, based entirely on Semantic Web standards from W3C: RDF, RDFS, OWL, SPARQL. OWLIM Lite is an "in memory" engine. OWLIM SE is robust standalone database engine. OWLIM Enterprise is a clustered version which offers horizontal scalability and failover support and other enterprise features.
SPARQLCity produces SPARQLVerse: A standards and Hadoop based analytic graph engine for performing rich business analytics on structured and semi-structured data.
A high performance, multi-purpose, highly scalable and extensible MPP database incorporating patented engines supporting native SQL, MapReduce and Graph data storage and manipulation. An extensive set of analytical function libraries and data visualization capabilities are also provided.
RDF graph: Triple & Quad (named graphs); expandable column store
SPARQL, XMLA, ODBC, JDBC, ADO.NET, OLE DB, Jena, Sesame, Virtuoso PL/SQL, Java, Python, Perl, PHP, HTTP, etc.
SPARQL 1.1; SPARQL web service endpoint; SQL; others
Pivot Viewer (Silverlight or HTML5); OpenLink Data Explorer; SPARQL-compliant tools; Apache Jena-based tools; XML & JSON-based tools; SQL based tools
ACID
Internal column-store or row-store (depending on licensure), hybrid RDF/SQL/RDB engine
Infinite via Commercial Edition's Cluster Module elastic cluster functionality; simple master-slave clustering of single-server instances also an option.
Parallel load, query, inference; Query controls; Scales from PC to Oracle Exadata; Supports Oracle Real Application Clusters and Oracle Database 8 exabytes
Apache HamaDiarsipkan 2012-06-18 di Wayback Machine. - a pure BSP(Bulk Synchronous Parallel) computing framework on top of HDFS (Hadoop Distributed File System) for massive scientific computations such as matrix, graph and network algorithms.
FaunusDiarsipkan 2013-01-29 di Wayback Machine. - a Hadoop-based graph computing framework that uses Gremlin as its query language. Faunus provides connectivity to Titan, Rexster-fronted graph databases, and to text/binary graph formats stored in HDFS. Faunus is developed by AureliusDiarsipkan 2016-04-02 di Wayback Machine..
GraphBaseDiarsipkan 2014-11-25 di Wayback Machine. - Enterprise Edition supports embedding of callable Java Agents within the vertices of a distributed graph.
HipGDiarsipkan 2013-06-03 di Wayback Machine. - a library for high-level parallel processing of large-scale graphs. HipG is implemented in Java and is designed for distributed-memory machine
IBM System G Graph Analytics ToolkitDiarsipkan 2015-02-26 di Wayback Machine. - A comprehensive graph analytics library consisted of network topological analysis tools, graph matching and search tools, and graph path and flow tools. It has been applied to various use cases and industry solutions.
InfiniteGraphDiarsipkan 2023-06-01 di Wayback Machine. - a commercially available distributed graph database that supports parallel load and parallel queries.
OpenLinkVirtuoso - the shared-nothing Cluster Edition supports distributed graph data processing.
Oracle Spatial and GraphDiarsipkan 2013-11-06 di Wayback Machine. - loading, inferencing, and querying workloads are automatically and transparently distributed across the nodes in an Oracle Real Application Cluster, Oracle Exadata Database Machine, and Oracle Database Appliance.
PowerLyraDiarsipkan 2022-10-05 di Wayback Machine. - A distributed graph analytics based on GraphLab using differentiated graph computation and partitioning on skewed (e.g. power-law and bipartite) graphs (dynamically applying different computation and partition strategies for different vertices).
CyclopsDiarsipkan 2021-06-15 di Wayback Machine. - A computation and communication efficient graph processing system with significantly low communication cost.
ImitatorDiarsipkan 2022-05-27 di Wayback Machine. - A reliable distributed graph processing system with replication-based fault-tolerance.
SedgeDiarsipkan 2015-02-25 di Wayback Machine. - A framework for distributed large graph processing and graph partition management (including an open source version of Google's Pregel)
MizanDiarsipkan 2022-12-26 di Wayback Machine. - An optimized Pregel clone that can be deployed easily on Amazon EC2, local clusters, stand-alone Linux systems and supercomputers (IBM BlueGene/P). It utilizes runtime graph repartitioning between iterations to provide dynamic load balancing for better algorithm performance.[16]
WeaverDiarsipkan 2020-01-20 di Wayback Machine. - A fast and scalable graph store designed specifically for dynamically-changing graphs
GPGPU Graph Processing
MedusaDiarsipkan 2016-01-24 di Wayback Machine. - A framework for graph processing using Graphics Processing Units (GPUs) on both shared memory and distributed environments. Medusa allows users with no GPU programming expertise to leverage GPUs for graph processing.
APIs and Graph Query/Programming Languages
Bounds LanguageDiarsipkan 2014-11-28 di Wayback Machine. - terse C-style syntax which initiates concurrent traversals in GraphBase and supports interaction between them.
PipesDiarsipkan 2016-03-21 di Wayback Machine. - a lazy dataflow framework written in Java that forms the foundation for various property graph traversal languages.
PixyDiarsipkan 2022-12-05 di Wayback Machine. - a declarative graph query language that works on any Blueprints-compatible graph database
PygrDiarsipkan 2016-08-09 di Wayback Machine. - a Python API for large-scale analysis of biological sequences and genomes, with alignments represented as graphs.
RexsterDiarsipkan 2013-12-21 di Wayback Machine. - a graph database server that provides a REST or binary protocol API (RexPro). Supports Titan, Neo4j, OrientDB, Dex, and any TinkerPop/Blueprints-enabled graph.
RDFSharpDiarsipkan 2017-12-23 di Wayback Machine. - a .NET API for modeling RDF graphs, storing them on many SQL databases (Firebird, MySQL, PostgreSQL, SQL Server, SQLite) and querying them with SPARQL.
SPASQLDiarsipkan 2023-06-01 di Wayback Machine. - an extension of the SQL standard, allowing execution of SPARQL queries within SQL statements, typically by treating them as subquery or function clauses. This also allows SPARQL queries to be issued through "traditional" data access APIs (ODBC, JDBC, OLE DB, ADO.NET, etc.)