Skip to content

Commit

Permalink
Replace resolver rocksdb with sqlite
Browse files Browse the repository at this point in the history
  • Loading branch information
jamesamcl authored Dec 20, 2024
1 parent 750f07e commit 70f0300
Show file tree
Hide file tree
Showing 55 changed files with 1,580 additions and 793 deletions.
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# GrEBI (Graphs@EBI)

HPC pipeline to aggregate knowledge graphs from [EMBL-EBI resources](https://www.ebi.ac.uk/services/data-resources-and-tools), the [MONARCH Initiative KG](https://monarch-initiative.github.io/monarch-ingest/Sources/), [ROBOKOP](https://robokop.renci.org/), [Ubergraph](https://github.com/INCATools/ubergraph), and other sources into giant (multi-terabyte) transient Neo4j+Solr+RocksDB databases for querying.
HPC pipeline to aggregate knowledge graphs from [EMBL-EBI resources](https://www.ebi.ac.uk/services/data-resources-and-tools), the [MONARCH Initiative KG](https://monarch-initiative.github.io/monarch-ingest/Sources/), [ROBOKOP](https://robokop.renci.org/), [Ubergraph](https://github.com/INCATools/ubergraph), and other sources into giant (multi-terabyte) transient Neo4j+Solr databases for querying.

## Outputs

Expand Down Expand Up @@ -88,7 +88,7 @@ The pipeline is implemented as [Rust](https://www.rust-lang.org/) programs with
* Cliques of equivalent nodes are merged into single nodes
* Cliques of equivalent properties are merged into single properties (and for ontology-defined properties, the [qualified safe labels](https://github.com/VirtualFlyBrain/neo4j2owl/blob/master/README.md) are used)

The primary output of the pipeline is a [property graph](https://docs.oracle.com/en/database/oracle/property-graph/22.2/spgdg/what-are-property-graphs.html) for [Neo4j](https://github.com/neo4j/neo4j). The nodes and edges are also loaded into [Solr](https://solr.apache.org/) for full-text search and [RocksDB](https://rocksdb.org/) for id->object resolution.
The primary output of the pipeline is a [property graph](https://docs.oracle.com/en/database/oracle/property-graph/22.2/spgdg/what-are-property-graphs.html) for [Neo4j](https://github.com/neo4j/neo4j). The nodes and edges are also loaded into [Solr](https://solr.apache.org/) for full-text search and sqlite for id->compressed object resolution.



2 changes: 1 addition & 1 deletion dataload/01_ingest/grebi_ingest_sqlite/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ edition = "2021"
Inflector = "0.11.4"
clap = { version = "4.4.11", features = ["derive"] }
hex = "0.4.3"
rusqlite = "0.31.0"
rusqlite = "0.32.1"
serde_json = { version = "1.0.108", features=["preserve_order"] }
jemallocator = "0.5.4"

Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
[package]
name = "grebi_make_compressed_blob"
version = "0.1.0"
edition = "2021"

[dependencies]
clap = { version = "4.4.11", features = ["derive"] }
grebi_shared = { path = "../../grebi_shared" }
flate2 = {version="1.0.28", features=["zlib-ng"]}
serde_json = { version = "1.0.108", features=["preserve_order"] }
jemallocator = "0.5.4"


Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@

use flate2::write::ZlibEncoder;
use flate2::Compression;
use grebi_shared::get_id;
use std::io::BufReader;
use std::io::BufRead;
use std::io::BufWriter;
use std::io;
use std::io::Write;

#[global_allocator]
static ALLOC: jemallocator::Jemalloc = jemallocator::Jemalloc;

fn main() {

let stdin = io::stdin().lock();
let mut reader = BufReader::new(stdin);

let stdout = io::stdout().lock();
let mut writer = BufWriter::new(stdout);

let mut n:i64 = 0;

let mut line:Vec<u8> = Vec::new();

loop {

line.clear();
reader.read_until(b'\n', &mut line).unwrap();

if line.len() == 0 {
eprintln!("saw {} lines", n);
break;
}

n = n + 1;

let id = get_id(&line);

writer.write_all(&(id.len() as u32).to_le_bytes()).unwrap();
writer.write_all(id).unwrap();

let mut enc = ZlibEncoder::new(Vec::new(), Compression::new(9));

enc.write_all(&line).unwrap();
let compressed = enc.finish().unwrap();

writer.write_all(&(compressed.len() as u32).to_le_bytes()).unwrap();
writer.write_all(&compressed).unwrap();
}

}
14 changes: 0 additions & 14 deletions dataload/07_create_db/rocksdb/grebi_make_rocks/Cargo.toml

This file was deleted.

80 changes: 0 additions & 80 deletions dataload/07_create_db/rocksdb/grebi_make_rocks/src/main.rs

This file was deleted.

Loading

0 comments on commit 70f0300

Please sign in to comment.