Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Could this tool import csv to backend of cql and es ? [cassandra 3.11][janusgraph 0.3.0] #1

Open
juncaofish opened this issue Sep 26, 2018 · 10 comments

Comments

@juncaofish
Copy link

juncaofish commented Sep 26, 2018

It seems it hasn't been implemented in the following code..

public class ProxyBulkLoader implements BulkLoader {


    private BulkLoader real;

    public ProxyBulkLoader(StandardJanusGraph graph){

        String backend = graph.getConfiguration().getConfiguration().get(STORAGE_BACKEND);

        if ("cassandrathrift".equals(backend)){
            real = new CassandraSSTableLoader();
        }else if ("hbase".equals(backend)){
            // ignore
        }else if ("bigtable".equals(backend)){
            // ignore
        }else {
        }
    }

my config file looks like below:

cache.db-cache = true
cache.db-cache-clean-wait = 20
cache.db-cache-time = 180000
cache.db-cache-size = 0.5

storage.backend=cql
storage.cql.keyspace=graph1
gremlin.graph=org.janusgraph.core.JanusGraphFactory
storage.hostname=xx

index.base.index-name=graph1
index.base.backend=elasticsearch
index.base.hostname=xx:xx
index.base.elasticsearch.create.ext.number_of_shards=3
index.base.elasticsearch.create.ext.number_of_replicas=1
index.base.elasticsearch.create.ext.shard.check_on_startup=true
index.base.elasticsearch.create.ext.refresh_interval=10s

query.batch=true
ids.block-size=100000000
ids.renew-percentage=0.3
schema.default=none
storage.batch-loading=true
@juncaofish juncaofish changed the title Could this tool import csv to backend with cassandra and es ? Could this tool import csv to backend of cql and es ? Sep 26, 2018
@dengziming
Copy link
Owner

yes, it can import csv to backend with cassandra , but i just set storage.backend=cassandrathrift instead of storage.backend= cql, my config just like :

storage.backend=cassandrathrift
storage.hostname=localhost
storage.cassandra.keyspace=janusgraph
index.search.backend=elasticsearch
index.search.hostname=localhost
storage.buffer-size=10000
ids.block-size=10000000

@juncaofish
Copy link
Author

I set the config with:

storage.backend=cassandrathrift

The log for importing indicates data have been imported:
[node]389[map]52[edge]272
.......... .......... .......... .......... .......... 5%
.......... .......... .......... .......... .......... 10%
.......... .......... .......... .......... .......... 15%
.......... .......... .......... .......... .......... 20%
.......... .......... .......... .......... .......... 25%
.......... .......... .......... .......... .......... 30%
.......... .......... .......... .......... .......... 35%
.......... .......... .......... .......... .......... 40%
.......... .......... .......... .......... .......... 45%
.......... .......... .......... .......... .......... 50%
.......... .......... .......... .......... .......... 55%
.......... .......... .......... .......... .......... 60%
.......... .......... .......... .......... .......... 65%
.......... .......... .......... .......... .......... 70%
.......... .......... .......... .......... .......... 75%
.......... .......... .......... .......... .......... 80%
.......... .......... .......... .......... .......... 85%
.......... .......... .......... .......... .......... 90%
.......... .......... .......... .......... .......... 95%
.......... .......... .......... .......... .......... 100%

IMPORT DONE in 1s 193ms.
Imported:
14 nodes
18 edges
31 properties
Peak memory usage: 8000060

with exception:

InvalidRequestException(why:unconfigured table schema_columnfamilies)
java.lang.RuntimeException: Could not retrieve endpoint ranges: 
        at org.apache.cassandra.tools.BulkLoader$ExternalClient.init(BulkLoader.java:344)
        at org.apache.cassandra.io.sstable.SSTableLoader.stream(SSTableLoader.java:156)
        at org.apache.cassandra.tools.BulkLoader.main(BulkLoader.java:111)
        at janusgraph.util.batchimport.unsafe.load.cassandra.CassandraSSTableLoader.load(CassandraSSTableLoader.java:86)
        at janusgraph.util.batchimport.unsafe.load.ProxyBulkLoader.load(ProxyBulkLoader.java:39)
        at janusgraph.util.batchimport.unsafe.BulkLoad.main(BulkLoad.java:311)
Caused by: InvalidRequestException(why:unconfigured table schema_columnfamilies)
        at org.apache.cassandra.thrift.Cassandra$execute_cql3_query_result$execute_cql3_query_resultStandardScheme.read(Cassandra.java:50297)
        at org.apache.cassandra.thrift.Cassandra$execute_cql3_query_result$execute_cql3_query_resultStandardScheme.read(Cassandra.java:50274)
        at org.apache.cassandra.thrift.Cassandra$execute_cql3_query_result.read(Cassandra.java:50189)
        at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
        at org.apache.cassandra.thrift.Cassandra$Client.recv_execute_cql3_query(Cassandra.java:1734)
        at org.apache.cassandra.thrift.Cassandra$Client.execute_cql3_query(Cassandra.java:1719)
        at org.apache.cassandra.tools.BulkLoader$ExternalClient.init(BulkLoader.java:323)

When I query the graph, it seems nothing has been imported yet.
Could you help me figure out what's going on with the exception?

@dengziming
Copy link
Owner

@juncaofish
i think your cassandra version is 3.X ?
the bulk load use neither Thrift nor cqlsh, so the edition between cassandra and janusgraph should be consistent. for example ,if you are using cassandra 2.0 ,you should use janusgraph 2.X.

you can change the cassandra dependency version of your janusgraph to 3.X, or you can change your local cassandra to 2.X (i am using 2.1.8).

@juncaofish
Copy link
Author

we use cassandra 3.11 and janusgraph 0.3.0. Is it consistent?

@dengziming
Copy link
Owner

@juncaofish
I havn't try cassandra 3.11, I think it's OK . just ensure that the version of cassandra in janus are the same of cassandra you use .
I will make some change and try to import to cassandra 3.0, just wait for some days, sorry!

@juncaofish
Copy link
Author

@dengziming
Great. Thanks for your help.

@dengziming
Copy link
Owner

dengziming commented Oct 11, 2018

@juncaofish
hello, i have tried janusgraph3.0 and cassandra 3.10, it's difficult to do bulk import, you can do it using code of the following branch:
https://github.com/dengziming/janusgraph-util/tree/feature_cassandra3

but i am sorry that you should first run the java commend, and run the sstableloader commend manually .

for example, you run the BulkLoad in your IDE , the args are like this:

--into /Users/dengziming/opt/soft/neo4j-community-3.3.3/data/databases/all2018701.db \ --janus-config-file janusgraph.properties \ --skip-duplicate-nodes true \ --skip-bad-relationships true \ --ignore-extra-columns true \ --ignore-empty-strings true \ --bad-tolerance 10000000 \ --processors 1 \ --id-type string \ --max-memory 2G \ 
--nodes:titan /Users/dengziming/opt/data/tmp/v_titan.csv \ 
--nodes:location /Users/dengziming/opt/data/tmp/v_location.csv \ 
--nodes:god /Users/dengziming/opt/data/tmp/v_god.csv \ 
--nodes:demigod /Users/dengziming/opt/data/tmp/v_demigod.csv \ 
--nodes:human /Users/dengziming/opt/data/tmp/v_human.csv \ 
--nodes:monster /Users/dengziming/opt/data/tmp/v_monster.csv \ 
--edges:father /Users/dengziming/opt/data/tmp/e_god_titan_father.csv \ 
--edges:father /Users/dengziming/opt/data/tmp/e_demigod_god_father.csv \ 
--edges:mother /Users/dengziming/opt/data/tmp/e_demigod_human_mother.csv \ 
--edges:lives /Users/dengziming/opt/data/tmp/e_god_location_lives.csv \ 
--edges:lives /Users/dengziming/opt/data/tmp/e_monster_location_lives.csv \ 
--edges:brother /Users/dengziming/opt/data/tmp/e_god_god_brother.csv \ 
--edges:battled /Users/dengziming/opt/data/tmp/e_demigod_monster_battled.csv \ 
--edges:pet /Users/dengziming/opt/data/tmp/e_god_monster_pet.csv 

and after finish you will see some log:

IMPORT DONE in 1s 11ms. 
Imported:
  12 nodes
  17 edges
  27 properties
Peak memory usage: 8000052
you are using cassandra 3.0, please use `sstableloader` to manually load into cassandra
sstableloader -d localhost /Users/dengziming/opt/soft/neo4j-community-3.3.3/data/databases/all2018701.db/Nodes/0/janusgraph/edgestore
sstableloader -d localhost /Users/dengziming/opt/soft/neo4j-community-3.3.3/data/databases/all2018701.db/Edges/0/janusgraph/edgestore

and you can run all these commend to load data into janusgraph.

# # here is the commend
dengziming@dengzimings-MacBook-Pro:~/Desktop/worknotes$ sstableloader -d localhost /Users/dengziming/opt/soft/neo4j-community-3.3.3/data/databases/all2018701.db/Nodes/0/janusgraph/edgestore
objc[68147]: Class JavaLaunchHelper is implemented in both /Library/Java/JavaVirtualMachines/jdk1.8.0_121.jdk/Contents/Home//bin/java (0x1039524c0) and /Library/Java/JavaVirtualMachines/jdk1.8.0_121.jdk/Contents/Home/jre/lib/libinstrument.dylib (0x1051bc4e0). One of the two will be used. Which one is undefined.
WARN  16:02:14,081 Only 26.308GiB free across all data volumes. Consider adding more capacity to your cluster or removing obsolete snapshots
Established connection to initial hosts
Opening sstables and calculating sections to stream
Streaming relevant part of /Users/dengziming/opt/soft/neo4j-community-3.3.3/data/databases/all2018701.db/Nodes/0/janusgraph/edgestore/mc-1-big-Data.db to [localhost/127.0.0.1]
progress: [localhost/127.0.0.1]0:1/1 100% total: 100% 0.137KiB/s (avg: 0.137KiB/s)
progress: [localhost/127.0.0.1]0:1/1 100% total: 100% 0.000KiB/s (avg: 0.136KiB/s)

Summary statistics:
   Connections per host    : 1
   Total files transferred : 1
   Total bytes transferred : 0.581KiB
   Total duration          : 4255 ms
   Average transfer rate   : 0.136KiB/s
   Peak transfer rate      : 0.137KiB/s

WARN  16:02:18,949 JNA link failure, one or more native method will be unavailable.

## here is the commend
dengziming@dengzimings-MacBook-Pro:~/Desktop/worknotes$ sstableloader -d localhost /Users/dengziming/opt/soft/neo4j-community-3.3.3/data/databases/all2018701.db/Edges/0/janusgraph/edgestore
objc[68154]: Class JavaLaunchHelper is implemented in both /Library/Java/JavaVirtualMachines/jdk1.8.0_121.jdk/Contents/Home//bin/java (0x10a70f4c0) and /Library/Java/JavaVirtualMachines/jdk1.8.0_121.jdk/Contents/Home/jre/lib/libinstrument.dylib (0x10a7d74e0). One of the two will be used. Which one is undefined.
WARN  16:02:21,344 Only 26.316GiB free across all data volumes. Consider adding more capacity to your cluster or removing obsolete snapshots
Established connection to initial hosts
Opening sstables and calculating sections to stream
Streaming relevant part of /Users/dengziming/opt/soft/neo4j-community-3.3.3/data/databases/all2018701.db/Edges/0/janusgraph/edgestore/mc-2-big-Data.db to [localhost/127.0.0.1]
progress: [localhost/127.0.0.1]0:1/1 100% total: 100% 0.170KiB/s (avg: 0.170KiB/s)
progress: [localhost/127.0.0.1]0:1/1 100% total: 100% 0.000KiB/s (avg: 0.169KiB/s)

Summary statistics:
   Connections per host    : 1
   Total files transferred : 1
   Total bytes transferred : 0.605KiB
   Total duration          : 3575 ms
   Average transfer rate   : 0.169KiB/s
   Peak transfer rate      : 0.170KiB/s

WARN  16:02:25,488 JNA link failure, one or more native method will be unavailable.

then I query janusgrpah:

gremlin> g.V().count()
==>12
gremlin> saturn = g.V().has("name", "saturn").next();
==>v[4096]
gremlin> g.V(saturn).in("father").in("father").values("name")
==>hercules

so it's OK , good luck for you!

@juncaofish juncaofish changed the title Could this tool import csv to backend of cql and es ? Could this tool import csv to backend of cql and es ? [cassandra 3.11][janusgraph 0.3.0] Oct 11, 2018
@cljxhouse
Copy link

hello, the latest code is now using TxImportStoreImpl to load data, can this tool add support for cql based on this?
currently I am getting this exception:
janusgraph-import/janusgraph.properties
Input error: Could not find implementation class: org.janusgraph.diskstorage.cql.CQLStoreManager
Caused by:Could not find implementation class: org.janusgraph.diskstorage.cql.CQLStoreManager
java.lang.IllegalArgumentException: Could not find implementation class: org.janusgraph.diskstorage.cql.CQLStoreManager
at org.janusgraph.util.system.ConfigurationUtil.instantiate(ConfigurationUtil.java:60)
at org.janusgraph.diskstorage.Backend.getImplementationClass(Backend.java:476)
at org.janusgraph.diskstorage.Backend.getStorageManager(Backend.java:408)
at org.janusgraph.graphdb.configuration.GraphDatabaseConfiguration.(GraphDatabaseConfiguration.java:1254)
at org.janusgraph.core.JanusGraphFactory.open(JanusGraphFactory.java:160)
at org.janusgraph.core.JanusGraphFactory.open(JanusGraphFactory.java:131)
at org.janusgraph.core.JanusGraphFactory.open(JanusGraphFactory.java:78)
at janusgraph.util.batchimport.unsafe.graph.GraphUtil.getGraph(GraphUtil.java:41)
at janusgraph.util.batchimport.unsafe.BulkLoad.getGraph(BulkLoad.java:417)
at janusgraph.util.batchimport.unsafe.BulkLoad.main(BulkLoad.java:284)
Caused by: java.lang.ClassNotFoundException: org.janusgraph.diskstorage.cql.CQLStoreManager
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:264)
at org.janusgraph.util.system.ConfigurationUtil.instantiate(ConfigurationUtil.java:56)
... 9 more

@dengziming
Copy link
Owner

@cljxhouse if you use TxImportStoreImpl to load data to cassandra, you can use cql to persist data.

if you get these error of Could not find implementation class, you can solve it by add the dependency janusgraph-cql to pom.xml.

@fanweber
Copy link

使用 sstableloader 的时候,提示错误

[root@localhost janusgraph-import]# sstableloader -d 127.0.0.1 --verbose data/all20190709_01.db/Nodes/0/janusgraph/edgestore
Keyspace system_schema does not exist
com.datastax.driver.core.exceptions.InvalidQueryException: Keyspace system_schema does not exist

cassandra 版本是3.11.4
janusgraph-util 的版本是 janusgraph-util-feature_cassandra3

不知道是版本或配置问题,还是 sstableloader不再支持cassandra。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants