-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Indexing more than 250MLN from Hive to SolR #23
Comments
@disoardi Please share your the Solr table configuration. |
Hive external Table with serde:
hortonworks Data Platform 2.3.2 We have 9 Yarn nodes with 96GB per nodes ( TOT 864GB of YARN queue) Thanks in advance |
Do you have only one Zookeeper node? Usually the minimum recommended amount of Zookeeper nodes is 3. The zk string should be something like:
Can you share the output of the indexing? Are there any errors in the yarn/hive logs? You can try increasing the the Solr buffer size The
Some test with 3 Solr/Yarn nodes: (Solr and Yarn were installed in the same node)
|
Sorry for the dealy, but I found the solution: I set solr.client.threads. The default is 1. Do you have any documentation about this options? Thanks in advance |
hi |
Hi all,
We are trying to index more than 250MLN rows from Hive table (ORC format) but we have noticed that the indexing is too slow.
We have 9 SolR nodes (9 shards and 2 replicas per shard) and we have set the maxIndexingThreads parameter to 128 and the ramBufferSizeMB one to 60MB.
While launching the INSERT INTO on the external table, where the hive-serde is used, the servers CPU is idle and the indexig througput is lower than 1MLN per hour.
Since the servers are idle how can we do it faster? We have a lot of CPUs and RAM but we are not able to use them for the indexing process.
Any suggested?
Can be useful to configure any parameters on the client side to use all the threads?
Thanks in advance.
PS: We have set the commit (auto and soft) to 10 minutes or 1MLN of documents.
The text was updated successfully, but these errors were encountered: