Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error loading into solr table from another hive table. #13

Open
aftnix opened this issue Sep 25, 2016 · 6 comments
Open

Error loading into solr table from another hive table. #13

aftnix opened this issue Sep 25, 2016 · 6 comments
Labels
information needed More information is needed to answer

Comments

@aftnix
Copy link

aftnix commented Sep 25, 2016

>sudo -u solr bin/solr create -c hiveCollection -d basic_configs -n hiveCollection -s 2 -rf 2
>hive>CREATE EXTERNAL TABLE authproc_syslog_solr (hid STRING, tstamp TIMESTAMP, type STRING, msg STRING, thost STRING, tservice STRING, tyear STRING, tmonth STRING, tday STRING) STORED BY 'com.lucidworks.hadoop.hive.LWStorageHandler' LOCATION '/tmp/solr' TBLPROPERTIES('solr.zkhost' = 'hadoop1.openstacksetup.com:2181/solr', 'solr.collection'='hiveCollection', 'solr.query' = '*:*');

>hive>INSERT OVERWRITE TABLE authproc_syslog_solr SELECT s.* FROM authproc_syslog s;

Vertex did not succeed due to OWN_TASK_FAILURE, failedTasks:1 killedTasks:32, Vertex vertex_1473357519389_0194_6_00 [Map 1] killed/failed due to:OWN_TASK_FAILURE]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0

DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:0
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, vertexId=vertex_14733575
19389_0194_6_00, diagnostics=[Task failed, taskId=task_1473357519389_0194_6_00_000009, diagnostics=[TaskAttempt 0 failed, info=[Error: Failure wh
ile running task:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error whi
le processing row
Caused by: java.lang.NullPointerException
        at com.lucidworks.hadoop.io.impl.LWSolrDocument.getId(LWSolrDocument.java:46)
        at com.lucidworks.hadoop.io.LucidWorksWriter.write(LucidWorksWriter.java:184)
        at com.lucidworks.hadoop.hive.LWHiveOutputFormat$1.write(LWHiveOutputFormat.java:39)
        at org.apache.hadoop.hive.ql.exec.FileSinkOperator.process(FileSinkOperator.java:764)
        at org.apache.hadoop.hive.ql.exec.vector.VectorFileSinkOperator.process(VectorFileSinkOperator.java:102)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:838)
        at org.apache.hadoop.hive.ql.exec.vector.VectorSelectOperator.process(VectorSelectOperator.java:138)
        at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:838)
        at org.apache.hadoop.hive.ql.exec.TableScanOperator.process(TableScanOperator.java:133)
        at org.apache.hadoop.hive.ql.exec.MapOperator$MapOpCtx.forward(MapOperator.java:170)
        at org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(VectorMapOperator.java:45)

The hive table and the hive_solr table have the exactly same schema.

@aftnix
Copy link
Author

aftnix commented Sep 26, 2016

Turns out my table didn't have id as the first field. I fixed it. But now the INSERT never finishes( I waited couple of hours, reduced dataset etc, but the query never finishes.

Yarn logs contain these :

2016-09-26 17:24:21,082 [INFO] [Dispatcher thread {Central}] |history.HistoryEventHandler|: [HISTORY][DAG:dag_1474881768573_0002_2][Event:VERTEX_
FINISHED]: vertexName=Map 1, vertexId=vertex_1474881768573_0002_2_00, initRequestedTime=1474883498501, initedTime=1474883499061, startRequestedTi
me=1474883498577, startedTime=1474883499061, finishTime=1474889061031, timeTaken=5561970, status=KILLED, diagnostics=Vertex received Kill while i
n RUNNING state.
Vertex did not succeed due to DAG_KILL, failedTasks:0 killedTasks:3
Vertex vertex_1474881768573_0002_2_00 [Map 1] killed/failed due to:DAG_KILL, counters=Counters: 0, vertexStats=firstTaskStartTime=1474883503313, 
firstTasksToStart=[ task_1474881768573_0002_2_00_000001 ], lastTaskFinishTime=1474889061030, lastTasksToFinish=[ task_1474881768573_0002_2_00_000
002,task_1474881768573_0002_2_00_000001 ], minTaskDuration=-1, maxTaskDuration=-1, avgTaskDuration=-1.0, numSuccessfulTasks=0, shortestDurationTa
sks=[  ], longestDurationTasks=[  ], vertexTaskStats={numFailedTaskAttempts=0, numKilledTaskAttempts=0, numCompletedTasks=3, numSucceededTasks=0,
 numKilledTasks=3, numFailedTasks=0}

Don't know what's going wrong here :(

@ctargett
Copy link
Contributor

Sorry for the delay of a few days to get back to you.

Are there any errors besides those messages?

Can you also share a little bit about your environment - it seems you're using Tez? What version/distro of Hive?

@vishnucg
Copy link

vishnucg commented Nov 4, 2016

I am able to load data into solr external table from another managed hive table.
But when I try to retrieve data from the solr table, it is throwing
"Failed with exception java.io.IOException:java.lang.ClassCastException: java.lang.Integer cannot be cast to java.lang.String"
I am using solr-hive-serde-2.2.6.jar on Hive 1.1.0-cdh5.4.5

@acesar
Copy link
Contributor

acesar commented Nov 5, 2016

@vishnucg can you please open a new issue with your question?

@shazack
Copy link

shazack commented Mar 2, 2017

Did this issue get resolved? I'm getting the same error

@shazack
Copy link

shazack commented Mar 2, 2017

I'm getting
Caused by: java.lang.NullPointerException at com.lucidworks.hadoop.io.impl.LWSolrDocument.getId(LWSolrDocument.java:46) at com.lucidworks.hadoop.io.LucidWorksWriter.write(LucidWorksWriter.java:190) ... 22 more ], TaskAttempt 3 failed, info=[Error: Failure while running task:java.lang.RuntimeException: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row (tag=0) {"key":{},"value":{"_col0":null,"_col1":null,"_col2":null,"_col3":null,"_col4":null,"_col5":null,"_col6":null,"_col7":null,"_col8":null,"_col9":null,"_col10":null,"_col11":null,"_col12":null,"_col13":null,"_col14":null,"_col15":null,"_col16

@ctargett ctargett added the information needed More information is needed to answer label May 3, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
information needed More information is needed to answer
Projects
None yet
Development

No branches or pull requests

5 participants