You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a kafka stream with some updates of objects, stored in HBase. Those updates have a version and a timestamp of the change. I need to understand, if my HBase object newer or older than that one, i got from Kafka. The best way to do that is to join the stream with static HBase dataframe on objectId and filter out some rows with a version lower, than already stored. The rest is to be updated (or upserted). The alternate way is to set the HBase version timestamp of each row manually using the update timestamp. So my code (python) is like that:
But i always get such an error: streamingQueryException: key not found: hbase_version
It seems like this type of join is not supported, or maybe i miss something?
The resulting solution for me must by in python, but it is ok to pass some additional jars with spark-submit.
The text was updated successfully, but these errors were encountered:
In my opinion you should replace .select('id', col('hbase_version').cast('integer')), by .selectExpr('id','cast(hbase_version as integer) as hbase_version') or use .alias().
Did you try with the latest HBaseSync from #238?
I'm wondering .option('hbasecat', catalog_kafka), the option name to pass the HBase schema is now hbase.catalog. 'hbasecat' doesn't sound anything to me.
@ser0t0nin I'm also trying to connect to hbase via spark structured streaming but failing to do so, for Writing the data I'm using Spark hortonworks connector, with following code.
I have a similar question on StackOverflow.
I have a kafka stream with some updates of objects, stored in HBase. Those updates have a version and a timestamp of the change. I need to understand, if my HBase object newer or older than that one, i got from Kafka. The best way to do that is to join the stream with static HBase dataframe on objectId and filter out some rows with a version lower, than already stored. The rest is to be updated (or upserted). The alternate way is to set the HBase version timestamp of each row manually using the update timestamp. So my code (python) is like that:
But i always get such an error:
streamingQueryException: key not found: hbase_version
It seems like this type of join is not supported, or maybe i miss something?
The resulting solution for me must by in python, but it is ok to pass some additional jars with spark-submit.
The text was updated successfully, but these errors were encountered: