-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
read sequence file from HDFS #101
Comments
It's almost correct! The mistake is that using JavaCall
# make sure we read the signature correctly
listmethods(sc.jsc, "sequenceFile")
# 2-element Vector{JMethod}:
# org.apache.spark.api.java.JavaPairRDD sequenceFile(java.lang.String, java.lang.Class, java.lang.Class, int)
# org.apache.spark.api.java.JavaPairRDD sequenceFile(java.lang.String, java.lang.Class, java.lang.Class)
# create class objects - using a hack for simplicity
# though a more robust approach would be to use Java reflection API
jstr = getclass(JString(""))
# actually call sequenceFile()
path = ...
jcall(sc.jsc, "sequenceFile", JJavaPairRDD, (JString, JClass, JClass), path, jstr, jstr) |
Thank you, @dfdx! It works now. Are there any functions in Spark.jl that I can use to get the key and value of JavaPairRDD? |
Update:
But I got errors saying,
and
|
Just to make sure, in your sequence file both - keys and values - are strings, right? Because I used JString just as an example class :D |
Aaaaa, they are actually |
I had to refresh my memory about the Reflection API a bit, but it turns out we already have the convenient function to create instance of a jtext = JavaCall.classforname("org.apache.hadoop.io.Text")
text = jcall(sc.jsc, "sequenceFile", JJavaPairRDD, (JString, JClass, JClass), filepath_input, jtext, jtext) |
Thank you so much! That part works now. I saw an error when run collect(text_v),
After some searching, it seems that Text is not serializable and I found this solution.
and I got results like,
Something is still missing in my code. |
I understand the error, but I don't see an immediate solution. In your code you call The most straightforward way to convert As a side note, very few people still work with sequence files and Hadoop's |
Hi @dfdx, thank you for showing me how to add new functions to Spark.jl in issue #98. Now I am trying to add "sequenceFile" following the same steps but could not get it working. Here is the code snippet.
I got an error message saying
Any suggestions on how to make it work? Thanks.
The text was updated successfully, but these errors were encountered: