You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a issue, I tried to learn Spark and I followed the documentation:
This is a quick introduction into the Spark.jl core functions. It closely follows the official PySpark tutorial and copies many examples verbatim. In most cases, PySpark docs should work for Spark.jl as is or with little adaptation.
DataFrmes.jl is definitely the way to go, but the integration isn't done yet. In the simplest case, you can convert rows of DataFrames.DataFrame to Spark.Rows and use Spark.createDataFrame(...) to convert it. A more efficient solution would be to use Arrow.jl to pass data between Julia and JVM underlying the Spark.jl, but it requires a bit more research than I currently have time for :( I will take a look at it at the next iteration of working on Spark.jl.
Hello,
Good job on Spark.jl.
I have a issue, I tried to learn Spark and I followed the documentation:
I found that it is possible to Create a PySpark DataFrame from a pandas DataFrame. I tried to create a spark DataFrame from julia DataFrames.jl.
`df = spark.createDataFrame([
Row(a=1, b=2.0, c="string1", d=Date(2000, 1, 1), e=DateTime(2000, 1, 1, 12, 0)),
Row(a=2, b=3.0, c="string2", d=Date(2000, 2, 1), e=DateTime(2000, 1, 2, 12, 0)),
Row(a=4, b=5.0, c="string3", d=Date(2000, 3, 1), e=DateTime(2000, 1, 3, 12, 0))
])
+---+---+-------+----------+-------------------+
| a| b| c| d| e|
+---+---+-------+----------+-------------------+
| 1|2.0|string1|2000-01-01|2000-01-01 13:00:00|
| 2|3.0|string2|2000-02-01|2000-01-02 13:00:00|
| 4|5.0|string3|2000-03-01|2000-01-03 13:00:00|
+---+---+-------+----------+-------------------+
julia> println(df)
+---+---+-------+----------+-------------------+
| a| b| c| d| e|
+---+---+-------+----------+-------------------+
| 1|2.0|string1|2000-01-01|2000-01-01 13:00:00|
| 2|3.0|string2|2000-02-01|2000-01-02 13:00:00|
| 4|5.0|string3|2000-03-01|2000-01-03 13:00:00|
+---+---+-------+----------+-------------------+
julia> df_dataframes = DataFrames.DataFrame(A=1:4, B=["M", "F", "F", "M"])
4×2 DataFrame
Row │ A B
│ Int64 String
─────┼───────────────
1 │ 1 M
2 │ 2 F
3 │ 3 F
4 │ 4 M
julia> df = spark.createDataFrame(df_dataframes)
ERROR: MethodError: no method matching createDataFrame(::SparkSession, ::DataFrames.DataFrame)
Closest candidates are:
createDataFrame(::SparkSession, ::Vector{Vector{Any}}, ::Union{String, Vector{String}}) at ~/.julia/packages/Spark/89BUd/src/session.jl:92
createDataFrame(::SparkSession, ::Vector{Row}) at ~/.julia/packages/Spark/89BUd/src/session.jl:98
createDataFrame(::SparkSession, ::Vector{Row}, ::Union{String, Vector{String}}) at ~/.julia/packages/Spark/89BUd/src/session.jl:87
...
Stacktrace:
[1] (::Spark.DotChainer{SparkSession, typeof(Spark.createDataFrame)})(args::DataFrames.DataFrame)
@ Spark ~/.julia/packages/Spark/89BUd/src/chainable.jl:13
[2] top-level scope
@ REPL[21]:1
`
Is there a way to create a spark dataframe from DataFrames.jl? Or Do i have to use Pandas.jl.
regards
The text was updated successfully, but these errors were encountered: