You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When joining two tables using some joining function that results in a named tuple, the number of columns and their names in the joined table is not consistent. It seems to depend on if the type of one of the values in the named tuple changes or not:
julia> a =table((x =1:10, y =rand(10)), pkey =:x);
julia> b =table((x =1:2:20, z =rand(10)), pkey =:x);
julia>join((l, r) -> (answer = l.y + r.z, what = l.y), a, b) # looks right:
Table with 5 rows, 3 columns:
x answer what
──────────────────────
10.3051820.17750331.156160.3915450.4428930.082490371.215810.75275991.279590.614901
julia>join((l, r) -> (answer = l.y + r.z, what =rand(Bool) ?1.0:"wrong"), a, b) # what happened?
Table with 5 rows, 2 columns:12
──────────────────────────────────────
1 (answer =0.305182, what ="wrong")
3 (answer =1.15616, what =1.0)
5 (answer =0.442893, what ="wrong")
7 (answer =1.21581, what ="wrong")
9 (answer =1.27959, what ="wrong")
Note that (after enough tries) even when the actual type is the same, it still changes the columns:
julia>join((l, r) -> (answer = l.y + r.z, what =rand(Bool) ?1.0:"wrong"), a, b)
Table with 5 rows, 2 columns:12
──────────────────────────────────────
1 (answer =0.209555, what ="wrong")
3 (answer =1.34304, what ="wrong")
5 (answer =0.381569, what ="wrong")
7 (answer =0.809494, what ="wrong")
9 (answer =1.13374, what ="wrong")
But gets it right when I "hard-wire" the result:
julia>join((l, r) -> (answer = l.y + r.z, what ="wrong"), a, b)
Table with 5 rows, 3 columns:
x answer what
────────────────────
10.209555"wrong"31.34304"wrong"50.381569"wrong"70.809494"wrong"91.13374"wrong"
The text was updated successfully, but these errors were encountered:
I believe the issue is that join uses inference to determine the output type and behavior is a bit undefined when inference fails (as far as I understand at least). In particular if inference cannot prove that the return type is a named tuple, you will get an array of named tuples rather than a bunch of columns. The roadmap to fix this is in JuliaData/IndexedTables.jl#151. join is not done yet because:
the join code is very complex
StructArrays has the necessary machinery to implement iterator style groupby (see Simplify grouping iteration mechanism IndexedTables.jl#223) but not iterator style join (I have some ideas though and will try to give it a shot in the next few days)
Relevant PR is JuliaData/IndexedTables.jl#225. It fixes the issue but still requires a fair amount of work to match the performance of the original implementation.
When joining two tables using some joining function that results in a named tuple, the number of columns and their names in the joined table is not consistent. It seems to depend on if the type of one of the values in the named tuple changes or not:
Note that (after enough tries) even when the actual type is the same, it still changes the columns:
But gets it right when I "hard-wire" the result:
The text was updated successfully, but these errors were encountered: