Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

inconsistency in join with a function resulting in namedtuple #266

Closed
yakir12 opened this issue Mar 20, 2019 · 3 comments
Closed

inconsistency in join with a function resulting in namedtuple #266

yakir12 opened this issue Mar 20, 2019 · 3 comments

Comments

@yakir12
Copy link

yakir12 commented Mar 20, 2019

When joining two tables using some joining function that results in a named tuple, the number of columns and their names in the joined table is not consistent. It seems to depend on if the type of one of the values in the named tuple changes or not:

julia> a = table((x = 1:10,   y = rand(10)), pkey = :x);

julia> b = table((x = 1:2:20, z = rand(10)), pkey = :x);

julia> join((l, r) -> (answer = l.y + r.z, what = l.y), a, b) # looks right:
Table with 5 rows, 3 columns:
x  answer    what
──────────────────────
1  0.305182  0.177503
3  1.15616   0.39154
5  0.442893  0.0824903
7  1.21581   0.752759
9  1.27959   0.614901

julia> join((l, r) -> (answer = l.y + r.z, what = rand(Bool) ? 1.0 : "wrong"), a, b) # what happened?
Table with 5 rows, 2 columns:
1  2
──────────────────────────────────────
1  (answer = 0.305182, what = "wrong")
3  (answer = 1.15616, what = 1.0)
5  (answer = 0.442893, what = "wrong")
7  (answer = 1.21581, what = "wrong")
9  (answer = 1.27959, what = "wrong")

Note that (after enough tries) even when the actual type is the same, it still changes the columns:

julia> join((l, r) -> (answer = l.y + r.z, what = rand(Bool) ? 1.0 : "wrong"), a, b)
Table with 5 rows, 2 columns:
1  2
──────────────────────────────────────
1  (answer = 0.209555, what = "wrong")
3  (answer = 1.34304, what = "wrong")
5  (answer = 0.381569, what = "wrong")
7  (answer = 0.809494, what = "wrong")
9  (answer = 1.13374, what = "wrong")

But gets it right when I "hard-wire" the result:

julia> join((l, r) -> (answer = l.y + r.z, what = "wrong"), a, b)
Table with 5 rows, 3 columns:
x  answer    what
────────────────────
1  0.209555  "wrong"
3  1.34304   "wrong"
5  0.381569  "wrong"
7  0.809494  "wrong"
9  1.13374   "wrong"
@piever
Copy link
Collaborator

piever commented Mar 20, 2019

I believe the issue is that join uses inference to determine the output type and behavior is a bit undefined when inference fails (as far as I understand at least). In particular if inference cannot prove that the return type is a named tuple, you will get an array of named tuples rather than a bunch of columns. The roadmap to fix this is in JuliaData/IndexedTables.jl#151. join is not done yet because:

@piever
Copy link
Collaborator

piever commented Mar 23, 2019

Relevant PR is JuliaData/IndexedTables.jl#225. It fixes the issue but still requires a fair amount of work to match the performance of the original implementation.

@piever
Copy link
Collaborator

piever commented Mar 25, 2019

Fixed by JuliaData/IndexedTables.jl#225

@piever piever closed this as completed Mar 25, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants