Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

transpose on Panel fails #390

Open
tnielens opened this issue Mar 20, 2022 · 4 comments
Open

transpose on Panel fails #390

tnielens opened this issue Mar 20, 2022 · 4 comments
Labels

Comments

@tnielens
Copy link
Collaborator

import org.saddle._
Panel(Vec(1, 2, 3), Vec("hello", "world", "!")).T

throws

java.lang.ArrayStoreException
	at java.lang.System.arraycopy(Native Method)
	at org.saddle.array.package$.$anonfun$flatten$2(package.scala:623)
	at org.saddle.array.package$.$anonfun$flatten$2$adapted(package.scala:621)
	at scala.collection.immutable.Vector.foreach(Vector.scala:1856)
	at org.saddle.array.package$.flatten(package.scala:621)
	at org.saddle.scalar.ScalarTagAny.concat(ScalarTagAny.scala:64)
	at org.saddle.Frame.toMat(Frame.scala:1426)
	at org.saddle.Frame.T(Frame.scala:168)
	at repl.MdocSession$App.<init>(scalar.worksheet.sc:11)
	at repl.MdocSession$.app(scalar.worksheet.sc:3)
@pityka
Copy link
Owner

pityka commented Mar 20, 2022

Thanks for finding this.

What do you think of Panel? In the last 10 years I never used it.

@tnielens
Copy link
Collaborator Author

I don't have much experience with saddle.
This works fine with the regular frame constructor:

Frame(Vec[Any](1, 2, 3), Vec[Any]("hello", "world", "!")).T

Issue is probably that the underlying arrays of Vec[Int] and Vec[String] aren't compatible for the transposition. Panel allows that by taking Vec[_]in whereas the Frame example above forces Vec[Any] and corresponding scalartag.

@pityka pityka added the bug label Mar 28, 2022
@bbuchsbaum
Copy link

Thanks for finding this.

What do you think of Panel? In the last 10 years I never used it.

Just chiming in to say heterogeneous Frames are important for statistical modeling where you have categorical variables and continuous variables in the same dataset. Does this forked version of saddle support heterogeneous Frames?

@pityka
Copy link
Owner

pityka commented May 24, 2023

No, in this fork Frame is Frame[RowIndextype, ColIndexType, ValueType] thus the values in the frame must be of a single type.

For the categorical variables you can overcome this by one-hot encoding, as eventually often times that is done downstream anyway for analysis. You can also encode the levels with doubles (just don't use them as you would use a numeric value).

I think there is a lot of use case in a heterogeneous column wise data structure, but as it is now, Frame is not that. To remove this confusion, make it simpler to maintain etc I removed those Panel constructors. I think there is place for a new data structure which zips together heterogeneous columns.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants