-
Notifications
You must be signed in to change notification settings - Fork 196
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove Ballista python #1069
Remove Ballista python #1069
Conversation
docs also need updating e.g. https://datafusion.apache.org/ballista/user-guide/python.html |
relates to: apache#1066 & apache#1067
c062f33
to
738d4ce
Compare
If we remove the Python bindings, how will users submit queries? I currently rely on them to run benchmarks, for example. Yes, users could start writing and compiling Rust, but I think most data scientists/engineers are much more comfortable with Python. |
I guess we should push making datafusion python running on ballista. We could keep current bindings until we make DF python supporting ballista |
As follows up work I'd try to use SessionContext instead of BallistaContext, if that works we could deprecate python and ballista context at the same time. Would that make sense @andygrove ? |
To make For Python, I think the trick is to somehow use the DataFusion Python bindings but pass the Ballista SessionContext to it? |
maybe @timsaucer or @Michael-J-Ward can help with this conversation? |
I'm not quite sure - I haven't looked at ballista since you recommended I look at ray instead. Taking a very quick look my guess is that we'd need to do something similar to what I'm working on with the FFI table provider. I'm sorry I can't be of more help right now. |
I've done quick POC in https://github.com/milenkovicm/arrow-ballista/tree/poc_client_interface where I replaced Also, use ballista::ext::BallistaExt;
use datafusion::{execution::options::ParquetReadOptions, prelude::SessionContext};
#[tokio::main]
async fn main() -> datafusion::error::Result<()> {
//
// should we remove BallistaContext ?!
//
let ctx : SessionContext = SessionContext::ballista_standalone().await?;
ctx.register_parquet(
"test",
&format!("{testdata}/alltypes_plain.parquet"),
ParquetReadOptions::default(),
)
.await?;
let df = ctx.sql("select * from test").await?;
df.write_csv(
"/directory/to_write/csv",
Default::default(),
Default::default(),
)
.await?;
Ok(())
} Resolves and executes plan, writes the file, but unfortunately file does not make sense (some kind of binary, not sure whats the issue, will have a look once I get through current PRs):
I guess if we have SessionContext it could be used with DataFusion Python, haven't done much with DF Python |
I have created #1081 in which i'll try to replace |
If you agree @andygrove I'll close this task. But will bother two of you once I get #1081 in shape, to implement those change to python |
As part of effort outlined in #1066 and #1067 this PR removes python crate.
Relates to: #1066 & #1067
Which issue does this PR close?
Closes #.
Rationale for this change
We should focus effort on providing support for DataFusion python rather than maintaining this crate
What changes are included in this PR?
Are there any user-facing changes?
Python API has been removed