Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace BallistaContext with SessionContext #1081

Closed
Tracked by #1068
milenkovicm opened this issue Oct 14, 2024 · 1 comment · Fixed by #1088
Closed
Tracked by #1068

Replace BallistaContext with SessionContext #1081

milenkovicm opened this issue Oct 14, 2024 · 1 comment · Fixed by #1088
Labels
enhancement New feature or request

Comments

@milenkovicm
Copy link
Contributor

milenkovicm commented Oct 14, 2024

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

As we tend to reduce code footprint I would like to propose to replace BallistaContext with SessionContext.

It would definitely improve usability as we would get most of the methods available in SessionContext also, some DataFusion applications would be deployable to Ballista with single line change.

use ballista::{extension::SessionContextExt, prelude::*};
use datafusion::prelude::SessionContext;

let ctx : SessionContext = SessionContext::ballista_standalone().await?;

With write sinks now in place, we will get write support as well, feature Ballista did not have before.

IMHO it would make a lot of sense to have a single api across DataFusion and Ballista.

If replacement is successful it would enable us to re-use Datafusion Python crate, eliminating need for maintenance
of Ballista Python, We would need to provide SessionContext::ballista_standalone and equivalent methods.

import datafusion
import ballista.standalone
from datafusion import col

# create a context (datafusion context with ballista standalone enabled)
ctx = ballista.standalone.SessionContext()

There are clear benefits of deprecation of BallistaContext, however decision may be problematic as we could not hide SessionContext
methods which do not work with ballista. SessionContext may bring usability issues with UDF support, configuration and basically all functionalities which need to be propagated across the cluster to work, and which may not be trivial to address. We may try to be address the by "turning off" those methods in ballista or just by documenting it, still some effort is needed. Or maybe its not issue at all?

Describe the solution you'd like

Rough action plan:

  • Create SessionContextExt which would expose methods for creating standalone nad remote context, re-using BallistaQueryPlanner.
  • Verify basic SQL and DataFrame support.
  • Verify/fix write support (plans with write Sink are generated but write operation does not create valid files).
  • Update python crate to create SessionContextExt.
  • Deprecate BallistaContext.
  • Deprecate ballista python.

Describe alternatives you've considered

Additional context

relates to #1068

@milenkovicm milenkovicm added the enhancement New feature or request label Oct 14, 2024
@milenkovicm
Copy link
Contributor Author

I'll take this task

milenkovicm added a commit to milenkovicm/arrow-ballista that referenced this issue Oct 14, 2024
milenkovicm added a commit to milenkovicm/arrow-ballista that referenced this issue Oct 14, 2024
milenkovicm added a commit to milenkovicm/arrow-ballista that referenced this issue Oct 14, 2024
milenkovicm added a commit to milenkovicm/arrow-ballista that referenced this issue Oct 15, 2024
milenkovicm added a commit to milenkovicm/arrow-ballista that referenced this issue Oct 15, 2024
milenkovicm added a commit to milenkovicm/arrow-ballista that referenced this issue Oct 15, 2024
milenkovicm added a commit to milenkovicm/arrow-ballista that referenced this issue Oct 15, 2024
milenkovicm added a commit to milenkovicm/arrow-ballista that referenced this issue Oct 15, 2024
milenkovicm added a commit to milenkovicm/arrow-ballista that referenced this issue Oct 15, 2024
milenkovicm added a commit to milenkovicm/arrow-ballista that referenced this issue Oct 15, 2024
milenkovicm added a commit to milenkovicm/arrow-ballista that referenced this issue Oct 17, 2024
milenkovicm added a commit to milenkovicm/arrow-ballista that referenced this issue Oct 18, 2024
milenkovicm added a commit to milenkovicm/arrow-ballista that referenced this issue Oct 20, 2024
andygrove pushed a commit that referenced this issue Oct 21, 2024
* Initial SessionContextExt skeleton

relates to #1081

* add few more tests ...

to find missing functionalities, and verify it
`SessionContextExt` will not fail any of the tests
for `BallistaContext`

* Detect if LogicalPlan is scanning information schema

... it does, we will use `DefaultPhysicalPlanner`
and execute query locally.

* change extension interface, simplifying it

* Change SessionContextExt interface ...

... add more tests

* update rustdocs

* remote methods accept `url` ...

... it would be easier to add security later.

* remove config option for now ...

... would add them in next commits, once i get
better idea about them.

* debug failed windows test

* remove `standalone` from default features in client

* fix clippy in tests

* fix formatting as well
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant