Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor sparkctl #2368

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Conversation

ChenYi015
Copy link
Contributor

Purpose of this PR

Proposed changes:

Refactor sparkctl:

  • Use controller-runtime client instead of generated crd client to interact with API server
  • sparkctl status sub command is moved to sparkctl get so that it be consistent with kubectl get
  • Adjust the sparkctl directory structure as follows:
$ tree -L 2 cmd/sparkctl
cmd/sparkctl
├── README.md
├── create
│   ├── create.go
│   ├── create_test.go
│   ├── gcs.go
│   ├── s3.go
│   └── testdata
├── delete
│   └── delete.go
├── event
│   └── event.go
├── forward
│   └── forward.go
├── get
│   └── get.go
├── list
│   └── list.go
├── log
│   └── log.go
└── main.go

Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from chenyi015. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Signed-off-by: Yi Chen <[email protected]>
@ChenYi015
Copy link
Contributor Author

/assign @jacobsalway

@jacobsalway
Copy link
Member

jacobsalway commented Jan 21, 2025

Apologies for late review. What are your thoughts on removing sparkctl altogether? I think most of its functionality can already be achieved with kubectl, except for the job/dependency staging and creating a SparkApplication from an existing ScheduledSparkApplication.

I personally haven't ever used sparkctl in nearly two years of running the operator in production and haven't used any custom CLIs for any other Kubernetes operators either.

As for alternatives to create functionality:

  • Usually I've packaged my Spark jobs as pre-built container images, but I'm open to opinions on the job staging functionality.
  • For triggering a run from an existing scheduled application, one approach that comes to mind is an annotation-based approach similarly to how Strimzi supports restarting Kafka connectors.

@ChenYi015
Copy link
Contributor Author

ChenYi015 commented Jan 23, 2025

Actually, I do not use sparkctl at all, it is totally fine with me to remove it altogether, but I am not sure whether or not anyone is using it. I refactored this binary using the controller-runtime client so that we can remove the generated clientsets, informers, and listers later. Currently, the generated code is out-of-date for that the hack/update-codegen.sh script did not work.

@vara-bonthu
Copy link
Contributor

I agree with @jacobsalway sparkctl might not be necessary, as users typically deploy jobs using kubectl or through schedulers. We can revisit this topic in our next meeting to decide whether to keep it or remove it. Removing it could reduce maintenance efforts and simplify the codebase.

@jacobsalway
Copy link
Member

Here are some actions I think we can take or discuss at the next meeting:

  1. Consider removing sparkctl given its overlap with kubectl and to simplify the codebase.
  2. Allow users to add a k8s.sparkoperator.io/trigger=true annotation to a ScheduledSparkApplication to replicate the functionality that would be lost from the removal of sparkctl that doesn't overlap with kubectl.
  3. Fix the repo semantic versioning and fix the informer/lister/watcher code generation. I'd actually be against removing the generated code since I'm actually using it for some internal usecases, though with an older version of the CRs. [BUG] Not possible to reference module by release tags #2267 and [BUG] Generated clientsets/informers aren't updated #2268.

@jacobsalway
Copy link
Member

On a side note the clientset code gen isn't relevant anymore #2268

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants