Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support SKU spec upgrade #459

Merged
merged 3 commits into from
Jun 13, 2023
Merged

Support SKU spec upgrade #459

merged 3 commits into from
Jun 13, 2023

Conversation

Andyz26
Copy link
Collaborator

@Andyz26 Andyz26 commented Jun 12, 2023

Context

Currently upgrade operation on RC is for the image version.
Support a new flag in upgrade RC to be able to upgrade Sku spec in upgrade operation.

Checklist

  • ./gradlew build compiles code correctly
  • Added new tests where applicable
  • ./gradlew test passes all tests
  • Extended README or added javadocs where applicable

@Andyz26 Andyz26 temporarily deployed to Integrate Pull Request June 12, 2023 22:11 — with GitHub Actions Inactive
@github-actions
Copy link

github-actions bot commented Jun 12, 2023

Test Results

127 files  ±0  127 suites  ±0   6m 24s ⏱️ ±0s
538 tests +1  528 ✔️ ±0  8 💤 ±0  2 +1 
540 runs  +2  530 ✔️ +1  8 💤 ±0  2 +1 

For more details on these failures, see this check.

Results for commit e179555. ± Comparison against base commit b119679.

♻️ This comment has been updated with latest results.

@github-actions
Copy link

github-actions bot commented Jun 12, 2023

Uploaded Artifacts

To use these artifacts in your Gradle project, paste the following lines in your build.gradle.

resolutionStrategy {
    force "io.mantisrx:mantis-client:0.1.0-20230613.173850-288"
    force "io.mantisrx:mantis-common:0.1.0-20230613.173850-287"
    force "io.mantisrx:mantis-common-serde:0.1.0-20230613.173850-287"
    force "io.mantisrx:mantis-discovery-proto:0.1.0-20230613.173850-287"
    force "io.mantisrx:mantis-network:0.1.0-20230613.173850-287"
    force "io.mantisrx:mantis-remote-observable:0.1.0-20230613.173850-288"
    force "io.mantisrx:mantis-runtime:0.1.0-20230613.173850-288"
    force "io.mantisrx:mantis-runtime-loader:0.1.0-20230613.173850-288"
    force "io.mantisrx:mantis-shaded:0.1.0-20230613.173850-286"
    force "io.mantisrx:mantis-connector-iceberg:0.1.0-20230613.173850-286"
    force "io.mantisrx:mantis-connector-job:0.1.0-20230613.173850-288"
    force "io.mantisrx:mantis-connector-kafka:0.1.0-20230613.173850-288"
    force "io.mantisrx:mantis-connector-publish:0.1.0-20230613.173850-287"
    force "io.mantisrx:mantis-control-plane-client:0.1.0-20230613.173850-287"
    force "io.mantisrx:mantis-control-plane-core:0.1.0-20230613.173850-281"
    force "io.mantisrx:mantis-control-plane-server:0.1.0-20230613.173850-281"
    force "io.mantisrx:mantis-examples-core:0.1.0-20230613.173850-281"
    force "io.mantisrx:mantis-examples-groupby-sample:0.1.0-20230613.173850-281"
    force "io.mantisrx:mantis-examples-jobconnector-sample:0.1.0-20230613.173850-281"
    force "io.mantisrx:mantis-examples-mantis-publish-sample:0.1.0-20230613.173850-281"
    force "io.mantisrx:mantis-examples-sine-function:0.1.0-20230613.173850-281"
    force "io.mantisrx:mantis-examples-synthetic-sourcejob:0.1.0-20230613.173850-281"
    force "io.mantisrx:mantis-examples-twitter-sample:0.1.0-20230613.173850-281"
    force "io.mantisrx:mantis-examples-wordcount:0.1.0-20230613.173850-281"
    force "io.mantisrx:mantis-publish-core:0.1.0-20230613.173850-281"
    force "io.mantisrx:mantis-publish-netty:0.1.0-20230613.173850-280"
    force "io.mantisrx:mantis-publish-netty-guice:0.1.0-20230613.173850-281"
    force "io.mantisrx:mantis-server-agent:0.1.0-20230613.173850-281"
    force "io.mantisrx:mantis-server-worker:0.1.0-20230613.173850-281"
    force "io.mantisrx:mantis-server-worker-client:0.1.0-20230613.173850-281"
    force "io.mantisrx:mantis-source-job-kafka:0.1.0-20230613.173850-281"
    force "io.mantisrx:mantis-source-job-publish:0.1.0-20230613.173850-281"
}

@Andyz26 Andyz26 changed the title [WIP, Do not review] Support SKU spec upgrade Support SKU spec upgrade Jun 13, 2023
Comment on lines 43 to 44
@Nullable
MantisResourceClusterSpec resourceClusterSpec;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: why do we need this here? Do we not have a separate endpoint to update the cluster spec?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking along the lines of:

resourceCluster.sku(SKU).clusterSpec(NEW_SPEC).image(IMAGE).reconcile()

Copy link
Collaborator Author

@Andyz26 Andyz26 Jun 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is served as a translation between actors to Temporal workflow invoker. It's probably cleaner to have another intermediate contract class here instead.

Comment on lines 352 to 365
upgradeFut = this.resourceClusterStorageProvider.getResourceClusterSpecWritable(req.getClusterId())
.thenCompose(specW -> {
if (specW == null) {
return CompletableFuture.completedFuture(UpgradeClusterContainersResponse.builder()
.responseCode(ResponseCode.CLIENT_ERROR_NOT_FOUND)
.build());
}

UpgradeClusterContainersRequest enrichedReq =
req.toBuilder()
.resourceClusterSpec(specW.getClusterSpec())
.build();
return this.resourceClusterProvider.upgradeContainerResource(enrichedReq);
})
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think ideally we should not have this here as it increases the complexity of this endpoint.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Temporal workflow doesn't understand how to get the spec from internal kvDal translation. So the choices are between:

  1. Have workflow calling back to master endpoint to GET resource spec then apply it in the workflow.
  2. Have the actor include the spec to send to workflow invoker directly (current). I think this is cleaner and less interactions needed.

@Andyz26 Andyz26 temporarily deployed to Integrate Pull Request June 13, 2023 17:38 — with GitHub Actions Inactive

pipe(this.resourceClusterProvider.upgradeContainerResource(req), getContext().dispatcher()).to(getSender());
// For scaling-down the decision requires getting idle hosts first.
// if enableSkuSpecUpgrade is true, first fetch the latest spec to override the sku spec during upgrade
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of fetching this, could we send this as an argument to the workflow?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is sending the spec over the wire.


String region;

String optionalImageId;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This need not be optional, correct?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it current default to "latest"


int optionalBatchMaxSize;

boolean forceUpgradeOnSameImage;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/forceUpgradeOnSameImage/force


MantisResourceClusterEnvType optionalEnvType;

int optionalBatchMaxSize;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this batchMaxSize for?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per upgrade batch settings where each batch in upgrade should not have more than x agents involved.

@Andyz26 Andyz26 had a problem deploying to Integrate Pull Request June 13, 2023 21:37 — with GitHub Actions Failure
@Andyz26 Andyz26 merged commit 16ae78d into master Jun 13, 2023
1 of 3 checks passed
@Andyz26 Andyz26 deleted the andyz/upgradeWithSkuSpec branch July 10, 2023 05:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants