Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add KEP for AddOnPlacementScoreGenerator #93

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

mvazquezc
Copy link

As discussed at the community meeting on April 6th, this is the KEP for adding the AddOnPlacementScoreGenerator API.

@mvazquezc mvazquezc force-pushed the addon-placement-score-generator branch from f9cf88f to 8031826 Compare April 13, 2023 14:25
addOnPlacementSelector:
name: latency
namespace: cluster1
latencies:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My question is, should this field latencies be a more general field? and how could a user define a URL here, what's the requirement for the URL and the output?

Copy link
Author

@mvazquezc mvazquezc Apr 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @haoqing0110 yes, other than more generic to me this would be "any string" and even we could have more than one, something like this for example:

apiVersion: cluster.open-cluster-management.io/v1alpha1
kind: AddOnPlacementScoreGenerator
metadata:
  name: cluster1-generator
  namespace: cluster1
spec:
  addOnPlacementSelector:
    name: latency
    namespace: cluster1
  latencies:
    - name: redhat-com-avgLatency
      url: https://redhat.com
      runs: 2
      waitBetweenRuns: 10s
    - name: google-com-avgLatency
      url: https://google.com
    - name: linux-com-avgLatency
      url: https://linux.com
  energy:
    - name: some-name
      someValue: some-value

Then, the controller interested in latencies will consume latencies , the controller interested in energy will consume energy.

Copy link
Member

@haoqing0110 haoqing0110 Apr 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation. This AddOnPlacementScoreGenerator sounds like a configuration for the Addon controller, and to generate different scores, developers still need to develop their own addon controller, correct? Sounds similar to the idea of https://github.com/open-cluster-management-io/enhancements/tree/main/enhancements/sig-architecture/58-addon-configuration, not sure if this can satisfy the motivation of this proposal, and this proposal is like to propose a new addon configuration called AddOnPlacementScoreGenerator.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I'm wondering could the controller to consume latencies and energy could be a unified controller, it can consume different kind of data source, eg, an URL https://redhat.com or some host path /some/path, or some other things. And this API just defines the supported data source types.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @qiujian16 , want to know your suggestions.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@haoqing0110 the use cases defined by 58-addon-configuration would work for us, I have checked the examples and the example with ManagedProxyConfiguration is very similar to what we want.

And answering your second comment, yes, the same controller could use latencies and energy data.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is more like a cluster scoped resource that applies to all clusters, it is less likely that we want different configs for different clusters. Also how could we parse the result of the http responses? It might be straightforward for latency. But if we try to fetch data from a metrics server or 3rd party services to generate the score, the returned output varies and we might need a way to define how to parse the result.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @qiujian16,

I think that for latencies global object or per-cluster object would both work. In terms of data, I was thinking of an unstructured structure where people developing custom controllers with the addon framework will consume the data the way they need it. I don't think this should be a controller taking care of multiple data sources, instead a single controller taking care of a data source, so if anything were wrong with the data itself it wouldn't affect the other add-ons.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on Mario's approach on a single controller taking care of a single data source type.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To my understanding, in this API we should define

  1. how to get the raw data. It can not be only latency. It probably could be a field type and latency could be a certain type to get raw data.
  2. how to convent raw data to the score

@mvazquezc mvazquezc force-pushed the addon-placement-score-generator branch from 8031826 to 2bf05f9 Compare July 3, 2023 08:48
@openshift-ci
Copy link

openshift-ci bot commented Jul 3, 2023

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: mvazquezc
Once this PR has been reviewed and has the lgtm label, please assign qiujian16 for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Copy link
Member

@qiujian16 qiujian16 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put some comment, but I haven't thought through the whole idea yet. Will take some time this week and add more comments.

name: cluster1-generator
namespace: cluster1
spec:
addOnPlacementSelector:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this actually the name of the score?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not really, it's the name of the AddOnPlacementScore. This selector would be used by the controller generating the AddOnPlacementScore to get the AddOnPlacementScoreGenerator that applies to this specific AddOnPlacementScore.


~~~yaml
apiVersion: cluster.open-cluster-management.io/v1alpha1
kind: AddOnPlacementScoreGenerator
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe AddOnPlacementScoreDataSource?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wfm

Copy link
Author

@mvazquezc mvazquezc Jul 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To the comment below:

  1. I don't think we want to create types... I was thinking of an unstructured data type where the controller consuming the API should take care of reading the fields it expects to be present in order to generate the required scores.
  2. Controller should take care of that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not suggest unstructured type, it will lose the schema validation and make the API unbounded.

addOnPlacementSelector:
name: latency
namespace: cluster1
latencies:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To my understanding, in this API we should define

  1. how to get the raw data. It can not be only latency. It probably could be a field type and latency could be a certain type to get raw data.
  2. how to convent raw data to the score

Copy link
Member

@qiujian16 qiujian16 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am still not convinced that this should be put in the API repo. The major uncertainty is how this API can be evolved and consumed by multiple score providers. It seems to me a new score provider has to update this API or build their own CRDs to config how score is generated. It does not seems change the fact that AddOnPlacementScore is the only integration point. But I am 100% ok if we create a separated repo and maintain this API and controller there.


~~~yaml
apiVersion: cluster.open-cluster-management.io/v1alpha1
kind: AddOnPlacementScoreGenerator
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not suggest unstructured type, it will lose the schema validation and make the API unbounded.

@mvazquezc
Copy link
Author

I am still not convinced that this should be put in the API repo. The major uncertainty is how this API can be evolved and consumed by multiple score providers. It seems to me a new score provider has to update this API or build their own CRDs to config how score is generated. It does not seems change the fact that AddOnPlacementScore is the only integration point. But I am 100% ok if we create a separated repo and maintain this API and controller there.

I'm not sure I follow you. The idea behind this CRD is to allow the customization of score generations by introducing a new API that can be consumed by the addon controllers, I don't think we want to have a dedicated controller for this API.

Multiple score providers will be required to read this API in case they want to get score-generation customizations. In case these score providers don't need customizations they're fine not reading this CRD.

To me, this API would be like having your config in a ConfigMap and reading it upon controller start.

@qiujian16
Copy link
Member

I think I understand the intention, but I do not see how it can be achieved when a new provider comes without adding additional field in this API. For example, if I have a new provider comes and allow user to configure it to score the cluster based on resource usage, I have to update the API spec or introduce a new API to config. I do not see how you can define a common api schema that can adapt to different providers.

@mvazquezc
Copy link
Author

I think I understand the intention, but I do not see how it can be achieved when a new provider comes without adding additional field in this API. For example, if I have a new provider comes and allow user to configure it to score the cluster based on resource usage, I have to update the API spec or introduce a new API to config. I do not see how you can define a common api schema that can adapt to different providers.

That's why I think it should be unstructured, so anyone can put their customization data without having to change the API. The controllers are responsible for reading their data and making sure that in case of misconfiguration, they can gracefully handle it.

@qiujian16
Copy link
Member

qiujian16 commented Jul 14, 2023

That's why I think it should be unstructured, so anyone can put their customization data without having to change the API. The controllers are responsible for reading their data and making sure that in case of misconfiguration, they can gracefully handle it.

if it is unstructured, why we need to define an API schema instead of directly using a configmap :)? I think we need to at least define some general concept in this API rather than arbitrary type...This is why I am struggling with how this API can provide commonality in this scenario.

@mvazquezc
Copy link
Author

if it is unstructured, why we need to define an API schema instead of directly using a configmap :)? I think we need to at least define some general concept in this API rather than arbitrary type...This is why I am struggling with how this API can provide commonality in this scenario.

It could be done via ConfigMaps, we won't be able to leverage things defined in the KEP like addOnPlacementSelector but I guess that's not that bad.

I cannot think of a common structured API that can be leveraged without having to update API types for every new provider. Requirements for different data providers may be very different and as such, hard to put in common.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants