-
Notifications
You must be signed in to change notification settings - Fork 331
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
koordlet: add resctrl qos collector #2005
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2005 +/- ##
==========================================
- Coverage 67.19% 67.14% -0.06%
==========================================
Files 451 454 +3
Lines 43468 43686 +218
==========================================
+ Hits 29208 29331 +123
- Misses 11714 11798 +84
- Partials 2546 2557 +11
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Signed-off-by: Rouzip <[email protected]>
Sorry for the late pr, any comments is welcome. |
Signed-off-by: Rouzip <[email protected]>
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
2d1efd6
to
c28c849
Compare
hello, PTAL 😊 @saintube |
Signed-off-by: Rouzip <[email protected]>
@Rouzip Thanks for your great contributions. Since it is a large patch, we need to make some tests before it is merged. |
😊 Anything I can do? |
We will verify this patch on some test environments later. It would also be appreciated if you could add more UTs to increase the patch coverage to no less than the target of 70%. |
Signed-off-by: Rouzip <[email protected]>
Signed-off-by: Frame <[email protected]>
Signed-off-by: Frame <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
PTAL /cc @zwzhang0107 @hormes
"github.com/koordinator-sh/koordinator/pkg/koordlet/metriccache" | ||
"github.com/koordinator-sh/koordinator/pkg/koordlet/metrics" | ||
"github.com/koordinator-sh/koordinator/pkg/koordlet/metricsadvisor/framework" | ||
"github.com/koordinator-sh/koordinator/pkg/koordlet/qosmanager/plugins/resctrl" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's better not not import qos plugin
these const can be moved to utils
return &ResctrlAMDReader{} | ||
} | ||
|
||
func (rr *ResctrlBaseReader) ReadResctrlL3Stat(parent string) (map[CacheId]uint64, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
more comments about ReadResctrlL3Stat/ReadResctrlMBStat can make the reader easy understand
@@ -48,7 +50,9 @@ func NewDefaultConfig() *Config { | |||
PSICollectorInterval: 10 * time.Second, | |||
CPICollectorTimeWindow: 10 * time.Second, | |||
ColdPageCollectorInterval: 5 * time.Second, | |||
ResctrlCollectorInterval: 1 * time.Second, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
now 1 second seems to short and not necessary now, how about 10 seconds by default?
QosResctrl = prometheus.NewGaugeVec(prometheus.GaugeOpts{ | ||
Subsystem: KoordletSubsystem, | ||
Name: "qos_resctrl", | ||
Help: "qos resctrl collected by koordlet", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
detailed help msg will be helpful since resctrl is really an advanced metrics.
var ( | ||
QosResctrl = prometheus.NewGaugeVec(prometheus.GaugeOpts{ | ||
Subsystem: KoordletSubsystem, | ||
Name: "qos_resctrl", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"qos" is already included in labels, maybe define as two metric such as "llc_occupancy" and "mbm_occupancy". so that we don't need to set "MetricPropertyResctrlMbType="") during record metrics.
so does the MetricPropertiesFunc
@@ -102,6 +106,12 @@ const ( | |||
|
|||
MetricPropertyCPIResource MetricProperty = "cpi_resource" | |||
|
|||
MetricPropertyNodeQos MetricProperty = "node_qos" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MetricPropertyQos MetricProperty = "qos"
Ⅰ. Describe what this PR does
Add resctrl qos collector.
Ⅱ. Does this pull request fix one issue?
fixes #1832
Ⅲ. Describe how to verify it
After enable resctrl flag in config:
curl http://localhost:9316/metrics|grep resctrl
Ⅳ. Special notes for reviews
V. Checklist
make test