-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Added HTM anomaly code with respect to Operate First cpu_usage data #9
base: master
Are you sure you want to change the base?
Conversation
483d020
to
8ff8a27
Compare
/test pre-commit |
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
@@ -0,0 +1,809 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you printing this out? Its defined clearly in the above cell.
What might be useful here is some explanation for these Parameters and how their values were selected.
Reply via ReviewNB
@@ -0,0 +1,809 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -0,0 +1,809 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line #3. predictor = Predictor(steps=[1, 5], alpha=parameters["predictor"]["sdrc_alpha"])
The parameters defined in this notebook appear to be those selected for the gymdata.csv
dataset used in the example hotgym.py
is there any work that needs to be done on our part to ensure these parameters are correct for the cpu dataset?
Why are steps 1 and 5 used in the predictor? Why not other values?
Reply via ReviewNB
@@ -0,0 +1,809 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -0,0 +1,809 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain a bit better what is being plotted here? What is 1 Step Prediction vs 5 Step? Why is there no 10 Step prediction? What are the Instantaneous and Likelihood anomalies? Are they the same algorithm with different amounts of training time? Or are they different from each other? Why is "Anomaly Likelihood considered to be the best predictor of Anomaly." ? Are there any performance metrics to back this up?
Reply via ReviewNB
@@ -0,0 +1,809 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you space out the sublplots? The titles and x-axis labels are running into each other. You might also want to use seaborn, as it creates slightly nicer graphs.
Reply via ReviewNB
@@ -0,0 +1,809 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do we know this method is good? Are there any baselines to compare it with? For example, does HTM significantly outperform just flagging all changes in value greater than 3x standard deviation over a rolling window as an anomaly? Is there anything else we can compare it to to justify the claim "the model does a good job"?
Reply via ReviewNB
@@ -0,0 +1,809 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -0,0 +1,809 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this is a naive question, but is it not possible to read the data with df = pd.read_csv("df_cpu.csv")
?
Reply via ReviewNB
@@ -0,0 +1,809 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Line #13. len(records)
Hmm inspecting the csv file on jupyterlab shows 576 rows, but here I see 574 records, any idea what might cause this inconsistency?
Reply via ReviewNB
Hey @suppathak, so I see that in addition to notebook, README, and the csv, there's also a bunch of markdown docs and images being added in this PR. Is that intentional or were this supposed to be in another PR? |
Hey @chauhankaranraj , Thanks for the comments. I will work on them. The rest of the markdown docs are from another PR and are already accepted to the master branch. This may be due to some merging error in this branch. However, after doing the rebase, the error is now sorted. Thanks :) |
4253fa5
to
fc73b9a
Compare
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@suppathak: The following test failed, say
Full PR test history. Your PR dashboard. Please help us and open an issue when you hit one in your PR. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
As a data scientist, working on HTM anomaly detection techniques, I want to create a jupyter notebook with the application of HTM-anomaly detection technique on a data from Operate first smaug cluster. I have included the notebook, dataset and a README file describing the process.
Feel free to provide feedback! Thank you.
Closes #5