-
Notifications
You must be signed in to change notification settings - Fork 333
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ENG-6264] add monthly "item usage" report #10760
[ENG-6264] add monthly "item usage" report #10760
Conversation
b877b5a
to
4119193
Compare
3ca15b1
to
48ee82b
Compare
980523c
to
cfd914b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A good start! Just a few questions and odds and ends.
try: | ||
_next_after = _agg_result.after_key | ||
except AttributeError: | ||
return # all done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might want to show more details about the error being caught. If you know the missing attribute is after_key
it should explicitly check for that.
except _SkipItem: | ||
pass | ||
|
||
def _report_from_buckets(self, exact_bucket, contained_views_bucket): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: This would be a good place for type hinting.
def _mocks(self): | ||
with ( | ||
# set a tiny page size to force aggregation pagination: | ||
mock.patch('osf.metrics.reporters.public_item_usage._CHUNK_SIZE', 1), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like you might want to vary this, this would be a good benchmarking tests. CHUNK_SIZE is currently hard coded, it would be nice to have it be optimized or have a reasoning behind it being 500.
_empty = list(_reporter.report(ym_empty)) | ||
assert _empty == [] | ||
|
||
def test_reporter(self, ym_empty, ym_sparse, ym_busy, sparse_month_usage, busy_month_item0, busy_month_item1, busy_month_item2): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should split this into multiple test cases, I can't tell what this is testing for.
for _actionbucket in exact_bucket.agg_action: | ||
if _actionbucket.key == CountedAuthUsage.ActionLabel.VIEW.value: | ||
_report.view_count = _actionbucket.doc_count | ||
# note: view_session_count computed separately to avoid double-counting |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why?
return _report | ||
|
||
def _init_report_from_exact_bucket(self, exact_bucket) -> PublicItemUsageReport: | ||
# in the (should-be common) case of an item that has been directly viewed in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"should-be common"? without numbers this is very opaque to me. It would help to have some context about how often this happens, benchmarking would help us decide that.
assert _busy_item1.item_osfid == 'item1' | ||
assert _busy_item1.provider_id == ['prov0', 'prov1'] | ||
assert _busy_item1.platform_iri == ['http://osf.example'] | ||
assert _busy_item1.view_count == 6 * 9 + 11 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A lot of magic numbers around here.
further reviewed and merged with #10764 |
Purpose
compute monthly "usage" metrics for public items on osf
Changes
osf.metrics.reports.PublicItemUsageReport
(elasticsearch metric document model)osf.metrics.reporters.public_item_usage.PublicItemUsageReporter
(monthly reporter)osf.metrics.counted_usage
, addget_item_type
andget_provider_id
for consistent reuse with the new reporterQA Notes
Please make verification statements inspired by your code and what your code touches.
What are the areas of risk?
Any concerns/considerations/questions that development raised?
Documentation
Side Effects
Ticket