-
Notifications
You must be signed in to change notification settings - Fork 516
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Do not clobber the Total line in mgr:mem report #1945
base: master
Are you sure you want to change the base?
Do not clobber the Total line in mgr:mem report #1945
Conversation
Since 2022 commit a750837, Squid reported incorrect aggregate "Total" line stats. The line itself was mislabeled by repeating the name of the last MemPools::pools pool (which may vary with traffic patterns!), confusing admins and wreaking havoc on mgr:mem analysis scripts.
Sample mgr:mem output difference: - cbdata store_client (14) 272 4961 1 1 0.00 27200.000 4961 1 ...
+ Total 1 4961 962 962 0.00 100.000 4961 962 ... The actual difference varies based on traffic (at least) because the last MemPools::pools entry depends on traffic. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Squid v6 also suffers from this bug. The change applies to v6 and appears to work there in my primitive tests.
src/mem/Stats.cc
Outdated
@@ -29,5 +24,12 @@ Mem::GlobalStats(PoolStats &stats) | |||
stats.overhead += sizeof(Allocator *); | |||
} | |||
|
|||
// Reset PoolStats::meter, label, and obj_size data members after getStats() | |||
// calls in the above loop set them. TODO: Refactor to remove these members. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refactoring mentioned in the above TODO will produce a lot of noise, so I did not include it in this bug-fixing PR.
I did remove PoolStats::pool member because keeping a stale value would essentially preserve a part of the bug this PR is fixing while resetting that raw pointer to nil is dangerous (some future code can easily start dereferencing it). That member removal did not produce a lot of noise because the removed data member was barely used.
P.S. This comment does not request any PR changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please clarify in the code comment what you mean by "these members".
FTR; this particular stats
variable holds the details for the entire MemPools
pooling structure. The meta-pool of all pools (stats.overhead += sizeof(MemPools)
) and allocations in terms of bytes (stats.obj_size = 1
) .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please clarify in the code comment what you mean by "these members".
Done in branch commit a644ade? If you need a different kind of clarification, please suggest replacement wording or detail what seems unclear. I am also OK with removing that TODO if you prefer to remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, going by that clarification, I think nobody else will be able to enact it without a much more detailed knowledge of what you in particular think is wrong with them. For now they serve a clear purpose.
So ... remove the TODO and enact it in a future PR if/when you have time to figure out the details.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove the TODO
Done in commit 5f6793b.
@@ -586,7 +586,7 @@ Mem::Report(std::ostream &stream) | |||
PoolStats mp_stats; | |||
pool->getStats(mp_stats); | |||
|
|||
if (mp_stats.pool->meter.gb_allocated.count > 0) | |||
if (pool->meter.gb_allocated.count > 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This exposes an old TOCTOU bug. It is important for this if-statement to check a member of mp_stats
when determining whether its pool was "used" or not.
Perhapse meter->gb_allocated
will work better:
if (pool->meter.gb_allocated.count > 0) | |
if (mp_stats.meter->gb_allocated.count > 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This exposes an old TOCTOU bug.
I do not see a TOCTOU bug in this code (official or PR):
- Time Of Check: This
if
statement condition. - Time Of Use: The "true" and "false" branches of this
if
statement.
It is important for this if-statement to check a member of
mp_stats
when determining whether its pool was "used" or not.
stream << "Pools ever used: " << poolCount - not_used << " (shown above)\n";
It is pretty much the opposite: In the context of this if
statement, we want to know whether the pool was "ever used" rather than whether it is "currently used" (look for "ever used" reporting in code further below that relies on the results of this if
statement branches; I quoted that code line above). Asking mp_stats is worse than asking the pool itself because mp_stats is problematic on several levels, including obscuring the distinction between "ever/cumulative" and "current" use statistics (e.g., mp_stats::items_inuse represents current use rather than cumulative use).
Perhapse
meter->gb_allocated
will work better:
if (pool->meter.gb_allocated.count > 0) | |
if (mp_stats.meter->gb_allocated.count > 0) |
The proposed change has no effect on this if
statement condition value: The two variants use two different ways to access the same gb_allocated object because mp_stats.meter
just points to pool->meter
:
stats.meter = &meter;
That gb_allocated object represents long-term (i.e. "ever used") history because MemPools::flushMeters() and flushCounters() methods preserve past gb_allocated values.
PR condition variant is better than the suggested replacement because it is more direct, because it avoids a problematic object, and because we are computing an "ever used" condition that the pool ought to know about even when mp_stats members like items_inuse represent "current use".
I believe this change request about alleged old bug is invalid (for the reasons detailed above), but even if you disagree, please dismiss your negative review (so that this arguably minimal PR fixing another bug and not changing this if
statement condition can be merged) and then post a PR dedicated to fixing that alleged old bug. That way, we will be making steady progress.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This exposes an old TOCTOU bug.
I do not see a TOCTOU bug in this code (official or PR):
* Time Of Check: This `if` statement condition.
Correct.
* Time Of Use: The "true" and "false" branches of this `if` statement.
Not necessarily, TOU refers to the value(s) being validated by the conditional.
In this case the some values are being copied from pool
to mp_stats
outside the control of this if-statement. Their TOU is inside getStats(..)
somewhere, with some pool's being changed between the TOU and TOC.
We do not notice it because the affected pools have large counts and the difference is single digits.
The proposed change has no effect on this if statement condition value: The two variants use two different ways to access the same gb_allocated object because mp_stats.meter just points to pool->meter:
You are right, yes. What we should do instead is have the getStats(..)
call inside the if-statement branch with the new condition you made depend on pool.
if (pool->meter.gb_allocated.count > 0) {
PoolStats mp_stats;
pool->getStats(mp_stats);
usedPools.emplace_back(mp_stats);
} else
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This exposes an old TOCTOU bug.
I do not see a TOCTOU bug in this code (official or PR):
* Time Of Check: This `if` statement condition.
Correct.
* Time Of Use: The "true" and "false" branches of this `if` statement.
Not necessarily, TOU refers to the value(s) being validated by the conditional.
TOU refers to use of values after the check (i.e. after TOC). That use is not limited to this if
statement -- there is more value-using code after the if
statement, of course.
As far as this PR is concerned, the condition is effectively unchanged and makes sense. The values used by conditional code (and code further below) are unchanged and make sense. We should not be discussing any of this in this PR!
In this case the some values are being copied from
pool
tomp_stats
outside the control of this if-statement.
Yes, this if
statement does not care how copied values were computed. It simply uses what was computed. There is nothing wrong with that.
Their TOU is inside
getStats(..)
somewhere, with some pool's being changed between the TOU and TOC.
TOCTOU (a.k.a. Time Of Check To Time Of Use) bugs are bugs that happen when the value changes between TOC and TOU. Whatever use happens before TOC is irrelevant to TOCTOU. ( There is also no relevant "use" inside getStats() as far as this report is concerned. "Use" happens later, when computed values are added to the report. )
For the purpose of assessing whether this if
statement introduces a TOCTOU bug, any use of a value before this if
statement is not relevant.
We do not notice it because the affected pools have large counts and the difference is single digits.
The alleged difference you are talking about (i.e. reporting zero stats for a pool that became used after its getStats()) call is not relevant to this PR.
The proposed change has no effect on this if statement condition value: The two variants use two different ways to access the same gb_allocated object because mp_stats.meter just points to pool->meter:
You are right, yes.
That "this if
statement is effectively unchanged" fact should resolve this change request. This PR does not change anything related to this change request.
What we should do instead is have the
getStats(..)
call inside the if-statement branch with the new condition you made depend on pool.
The condition is not new, as we have established above. I have not "made depend on pool" anything that had not depend on the very same pool before this PR! No use/check timings have changed.
if (pool->meter.gb_allocated.count > 0) { PoolStats mp_stats; pool->getStats(mp_stats); usedPools.emplace_back(mp_stats); } else
I agree that something like that should be done to avoid reporting all zeros for pools that became used after getStats(). This change has nothing to do with this PR though. Please let this PR merge and post a dedicated PR improving this aspect of this code. That other PR should also move the condition inside a pool method (easy) OR make getStats() constant (more improvements, but requires additional code adjustments). Otherwise, we should not reorder getStats() (that may compute meter.gb_allocated!) and the condition that checks meter.gb_allocated.
If you insist on adding these out of scope changes to this PR, I will instead revert PR changes that remove PoolStats::pool member, so that this code is completely unchanged. Would you prefer that?
src/mem/Stats.cc
Outdated
@@ -29,5 +24,12 @@ Mem::GlobalStats(PoolStats &stats) | |||
stats.overhead += sizeof(Allocator *); | |||
} | |||
|
|||
// Reset PoolStats::meter, label, and obj_size data members after getStats() | |||
// calls in the above loop set them. TODO: Refactor to remove these members. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please clarify in the code comment what you mean by "these members".
FTR; this particular stats
variable holds the details for the entire MemPools
pooling structure. The meta-pool of all pools (stats.overhead += sizeof(MemPools)
) and allocations in terms of bytes (stats.obj_size = 1
) .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modified requests from before.
Since 2022 commit a750837, Squid reported incorrect aggregate "Total"
line stats. The line itself was mislabeled by repeating the name of the
last MemPools::pools pool (which may vary with traffic patterns!),
confusing admins and wreaking havoc on mgr:mem analysis scripts.