Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not clobber the Total line in mgr:mem report #1945

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

rousskov
Copy link
Contributor

Since 2022 commit a750837, Squid reported incorrect aggregate "Total"
line stats. The line itself was mislabeled by repeating the name of the
last MemPools::pools pool (which may vary with traffic patterns!),
confusing admins and wreaking havoc on mgr:mem analysis scripts.

Since 2022 commit a750837, Squid reported incorrect aggregate "Total"
line stats. The line itself was mislabeled by repeating the name of the
last MemPools::pools pool (which may vary with traffic patterns!),
confusing admins and wreaking havoc on mgr:mem analysis scripts.
@rousskov
Copy link
Contributor Author

Sample mgr:mem output difference:

- cbdata store_client (14) 272 4961   1   1 0.00 27200.000 4961   1 ...
+ Total                      1 4961 962 962 0.00   100.000 4961 962 ...

The actual difference varies based on traffic (at least) because the last MemPools::pools entry depends on traffic.

Copy link
Contributor Author

@rousskov rousskov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Squid v6 also suffers from this bug. The change applies to v6 and appears to work there in my primitive tests.

src/mem/Stats.cc Outdated
@@ -29,5 +24,12 @@ Mem::GlobalStats(PoolStats &stats)
stats.overhead += sizeof(Allocator *);
}

// Reset PoolStats::meter, label, and obj_size data members after getStats()
// calls in the above loop set them. TODO: Refactor to remove these members.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactoring mentioned in the above TODO will produce a lot of noise, so I did not include it in this bug-fixing PR.

I did remove PoolStats::pool member because keeping a stale value would essentially preserve a part of the bug this PR is fixing while resetting that raw pointer to nil is dangerous (some future code can easily start dereferencing it). That member removal did not produce a lot of noise because the removed data member was barely used.

P.S. This comment does not request any PR changes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please clarify in the code comment what you mean by "these members".

FTR; this particular stats variable holds the details for the entire MemPools pooling structure. The meta-pool of all pools (stats.overhead += sizeof(MemPools)) and allocations in terms of bytes (stats.obj_size = 1) .

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please clarify in the code comment what you mean by "these members".

Done in branch commit a644ade? If you need a different kind of clarification, please suggest replacement wording or detail what seems unclear. I am also OK with removing that TODO if you prefer to remove it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, going by that clarification, I think nobody else will be able to enact it without a much more detailed knowledge of what you in particular think is wrong with them. For now they serve a clear purpose.

So ... remove the TODO and enact it in a future PR if/when you have time to figure out the details.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove the TODO

Done in commit 5f6793b.

@rousskov rousskov added M-cleared-for-merge https://github.com/measurement-factory/anubis#pull-request-labels S-could-use-an-approval An approval may speed this PR merger (but is not required) labels Nov 13, 2024
@kinkie kinkie added the backport-to-v6 maintainer has approved these changes for v6 backporting label Nov 13, 2024
@@ -586,7 +586,7 @@ Mem::Report(std::ostream &stream)
PoolStats mp_stats;
pool->getStats(mp_stats);

if (mp_stats.pool->meter.gb_allocated.count > 0)
if (pool->meter.gb_allocated.count > 0)
Copy link
Contributor

@yadij yadij Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This exposes an old TOCTOU bug. It is important for this if-statement to check a member of mp_stats when determining whether its pool was "used" or not.

Perhapse meter->gb_allocated will work better:

Suggested change
if (pool->meter.gb_allocated.count > 0)
if (mp_stats.meter->gb_allocated.count > 0)

Copy link
Contributor Author

@rousskov rousskov Nov 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This exposes an old TOCTOU bug.

I do not see a TOCTOU bug in this code (official or PR):

  • Time Of Check: This if statement condition.
  • Time Of Use: The "true" and "false" branches of this if statement.

It is important for this if-statement to check a member of mp_stats when determining whether its pool was "used" or not.

    stream << "Pools ever used:     " << poolCount - not_used << " (shown above)\n";

It is pretty much the opposite: In the context of this if statement, we want to know whether the pool was "ever used" rather than whether it is "currently used" (look for "ever used" reporting in code further below that relies on the results of this if statement branches; I quoted that code line above). Asking mp_stats is worse than asking the pool itself because mp_stats is problematic on several levels, including obscuring the distinction between "ever/cumulative" and "current" use statistics (e.g., mp_stats::items_inuse represents current use rather than cumulative use).

Perhapse meter->gb_allocated will work better:

Suggested change
if (pool->meter.gb_allocated.count > 0)
if (mp_stats.meter->gb_allocated.count > 0)

The proposed change has no effect on this if statement condition value: The two variants use two different ways to access the same gb_allocated object because mp_stats.meter just points to pool->meter:

    stats.meter = &meter;

That gb_allocated object represents long-term (i.e. "ever used") history because MemPools::flushMeters() and flushCounters() methods preserve past gb_allocated values.

PR condition variant is better than the suggested replacement because it is more direct, because it avoids a problematic object, and because we are computing an "ever used" condition that the pool ought to know about even when mp_stats members like items_inuse represent "current use".

I believe this change request about alleged old bug is invalid (for the reasons detailed above), but even if you disagree, please dismiss your negative review (so that this arguably minimal PR fixing another bug and not changing this if statement condition can be merged) and then post a PR dedicated to fixing that alleged old bug. That way, we will be making steady progress.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This exposes an old TOCTOU bug.

I do not see a TOCTOU bug in this code (official or PR):

* Time Of Check: This `if` statement condition.

Correct.

* Time Of Use: The "true" and "false" branches of this `if` statement.

Not necessarily, TOU refers to the value(s) being validated by the conditional.

In this case the some values are being copied from pool to mp_stats outside the control of this if-statement. Their TOU is inside getStats(..) somewhere, with some pool's being changed between the TOU and TOC.

We do not notice it because the affected pools have large counts and the difference is single digits.

The proposed change has no effect on this if statement condition value: The two variants use two different ways to access the same gb_allocated object because mp_stats.meter just points to pool->meter:

You are right, yes. What we should do instead is have the getStats(..) call inside the if-statement branch with the new condition you made depend on pool.

if (pool->meter.gb_allocated.count > 0) {
    PoolStats mp_stats;
    pool->getStats(mp_stats);
    usedPools.emplace_back(mp_stats);
} else

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This exposes an old TOCTOU bug.

I do not see a TOCTOU bug in this code (official or PR):

* Time Of Check: This `if` statement condition.

Correct.

* Time Of Use: The "true" and "false" branches of this `if` statement.

Not necessarily, TOU refers to the value(s) being validated by the conditional.

TOU refers to use of values after the check (i.e. after TOC). That use is not limited to this if statement -- there is more value-using code after the if statement, of course.

As far as this PR is concerned, the condition is effectively unchanged and makes sense. The values used by conditional code (and code further below) are unchanged and make sense. We should not be discussing any of this in this PR!

In this case the some values are being copied from pool to mp_stats outside the control of this if-statement.

Yes, this if statement does not care how copied values were computed. It simply uses what was computed. There is nothing wrong with that.

Their TOU is inside getStats(..) somewhere, with some pool's being changed between the TOU and TOC.

TOCTOU (a.k.a. Time Of Check To Time Of Use) bugs are bugs that happen when the value changes between TOC and TOU. Whatever use happens before TOC is irrelevant to TOCTOU. ( There is also no relevant "use" inside getStats() as far as this report is concerned. "Use" happens later, when computed values are added to the report. )

For the purpose of assessing whether this if statement introduces a TOCTOU bug, any use of a value before this if statement is not relevant.

We do not notice it because the affected pools have large counts and the difference is single digits.

The alleged difference you are talking about (i.e. reporting zero stats for a pool that became used after its getStats()) call is not relevant to this PR.

The proposed change has no effect on this if statement condition value: The two variants use two different ways to access the same gb_allocated object because mp_stats.meter just points to pool->meter:

You are right, yes.

That "this if statement is effectively unchanged" fact should resolve this change request. This PR does not change anything related to this change request.

What we should do instead is have the getStats(..) call inside the if-statement branch with the new condition you made depend on pool.

The condition is not new, as we have established above. I have not "made depend on pool" anything that had not depend on the very same pool before this PR! No use/check timings have changed.

if (pool->meter.gb_allocated.count > 0) {
    PoolStats mp_stats;
    pool->getStats(mp_stats);
    usedPools.emplace_back(mp_stats);
} else

I agree that something like that should be done to avoid reporting all zeros for pools that became used after getStats(). This change has nothing to do with this PR though. Please let this PR merge and post a dedicated PR improving this aspect of this code. That other PR should also move the condition inside a pool method (easy) OR make getStats() constant (more improvements, but requires additional code adjustments). Otherwise, we should not reorder getStats() (that may compute meter.gb_allocated!) and the condition that checks meter.gb_allocated.

If you insist on adding these out of scope changes to this PR, I will instead revert PR changes that remove PoolStats::pool member, so that this code is completely unchanged. Would you prefer that?

src/mem/Stats.cc Outdated
@@ -29,5 +24,12 @@ Mem::GlobalStats(PoolStats &stats)
stats.overhead += sizeof(Allocator *);
}

// Reset PoolStats::meter, label, and obj_size data members after getStats()
// calls in the above loop set them. TODO: Refactor to remove these members.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please clarify in the code comment what you mean by "these members".

FTR; this particular stats variable holds the details for the entire MemPools pooling structure. The meta-pool of all pools (stats.overhead += sizeof(MemPools)) and allocations in terms of bytes (stats.obj_size = 1) .

@yadij yadij added S-waiting-for-author author action is expected (and usually required) and removed M-cleared-for-merge https://github.com/measurement-factory/anubis#pull-request-labels S-could-use-an-approval An approval may speed this PR merger (but is not required) labels Nov 13, 2024
@rousskov rousskov requested a review from yadij November 14, 2024 15:04
@rousskov rousskov added S-waiting-for-reviewer ready for review: Set this when requesting a (re)review using GitHub PR Reviewers box and removed S-waiting-for-author author action is expected (and usually required) labels Nov 14, 2024
@yadij yadij added S-waiting-for-author author action is expected (and usually required) and removed S-waiting-for-reviewer ready for review: Set this when requesting a (re)review using GitHub PR Reviewers box labels Nov 16, 2024
Copy link
Contributor

@yadij yadij left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified requests from before.

@rousskov rousskov added S-waiting-for-reviewer ready for review: Set this when requesting a (re)review using GitHub PR Reviewers box and removed S-waiting-for-author author action is expected (and usually required) labels Nov 16, 2024
@rousskov rousskov requested a review from yadij November 16, 2024 03:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-to-v6 maintainer has approved these changes for v6 backporting S-waiting-for-reviewer ready for review: Set this when requesting a (re)review using GitHub PR Reviewers box
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants