Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make inverting the call tree fast, by computing inverted call nodes lazily #4900

Merged
merged 32 commits into from
Feb 13, 2025

Conversation

mstange
Copy link
Contributor

@mstange mstange commented Jan 23, 2024

Deploy preview

Fixes #467, fixes #337, fixes #3313.

┆Issue is synchronized with this Jira Task

@mstange mstange self-assigned this Jan 23, 2024
Copy link

codecov bot commented Jan 23, 2024

Codecov Report

Attention: Patch coverage is 94.41261% with 39 lines in your changes missing coverage. Please review.

Project coverage is 86.04%. Comparing base (39c4722) to head (0cd70fa).

Files with missing lines Patch % Lines
src/profile-logic/call-node-info.js 93.86% 21 Missing and 2 partials ⚠️
src/profile-logic/profile-data.js 92.03% 9 Missing ⚠️
src/profile-logic/call-tree.js 96.75% 5 Missing ⚠️
src/components/flame-graph/FlameGraph.js 66.66% 1 Missing ⚠️
src/components/stack-chart/index.js 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4900      +/-   ##
==========================================
+ Coverage   85.93%   86.04%   +0.11%     
==========================================
  Files         312      312              
  Lines       29773    30271     +498     
  Branches     8197     8275      +78     
==========================================
+ Hits        25585    26047     +462     
- Misses       3597     3631      +34     
- Partials      591      593       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@mstange mstange force-pushed the fast-invert4 branch 3 times, most recently from 8fd4e74 to 533a704 Compare August 6, 2024 22:01
@mstange mstange force-pushed the fast-invert4 branch 3 times, most recently from 22b07b4 to b730acb Compare August 7, 2024 23:35
@mstange mstange marked this pull request as ready for review August 7, 2024 23:35
@mstange mstange requested a review from julienw August 7, 2024 23:35
@mstange
Copy link
Contributor Author

mstange commented Aug 7, 2024

This is now ready for review! A few more comments are probably needed in the "Create inverted call nodes lazily" commit, but I think it's reviewable in its current state.

mstange added a commit that referenced this pull request Aug 9, 2024

Verified

This commit was signed with the committer’s verified signature.
daviehh daviehh
…erComparator (#5076)

In #4900 I'm going to add an alternative code path to compute the
samples selected states for the inverted view. I've broken this new test
out of that PR to reduce its scope a tiny bit.
Copy link
Contributor

@julienw julienw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

submitted my current comments! Hopefully they still make sense, but I think they do :-)

* Tree Left aligned Right aligned Reordered by suffix
* - [cn0] A = A = A [so0] [so0] [cn0] A
* - [cn1] B = A -> B = A -> B [so3] [so1] [cn4] A <- A
* - [cn2] A = A -> B -> A = A -> B -> A [so2] ↘↗ [so2] [cn2] A <- B <- A
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A difference with the previous situation, is that previously, in the inverted call tree, we'd have both [A, B] and [A, B, A], despite that [B, A] doesn't exist in the uninverted call tree.
Now several inverted call nodes (in this case [A, B] and [A, B, A]) could map to the same non-inverted call node. I guess the non-inverted call node maps to its deeper version?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A difference with the previous situation, is that previously, in the inverted call tree, we'd have both [A, B] and [A, B, A], despite that [B, A] doesn't exist in the uninverted call tree. Now several inverted call nodes (in this case [A, B] and [A, B, A]) could map to the same non-inverted call node.

That's right. Later on in this comment, it says that both in2 and in3 map to cn2; that's exactly your example.

I guess the non-inverted call node maps to its deeper version?

There is not really any mapping of non-inverted call nodes to inverted call nodes. You usually don't ever need to go in that direction. The only time when we ask "which inverted call node corresponds to this non-inverted call node?" is when we the user toggles inverted mode and we try to move the selection to a place that makes sense. That's done in findHeavyPathToSameFunctionAfterInversion. And for the non-inverted node [A, B, A], this function will pick the inverted node [A]. I think that makes sense - you clicked the deepest node (an actual "leaf" node in this case), and, after inverting, leaf nodes become roots.

@mstange mstange force-pushed the fast-invert4 branch 4 times, most recently from 5cc7d34 to 21d53c7 Compare November 24, 2024 19:49
@mstange
Copy link
Contributor Author

mstange commented Nov 24, 2024

I've now split the incremental order refinement out into a separate commit at the end. I've also polished both split commits to be easier to read.

Both the "Create inverted call nodes lazily" and the "Refine the suffix order incrementally, as new inverted nodes are created" commit should be much easier to review now.

This is the main commit of this PR. Now that nothing is relying on having
an inverted call node for each sample, or on having a fully-computed
inverted call node table, we can make it so that we only add entries to
the inverted call node table when we actually need a node, for example
because it was revealed in the call tree. This makes it a lot faster
to click the "Invert call stack" checkbox - before this commit, we were
computing a lot of inverted call nodes that were never shown to the user.

After this commit, CallNodeInfoInvertedImpl no longer inherits from
CallNodeInfoImpl - it is now a fully separate implementation.

This commit reduces the time spent in `getInvertedCallNodeInfo` on an
example profile (https://share.firefox.dev/411Vg2T) from 11 seconds to 718 ms.
Before: https://share.firefox.dev/3CTNApp
After: https://share.firefox.dev/492F7wl (15x faster)
The new structure gives us a nice guarantee about roots
of the inverted tree: There is an inverted root for every
func, and their indexes are identical. This makes it really
cheap to translate between the call node index and the func index
(no conversion or lookup is necessary) and also makes it cheap
to check if a node is a root.

This commit also replaces a few maps and sets with typed arrays
for performance. This is easier now that the root indexes are
all contiguous.
This avoids a CompareIC when comparing to null in _createInvertedRootCallNodeTable,
because we'll now only be comparing integers. This speeds up
_createInvertedRootCallNodeTable by almost 2x.
…ted.

This saves a lot of work upfront that's not needed. At any given time,
we just need the suffix order to be accurate enough so that the "suffix
order index range" for every existing inverted call node is correct.

This commit reduces the time spent in `getInvertedCallNodeInfo` + `getChildren`
on an example profile (https://share.firefox.dev/411Vg2T) from 721 ms to 20 ms.
Before: https://share.firefox.dev/40Wdi6S
After: https://share.firefox.dev/3AZjbpg (35x faster)
Copy link
Contributor

@julienw julienw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed up to "Remove now-unused getCallNodeTable()." included. (all good so far)

I'm pushing my comments so far, I'm not sure I'll be able to do more today (but who knows)

*
* ## Definition
*
* We define the suffix order as the lexicographic order of the inverted call path.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth precising that it's also depth-first like the non-inverted table (this is also explained below in call paths become grouped in such a way that call paths which belong to the same *inverted* tree node but because we write "depth-first" right above, I think it's good to mention it here as well, so that a reader doesn't wonder "oh, I guess the suffix order is different" but it's not).

(if I'm not mistaken :-) )

Copy link
Contributor Author

@mstange mstange Jan 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, "depth-first" is something that describes a tree traversal, i.e. an ordering of nodes. I don't think we can use the term to describe an ordering of paths. If you want to say every path corresponds to one inverted node, then the list would be "skipping" some nodes. For example, here's a suffix-ordered list of call paths: A, A-A, C-B-A, B, C-B, C, B-D. If you make an inverted tree for this list, then that tree needs two nodes for which there aren't any exact call path matches in the list, specifically B-A and D. So it's not really a depth-first traversal of the inverted tree because it's missing some ancestor nodes.

Copy link
Contributor

@julienw julienw Feb 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mmm that's true.
I think I wanted to mean we have the ancestors before their children (when present)

Comment on lines 347 to 355
const tracedTimingNonInverted =
tracedTiming !== null && tracedTiming.type === 'NON_INVERTED'
? tracedTiming.timings
: null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I spent too much time trying to understand why we have null for the INVERTED case... until I realized that this can't happen here. Can you please be more specific at the start of the comment, saying something like that (suggestion) "When the flame graph panel is displayed, all tree computations are non inverted -- see UrlState.getInvertCallstack"

@@ -504,6 +500,9 @@ class CallNodeContextMenuImpl extends React.PureComponent<Props> {
const fileName =
filePath &&
parseFileNameFromSymbolication(filePath).path.match(/[^\\/]+$/)?.[0];

const callNodeTable = callNodeInfo.getNonInvertedCallNodeTable();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note to self, this change could have been in the previous commit (use the non inverted table for the recursive function transform)

Copy link
Contributor

@julienw julienw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm only missing the commits " Create inverted call nodes lazily" and "Refine the suffix order incrementally, as new inverted nodes are created", I reviewed the others.
Pushing the comments I have so far!

type InvertedNonRootCallNodeTable = {|
prefix: InvertedCallNodeHandle[],
func: IndexIntoFuncTable[], // IndexIntoInvertedNonRootCallNodeTable -> IndexIntoFuncTable
pathHash: string[], // IndexIntoInvertedNonRootCallNodeTable -> string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah interesting and good idea that we store it here, we might as well store it for the other tables too? The only drawback is that the strings might be large with a lot of duplication if the tree is deep.

(not in this PR)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only drawback is that the strings might be large with a lot of duplication if the tree is deep.

Although that maybe by constructing them iteratively (parent + new_function), maybe the engine shares the string parts (as dependent strings?) between the strings?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, hard to say. There's a good chance that the string starts out as a rope when it's created, like you say. But then we use this string as a key in a map, and the lookup will flatten the rope and atomize the flattened string. So in the end I'm not sure what will end up getting stored.


// Non-null for non-root nodes whose children haven't been created yet.
// For a non-root node x of the inverted tree, let k = depth[x] its depth in the inverted tree,
// and deepNodes = deepNodes[x] be its non-null deep nodes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's confusing that you use the same name deepNodes in left and right. Then below (line 327) it's not clear to me what deepNodes[x] refers to.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh indeed, that's confusing. I wonder if I had different names here in the past and this got the way it is by search & replace.

// For a non-root node x of the inverted tree, let k = depth[x] its depth in the inverted tree,
// and deepNodes = deepNodes[x] be its non-null deep nodes.
// Then, for every index i in suffixOrderIndexRangeStart[x]..suffixOrderIndexRangeEnd[x],
// the k'th prefix node of suffixOrderedCallNodes[i] is stored at deepNodes[x][i - suffixOrderIndexRangeStart[x]].
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need an example, it's not clear to me what these deepNodes are.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's an example in the long comment above export class CallNodeInvertedImpl, search for | self node | corresponding deep node |.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep I've just read it! It's a lot clearer now.
Maybe this comment here would be better as a reference to the comment above then?

Something such as:

The array at index k caches the children for the inverted node k, up to some depth.
This is useful to quickly compute the children for this inverted node.
Afterwards it's set to null for that index, but passed on to its children.
Please refer to the comment on CallNodeInvertedImpl below for a more detailed explanation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The array at index k caches the children for the inverted node k,

Although this sentence here is wrong.

Is it more:
Let's take <arr> the array at index <in> for inverted node <in>.
<arr> contains the <depth>th ancestor for all paths that "ends with" this inverted node <in>, that is all its children (no matter their respective depth).

?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also maybe add something such as "as a consequence, the deepnodes are ordered by call node suffix order"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've changed it to the following:

  // Non-null for non-root nodes whose children haven't been created yet.
  // The array at index x caches ancestors of the non-inverted nodes belonging
  // to the inverted node x, specifically the ancestor "k steps up" from each
  // non-inverted node, with k being the depth of the inverted node.
  // This is useful to quickly compute the children for this inverted node.
  // Afterwards it's set to null for that index, and the next level is passed on
  // to the newly-created children.
  // Please refer to 'Why do we keep a "deepNodes" property in the inverted table?'
  // in the comment on CallNodeInvertedImpl below for a more detailed explanation.
  //
  // For every inverted non-root call node x with deepNodes[x] !== null:
  //   For every suffix order index i in suffixOrderIndexRangeStart[x]..suffixOrderIndexRangeEnd[x],
  //   the k'th parent node of suffixOrderedCallNodes[i] is stored at
  //   deepNodes[x][i - suffixOrderIndexRangeStart[x]], with k = depth[x].
  deepNodes: Array<Uint32Array | null>, // IndexIntoInvertedNonRootCallNodeTable -> (Uint32Array | null)

Also maybe add something such as "as a consequence, the deepnodes are ordered by call node suffix order"?

Hmm, I haven't thought about whether that's true. It might well be true!

return this._invertedNonRootCallNodeTable.pathHash[nonRootIndex];
}

prefixForNode(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prefixForNode doesn't seem covered by tests, but I think this shows a lack of prior tests for inverted trees rather than a problem in this patch. This is called by CallTree.getParent, which is called when the user presses ArrowLeft in the tree.
I think this is out of scope for this patch to add coverage for this.

} else if (suffixOrderIndex >= selectedSubtreeRangeEnd) {
sampleSelectedState = 'UNSELECTED_ORDERED_AFTER_SELECTED';
}
} else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we also have no test covering inverted trees with samples filtered out...

(out of scope for the PR)

Copy link
Contributor

@julienw julienw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I couldn't go as far as I wanted, but at least I've read the long comment now, which is pretty good, thanks a lot for that! I had a few comments on it to maybe improve it, tell me what you think.

Comment on lines 480 to 489
* - [in3] A (so:1..2) = A <- A = ... A -> A (cn4)
* - [in4] B (so:2..3) = A <- B = ... B -> A (cn2)
* - [in6] A (so:2..3) = A <- B <- A = ... A -> B -> A (cn2)
* - [in1] B (so:3..5) = B = ... B (cn1, cn5)
* - [in5] A (so:3..5) = B <- A = ... A -> B (cn1, cn5)
* - [in6] A (so:4..5) = B <- A <- A = ... A -> A -> B (cn5)
* - [in7] C (so:5..7) = C = ... C (cn6, cn3)
* - [in8] A (so:5..6) = C <- A = ... A -> C (cn6)
* - [in9] B (so:6..7) = C <- B = ... B -> C (cn3)
* - [in10] A (so:6..7) = C <- B <- A = ... A -> B -> C (cn3)
* - [in10] A (so:4..5) = B <- A <- A = ... A -> A -> B (cn5)
* - [in2] C (so:5..7) = C = ... C (cn6, cn3)
* - [in7] A (so:5..6) = C <- A = ... A -> C (cn6)
* - [in8] B (so:6..7) = C <- B = ... B -> C (cn3)
* - [in9] A (so:6..7) = C <- B <- A = ... A -> B -> C (cn3)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now the counter are a little bit "random" (except for the roots), depending on when they've been accessed for the first time, aren't they?
It would be good to mention that in line 474 above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yes they are. I'll add a comment to mention that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to:

 * inX:     Inverted call node index X (this index is somewhat arbitrary because
 *          it's based on the order in which callNodeInfoInverted.getChildren is
 *          called)

Comment on lines 602 to 604
* What this example shows is that we need to look not at a self node's immediate
* parent, but rather at its (k + 1)'th parent, where k is the depth of the
* inverted node whose children we're creating.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced :D why do we need to look at this? Is that to know if the kth inverted node has children?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it just that, when we need to look at the children of in4, in4 being depth 1, its children will be depth 2, so we need to look 2 parents up?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that's exactly it. Ok I'll try to change it to say 2 first, and then do the generic k depth in the next paragraph. Anyway, what I'm trying to get across is that, if you only keep the self nodes around, then you would need to go really far up the non-inverted parents to know the functions for the inverted children. I'm trying to make the point that we also need to keep the deep nodes around.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok sounds good, maybe you can write something like that as an introduction: In the next few paragraphs, we'll explain why we need to constantly iterate over the non-inverted parents, and that the "deepNodes" property is a cache to make it faster. Keep reading!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

Comment on lines 585 to 598
*
* in4 has one self node: cn2. This is the only non-inverted node whose call path
* ends in "... -> B -> A".
*
* in4 has depth 1.
*
* in4's func is B.
* cn2's func is A. (!)
*
* cn2's func still corresponds to the inverted root, i.e. in0's func.
* But cn2's parent, cn1, has func B.
*
* And cn1's parent, cn0, has func A.
*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear to me why we need to enter into so much details here.
Isn't it quite clear that in4 has one child, with func A, because cn2 has one parent, which is A? I'm sure I'm missing some complexity...

Is that to detail how many steps we need to go up from cn2's self func to find the children?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm that's a good point, it really looks rather obvious if you put it like that. Yes, the purpose was to show that you need to go up one level from the self node if you want to know the function of the inverted child. I can try to shorten this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe one key is to talk about iterations too.
Iteration 0: compute children of root
Iteration 1: compute children of one of these children
Iteration 2: compute children of one of the children of iteration 1.

And mention that these iterations are not done right away but only on demand from a user action (for example the user expands a row in the inverted call tree, or clicks somewhere in the activity graph). This way this links the algorithm with what happens in the app and makes things more concrete.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea with the iterations! Added.

Comment on lines 613 to 615
* cn2's 0th parent is cn2.
* cn2's 1st parent is cn1.
* cn2's 2nd parent (i.e. its grandparent) is cn0.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* cn2's 0th parent is cn2.
* cn2's 1st parent is cn1.
* cn2's 2nd parent (i.e. its grandparent) is cn0.
* cn2's 0th parent is cn2 (func: A).
* cn2's 1st parent is cn1 (func: B).
* cn2's 2nd parent (i.e. its grandparent) is cn0 (func: A).

Note: I find this more concise and clearer than the write-up above! Maybe the write-up above could just be that too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, I've changed it to consistently have (func: X) behind most nodes.

* So in5 has no children.
*
* ---
*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: add a title such as Why do we keep a "deepNodes" property in the inverted table so that we can grep it easily (and follow where you're going)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a great idea!

// For a non-root node x of the inverted tree, let k = depth[x] its depth in the inverted tree,
// and deepNodes = deepNodes[x] be its non-null deep nodes.
// Then, for every index i in suffixOrderIndexRangeStart[x]..suffixOrderIndexRangeEnd[x],
// the k'th prefix node of suffixOrderedCallNodes[i] is stored at deepNodes[x][i - suffixOrderIndexRangeStart[x]].
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep I've just read it! It's a lot clearer now.
Maybe this comment here would be better as a reference to the comment above then?

Something such as:

The array at index k caches the children for the inverted node k, up to some depth.
This is useful to quickly compute the children for this inverted node.
Afterwards it's set to null for that index, but passed on to its children.
Please refer to the comment on CallNodeInvertedImpl below for a more detailed explanation.

// and deepNodes = deepNodes[x] be its non-null deep nodes.
// Then, for every index i in suffixOrderIndexRangeStart[x]..suffixOrderIndexRangeEnd[x],
// the k'th prefix node of suffixOrderedCallNodes[i] is stored at deepNodes[x][i - suffixOrderIndexRangeStart[x]].
deepNodes: Array<Uint32Array | null>, // IndexIntoInvertedNonRootCallNodeTable -> (Uint32Array | null)
Copy link
Contributor

@julienw julienw Feb 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this could be deepNodesCache, to better refer that it's used as an optimizing trick... but up to you

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Not asking to change the name now, this can be done in a follow-up if we feel it's better)

Another idea is that instead of the terminology of "deep node", it could be "ancestor node". Indeed what I struggled with is that the values in the Uint32Array are call node indexes in the non inverted call tree, they do not represent call nodes in the inverted call tree. But the term "deep" implies to me that they are children of this inverted call node (they're not!).
But I'm not sure of that so it might be good to wait (also I'd prefer to see this landed rather nitpicking on the exact terms)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the name deepNodesCache, happy to rename it to this in a follow-up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about the term "ancestor node" too but I was worried that it would be to easy to think it was referring to an ancestor in the inverted tree.

Copy link
Contributor

@julienw julienw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still haven't finished but I think I'm more and more confident about understanding this fully. I'm now looking at the meat of this commit (_computeChildrenInfo and _createChildrenForInfo).
I'm sorry this is so long.
I'm pushing the current comments so that you can see them early.

Comment on lines +1166 to +1172
while (currentHandle >= rootCount) {
const nonRootIndex = currentHandle - rootCount;
callNodePath.push(this._invertedNonRootCallNodeTable.func[nonRootIndex]);
currentHandle = this._invertedNonRootCallNodeTable.prefix[nonRootIndex];
}
const rootFunc = currentHandle;
callNodePath.push(rootFunc);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

optional: I would have used a do/while loop here, so that the rootFunc would be pushed inside the loop

(also there are too few occasions to use do/while so I try to use it when I can :D poor little underused loop control instruction)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what that would look like to be honest.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was confused, I forgot about the existence of the 2 separate tables for root vs non root nodes, sorry.

Comment on lines 1177 to 1179
// Returns a CallNodeIndex from a CallNodePath.
getCallNodeIndexFromPath(
callNodePath: CallNodePath
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you please precise if this is an inverted call node path? (I believe it is?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, whenever the caller has a CallNodeInfoInverted in their hands, the caller will be dealing with inverted call node handles and inverted call node paths. Clarified the comment.

return childNodeHandle;
}

_findDeepestKnownAncestor(callPath: CallNodePath): InvertedCallNodeHandle {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function seems to be packed with small surprising things.
My understanding is that it is essentially bisecting, right?
It would be good to mention it in a comment for the function, so that we can follow along more easily.

Also what does "known" means in this context? Is it one that's already created it the call node table (and therefore one that's in the cache)?

The function seems to be used only in getCallNodePathFromIndex it could be better that it's located closer to that function below.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: "deepest anscestor" is a term that's difficult to understand quickly IMO, it's kind of an oxymoron... Rationally I understand of course, it's "deepest" in the inverted table, but "ancestor" in the non-inverted table. I don't have a better idea at the moment though. But it could be good to mention that in the comment to the function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh haha you're right, it is packed with surprises!

I've renamed _findDeepestKnownAncestor to _findDeepestExistingInvertedAncestorNode and moved it to just below its caller.

I'm also adding the following comment:

    // Find the depth of the deepest existing ancestor node using bisection.
    // For each tested depth `currentDepth`, we create a partial inverted call
    // path `callPath.slice(0, currentDepth + 1)` and check whether we have an
    // inverted node for that partial call path. We find the largest value of
    // `currentDepth` for which the partial call path refers to an existing node,
    // and set `bestNode` to that node.

And no, it finds an ancestor node in the inverted tree!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And no, it finds an ancestor node in the inverted tree!

Ah yeah, your comments further down show that you understood this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I'm not so sure anymore. I'd need to think about it again to ask a good question, but I don't have the time atm :) let's discuss this at some point later.

const parentDeepNodeCount = parentDeepNodes.length;
const [parentIndexRangeStart, parentIndexRangeEnd] =
this.getSuffixOrderIndexRangeForCallNode(parentNodeHandle);
if (parentIndexRangeStart + parentDeepNodeCount !== parentIndexRangeEnd) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was gonna write: I'm not fully sure I understand this invariant, can you please explain?

but now I think I understand that by construction, the amount of deepNodes should be equal to the amount of children.
Maybe it would be good to write it as a comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW do we want to keep this condition explicitely, or would you rather have it as a comment assert like in other places?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't sure what to do about this. I left the explicit checks in. I don't think they affect performance much.

Comment on lines 585 to 598
*
* in4 has one self node: cn2. This is the only non-inverted node whose call path
* ends in "... -> B -> A".
*
* in4 has depth 1.
*
* in4's func is B.
* cn2's func is A. (!)
*
* cn2's func still corresponds to the inverted root, i.e. in0's func.
* But cn2's parent, cn1, has func B.
*
* And cn1's parent, cn0, has func A.
*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe one key is to talk about iterations too.
Iteration 0: compute children of root
Iteration 1: compute children of one of these children
Iteration 2: compute children of one of the children of iteration 1.

And mention that these iterations are not done right away but only on demand from a user action (for example the user expands a row in the inverted call tree, or clicks somewhere in the activity graph). This way this links the algorithm with what happens in the app and makes things more concrete.

// For a non-root node x of the inverted tree, let k = depth[x] its depth in the inverted tree,
// and deepNodes = deepNodes[x] be its non-null deep nodes.
// Then, for every index i in suffixOrderIndexRangeStart[x]..suffixOrderIndexRangeEnd[x],
// the k'th prefix node of suffixOrderedCallNodes[i] is stored at deepNodes[x][i - suffixOrderIndexRangeStart[x]].
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The array at index k caches the children for the inverted node k,

Although this sentence here is wrong.

Is it more:
Let's take <arr> the array at index <in> for inverted node <in>.
<arr> contains the <depth>th ancestor for all paths that "ends with" this inverted node <in>, that is all its children (no matter their respective depth).

?

// For a non-root node x of the inverted tree, let k = depth[x] its depth in the inverted tree,
// and deepNodes = deepNodes[x] be its non-null deep nodes.
// Then, for every index i in suffixOrderIndexRangeStart[x]..suffixOrderIndexRangeEnd[x],
// the k'th prefix node of suffixOrderedCallNodes[i] is stored at deepNodes[x][i - suffixOrderIndexRangeStart[x]].
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also maybe add something such as "as a consequence, the deepnodes are ordered by call node suffix order"?

let nodesWhichEndHereCount = 0;
while (nodesWhichEndHereCount < parentDeepNodeCount) {
const deepNode = parentDeepNodes[nodesWhichEndHereCount];
if (callNodeTable.prefix[deepNode] !== -1) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: add a comment such as "This is the first deepNode that has a parent => all the following ones will have a parent as well."

Copy link
Contributor

@julienw julienw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

review finished for Create inverted call nodes lazily.

next up is Refine the suffix order incrementally, as new inverted nodes are crea… (the last one)

* - [cn4] A = A -> A = A -> A [so1]
* - [cn5] B = A -> A -> B = A -> A -> B [so4]
* - [cn6] C = A -> C = A -> C [so5]
*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When reviewing, it helped me looking at an actual inverted tree in the UI and poking at the data structures such as the deepNodes.
I think it would be useful to show full inverted tree here. It would be the introduction to the algorithm that constructs it ("this is the our expected result, now let's look at how we build it" in a way)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, done!

const parentDeepNodeCount = parentDeepNodes.length;
const [parentIndexRangeStart, parentIndexRangeEnd] =
this.getSuffixOrderIndexRangeForCallNode(parentNodeHandle);
if (parentIndexRangeStart + parentDeepNodeCount !== parentIndexRangeEnd) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW do we want to keep this condition explicitely, or would you rather have it as a comment assert like in other places?

Comment on lines 891 to 896
const firstChildFirstParentDeepNode =
parentDeepNodes[nodesWhichEndHereCount];
const firstChildFirstDeepNode =
callNodeTable.prefix[firstChildFirstParentDeepNode];
childrenDeepNodes[0] = firstChildFirstDeepNode;
const firstChildFunc = callNodeTable.func[firstChildFirstDeepNode];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without looking too close, I wonder why this can't be part of the loop below?

(but I see this code is removed later anyway so let's skip this comment)

Comment on lines +968 to +969
nextChildSuffixOrderIndexRangeStart += deepNodeCount;
nextChildDeepNodeIndex += deepNodeCount;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(optional) if you move these additions at the end of the loop, I think this could be somewhat clearer (it's common that the end of a loop contains the operations for the next iteration), also I think you could rename them currentChildXXX instead of next, and use them directly (so you could remove suffixOrderIndexRangeStart too).

Comment on lines 990 to 991
* For all i in 0..deepNodes.length, deepNodes[i] is the k'th parent node
* of suffixOrderedCallNodes[suffixOrderIndexRangeStart + i],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* For all i in 0..deepNodes.length, deepNodes[i] is the k'th parent node
* of suffixOrderedCallNodes[suffixOrderIndexRangeStart + i],
* For all i in 0..deepNodes.length, deepNodes[i] is the k'th parent node
* of suffixOrderedCallNodes[suffixOrderIndexRangeStart + i] in the non-inverted tree,

Copy link
Contributor

@julienw julienw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I admit I haven't done as much scrutiny for the "refine order" commit than the others, but it looks very reasonable still.

Let's land this after the small comment changes are done!

Thanks a ton for the algorithm update, I could test with some of my old profiles that were also pretty slow (deploy preview / production) and they improve immensely.

Comment on lines +956 to +960
if (
nodesWhichEndHereCount + childrenDeepNodeCount !==
parentDeepNodeCount
) {
throw new Error('indexes out of sync');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to keep this assertion in the final code, or would you rather have it as a comment?

return null;
}

const childrenDeepNodeCount = parentDeepNodeCount - nodesWhichEndHereCount;
// We create one child for each distinct func we found. The children need to
// be ordered by func.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why exactly does it need to be ordered by func? (we easily see it is ordered by func with the sort operation, but not clearly why). Is that because it needs to be suffix-ordered?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to "We order the children by func so that _getChildWithFunc can use bisection."

// We have the parent's self nodes and their corresponding deep nodes.
// These nodes are currently only sorted up to the parent's depth:
// we know that every parentDeepNode has the parent's func.
// But if we look at the prefix of each parentDoopNode, we'll encounter
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo

Suggested change
// But if we look at the prefix of each parentDoopNode, we'll encounter
// But if we look at the prefix of each parentDeepNode, we'll encounter

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doop node LOL

// and deepNodes = deepNodes[x] be its non-null deep nodes.
// Then, for every index i in suffixOrderIndexRangeStart[x]..suffixOrderIndexRangeEnd[x],
// the k'th prefix node of suffixOrderedCallNodes[i] is stored at deepNodes[x][i - suffixOrderIndexRangeStart[x]].
deepNodes: Array<Uint32Array | null>, // IndexIntoInvertedNonRootCallNodeTable -> (Uint32Array | null)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Not asking to change the name now, this can be done in a follow-up if we feel it's better)

Another idea is that instead of the terminology of "deep node", it could be "ancestor node". Indeed what I struggled with is that the values in the Uint32Array are call node indexes in the non inverted call tree, they do not represent call nodes in the inverted call tree. But the term "deep" implies to me that they are children of this inverted call node (they're not!).
But I'm not sure of that so it might be good to wait (also I'd prefer to see this landed rather nitpicking on the exact terms)

@mstange mstange enabled auto-merge February 13, 2025 23:12
@mstange mstange merged commit d8bba23 into firefox-devtools:main Feb 13, 2025
15 of 16 checks passed
@canova canova mentioned this pull request Feb 19, 2025
canova added a commit that referenced this pull request Feb 19, 2025
## Updates:

[Nicolas Chevobbe] Make timeline ruler notches visible in High Contrast
Mode (#5346)
[Nazım Can Altınova] Add the ability to mark marker fields as hidden
(#5354)
[Maxx Crawford] Update guide-startup-shutdown.md (#5357)
[Nazım Can Altınova] Enable prettier on the docs-user markdown files
(#5358)
[Paul Adenot] Allow searching by Content-Type in the network marker view
(#5351)
[Florian Quèze] Hide the pid in global tracks if it is 0. (#5361)
[Markus Stange] Make inverting the call tree fast, by computing inverted
call nodes lazily (#4900)
[Markus Stange] Use 64-bit floats for call tree timings. (#5371)
[Markus Stange] Extend the workaround in the v53 upgrader to generate
missing subcategory columns. (#5369)

## Also thanks to our localizers:

de: Michael Köhler
el: Jim Spentzos
en-GB: Paul
es-CL: ravmn
fr: Théo Chevalier
fur: Fabio Tomat
fy-NL: Fjoerfoks
ia: Melo46
it: Francesco Lodolo
nl: Mark Heijl
pt-BR: Marcelo Ghelman
ru: Valery Ledovskoy
sv-SE: Luna Jernberg, Andreas Pettersson
tr: Grk
uk: Іhor Hordiichuk
zh-CN: Olvcpr423
zh-TW: Pin-guang Chen
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants