Make inverting the call tree fast, by computing inverted call nodes lazily #4900

mstange · 2024-01-23T16:53:07Z

Fixes #467, fixes #337, fixes #3313.

┆Issue is synchronized with this Jira Task

codecov · 2024-01-23T20:37:01Z

Codecov Report

Attention: Patch coverage is 94.41261% with 39 lines in your changes missing coverage. Please review.

Project coverage is 86.04%. Comparing base (39c4722) to head (0cd70fa).

Files with missing lines	Patch %	Lines
src/profile-logic/call-node-info.js	93.86%	21 Missing and 2 partials ⚠️
src/profile-logic/profile-data.js	92.03%	9 Missing ⚠️
src/profile-logic/call-tree.js	96.75%	5 Missing ⚠️
src/components/flame-graph/FlameGraph.js	66.66%	1 Missing ⚠️
src/components/stack-chart/index.js	0.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #4900      +/-   ##
==========================================
+ Coverage   85.93%   86.04%   +0.11%     
==========================================
  Files         312      312              
  Lines       29773    30271     +498     
  Branches     8197     8275      +78     
==========================================
+ Hits        25585    26047     +462     
- Misses       3597     3631      +34     
- Partials      591      593       +2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

mstange · 2024-08-07T23:37:07Z

This is now ready for review! A few more comments are probably needed in the "Create inverted call nodes lazily" commit, but I think it's reviewable in its current state.

…erComparator (#5076) In #4900 I'm going to add an alternative code path to compute the samples selected states for the inverted view. I've broken this new test out of that PR to reduce its scope a tiny bit.

julienw

submitted my current comments! Hopefully they still make sense, but I think they do :-)

julienw · 2024-08-23T13:45:33Z

src/types/profile-derived.js

+ *   Tree            Left aligned    Right aligned        Reordered by suffix
+ * - [cn0] A      =  A            =            A [so0]    [so0] [cn0] A
+ *   - [cn1] B    =  A -> B       =       A -> B [so3]    [so1] [cn4] A <- A
+ *     - [cn2] A  =  A -> B -> A  =  A -> B -> A [so2] ↘↗ [so2] [cn2] A <- B <- A


A difference with the previous situation, is that previously, in the inverted call tree, we'd have both [A, B] and [A, B, A], despite that [B, A] doesn't exist in the uninverted call tree.
Now several inverted call nodes (in this case [A, B] and [A, B, A]) could map to the same non-inverted call node. I guess the non-inverted call node maps to its deeper version?

A difference with the previous situation, is that previously, in the inverted call tree, we'd have both [A, B] and [A, B, A], despite that [B, A] doesn't exist in the uninverted call tree. Now several inverted call nodes (in this case [A, B] and [A, B, A]) could map to the same non-inverted call node.

That's right. Later on in this comment, it says that both in2 and in3 map to cn2; that's exactly your example.

I guess the non-inverted call node maps to its deeper version?

There is not really any mapping of non-inverted call nodes to inverted call nodes. You usually don't ever need to go in that direction. The only time when we ask "which inverted call node corresponds to this non-inverted call node?" is when we the user toggles inverted mode and we try to move the selection to a place that makes sense. That's done in findHeavyPathToSameFunctionAfterInversion. And for the non-inverted node [A, B, A], this function will pick the inverted node [A]. I think that makes sense - you clicked the deepest node (an actual "leaf" node in this case), and, after inverting, leaf nodes become roots.

src/types/profile-derived.js

src/utils/bisect.js

src/profile-logic/call-node-info.js

src/types/profile-derived.js

src/profile-logic/call-node-info.js

mstange · 2024-11-24T19:59:09Z

I've now split the incremental order refinement out into a separate commit at the end. I've also polished both split commits to be easier to read.

Both the "Create inverted call nodes lazily" and the "Refine the suffix order incrementally, as new inverted nodes are created" commit should be much easier to review now.

This is the main commit of this PR. Now that nothing is relying on having an inverted call node for each sample, or on having a fully-computed inverted call node table, we can make it so that we only add entries to the inverted call node table when we actually need a node, for example because it was revealed in the call tree. This makes it a lot faster to click the "Invert call stack" checkbox - before this commit, we were computing a lot of inverted call nodes that were never shown to the user. After this commit, CallNodeInfoInvertedImpl no longer inherits from CallNodeInfoImpl - it is now a fully separate implementation. This commit reduces the time spent in `getInvertedCallNodeInfo` on an example profile (https://share.firefox.dev/411Vg2T) from 11 seconds to 718 ms. Before: https://share.firefox.dev/3CTNApp After: https://share.firefox.dev/492F7wl (15x faster)

The new structure gives us a nice guarantee about roots of the inverted tree: There is an inverted root for every func, and their indexes are identical. This makes it really cheap to translate between the call node index and the func index (no conversion or lookup is necessary) and also makes it cheap to check if a node is a root. This commit also replaces a few maps and sets with typed arrays for performance. This is easier now that the root indexes are all contiguous.

This avoids a CompareIC when comparing to null in _createInvertedRootCallNodeTable, because we'll now only be comparing integers. This speeds up _createInvertedRootCallNodeTable by almost 2x.

…ted. This saves a lot of work upfront that's not needed. At any given time, we just need the suffix order to be accurate enough so that the "suffix order index range" for every existing inverted call node is correct. This commit reduces the time spent in `getInvertedCallNodeInfo` + `getChildren` on an example profile (https://share.firefox.dev/411Vg2T) from 721 ms to 20 ms. Before: https://share.firefox.dev/40Wdi6S After: https://share.firefox.dev/3AZjbpg (35x faster)

julienw

Reviewed up to "Remove now-unused getCallNodeTable()." included. (all good so far)

I'm pushing my comments so far, I'm not sure I'll be able to do more today (but who knows)

julienw · 2025-01-30T14:11:12Z

src/types/profile-derived.js

+ *
+ * ## Definition
+ *
+ * We define the suffix order as the lexicographic order of the inverted call path.


It might be worth precising that it's also depth-first like the non-inverted table (this is also explained below in call paths become grouped in such a way that call paths which belong to the same *inverted* tree node but because we write "depth-first" right above, I think it's good to mention it here as well, so that a reader doesn't wonder "oh, I guess the suffix order is different" but it's not).

(if I'm not mistaken :-) )

Hmm, "depth-first" is something that describes a tree traversal, i.e. an ordering of nodes. I don't think we can use the term to describe an ordering of paths. If you want to say every path corresponds to one inverted node, then the list would be "skipping" some nodes. For example, here's a suffix-ordered list of call paths: A, A-A, C-B-A, B, C-B, C, B-D. If you make an inverted tree for this list, then that tree needs two nodes for which there aren't any exact call path matches in the list, specifically B-A and D. So it's not really a depth-first traversal of the inverted tree because it's missing some ancestor nodes.

mmm that's true.
I think I wanted to mean we have the ancestors before their children (when present)

src/profile-logic/call-node-info.js

src/profile-logic/profile-data.js

julienw · 2025-01-30T18:23:32Z

src/components/flame-graph/FlameGraph.js

+    const tracedTimingNonInverted =
+      tracedTiming !== null && tracedTiming.type === 'NON_INVERTED'
+        ? tracedTiming.timings
+        : null;


I spent too much time trying to understand why we have null for the INVERTED case... until I realized that this can't happen here. Can you please be more specific at the start of the comment, saying something like that (suggestion) "When the flame graph panel is displayed, all tree computations are non inverted -- see UrlState.getInvertCallstack"

src/profile-logic/call-tree.js

julienw · 2025-01-31T14:25:13Z

src/components/shared/CallNodeContextMenu.js

@@ -504,6 +500,9 @@ class CallNodeContextMenuImpl extends React.PureComponent<Props> {
    const fileName =
      filePath &&
      parseFileNameFromSymbolication(filePath).path.match(/[^\\/]+$/)?.[0];
+
+    const callNodeTable = callNodeInfo.getNonInvertedCallNodeTable();


note to self, this change could have been in the previous commit (use the non inverted table for the recursive function transform)

julienw

I'm only missing the commits " Create inverted call nodes lazily" and "Refine the suffix order incrementally, as new inverted nodes are created", I reviewed the others.
Pushing the comments I have so far!

julienw · 2025-02-03T16:57:53Z

src/profile-logic/call-node-info.js

+type InvertedNonRootCallNodeTable = {|
+  prefix: InvertedCallNodeHandle[],
+  func: IndexIntoFuncTable[], // IndexIntoInvertedNonRootCallNodeTable -> IndexIntoFuncTable
+  pathHash: string[], // IndexIntoInvertedNonRootCallNodeTable -> string


ah interesting and good idea that we store it here, we might as well store it for the other tables too? The only drawback is that the strings might be large with a lot of duplication if the tree is deep.

(not in this PR)

The only drawback is that the strings might be large with a lot of duplication if the tree is deep.

Although that maybe by constructing them iteratively (parent + new_function), maybe the engine shares the string parts (as dependent strings?) between the strings?

Hmm, hard to say. There's a good chance that the string starts out as a rope when it's created, like you say. But then we use this string as a key in a map, and the lookup will flatten the rope and atomize the flattened string. So in the end I'm not sure what will end up getting stored.

julienw · 2025-02-03T17:02:31Z

src/profile-logic/call-node-info.js

+
+  // Non-null for non-root nodes whose children haven't been created yet.
+  // For a non-root node x of the inverted tree, let k = depth[x] its depth in the inverted tree,
+  // and deepNodes = deepNodes[x] be its non-null deep nodes.


it's confusing that you use the same name deepNodes in left and right. Then below (line 327) it's not clear to me what deepNodes[x] refers to.

Oh indeed, that's confusing. I wonder if I had different names here in the past and this got the way it is by search & replace.

julienw · 2025-02-03T17:02:46Z

src/profile-logic/call-node-info.js

+  // For a non-root node x of the inverted tree, let k = depth[x] its depth in the inverted tree,
+  // and deepNodes = deepNodes[x] be its non-null deep nodes.
+  // Then, for every index i in suffixOrderIndexRangeStart[x]..suffixOrderIndexRangeEnd[x],
+  // the k'th prefix node of suffixOrderedCallNodes[i] is stored at deepNodes[x][i - suffixOrderIndexRangeStart[x]].


I need an example, it's not clear to me what these deepNodes are.

There's an example in the long comment above export class CallNodeInvertedImpl, search for | self node | corresponding deep node |.

Yep I've just read it! It's a lot clearer now.
Maybe this comment here would be better as a reference to the comment above then?

Something such as:

The array at index k caches the children for the inverted node k, up to some depth. This is useful to quickly compute the children for this inverted node. Afterwards it's set to null for that index, but passed on to its children. Please refer to the comment on CallNodeInvertedImpl below for a more detailed explanation.

The array at index k caches the children for the inverted node k,

Although this sentence here is wrong.

Is it more:
Let's take <arr> the array at index <in> for inverted node <in>.
<arr> contains the <depth>th ancestor for all paths that "ends with" this inverted node <in>, that is all its children (no matter their respective depth).

?

Also maybe add something such as "as a consequence, the deepnodes are ordered by call node suffix order"?

I've changed it to the following:

// Non-null for non-root nodes whose children haven't been created yet. // The array at index x caches ancestors of the non-inverted nodes belonging // to the inverted node x, specifically the ancestor "k steps up" from each // non-inverted node, with k being the depth of the inverted node. // This is useful to quickly compute the children for this inverted node. // Afterwards it's set to null for that index, and the next level is passed on // to the newly-created children. // Please refer to 'Why do we keep a "deepNodes" property in the inverted table?' // in the comment on CallNodeInvertedImpl below for a more detailed explanation. // // For every inverted non-root call node x with deepNodes[x] !== null: // For every suffix order index i in suffixOrderIndexRangeStart[x]..suffixOrderIndexRangeEnd[x], // the k'th parent node of suffixOrderedCallNodes[i] is stored at // deepNodes[x][i - suffixOrderIndexRangeStart[x]], with k = depth[x]. deepNodes: Array<Uint32Array | null>, // IndexIntoInvertedNonRootCallNodeTable -> (Uint32Array | null)

Also maybe add something such as "as a consequence, the deepnodes are ordered by call node suffix order"?

Hmm, I haven't thought about whether that's true. It might well be true!

julienw · 2025-02-03T18:12:28Z

src/profile-logic/call-node-info.js

+    return this._invertedNonRootCallNodeTable.pathHash[nonRootIndex];
+  }
+
+  prefixForNode(


prefixForNode doesn't seem covered by tests, but I think this shows a lack of prior tests for inverted trees rather than a problem in this patch. This is called by CallTree.getParent, which is called when the user presses ArrowLeft in the tree.
I think this is out of scope for this patch to add coverage for this.

julienw · 2025-02-03T18:13:35Z

src/profile-logic/profile-data.js

+      } else if (suffixOrderIndex >= selectedSubtreeRangeEnd) {
+        sampleSelectedState = 'UNSELECTED_ORDERED_AFTER_SELECTED';
+      }
+    } else {


Looks like we also have no test covering inverted trees with samples filtered out...

(out of scope for the PR)

julienw

Sorry, I couldn't go as far as I wanted, but at least I've read the long comment now, which is pretty good, thanks a lot for that! I had a few comments on it to maybe improve it, tell me what you think.

julienw · 2025-02-05T14:46:11Z

src/types/profile-derived.js

+ *   - [in3] A  (so:1..2)      =  A <- A        =       ... A -> A (cn4)
+ *   - [in4] B  (so:2..3)      =  A <- B        =       ... B -> A (cn2)
+ *     - [in6] A  (so:2..3)    =  A <- B <- A   =  ... A -> B -> A (cn2)
+ * - [in1] B  (so:3..5)        =  B             =            ... B (cn1, cn5)
 *   - [in5] A  (so:3..5)      =  B <- A        =       ... A -> B (cn1, cn5)
- *     - [in6] A  (so:4..5)    =  B <- A <- A   =  ... A -> A -> B (cn5)
- * - [in7] C  (so:5..7)        =  C             =            ... C (cn6, cn3)
- *   - [in8] A  (so:5..6)      =  C <- A        =       ... A -> C (cn6)
- *   - [in9] B  (so:6..7)      =  C <- B        =       ... B -> C (cn3)
- *     - [in10] A  (so:6..7)   =  C <- B <- A   =  ... A -> B -> C (cn3)
+ *     - [in10] A  (so:4..5)   =  B <- A <- A   =  ... A -> A -> B (cn5)
+ * - [in2] C  (so:5..7)        =  C             =            ... C (cn6, cn3)
+ *   - [in7] A  (so:5..6)      =  C <- A        =       ... A -> C (cn6)
+ *   - [in8] B  (so:6..7)      =  C <- B        =       ... B -> C (cn3)
+ *     - [in9] A  (so:6..7)    =  C <- B <- A   =  ... A -> B -> C (cn3)


Now the counter are a little bit "random" (except for the roots), depending on when they've been accessed for the first time, aren't they?
It would be good to mention that in line 474 above.

Ah, yes they are. I'll add a comment to mention that.

Changed to:

* inX: Inverted call node index X (this index is somewhat arbitrary because * it's based on the order in which callNodeInfoInverted.getChildren is * called)

julienw · 2025-02-05T17:36:52Z

src/profile-logic/call-node-info.js

+ * What this example shows is that we need to look not at a self node's immediate
+ * parent, but rather at its (k + 1)'th parent, where k is the depth of the
+ * inverted node whose children we're creating.


I'm not convinced :D why do we need to look at this? Is that to know if the kth inverted node has children?

Is it just that, when we need to look at the children of in4, in4 being depth 1, its children will be depth 2, so we need to look 2 parents up?

Yes that's exactly it. Ok I'll try to change it to say 2 first, and then do the generic k depth in the next paragraph. Anyway, what I'm trying to get across is that, if you only keep the self nodes around, then you would need to go really far up the non-inverted parents to know the functions for the inverted children. I'm trying to make the point that we also need to keep the deep nodes around.

ok sounds good, maybe you can write something like that as an introduction: In the next few paragraphs, we'll explain why we need to constantly iterate over the non-inverted parents, and that the "deepNodes" property is a cache to make it faster. Keep reading!

julienw · 2025-02-05T17:47:16Z

src/profile-logic/call-node-info.js

+ *
+ * in4 has one self node: cn2. This is the only non-inverted node whose call path
+ * ends in "... -> B -> A".
+ *
+ * in4 has depth 1.
+ *
+ * in4's func is B.
+ * cn2's func is A. (!)
+ *
+ * cn2's func still corresponds to the inverted root, i.e. in0's func.
+ * But cn2's parent, cn1, has func B.
+ *
+ * And cn1's parent, cn0, has func A.
+ *


It's not clear to me why we need to enter into so much details here.
Isn't it quite clear that in4 has one child, with func A, because cn2 has one parent, which is A? I'm sure I'm missing some complexity...

Is that to detail how many steps we need to go up from cn2's self func to find the children?

Hmm that's a good point, it really looks rather obvious if you put it like that. Yes, the purpose was to show that you need to go up one level from the self node if you want to know the function of the inverted child. I can try to shorten this.

Maybe one key is to talk about iterations too.
Iteration 0: compute children of root
Iteration 1: compute children of one of these children
Iteration 2: compute children of one of the children of iteration 1.

And mention that these iterations are not done right away but only on demand from a user action (for example the user expands a row in the inverted call tree, or clicks somewhere in the activity graph). This way this links the algorithm with what happens in the app and makes things more concrete.

Good idea with the iterations! Added.

julienw · 2025-02-05T17:48:50Z

src/profile-logic/call-node-info.js

+ * cn2's 0th parent is cn2.
+ * cn2's 1st parent is cn1.
+ * cn2's 2nd parent (i.e. its grandparent) is cn0.


Suggested change

* cn2's 0th parent is cn2.

* cn2's 1st parent is cn1.

* cn2's 2nd parent (i.e. its grandparent) is cn0.

* cn2's 0th parent is cn2 (func: A).

* cn2's 1st parent is cn1 (func: B).

* cn2's 2nd parent (i.e. its grandparent) is cn0 (func: A).

Note: I find this more concise and clearer than the write-up above! Maybe the write-up above could just be that too?

Good idea, I've changed it to consistently have (func: X) behind most nodes.

julienw · 2025-02-05T17:56:07Z

src/profile-logic/call-node-info.js

+ * So in5 has no children.
+ *
+ * ---
+ *


nit: add a title such as Why do we keep a "deepNodes" property in the inverted table so that we can grep it easily (and follow where you're going)

That's a great idea!

julienw · 2025-02-05T18:00:01Z

src/profile-logic/call-node-info.js

+  // For a non-root node x of the inverted tree, let k = depth[x] its depth in the inverted tree,
+  // and deepNodes = deepNodes[x] be its non-null deep nodes.
+  // Then, for every index i in suffixOrderIndexRangeStart[x]..suffixOrderIndexRangeEnd[x],
+  // the k'th prefix node of suffixOrderedCallNodes[i] is stored at deepNodes[x][i - suffixOrderIndexRangeStart[x]].


Yep I've just read it! It's a lot clearer now.
Maybe this comment here would be better as a reference to the comment above then?

Something such as:

The array at index k caches the children for the inverted node k, up to some depth. This is useful to quickly compute the children for this inverted node. Afterwards it's set to null for that index, but passed on to its children. Please refer to the comment on CallNodeInvertedImpl below for a more detailed explanation.

julienw · 2025-02-05T18:00:59Z

src/profile-logic/call-node-info.js

+  // and deepNodes = deepNodes[x] be its non-null deep nodes.
+  // Then, for every index i in suffixOrderIndexRangeStart[x]..suffixOrderIndexRangeEnd[x],
+  // the k'th prefix node of suffixOrderedCallNodes[i] is stored at deepNodes[x][i - suffixOrderIndexRangeStart[x]].
+  deepNodes: Array<Uint32Array | null>, // IndexIntoInvertedNonRootCallNodeTable -> (Uint32Array | null)


I wonder if this could be deepNodesCache, to better refer that it's used as an optimizing trick... but up to you

(Not asking to change the name now, this can be done in a follow-up if we feel it's better)

Another idea is that instead of the terminology of "deep node", it could be "ancestor node". Indeed what I struggled with is that the values in the Uint32Array are call node indexes in the non inverted call tree, they do not represent call nodes in the inverted call tree. But the term "deep" implies to me that they are children of this inverted call node (they're not!).
But I'm not sure of that so it might be good to wait (also I'd prefer to see this landed rather nitpicking on the exact terms)

I like the name deepNodesCache, happy to rename it to this in a follow-up.

I was thinking about the term "ancestor node" too but I was worried that it would be to easy to think it was referring to an ancestor in the inverted tree.

julienw

I still haven't finished but I think I'm more and more confident about understanding this fully. I'm now looking at the meat of this commit (_computeChildrenInfo and _createChildrenForInfo).
I'm sorry this is so long.
I'm pushing the current comments so that you can see them early.

julienw · 2025-02-06T14:46:35Z

src/profile-logic/call-node-info.js

+    while (currentHandle >= rootCount) {
+      const nonRootIndex = currentHandle - rootCount;
+      callNodePath.push(this._invertedNonRootCallNodeTable.func[nonRootIndex]);
+      currentHandle = this._invertedNonRootCallNodeTable.prefix[nonRootIndex];
+    }
+    const rootFunc = currentHandle;
+    callNodePath.push(rootFunc);


optional: I would have used a do/while loop here, so that the rootFunc would be pushed inside the loop

(also there are too few occasions to use do/while so I try to use it when I can :D poor little underused loop control instruction)

I'm not sure what that would look like to be honest.

I was confused, I forgot about the existence of the 2 separate tables for root vs non root nodes, sorry.

julienw · 2025-02-07T13:31:51Z

src/profile-logic/call-node-info.js

+  // Returns a CallNodeIndex from a CallNodePath.
+  getCallNodeIndexFromPath(
+    callNodePath: CallNodePath


can you please precise if this is an inverted call node path? (I believe it is?)

Yes, whenever the caller has a CallNodeInfoInverted in their hands, the caller will be dealing with inverted call node handles and inverted call node paths. Clarified the comment.

julienw · 2025-02-07T13:43:42Z

src/profile-logic/call-node-info.js

+    return childNodeHandle;
+  }
+
+  _findDeepestKnownAncestor(callPath: CallNodePath): InvertedCallNodeHandle {


This function seems to be packed with small surprising things.
My understanding is that it is essentially bisecting, right?
It would be good to mention it in a comment for the function, so that we can follow along more easily.

Also what does "known" means in this context? Is it one that's already created it the call node table (and therefore one that's in the cache)?

The function seems to be used only in getCallNodePathFromIndex it could be better that it's located closer to that function below.

note: "deepest anscestor" is a term that's difficult to understand quickly IMO, it's kind of an oxymoron... Rationally I understand of course, it's "deepest" in the inverted table, but "ancestor" in the non-inverted table. I don't have a better idea at the moment though. But it could be good to mention that in the comment to the function.

Oh haha you're right, it is packed with surprises!

I've renamed _findDeepestKnownAncestor to _findDeepestExistingInvertedAncestorNode and moved it to just below its caller.

I'm also adding the following comment:

// Find the depth of the deepest existing ancestor node using bisection. // For each tested depth `currentDepth`, we create a partial inverted call // path `callPath.slice(0, currentDepth + 1)` and check whether we have an // inverted node for that partial call path. We find the largest value of // `currentDepth` for which the partial call path refers to an existing node, // and set `bestNode` to that node.

And no, it finds an ancestor node in the inverted tree!

And no, it finds an ancestor node in the inverted tree!

Ah yeah, your comments further down show that you understood this.

Ah I'm not so sure anymore. I'd need to think about it again to ask a good question, but I don't have the time atm :) let's discuss this at some point later.

src/profile-logic/call-node-info.js

julienw · 2025-02-07T14:57:50Z

src/profile-logic/call-node-info.js

+    const parentDeepNodeCount = parentDeepNodes.length;
+    const [parentIndexRangeStart, parentIndexRangeEnd] =
+      this.getSuffixOrderIndexRangeForCallNode(parentNodeHandle);
+    if (parentIndexRangeStart + parentDeepNodeCount !== parentIndexRangeEnd) {


I was gonna write: I'm not fully sure I understand this invariant, can you please explain?

but now I think I understand that by construction, the amount of deepNodes should be equal to the amount of children.
Maybe it would be good to write it as a comment.

BTW do we want to keep this condition explicitely, or would you rather have it as a comment assert like in other places?

I wasn't sure what to do about this. I left the explicit checks in. I don't think they affect performance much.

julienw · 2025-02-07T15:01:19Z

src/profile-logic/call-node-info.js

+ *
+ * in4 has one self node: cn2. This is the only non-inverted node whose call path
+ * ends in "... -> B -> A".
+ *
+ * in4 has depth 1.
+ *
+ * in4's func is B.
+ * cn2's func is A. (!)
+ *
+ * cn2's func still corresponds to the inverted root, i.e. in0's func.
+ * But cn2's parent, cn1, has func B.
+ *
+ * And cn1's parent, cn0, has func A.
+ *


Maybe one key is to talk about iterations too.
Iteration 0: compute children of root
Iteration 1: compute children of one of these children
Iteration 2: compute children of one of the children of iteration 1.

And mention that these iterations are not done right away but only on demand from a user action (for example the user expands a row in the inverted call tree, or clicks somewhere in the activity graph). This way this links the algorithm with what happens in the app and makes things more concrete.

julienw · 2025-02-07T16:04:48Z

src/profile-logic/call-node-info.js

+  // For a non-root node x of the inverted tree, let k = depth[x] its depth in the inverted tree,
+  // and deepNodes = deepNodes[x] be its non-null deep nodes.
+  // Then, for every index i in suffixOrderIndexRangeStart[x]..suffixOrderIndexRangeEnd[x],
+  // the k'th prefix node of suffixOrderedCallNodes[i] is stored at deepNodes[x][i - suffixOrderIndexRangeStart[x]].


The array at index k caches the children for the inverted node k,

Although this sentence here is wrong.

Is it more:
Let's take <arr> the array at index <in> for inverted node <in>.
<arr> contains the <depth>th ancestor for all paths that "ends with" this inverted node <in>, that is all its children (no matter their respective depth).

?

julienw · 2025-02-07T16:13:10Z

src/profile-logic/call-node-info.js

+  // For a non-root node x of the inverted tree, let k = depth[x] its depth in the inverted tree,
+  // and deepNodes = deepNodes[x] be its non-null deep nodes.
+  // Then, for every index i in suffixOrderIndexRangeStart[x]..suffixOrderIndexRangeEnd[x],
+  // the k'th prefix node of suffixOrderedCallNodes[i] is stored at deepNodes[x][i - suffixOrderIndexRangeStart[x]].


Also maybe add something such as "as a consequence, the deepnodes are ordered by call node suffix order"?

julienw · 2025-02-07T16:14:24Z

src/profile-logic/call-node-info.js

+    let nodesWhichEndHereCount = 0;
+    while (nodesWhichEndHereCount < parentDeepNodeCount) {
+      const deepNode = parentDeepNodes[nodesWhichEndHereCount];
+      if (callNodeTable.prefix[deepNode] !== -1) {


nit: add a comment such as "This is the first deepNode that has a parent => all the following ones will have a parent as well."

julienw

review finished for Create inverted call nodes lazily.

next up is Refine the suffix order incrementally, as new inverted nodes are crea… (the last one)

julienw · 2025-02-11T13:55:49Z

src/profile-logic/call-node-info.js

+ *   - [cn4] A    =  A -> A       =       A -> A [so1]
+ *     - [cn5] B  =  A -> A -> B  =  A -> A -> B [so4]
+ *   - [cn6] C    =  A -> C       =       A -> C [so5]
+ *


When reviewing, it helped me looking at an actual inverted tree in the UI and poking at the data structures such as the deepNodes.
I think it would be useful to show full inverted tree here. It would be the introduction to the algorithm that constructs it ("this is the our expected result, now let's look at how we build it" in a way)

Good idea, done!

julienw · 2025-02-11T13:58:10Z

src/profile-logic/call-node-info.js

+    const parentDeepNodeCount = parentDeepNodes.length;
+    const [parentIndexRangeStart, parentIndexRangeEnd] =
+      this.getSuffixOrderIndexRangeForCallNode(parentNodeHandle);
+    if (parentIndexRangeStart + parentDeepNodeCount !== parentIndexRangeEnd) {


BTW do we want to keep this condition explicitely, or would you rather have it as a comment assert like in other places?

julienw · 2025-02-11T14:03:22Z

src/profile-logic/call-node-info.js

+    const firstChildFirstParentDeepNode =
+      parentDeepNodes[nodesWhichEndHereCount];
+    const firstChildFirstDeepNode =
+      callNodeTable.prefix[firstChildFirstParentDeepNode];
+    childrenDeepNodes[0] = firstChildFirstDeepNode;
+    const firstChildFunc = callNodeTable.func[firstChildFirstDeepNode];


Without looking too close, I wonder why this can't be part of the loop below?

(but I see this code is removed later anyway so let's skip this comment)

julienw · 2025-02-11T14:23:34Z

src/profile-logic/call-node-info.js

+      nextChildSuffixOrderIndexRangeStart += deepNodeCount;
+      nextChildDeepNodeIndex += deepNodeCount;


(optional) if you move these additions at the end of the loop, I think this could be somewhat clearer (it's common that the end of a loop contains the operations for the next iteration), also I think you could rename them currentChildXXX instead of next, and use them directly (so you could remove suffixOrderIndexRangeStart too).

julienw · 2025-02-11T14:50:32Z

src/profile-logic/call-node-info.js

+   * For all i in 0..deepNodes.length, deepNodes[i] is the k'th parent node
+   * of suffixOrderedCallNodes[suffixOrderIndexRangeStart + i],


Suggested change

* For all i in 0..deepNodes.length, deepNodes[i] is the k'th parent node

* of suffixOrderedCallNodes[suffixOrderIndexRangeStart + i],

* For all i in 0..deepNodes.length, deepNodes[i] is the k'th parent node

* of suffixOrderedCallNodes[suffixOrderIndexRangeStart + i] in the non-inverted tree,

julienw

I admit I haven't done as much scrutiny for the "refine order" commit than the others, but it looks very reasonable still.

Let's land this after the small comment changes are done!

Thanks a ton for the algorithm update, I could test with some of my old profiles that were also pretty slow (deploy preview / production) and they improve immensely.

julienw · 2025-02-12T12:39:52Z

src/profile-logic/call-node-info.js

+    if (
+      nodesWhichEndHereCount + childrenDeepNodeCount !==
+      parentDeepNodeCount
+    ) {
+      throw new Error('indexes out of sync');


do we want to keep this assertion in the final code, or would you rather have it as a comment?

julienw · 2025-02-12T12:50:46Z

src/profile-logic/call-node-info.js

      return null;
    }

-    const childrenDeepNodeCount = parentDeepNodeCount - nodesWhichEndHereCount;
+    // We create one child for each distinct func we found. The children need to
+    // be ordered by func.


Why exactly does it need to be ordered by func? (we easily see it is ordered by func with the sort operation, but not clearly why). Is that because it needs to be suffix-ordered?

Changed to "We order the children by func so that _getChildWithFunc can use bisection."

julienw · 2025-02-12T13:09:50Z

src/profile-logic/call-node-info.js

+    // We have the parent's self nodes and their corresponding deep nodes.
+    // These nodes are currently only sorted up to the parent's depth:
+    // we know that every parentDeepNode has the parent's func.
+    // But if we look at the prefix of each parentDoopNode, we'll encounter


typo

Suggested change

// But if we look at the prefix of each parentDoopNode, we'll encounter

// But if we look at the prefix of each parentDeepNode, we'll encounter

doop node LOL

julienw · 2025-02-12T13:50:36Z

src/profile-logic/call-node-info.js

+  // and deepNodes = deepNodes[x] be its non-null deep nodes.
+  // Then, for every index i in suffixOrderIndexRangeStart[x]..suffixOrderIndexRangeEnd[x],
+  // the k'th prefix node of suffixOrderedCallNodes[i] is stored at deepNodes[x][i - suffixOrderIndexRangeStart[x]].
+  deepNodes: Array<Uint32Array | null>, // IndexIntoInvertedNonRootCallNodeTable -> (Uint32Array | null)


(Not asking to change the name now, this can be done in a follow-up if we feel it's better)

Another idea is that instead of the terminology of "deep node", it could be "ancestor node". Indeed what I struggled with is that the values in the Uint32Array are call node indexes in the non inverted call tree, they do not represent call nodes in the inverted call tree. But the term "deep" implies to me that they are children of this inverted call node (they're not!).
But I'm not sure of that so it might be good to wait (also I'd prefer to see this landed rather nitpicking on the exact terms)

… in the flame graph, it'll be non-inverted traced timing.

…er is somewhat arbitrary) as requested in julien's review

…erted call node paths.

…torNode and improve its comments.

## Updates: [Nicolas Chevobbe] Make timeline ruler notches visible in High Contrast Mode (#5346) [Nazım Can Altınova] Add the ability to mark marker fields as hidden (#5354) [Maxx Crawford] Update guide-startup-shutdown.md (#5357) [Nazım Can Altınova] Enable prettier on the docs-user markdown files (#5358) [Paul Adenot] Allow searching by Content-Type in the network marker view (#5351) [Florian Quèze] Hide the pid in global tracks if it is 0. (#5361) [Markus Stange] Make inverting the call tree fast, by computing inverted call nodes lazily (#4900) [Markus Stange] Use 64-bit floats for call tree timings. (#5371) [Markus Stange] Extend the workaround in the v53 upgrader to generate missing subcategory columns. (#5369) ## Also thanks to our localizers: de: Michael Köhler el: Jim Spentzos en-GB: Paul es-CL: ravmn fr: Théo Chevalier fur: Fabio Tomat fy-NL: Fjoerfoks ia: Melo46 it: Francesco Lodolo nl: Mark Heijl pt-BR: Marcelo Ghelman ru: Valery Ledovskoy sv-SE: Luna Jernberg, Andreas Pettersson tr: Grk uk: Іhor Hordiichuk zh-CN: Olvcpr423 zh-TW: Pin-guang Chen

mstange self-assigned this Jan 23, 2024

mstange force-pushed the fast-invert4 branch from 850b79a to 6036bbd Compare January 23, 2024 20:27

mstange mentioned this pull request Jan 24, 2024

Various preparations in call tree code for fast inverted tree #4897

Merged

mstange force-pushed the fast-invert4 branch 3 times, most recently from 8fd4e74 to 533a704 Compare August 6, 2024 22:01

mstange mentioned this pull request Aug 7, 2024

Add an inverted tree test for getSamplesSelectedStates and getTreeOrderComparator #5076

Merged

mstange force-pushed the fast-invert4 branch 3 times, most recently from 22b07b4 to b730acb Compare August 7, 2024 23:35

mstange marked this pull request as ready for review August 7, 2024 23:35

mstange requested a review from julienw August 7, 2024 23:35

mstange mentioned this pull request Aug 8, 2024

[deploy preview] Fast inverted tree, take 2 #4806

Closed

mstange force-pushed the fast-invert4 branch from b730acb to 086aec7 Compare August 8, 2024 17:57

julienw reviewed Oct 11, 2024

View reviewed changes

mstange force-pushed the fast-invert4 branch 4 times, most recently from 5cc7d34 to 21d53c7 Compare November 24, 2024 19:49

mstange force-pushed the fast-invert4 branch from 21d53c7 to 0e7998b Compare November 25, 2024 01:24

mstange mentioned this pull request Nov 26, 2024

Add a "top functions" call tree view #15

Open

mstange force-pushed the fast-invert4 branch from 0e7998b to 430bb75 Compare December 5, 2024 20:16

mstange mentioned this pull request Dec 5, 2024

Speed up createCallNodeTable by 2.3x #5248

Open

mstange force-pushed the fast-invert4 branch from 430bb75 to 9ac75e1 Compare January 4, 2025 21:20

julienw self-requested a review January 10, 2025 15:38

mstange mentioned this pull request Jan 13, 2025

Float32Array has insufficient precision in call tree timings - incorrect byte sums displayed in call tree #5310

Closed

mstange added 7 commits January 29, 2025 17:20

Make sourceFramesInlinedIntoSymbol an Int32Array.

a891ee2

This avoids a CompareIC when comparing to null in _createInvertedRootCallNodeTable, because we'll now only be comparing integers. This speeds up _createInvertedRootCallNodeTable by almost 2x.

Fold the CallNodeInfoInverted interface into the implementation class.

9440725

Remove unused methods from CallNodeInfoNonInverted.

e1a726b

Add a test for focus-category on an inverted call tree.

Loading
Loading status checks…

0cd70fa

mstange force-pushed the fast-invert4 branch from 0137465 to 0cd70fa Compare January 29, 2025 22:29

julienw reviewed Jan 31, 2025

View reviewed changes

julienw reviewed Feb 3, 2025

View reviewed changes

julienw reviewed Feb 5, 2025

View reviewed changes

julienw reviewed Feb 7, 2025

View reviewed changes

julienw reviewed Feb 11, 2025

View reviewed changes

julienw approved these changes Feb 12, 2025

View reviewed changes

mstange added 5 commits February 13, 2025 15:02

Merge fast-invert4 with main.

f942782

Be more explicit about the expectation that, if we have traced timing…

b8a472d

… in the flame graph, it'll be non-inverted traced timing.

Expand comment about inverted call node indexes (mention that the ord…

3462954

…er is somewhat arbitrary) as requested in julien's review

Adjust CallNodeInfoInverted documentation based on julien's feedback.

9ddbf1c

Clarify that the CallNodePaths used with CallNodeInfoInverted are inv…

3c79aff

…erted call node paths.

mstange enabled auto-merge February 13, 2025 23:12

mstange added 6 commits February 13, 2025 18:14

Rename _findDeepestKnownAncestor to _findDeepestExistingInvertedAnces…

e5f0b4c

…torNode and improve its comments.

Improve comments in getCallNodeIndexFromPath.

9252918

Move _getChildWithFunc and improve its comments.

034244c

Adjust the comment above _createNonRootNode as requested by julien.

0331c6b

Clarify the comment about ordering the inverted children by func.

3f4e927

Fix parentDoopNode typo.

Loading
Loading status checks…

c80f34a

mstange force-pushed the fast-invert4 branch from 2d573dd to c80f34a Compare February 13, 2025 23:15

mstange merged commit d8bba23 into firefox-devtools:main Feb 13, 2025
15 of 16 checks passed

canova mentioned this pull request Feb 19, 2025

Deploy Feb 19, 2025 #5377

Merged

		nextChildSuffixOrderIndexRangeStart += deepNodeCount;
		nextChildDeepNodeIndex += deepNodeCount;

		* For all i in 0..deepNodes.length, deepNodes[i] is the k'th parent node
		* of suffixOrderedCallNodes[suffixOrderIndexRangeStart + i],

	// But if we look at the prefix of each parentDoopNode, we'll encounter
	// But if we look at the prefix of each parentDeepNode, we'll encounter

Make inverting the call tree fast, by computing inverted call nodes lazily #4900

Make inverting the call tree fast, by computing inverted call nodes lazily #4900

Conversation

mstange commented Jan 23, 2024 • edited Loading

codecov bot commented Jan 23, 2024 • edited Loading

Codecov Report

mstange commented Aug 7, 2024

julienw left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mstange commented Nov 24, 2024 • edited Loading

julienw left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mstange Jan 31, 2025 • edited Loading

Choose a reason for hiding this comment

julienw Feb 3, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

julienw left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

julienw left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

julienw Feb 5, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

julienw left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

julienw left a comment

mstange commented Jan 23, 2024 •

edited

Loading

codecov bot commented Jan 23, 2024 •

edited

Loading

mstange commented Nov 24, 2024 •

edited

Loading

julienw left a comment •

edited

Loading

mstange Jan 31, 2025 •

edited

Loading

julienw Feb 3, 2025 •

edited

Loading

julienw Feb 5, 2025 •

edited

Loading