metrics: CGo memory allocations should be per-store, or clarified as aggregated #4129

itsbilal · 2024-11-01T14:37:10Z

Currently, if the Pebble block cache is shared across multiple Pebbles, the CGo memory allocations line in db.Metrics() is aggregated across all the stores while the count of zombie memtables etc is for that Pebble only. This makes it pretty difficult to correlate the two numbers at first glance. Here's an example string representation from a Pebble with very high memtable memory utilization (this Cockroach node had 8 stores, each with high zombie memtable counts):

      |                             |       |       |   ingested   |     moved    |    written   |       |    amp   |     multilevel
level | tables  size val-bl vtables | score |   in  | tables  size | tables  size | tables  size |  read |   r   w  |    top   in  read
------+-----------------------------+-------+-------+--------------+--------------+--------------+-------+----------+------------------
    0 |     0     0B     0B       0 |  0.00 | 188GB |  1.1K  970KB |     0     0B |   16K   19GB |    0B |   0  0.1 |    0B    0B    0B
    1 |     0     0B     0B       0 |  0.00 |    0B |     0     0B |     0     0B |     0     0B |    0B |   0  0.0 |    0B    0B    0B
    2 |    18   52MB  3.3MB       0 |  0.96 | 8.9GB |   125  111KB |  3.0K  9.2GB |   53K  162GB | 162GB |   1 18.2 |    0B    0B    0B
    3 |    53  261MB   34MB       0 |  0.92 |  11GB |   327  290KB |  1.9K  4.8GB |  7.8K   28GB |  29GB |   1  2.5 |    0B    0B    0B
    4 |   161  1.6GB  356MB       1 |  0.84 |  17GB |   550  498KB |   382 1023MB |   17K  160GB | 161GB |   1  9.6 | 2.0GB 6.0GB  21GB
    5 |   634  2.3GB  554MB       2 |  1.00 |  15GB |  3.8K  4.9MB |   158  368MB |  133K  497GB | 499GB |   1 34.0 | 356MB 2.7GB 8.5GB
    6 |  6.2K  167GB  2.5GB      57 |     - |  11GB |   20K  224GB |   294   44MB |   24K 1009GB | 1.0TB |   1 93.7 | 767KB  26MB  58MB
total |  7.0K  171GB  3.4GB      60 |     - | 412GB |   26K  224GB |  5.8K   15GB |  252K  2.2TB | 1.9TB |   5  5.6 | 2.3GB 8.7GB  30GB
---------------------------------------------------------------------------------------------------------------------------------------
WAL: 1 files (48MB)  in: 186GB  written: 188GB (1% overhead) failover: (switches: 6, primary: ‹22h19m21.694006978s›, secondary: ‹59.301058386s›)
Flushes: 5296
Compactions: 48680  estimated debt: 0B  in progress: 0 (0B)
             default: 17201  delete: 2590  elision: 6356  move: 5835  read: 45  tombstone-density: 16653  rewrite: 0  copy: 0  multi-level: 1044
MemTables: 1 (64MB)  zombie: 50 (3.1GB)
Zombie tables: 2410 (29GB, local: 29GB)
Backing tables: 38 (1.8GB)
Virtual tables: 60 (1.4GB)
Local tables size: 172GB
Compression types: snappy: 6975 unknown: 60
Block cache: 0 entries (0B)  hit rate: 77.1%
Table cache: 78K entries (60MB)  hit rate: 99.9%
Secondary cache: 0 entries (0B)  hit rate: 0.0%
Snapshots: 3  earliest seq num: 1019853557
Table iters: 621
Filter utility: 72.8%
Ingestions: 3605  as flushable: 928 (53GB in 6493 tables)
Cgo memory usage: 24GB  block cache: 541MB (data: 392MB, maps: 148MB, entries: 5.4KB)  memtables: 23GB

At the very least, the string representation of metrics should clarify that the memory counters are aggregated across any Pebbles in the same process (as they're stored in a global var). Or better yet, we should make the counters per-store and print them out per-store.

Jira issue: PEBBLE-293

Epic CRDB-41111

The text was updated successfully, but these errors were encountered:

itsbilal added the C-enhancement New feature or request label Nov 1, 2024

blathers-crl bot added A-storage T-storage labels Nov 1, 2024

github-project-automation bot added this to Storage Nov 1, 2024

github-project-automation bot moved this to Incoming in Storage Nov 1, 2024

itsbilal added the O-testcluster Issues found as part of DRT testing label Nov 1, 2024

itsbilal moved this from Incoming to Backlog in Storage Nov 5, 2024

itsbilal assigned RaduBerinde Nov 5, 2024

exalate-issue-sync bot added the P-2 Issues/test failures with a fix SLA of 3 months label Nov 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

metrics: CGo memory allocations should be per-store, or clarified as aggregated #4129

metrics: CGo memory allocations should be per-store, or clarified as aggregated #4129

itsbilal commented Nov 1, 2024 •

edited by exalate-issue-sync bot

Loading

metrics: CGo memory allocations should be per-store, or clarified as aggregated #4129

metrics: CGo memory allocations should be per-store, or clarified as aggregated #4129

Comments

itsbilal commented Nov 1, 2024 • edited by exalate-issue-sync bot Loading

itsbilal commented Nov 1, 2024 •

edited by exalate-issue-sync bot

Loading