Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: optimize proctree memory consumption #4384

Merged
merged 5 commits into from
Dec 16, 2024

Conversation

geyslan
Copy link
Member

@geyslan geyslan commented Nov 11, 2024

Close: #4409

1. Explain what the PR does

f8963e0 chore(proctree): lower process default cache size
4a5bb5d chore(proctree): del Process interpreter FileInfo
d02ce61 chore(changelog): reduce mem footprint
9823a73 chore(changelog): benchmark changelog.Set()
6b805bb chore(utils): add PrintStructSizes() helper

f8963e0 chore(proctree): lower process default cache size

Adjust the default the process cache size to 16,384, setting a 2:1 ratio
between thread and process caches.

With GOGC=5 and both caches stressed, was observed an average heap usage
of 77MB.

4a5bb5d chore(proctree): del Process interpreter FileInfo

Remove the interpreter FileInfo fields from the Process struct.
This reduces the memory footprint of the Proctree by 8MB at average.

d02ce61 chore(changelog): reduce mem footprint

The current Changelog structure consumes a significant amount of memory
due to the allocation of metadata for each field/instance. As the number
of fields increases, the memory usage grows linearly. Approximately 240
bytes per field were observed just for metadata, excluding the actual
data and pointers for each field.

To reduce memory consumption, the new changelog.Changelog[T] type was
created.

Changelog[T] implementation uses a single slice to store changes for
a single field of type T. This approach is more memory-efficient than
the previous implementation, since the map is no longer used. Its main
advantage is achievable by setting T as a complete structure type, which
gets rid of the need for metadata of each field.

Avoid distributing Changelog[t] across multiple fields, as this would
negate the purpose of reducing memory usage by increasing the metadata
associated with each Changelog. However, one might consider using it as
structure fields in scenarios where a mutable field (with many entries
and high frequency of change) has a larger size than the sum of the
Changelog metadata overhead and the size of the outer data structure
itself.

FileInfo and TaskInfo were modified to use the new Changelog[T] being
T the respective feed structures.

---

| Caches | GOGC | Branch | *Heap Use | *Heap   | Diff of | Proctree |
|        |      |        | (Avg)     | Growth  | main    |          |
|--------|------|--------|--------- -|------- -|---------|----------|
| -      | 5    | main   | 18        | -       | -       | off      |
| 32768  | 5    | main   | 209       | 191     | -       | on       |
|---------------------------------- --------- ----------------------|
| -      | 5    | new    | 18        | -       | -       | off      |
| 32768  | 5    | new    | 96        | 78      | -59.16% | on       |

* in MB

With GOGC set to 5, the new implementation reduces average heap usage by
approximately 59% when using cache sizes of 32,768.

The "Heap Use" and "Heap Growth" columns serve as a good indicator of
memory consumption and can assist in determining optimal cache sizes.

---

The Set method was hugely improved, reducing the number of allocations,
memory usage, and execution time. The benchmark results are as follows:

Running tool: /home/gg/.goenv/versions/1.22.4/bin/go test -benchmem
-run=^$ -tags ebpf -bench ^Benchmark_Set$
github.com/aquasecurity/tracee/pkg/changelog -benchtime=10000000x

goos: linux
goarch: amd64
pkg: github.com/aquasecurity/tracee/pkg/changelog
cpu: AMD Ryzen 9 7950X 16-Core Processor

Benchmark_Set/All_Scenarios-32 10000000  322.8 ns/op  112 B/op  3 allocs/op
Benchmark_Set/Within_Limit-32  10000000  506.8 ns/op  496 B/op  5 allocs/op
PASS
ok      github.com/aquasecurity/tracee/pkg/changelog    367.236s

| Scenario      | ns/op. (%)   | B/op Reduc. (%)   | Alloc. Reduc. (%) |
|:--------------|:-------------|:------------------|:------------------|
| All Scenarios | -75.9%       | -88.7%            | -92.9%            |
| Within Limit  | -76.2%       | -85.5%            | -91.7%            |

9823a73 chore(changelog): benchmark changelog.Set()

Running tool: /home/gg/.goenv/versions/1.22.4/bin/go test -benchmem
-run=^$ -tags ebpf -bench ^Benchmark_Set$
github.com/aquasecurity/tracee/pkg/changelog -benchtime=10000000x

goos: linux
goarch: amd64
pkg: github.com/aquasecurity/tracee/pkg/changelog
cpu: AMD Ryzen 9 7950X 16-Core Processor
Benchmark_Set/All_Scenarios-32 10000000  1337 ns/op   992 B/op  42 allocs/op
Benchmark_Set/Within_Limit-32  10000000  2130 ns/op  3424 B/op  60 allocs/op
PASS
ok      github.com/aquasecurity/tracee/pkg/changelog    350.043s

2. Explain how to test it

3. Other comments

@geyslan
Copy link
Member Author

geyslan commented Nov 12, 2024

@trvll FYI.

@NDStrahilevitz

This comment was marked as outdated.

@geyslan

This comment was marked as resolved.

@geyslan

This comment was marked as outdated.

@geyslan geyslan force-pushed the new-changelog branch 2 times, most recently from 54b4943 to 10c2af2 Compare November 14, 2024 12:37
Copy link
Collaborator

@NDStrahilevitz NDStrahilevitz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I have a very access possible optimization to add, and some general questions and comment suggestions.

Overall while I don't like removing the explicit member references, this will be much more memory efficient and at least we kept type safety. Well done.

@geyslan geyslan force-pushed the new-changelog branch 4 times, most recently from 6cb4b3a to f553a67 Compare November 25, 2024 17:43
Copy link
Collaborator

@NDStrahilevitz NDStrahilevitz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Adding from our internal discussions that we can consider the slice-of-slices optimization in the future, in case we want to reduce CPU time here (at the cost of ~12mb from your tests).
Nice work!

@geyslan

This comment was marked as outdated.

@geyslan

This comment was marked as outdated.

@geyslan geyslan force-pushed the new-changelog branch 2 times, most recently from cdfa568 to 0dbfdb9 Compare December 4, 2024 19:31
@geyslan geyslan mentioned this pull request Dec 12, 2024
@geyslan geyslan force-pushed the new-changelog branch 4 times, most recently from 4629ed8 to 740bbbf Compare December 13, 2024 23:30
Running tool: /home/gg/.goenv/versions/1.22.4/bin/go test -benchmem
-run=^$ -tags ebpf -bench ^Benchmark_Set$
github.com/aquasecurity/tracee/pkg/changelog -benchtime=10000000x

goos: linux
goarch: amd64
pkg: github.com/aquasecurity/tracee/pkg/changelog
cpu: AMD Ryzen 9 7950X 16-Core Processor
Benchmark_Set/All_Scenarios-32 10000000  1337 ns/op   992 B/op  42 allocs/op
Benchmark_Set/Within_Limit-32  10000000  2130 ns/op  3424 B/op  60 allocs/op
PASS
ok  	github.com/aquasecurity/tracee/pkg/changelog	350.043s
@geyslan geyslan force-pushed the new-changelog branch 2 times, most recently from 8f80d65 to 7a03821 Compare December 13, 2024 23:38
// entry is an internal structure representing a single change in the entryList.
// It includes the timestamp and the value of the change.
type entry[T comparable] struct {
tsUnixNano int64 // timestamp of the change (nanoseconds since epoch)
Copy link
Member Author

@geyslan geyslan Dec 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced time.Time with int64 (UnixNano). I believe that's a safe change since it's an internal field for checking just time passing diffs. It got us ~4MB extra bytes (proctree with 32k elements in both caches).

Copy link
Collaborator

@yanivagman yanivagman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, good work @geyslan
Let's just remove changelog kind if it is not currently used

The current Changelog structure consumes a significant amount of memory
due to the allocation of metadata for each field/instance. As the number
of fields increases, the memory usage grows linearly. Approximately 240
bytes per field were observed just for metadata, excluding the actual
data and pointers for each field.

To reduce memory consumption, the new changelog.Changelog[T] type was
created.

Changelog[T] implementation uses a single slice to store changes for
a single field of type T. This approach is more memory-efficient than
the previous implementation, since the map is no longer used. Its main
advantage is achievable by setting T as a complete structure type, which
gets rid of the need for metadata of each field.

Avoid distributing Changelog[t] across multiple fields, as this would
negate the purpose of reducing memory usage by increasing the metadata
associated with each Changelog. However, one might consider using it as
structure fields in scenarios where a mutable field (with many entries
and high frequency of change) has a larger size than the sum of the
Changelog metadata overhead and the size of the outer data structure
itself.

FileInfo and TaskInfo were modified to use the new Changelog[T] being
T the respective feed structures.

---

| Caches | GOGC | Branch | *Heap Use | *Heap   | Diff of | Proctree |
|        |      |        | (Avg)     | Growth  | main    |          |
|--------|------|--------|--------- -|------- -|---------|----------|
| -      | 5    | main   | 18        | -       | -       | off      |
| 32768  | 5    | main   | 209       | 191     | -       | on       |
|---------------------------------- --------- ----------------------|
| -      | 5    | new    | 18        | -       | -       | off      |
| 32768  | 5    | new    | 96        | 78      | -59.16% | on       |

* in MB

With GOGC set to 5, the new implementation reduces average heap usage by
approximately 59% when using cache sizes of 32,768.

The "Heap Use" and "Heap Growth" columns serve as a good indicator of
memory consumption and can assist in determining optimal cache sizes.

---

The Set method was hugely improved, reducing the number of allocations,
memory usage, and execution time. The benchmark results are as follows:

Running tool: /home/gg/.goenv/versions/1.22.4/bin/go test -benchmem
-run=^$ -tags ebpf -bench ^Benchmark_Set$
github.com/aquasecurity/tracee/pkg/changelog -benchtime=10000000x

goos: linux
goarch: amd64
pkg: github.com/aquasecurity/tracee/pkg/changelog
cpu: AMD Ryzen 9 7950X 16-Core Processor

Benchmark_Set/All_Scenarios-32 10000000  322.8 ns/op  112 B/op  3 allocs/op
Benchmark_Set/Within_Limit-32  10000000  506.8 ns/op  496 B/op  5 allocs/op
PASS
ok  	github.com/aquasecurity/tracee/pkg/changelog	367.236s

| Scenario      | ns/op. (%)   | B/op Reduc. (%)   | Alloc. Reduc. (%) |
|:--------------|:-------------|:------------------|:------------------|
| All Scenarios | -75.9%       | -88.7%            | -92.9%            |
| Within Limit  | -76.2%       | -85.5%            | -91.7%            |
Remove the interpreter FileInfo fields from the Process struct.
This reduces the memory footprint of the Proctree by 8MB at average.
Copy link
Collaborator

@yanivagman yanivagman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Adjust the default process cache size to 16,384, setting a 2:1 ratio
between thread and process caches.

With GOGC=5 and both caches stressed, was observed an average heap usage
of 77MB.
@geyslan
Copy link
Member Author

geyslan commented Dec 16, 2024

/fast-forward

@github-actions github-actions bot merged commit f8963e0 into aquasecurity:main Dec 16, 2024
26 checks passed
@geyslan
Copy link
Member Author

geyslan commented Dec 16, 2024

The ChangelongKind[T] was created during this effort as an initial solution. After more refactoring , it turned out not to be that relevant for our current use case, despite being an interesting optimisation. However, as it might be useful for future ones, I leave it below for posterity.

package changelog

import (
	"time"

	"github.com/aquasecurity/tracee/pkg/logger"
)

// MemberKind represents the unique identifier for each kind of entry in the ChangelogKind.
// It is used to categorize different kinds of changes tracked by the ChangelogKind.
//
// NOTE: Declare your own MemberKind constants sequentially starting from 0,
// since they are used as the indexes in the flags slice passed to NewChangelog and
// other methods. For example:
//
//	const MyKind1 MemberKind = 0
//	const MyKind2 MemberKind = 1
//
//	var flags = []MaxEntries{
//	    MyKind1: 3,
//	    MyKind2: 5,
//	}
type MemberKind uint8

// ChangelogKind manages a list of changes (entries) for multiple member kinds.
// It keeps track of specifically configured members indicated by MemberKind identifiers.
// When instantiating an ChangelogKind struct, one must supply a relevant mapping between the desired
// unique members and the maximum amount of changes that member can track.
//
// ATTENTION: You should use ChangelogKind within a struct and provide methods to access it,
// coordinating access through your struct mutexes. DO NOT EXPOSE the ChangelogKind object directly to
// the outside world as it is not thread-safe.
type ChangelogKind[T comparable] struct {
	kindLists []entryList[T] // list of entries for each kind
}

// NewChangelogKind initializes a new `ChangelogKind` structure using the provided `MaxEntries` slice.
func NewChangelogKind[T comparable](maxEntries []MaxEntries) *ChangelogKind[T] {
	newKindLists := make([]entryList[T], 0, len(maxEntries))

	for _, max := range maxEntries {
		if max == 0 {
			logger.Fatalw("maxEntries must be greater than 0")
		}
		newList := newEntryList[T](max)
		// DEBUG: uncomment this to populate entries to measure memory footprint.
		// newList.populateEntries()
		newKindLists = append(newKindLists, newList)
	}

	return &ChangelogKind[T]{
		kindLists: newKindLists,
	}
}

// Set adds or updates an entry in the ChangelogKind for the specified `MemberKind` ordered by timestamp.
// If the new entry has the same value as the latest one, only the timestamp is updated.
// If there are already the maximum number of entries for this kind, it reuses or replaces an existing entry.
//
// ATTENTION: Make sure to pass a value of the correct type for the specified `MemberKind`.
func (ck *ChangelogKind[T]) Set(kind MemberKind, value T, timestamp time.Time) {
	if int(kind) >= len(ck.kindLists) {
		logger.Errorw("kind is not present in the entries", "kind", kind)
		return
	}

	kindList := ck.kindLists[kind]
	ck.kindLists[kind] = kindList.set(value, timestamp)
}

// Get retrieves the value of the entry for the specified `MemberKind` at or before the given timestamp.
// If no matching entry is found, it returns the default value for the entry type.
func (ck *ChangelogKind[T]) Get(kind MemberKind, timestamp time.Time) T {
	if ck.invalidKind(kind) {
		return getZero[T]()
	}

	return ck.kindLists[kind].get(timestamp)
}

// GetCurrent retrieves the most recent value for the specified `MemberKind`.
// If no entry is found, it returns the default value for the entry type.
func (ck *ChangelogKind[T]) GetCurrent(kind MemberKind) T {
	if ck.invalidKind(kind) {
		return getZero[T]()
	}

	return ck.kindLists[kind].getCurrent()
}

// // GetAll retrieves all values for the specified `MemberKind`, from the newest to the oldest.
func (ck *ChangelogKind[T]) GetAll(kind MemberKind) []T {
	if ck.invalidKind(kind) {
		return nil
	}

	return ck.kindLists[kind].getAll()
}

// Count returns the number of entries recorded for the specified `MemberKind`.
func (ck *ChangelogKind[T]) Count(kind MemberKind) int {
	if ck.invalidKind(kind) {
		return 0
	}

	return len(ck.kindLists[kind].entries)
}

// private

// invalidKind checks if the specified `MemberKind` is invalid.
func (ck *ChangelogKind[T]) invalidKind(kind MemberKind) bool {
	return int(kind) >= len(ck.kindLists)
}

@itaysk itaysk changed the title New changelog fix: optimize proctree memory consumption Dec 16, 2024
@geyslan geyslan deleted the new-changelog branch February 19, 2025 20:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reduce proctree memory footprint
3 participants