-
Notifications
You must be signed in to change notification settings - Fork 416
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/process tree data source #3054
Feature/process tree data source #3054
Conversation
03ef9da
to
f5f3f11
Compare
4209377
to
b29ea83
Compare
Thanks @AlonZivony, would like to review this one before merging. |
@@ -169,4 +169,5 @@ require ( | |||
kernel.org/pub/linux/libs/security/libcap/psx v1.2.64 // indirect | |||
) | |||
|
|||
replace github.com/aquasecurity/tracee/types => ./types |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note: add a types PR later
+1 for review |
I guess everyone wants to review this one, including me. =D |
b29ea83
to
b8a20dc
Compare
b8a20dc
to
e4dae80
Compare
pkg/proctree/export.go
Outdated
"github.com/aquasecurity/tracee/types/datasource" | ||
"github.com/aquasecurity/tracee/types/trace" | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like having a export.go
file. It does not make sense for me. I would keep all methods in the same file as others (its easier for readers to understand what is going on, as you already have the ProcessTree methods also split into events_processing.go
, exec_processing.go
and exit_processing.go
(those do make sense to have).
// setDefaultExecInfo change the execution information assumed for the process before its first | ||
// execution received. | ||
func (p *processNode) setDefaultExecInfo(info procExecInfo) { | ||
p.execInfo.ChangeDefault(info) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like here, for example. You have the internal to execInfo mutex, protecting the execInfo internal fields, but you still have a Load() operation for the pointer when you de-reference p.execInfo
. So the correct thread-safety operation here would either use the mutex or use an atomic Load for a *Struct{}
(*procExecInfo in this case). Just like an int64 should have an atomic Load()/Store(), etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then we can argue on fields that are changed runtime and the fields that aren't changed runtime. If that is the case (fields that are never changed runtime), we should make it very explicity they don't need to be atomic because they're immutable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, the problem you point here is not relevant because this is not a thread-safe operation.
I think that maybe you are confused because the TimeSeries
object is thread-safe, and I use it. I implemented it that way so it will be safe to use in all places, not specifically in the current process tree nodes which are not thread-safe.
The face I use a thread-safe object in my nodes doesn't mean the nodes are threa-safe.
I will add a documentation for it on the nodes so it will be clear that they are not thread-safe.
// getChildren return all registered children processes of current process | ||
func (p *processNode) getChildren() []*processNode { | ||
return maps.Values(p.childProcesses) | ||
} | ||
|
||
// amountOfChildren return the amount of processes registered as children of current process | ||
func (p *processNode) amountOfChildren() int { | ||
return len(p.childProcesses) | ||
} | ||
|
||
// hasChildren return if the current process has children registered to it | ||
func (p *processNode) hasChildren() bool { | ||
return p.amountOfChildren() != 0 | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In here.. the len(p.childProcesses) read the metadata for the map (to get its length). At the same time, you have maps.Values
walking the map to construct a slice (and its not thread-safe). Those 2 aren't thread-safe among them (lets say you call amountOfChildre()
in 2 different threads).
Now the question: "Oh but I won't be calling methods from 2 different threads because its always coming from the control plane". Then there is no need for thread-safety at all. Also, its not up to the caller of the interface to know which methods are thread-safe and which ones are not. The interface/class/struct is either thread-safe or not (all of its methods).
I strongly think all our structs/classes should be thread-safe for all its operations to all its fields/transactions... arguable and I know others disagree.
return err | ||
} | ||
} | ||
newProcess.mutex.Lock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a no for me. The synchronization mechanism has to be controled by the interface/class/struct that contains it. Or else things will get out of control in no time (and the encapsulation characteristic of object orientation vanishes).
pkg/proctree/fork_processing.go
Outdated
newProcess.mutex.Lock() | ||
tree.copyParentBinaryInfo(types.Timestamp(event.Timestamp), newProcess) | ||
|
||
newProcess.setGeneralInfoProtected( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also a no for me. The struct/class is either thread-safe or not. User can't chose what method to pick (I could use the same struct else where and use an unprotected method and I would brake your code).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, the processNode
and threadNode
are both unexported structs.
They wouldn't be used by users outside of the tree scope.
The are not thread-safe.
The tree itself is thread-safe, and implement all of the operations needed to make it possible.
pkg/proctree/fork_processing.go
Outdated
if fatherProcess == nil { | ||
return | ||
} | ||
fatherProcess.mutex.RLock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto on external (to type) synchronization control and encapsulation.
Have gone through 50% of the changes, there are many comments to be discussed, etc... so I stopped and will resume once discussions are done. Cheers! |
d476f1f
to
1b480a3
Compare
Create an object representing a changing value over time. This allows following values that are bound to change, and getting their value for any queried time. An example for changing value is process execution binary, changed in each execve syscall.
Create a type enveloping a map, protecting it from concurrent use. The protection is done using a RWMutex, to maximize the performance of accessing the map. This type has adventage over Sync.Map because it is implemented using generics instead of interface, which reduce reflection overhead.
Introduce a real-time process tree updating using Tracee's events. It exposes both processes and threads information. It is implemented to work with multiple threads, but with only one writing thread and the others readers. Using this process tree, we can reduce data transmission from the kernel and unite existing data storages doing the same logic in the future.
Add the process tree to the Tracee struct. Add the process tree to the pipeline, feeding it with events during the processing stage. This will enable using it in the future for new features.
Create a data source for signatures to use exposing Tracee's process tree. This will allow signatures to efficiently access stateful information and event data that otherwise they couldn't have accessed about the system processes and threads
1b480a3
to
28ee582
Compare
I'm working on this together with Alon and we will have a new PR soon. |
1. Explain what the PR does
This PR first purpose is to add data source for signatures, giving information about processes and threads.
To achieve it, user space process tree is reintroduced.
Fix #2586
2. Explain how to test it
The PR include unit testing for all of the logic of the process tree.
3. Other comments
The current process tree is only fed upon events.
We need to initialize it with the information of the system using procfs.
I don't think it has to be in the same PR, as after short amount of runtime all relevant information will be gathered.
After this will be approved I will do the effort to initialize it.