-
Notifications
You must be signed in to change notification settings - Fork 378
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce a limit of collected items #2051
Conversation
This commit extracts the code that marks the item produced by probe as incomplete to a static function _mark_collected_object_as_incomplete so that it can be reused in other code later
If a single OVAL object matches more than 10000 items, we will stop collecting these items and mark the object as incomplete. This should prevent some situations in which the produced results are too large and cause memory problems or inability to generate the HTML report.
This commit adds a simple test that covers a situation when the probe collects more items than the limit for a single OVAL object.
The `probe_item_collect` function can fail and it can return various return codes. But, the return codes aren't handled in the caller function `process_file`. That means the `process_file` continues to process items even if no other items can be collected due to memory constraints or other problems. With this commit, we will read the return code of `probe_item_collect` and react accordingly.
This won't fix the original problem (XSLT will still have problems if OVAL author concatenates multiple results), and on top of that it will introduce failing tests where HTML is not rendered at all (OCP4 Compliance Operator). In my opinion a correct fix would modify XSLT template or, at least, limit amount of rendered objects. |
Even if this functionality could be useful for someone in some cases, I'm strongly against putting it as a default behavior: OVAL does not define any kind of limit, which makes this an extremely unexpected feature without a control knob. |
src/OVAL/probes/probe/icache.c
Outdated
|
||
#include "probe.h" | ||
#include "icache.h" | ||
#include "_sexp-ID.h" | ||
|
||
#define PROBE_ITEM_COLLECT_MAX 1000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here was used 1.000
but in the commit description is mentioned 10.000
. Also, do you have information about how much memory one item consumes? I would like to understand how this proposed limit was calculated.
I see value in this feature since there are valid cases it can be used. But I also agree that introducing it as a default behavior might not be ideal. Can we introduce this feature as a command option, with no limit by default? I believe it would be easier for users to use this option when necessary. |
My understanding of the original problem in https://issues.redhat.com/browse/RHEL-11925 is that the ARF tree produced by oscap which is the input of the XSLT template is so huge that libxslt has problems processing it. I think that introducing a limit for the collected items will directly reduce the size of the produced ARF, so it will lower the chance that libxslt won't be able to process the tree. You're right that it won't exactly fix the problem, It also doesn't aim to fix it by this PR, but the aim is to mitigate it or make it happening less often. We're running multiple efforts that could improve the situation when applied together, eg. #2052, changes of well-known rules in ComplianceAsCode/content. I agree with you that XSLT will still have problems if OVAL definition uses multiple objects that lead to a large item collection. In that situation, I think having the limit still could help mitigate the issue. For example, if a definition contains 3 objects and each of them corresponds to 100000 items, then we would have 300000 items if no limit is in place but only 3000 items if there is a limit 1000 per object. Another aspect is the evaluated XCCDF profile and the amount of XCCDF rules in the profile and the nature of their individual OVAL checks. If we have a "bad chance" and we have many rules with these OVAL objects, the output will be reduced but it won't be sufficient. A different solution would be a global limit, ie. limit that doesn't apply per object but on the whole evaluation run. Regarding the introduction of failing tests, I believe it depends on how specific OVAL definition is constructed. Can you elaborate on what you have in mind?
The amount of rendered items in HTML report is limited to 100 items per OVAL object. See openscap/xsl/xccdf-report-oval-details.xsl Lines 105 to 119 in c27e108
I really like the idea to make the limit configurable in run time, eg. by command line option. |
OCP Compliance Operator (for example) does not render HTML and does not care about XSLT problems. If this feature will be enabled by default OCP evaluation results could become incomplete instead of passing.
What I mean here is to try and throw out elements from the tree before trying to render them with XSLT instead of not collecting them at all. With this approach the ARF and XCCDF results would stay correct.
Could it be better to actually use |
Users can limit the amount of items collected by setting the `OSCAP_PROBE_MAX_COLLECTED_ITEMS` environment variable. By default, the amount of items is unlimited.
@jan-cerny We do have quite a lot of env. variables that could affect oscap behavior. What do you think about having a special section somewhere in the main module, that would print all of them on some debug level? It might be useful to know all the input just from the log, without asking extra questions. The same, actually, goes for the command line arguments. |
@evgenyz Yes, great idea! I believe it would be useful especially if the log output is attached to bug reports. Can you create a ticket for that? |
I think the fail on Rawhide is caused by removal of xmlParseMemory from libxml2, I have found this thread: https://lists.fedoraproject.org/archives/list/[email protected]/thread/AI7QXMRXYQ3KAZTLIGICEX6YHFTM5RBR/ |
I think removing elements from ARF before passing the ARF to XSLT would be in practice implemented by XSLT or a combination of XPath and DOM operations. I can imagine that this approach could lead a similar problem with memory issues. Currently, OpenSCAP internally creates an ARF XML tree, applies XSLT on that to create HTML report, saves the HTML to a file and then saves the ARF file. I think we can easily limit the amount of the items in the ARF during the serialization into OVAL results DOM. I look into that and it could be done by modifying |
The CI fail on Rawhide is tracked in #2059 and isn't caused by the contents of PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation is sound from my PoV. Just a couple of notes regarding style and names.
src/OVAL/probes/probe/icache.c
Outdated
@@ -528,6 +529,31 @@ static int probe_cobj_memcheck(size_t item_cnt, double max_ratio) | |||
return (0); | |||
} | |||
|
|||
static int _mark_collected_object_as_incomplete(struct probe_ctx *ctx, const char *message) | |||
{ | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Redundant \n
tests/CMakeLists.txt
Outdated
@@ -30,6 +30,7 @@ add_subdirectory("curl") | |||
add_subdirectory("CPE") | |||
add_subdirectory("DS") | |||
add_subdirectory("mitre") | |||
add_subdirectory("memory") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The behavior alteration we are testing here is not exactly connected to memory. I'd rather name it like probe_behavior
or oval_behavior
or something more generic to make the folder a collection of tests related to these env. vars.
The behavior alteration we are testing here is not exactly connected to memory. I'd rather name it like probe_behavior to make the folder a collection of tests related to these env. vars.
I have removed empty line and renamed "memory" directory to "probe_behavior". |
This PR introduces possibility to set a limit of items collected by probes.
If a single OVAL object matches more than given limit, we will stop collecting these items and mark the object as incomplete.
Users will set the limit by defining the
OSCAP_PROBE_MAX_COLLECTED_ITEMS
environment variable.By default, the limit isn't set which means the probes will collect all the items matching the evaluated OVAL object.
Setting this limit can be useful in some situations in which the produced results are too large and cause memory problems or inability to generate the HTML report.
Related to: https://issues.redhat.com/browse/RHEL-11925, https://issues.redhat.com/browse/RHEL-4141
This change isn't a silver solution of these tickets. It's rather one of the measures that could improve our situation in combination with other measures both on the scanner and on the content side.