Introduce a limit of collected items #2051

jan-cerny · 2023-11-16T16:14:30Z

This PR introduces possibility to set a limit of items collected by probes.
If a single OVAL object matches more than given limit, we will stop collecting these items and mark the object as incomplete.

Users will set the limit by defining the OSCAP_PROBE_MAX_COLLECTED_ITEMS environment variable.
By default, the limit isn't set which means the probes will collect all the items matching the evaluated OVAL object.

Setting this limit can be useful in some situations in which the produced results are too large and cause memory problems or inability to generate the HTML report.

Related to: https://issues.redhat.com/browse/RHEL-11925, https://issues.redhat.com/browse/RHEL-4141

This change isn't a silver solution of these tickets. It's rather one of the measures that could improve our situation in combination with other measures both on the scanner and on the content side.

This commit extracts the code that marks the item produced by probe as incomplete to a static function _mark_collected_object_as_incomplete so that it can be reused in other code later

If a single OVAL object matches more than 10000 items, we will stop collecting these items and mark the object as incomplete. This should prevent some situations in which the produced results are too large and cause memory problems or inability to generate the HTML report.

This commit adds a simple test that covers a situation when the probe collects more items than the limit for a single OVAL object.

The `probe_item_collect` function can fail and it can return various return codes. But, the return codes aren't handled in the caller function `process_file`. That means the `process_file` continues to process items even if no other items can be collected due to memory constraints or other problems. With this commit, we will read the return code of `probe_item_collect` and react accordingly.

evgenyz · 2023-11-17T00:24:13Z

This won't fix the original problem (XSLT will still have problems if OVAL author concatenates multiple results), and on top of that it will introduce failing tests where HTML is not rendered at all (OCP4 Compliance Operator).

In my opinion a correct fix would modify XSLT template or, at least, limit amount of rendered objects.

evgenyz · 2023-11-17T00:34:23Z

Even if this functionality could be useful for someone in some cases, I'm strongly against putting it as a default behavior: OVAL does not define any kind of limit, which makes this an extremely unexpected feature without a control knob.

marcusburghardt · 2023-11-17T08:11:59Z

src/OVAL/probes/probe/icache.c


 #include "probe.h"
 #include "icache.h"
 #include "_sexp-ID.h"

+#define PROBE_ITEM_COLLECT_MAX 1000


Here was used 1.000 but in the commit description is mentioned 10.000. Also, do you have information about how much memory one item consumes? I would like to understand how this proposed limit was calculated.

marcusburghardt · 2023-11-17T08:20:33Z

Even if this functionality could be useful for someone in some cases, I'm strongly against putting it as a default behavior: OVAL does not define any kind of limit, which makes this an extremely unexpected feature without a control knob.

I see value in this feature since there are valid cases it can be used. But I also agree that introducing it as a default behavior might not be ideal. Can we introduce this feature as a command option, with no limit by default? I believe it would be easier for users to use this option when necessary.

jan-cerny · 2023-11-20T10:23:41Z

This won't fix the original problem (XSLT will still have problems if OVAL author concatenates multiple results), and on top of that it will introduce failing tests where HTML is not rendered at all (OCP4 Compliance Operator).

My understanding of the original problem in https://issues.redhat.com/browse/RHEL-11925 is that the ARF tree produced by oscap which is the input of the XSLT template is so huge that libxslt has problems processing it.

I think that introducing a limit for the collected items will directly reduce the size of the produced ARF, so it will lower the chance that libxslt won't be able to process the tree.

You're right that it won't exactly fix the problem, It also doesn't aim to fix it by this PR, but the aim is to mitigate it or make it happening less often. We're running multiple efforts that could improve the situation when applied together, eg. #2052, changes of well-known rules in ComplianceAsCode/content.

I agree with you that XSLT will still have problems if OVAL definition uses multiple objects that lead to a large item collection. In that situation, I think having the limit still could help mitigate the issue. For example, if a definition contains 3 objects and each of them corresponds to 100000 items, then we would have 300000 items if no limit is in place but only 3000 items if there is a limit 1000 per object. Another aspect is the evaluated XCCDF profile and the amount of XCCDF rules in the profile and the nature of their individual OVAL checks. If we have a "bad chance" and we have many rules with these OVAL objects, the output will be reduced but it won't be sufficient. A different solution would be a global limit, ie. limit that doesn't apply per object but on the whole evaluation run.

Regarding the introduction of failing tests, I believe it depends on how specific OVAL definition is constructed. Can you elaborate on what you have in mind?

In my opinion a correct fix would modify XSLT template or, at least, limit amount of rendered objects.

The amount of rendered items in HTML report is limited to 100 items per OVAL object. See

openscap/xsl/xccdf-report-oval-details.xsl

Lines 105 to 119 in c27e108

    
               <!-- table body (possibly item-type-specific) --> 
        
               <!-- limited to 100 lines --> 
        
               <tbody> 
        
                   <xsl:for-each select='$items'> 
        
                       <xsl:if test="not(position() > 100)"> 
        
                           <xsl:for-each select='key("oval-items", @item_id)'> 
        
                               <xsl:apply-templates select='.' mode='item-body'/> 
        
                           </xsl:for-each> 
        
                       </xsl:if> 
        
                   </xsl:for-each> 
        
               </tbody> 
        
           </table> 
        
           <xsl:if test="count($items) > 100"> 
        
               ... and <xsl:value-of select="count($items)-100"/> more items. 
        
           </xsl:if>

Even if this functionality could be useful for someone in some cases, I'm strongly against putting it as a default behavior: OVAL does not define any kind of limit, which makes this an extremely unexpected feature without a control knob.

I really like the idea to make the limit configurable in run time, eg. by command line option.

evgenyz · 2023-11-20T11:30:39Z

Regarding the introduction of failing tests, I believe it depends on how specific OVAL definition is constructed. Can you elaborate on what you have in mind?

OCP Compliance Operator (for example) does not render HTML and does not care about XSLT problems. If this feature will be enabled by default OCP evaluation results could become incomplete instead of passing.

In my opinion a correct fix would modify XSLT template or, at least, limit amount of rendered objects.

What I mean here is to try and throw out elements from the tree before trying to render them with XSLT instead of not collecting them at all. With this approach the ARF and XCCDF results would stay correct.

Even if this functionality could be useful for someone in some cases, I'm strongly against putting it as a default behavior: OVAL does not define any kind of limit, which makes this an extremely unexpected feature without a control knob.

I really like the idea to make the limit configurable in run time, eg. by command line option.

Could it be better to actually use OSCAP_PROBE_MAX_COLLECTED_OBJECTS env. variable (or smth like that)? This is the way we usually influence the behaviour of OVAL probes.

Users can limit the amount of items collected by setting the `OSCAP_PROBE_MAX_COLLECTED_ITEMS` environment variable. By default, the amount of items is unlimited.

evgenyz · 2023-11-22T02:31:26Z

@jan-cerny We do have quite a lot of env. variables that could affect oscap behavior. What do you think about having a special section somewhere in the main module, that would print all of them on some debug level? It might be useful to know all the input just from the log, without asking extra questions. The same, actually, goes for the command line arguments.

jan-cerny · 2023-11-22T10:03:48Z

@evgenyz Yes, great idea! I believe it would be useful especially if the log output is attached to bug reports. Can you create a ticket for that?

jan-cerny · 2023-11-22T10:23:55Z

I think the fail on Rawhide is caused by removal of xmlParseMemory from libxml2, I have found this thread: https://lists.fedoraproject.org/archives/list/[email protected]/thread/AI7QXMRXYQ3KAZTLIGICEX6YHFTM5RBR/

jan-cerny · 2023-11-23T16:18:57Z

What I mean here is to try and throw out elements from the tree before trying to render them with XSLT instead of not collecting them at all. With this approach the ARF and XCCDF results would stay correct.

I think removing elements from ARF before passing the ARF to XSLT would be in practice implemented by XSLT or a combination of XPath and DOM operations. I can imagine that this approach could lead a similar problem with memory issues.

Currently, OpenSCAP internally creates an ARF XML tree, applies XSLT on that to create HTML report, saves the HTML to a file and then saves the ARF file.

I think we can easily limit the amount of the items in the ARF during the serialization into OVAL results DOM. I look into that and it could be done by modifying oval_syschar_model_to_dom and oval_syschar_to_dom. See /src/OVAL/oval_sysModel.c around line 360. Due to the way how it works now it would cause a problem that also ARF file would be incomplete, which you probably don't want. But we can avoid it by creating a dedicated ARF tree just for the purpose of HTML creation, ie. first a full ARF tree would be created and saved to file and then another tree with reduced items would be created passed to XSLT to create a HTML output.

jan-cerny · 2023-11-24T10:00:38Z

I have added a option to configure this using an environment variable OSCAP_PROBE_MAX_COLLECTED_ITEMS.

I also have extracted the proposal by @evgenyz from here to #2058.

jan-cerny · 2023-11-28T14:56:35Z

The CI fail on Rawhide is tracked in #2059 and isn't caused by the contents of PR.

evgenyz

The implementation is sound from my PoV. Just a couple of notes regarding style and names.

evgenyz · 2023-11-30T01:37:19Z

src/OVAL/probes/probe/icache.c

@@ -528,6 +529,31 @@ static int probe_cobj_memcheck(size_t item_cnt, double max_ratio)
 	return (0);
 }

+static int _mark_collected_object_as_incomplete(struct probe_ctx *ctx, const char *message)
+{
+


Redundant \n

evgenyz · 2023-11-30T01:43:36Z

tests/CMakeLists.txt

@@ -30,6 +30,7 @@ add_subdirectory("curl")
 add_subdirectory("CPE")
 add_subdirectory("DS")
 add_subdirectory("mitre")
+add_subdirectory("memory")


The behavior alteration we are testing here is not exactly connected to memory. I'd rather name it like probe_behavior or oval_behavior or something more generic to make the folder a collection of tests related to these env. vars.

evgenyz · 2023-11-30T01:57:56Z

@evgenyz Yes, great idea! I believe it would be useful especially if the log output is attached to bug reports. Can you create a ticket for that?

#2063

The behavior alteration we are testing here is not exactly connected to memory. I'd rather name it like probe_behavior to make the folder a collection of tests related to these env. vars.

jan-cerny · 2023-12-04T08:24:32Z

I have removed empty line and renamed "memory" directory to "probe_behavior".

jan-cerny added 4 commits November 13, 2023 15:43

Refactor: Extract function

d1045b2

This commit extracts the code that marks the item produced by probe as incomplete to a static function _mark_collected_object_as_incomplete so that it can be reused in other code later

Test collected items limit

5e3d8bc

This commit adds a simple test that covers a situation when the probe collects more items than the limit for a single OVAL object.

marcusburghardt reviewed Nov 17, 2023

View reviewed changes

Make probe items limit configurable

f2461c7

Users can limit the amount of items collected by setting the `OSCAP_PROBE_MAX_COLLECTED_ITEMS` environment variable. By default, the amount of items is unlimited.

jan-cerny marked this pull request as draft November 22, 2023 10:33

jan-cerny added this to the 1.3.10 milestone Nov 22, 2023

Add comment for visual consistency

fb97a51

jan-cerny mentioned this pull request Nov 24, 2023

Shift the limit of OVAL items in HTML report #2058

Open

jan-cerny marked this pull request as ready for review November 24, 2023 10:00

evgenyz approved these changes Nov 30, 2023

View reviewed changes

marcusburghardt mentioned this pull request Nov 30, 2023

Improve Performance on rules probing the whole file system ComplianceAsCode/content#11319

Merged

10 tasks

jan-cerny added 2 commits December 4, 2023 09:12

Remove empty line

dd1977c

Rename "memory" directory to "probe_behavior"

6b29c04

The behavior alteration we are testing here is not exactly connected to memory. I'd rather name it like probe_behavior to make the folder a collection of tests related to these env. vars.

jan-cerny assigned evgenyz Dec 5, 2023

jan-cerny requested a review from evgenyz December 5, 2023 08:49

evgenyz approved these changes Dec 5, 2023

View reviewed changes

evgenyz merged commit 826eeb7 into OpenSCAP:maint-1.3 Dec 5, 2023
18 of 20 checks passed

evgenyz mentioned this pull request Jan 11, 2024

Skip certain paths from scanning #2052

Merged

evgenyz mentioned this pull request Feb 1, 2024

HTML Report Creation Fails With XPath error: growing nodeset hit limit #2082

Closed

evgenyz mentioned this pull request Apr 2, 2024

oscap #2099

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introduce a limit of collected items #2051

Introduce a limit of collected items #2051

jan-cerny commented Nov 16, 2023 •

edited

Loading

evgenyz commented Nov 17, 2023

evgenyz commented Nov 17, 2023

marcusburghardt Nov 17, 2023

marcusburghardt commented Nov 17, 2023

jan-cerny commented Nov 20, 2023

evgenyz commented Nov 20, 2023 •

edited

Loading

evgenyz commented Nov 22, 2023

jan-cerny commented Nov 22, 2023

jan-cerny commented Nov 22, 2023

jan-cerny commented Nov 23, 2023

jan-cerny commented Nov 24, 2023

jan-cerny commented Nov 28, 2023

evgenyz left a comment

evgenyz Nov 30, 2023

evgenyz Nov 30, 2023

evgenyz commented Nov 30, 2023

jan-cerny commented Dec 4, 2023

Introduce a limit of collected items #2051

Introduce a limit of collected items #2051

Conversation

jan-cerny commented Nov 16, 2023 • edited Loading

evgenyz commented Nov 17, 2023

evgenyz commented Nov 17, 2023

marcusburghardt Nov 17, 2023

Choose a reason for hiding this comment

marcusburghardt commented Nov 17, 2023

jan-cerny commented Nov 20, 2023

evgenyz commented Nov 20, 2023 • edited Loading

evgenyz commented Nov 22, 2023

jan-cerny commented Nov 22, 2023

jan-cerny commented Nov 22, 2023

jan-cerny commented Nov 23, 2023

jan-cerny commented Nov 24, 2023

jan-cerny commented Nov 28, 2023

evgenyz left a comment

Choose a reason for hiding this comment

evgenyz Nov 30, 2023

Choose a reason for hiding this comment

evgenyz Nov 30, 2023

Choose a reason for hiding this comment

evgenyz commented Nov 30, 2023

jan-cerny commented Dec 4, 2023

jan-cerny commented Nov 16, 2023 •

edited

Loading

evgenyz commented Nov 20, 2023 •

edited

Loading