-
Notifications
You must be signed in to change notification settings - Fork 550
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[cleaner] Skip obfuscation of substrings for hostnames properly #3850
[cleaner] Skip obfuscation of substrings for hostnames properly #3850
Conversation
Leftover from sosreport#3403 / sosreport#3496 where match_full_words_only is ignored for hostname mapping due to own get_regex_result method. Relevant: sosreport#3593 Resolves: sosreport#3850 Signed-off-by: Pavel Moravec <[email protected]>
8dbae3f
to
5a4f38a
Compare
Congratulations! One of the builds has completed. 🍾 You can install the built RPMs by following these steps:
Please note that the RPMs should be used only in a testing environment. |
sos/cleaner/mappings/hostname_map.py
Outdated
@@ -87,7 +87,7 @@ def get_regex_result(self, item): | |||
""" | |||
if '.' in item: | |||
item = item.replace('.', '(\\.|_)') | |||
return re.compile(item, re.I) | |||
return super(SoSHostnameMap, self).get_regex_result(item) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to return that way, perhaps add:
# pylint: disable=super-with-arguments
To the end of the line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or better, and then we don't need to disable it :)
return super().get_regex_result(item)
Leftover from sosreport#3403 / sosreport#3496 where match_full_words_only is ignored for hostname mapping due to own get_regex_result method. Relevant: sosreport#3593 Resolves: sosreport#3850 Signed-off-by: Pavel Moravec <[email protected]>
5a4f38a
to
1d5b4ce
Compare
Tests fail on obfuscating hostname
These logs do contain hostname, often FQDN, wrapped by underscores, like:
Question: should hostname mapper treat underscore as a word separator, or not? See https://github.com/sosreport/sos/blob/main/sos/cleaner/mappings/__init__.py#L102 and original problem I aim to fix: #3593 (comment) . Or in other words: if a host name is, say,
Exactly all those with |
I think, that makes sense to me, is it worth updating the tests to reflect this at all? |
I came to the same conclusion - that So:
So I will come up with a better patch. |
This is tricky :) Let see what tests will return, but OK, let try positive lookbehind Let re-formulate it to negative lookbehind :) So additionally to the patch, we must:
|
Currently, matching full words checks two lookaheads but does not check what is behind the word. Since Python requires fixed width strings for lookbehind, enumerate the forbidden chars behind the word explicitly. Relevant: sosreport#3593 Closes: sosreport#3850
Currently, matching full words checks two lookaheads but does not check what is behind the word. Since Python requires fixed width strings for lookbehind, enumerate the forbidden chars behind the word explicitly. Relevant: sosreport#3593 Closes: sosreport#3850 Signed-off-by: Pavel Moravec <[email protected]>
520de8d
to
f3d7c3e
Compare
FYI I made a test script similar to #3766 (comment) :
The script tests if given word ( |
I've put this through regex101 (always my fallback these days) too ;), and works as expected |
Currently, matching full words checks two lookaheads but does not check what is behind the word. Since Python requires fixed width strings for lookbehind, enumerate the forbidden chars behind the word explicitly. Relevant: sosreport#3593 Closes: sosreport#3850 Signed-off-by: Pavel Moravec <[email protected]>
f3d7c3e
to
3cf79d0
Compare
Once I fixed the stupid typo, it should pass tests now :) |
The failing test is a false alarm:
and the test finds the hostname OK, next round of PR will fix the test :) |
Related: sosreport#3850 Signed-off-by: Pavel Moravec <[email protected]>
Since I think I touch a fragile&core part of cleaner (how explicitly we search for matches), and the PR fixes three bugs altogether, I would prefer yet another ACK from either @jcastill or @TurboTurtle before merging. (three bugs: each commit fixes one) |
I'll give this a run through this Saturday. |
@pmoravec I ran some very basic tests and the commits seem to work pretty well. I'll see if I have some time this weekend to run something more complex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Leftover from #3403 / #3496 where match_full_words_only is ignored for hostname mapping due to own get_regex_result method. Relevant: #3593 Resolves: #3850 Signed-off-by: Pavel Moravec <[email protected]>
Currently, matching full words checks two lookaheads but does not check what is behind the word. Since Python requires fixed width strings for lookbehind, enumerate the forbidden chars behind the word explicitly. Relevant: #3593 Closes: #3850 Signed-off-by: Pavel Moravec <[email protected]>
Leftover from #3403 / #3496 where match_full_words_only is ignored for hostname mapping due to own get_regex_result method.
Resolves: #3593
Please place an 'X' inside each '[]' to confirm you adhere to our Contributor Guidelines