-
Notifications
You must be signed in to change notification settings - Fork 541
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Too many tailed files collected #3783
Comments
I'm not sure these files are needed at all, but instead of dropping we could add an option to collect them if needed, in case anyone relies on them for any scripts. "Interpreted/decoded" ones in plain text are more useful.
Agreed, maybe two/three days should be enough by default, or even just one day.
Agreed |
These columns are either empty, containing passwords or some encoded data. Get the *remaining* column names and query for them. If the query for column names fail, failover to current "SELECT *". Relevant: sosreport#3783 Resolves: sosreport#3784 Signed-off-by: Pavel Moravec <[email protected]>
This happens for Satellite / foreman, where we already increased sizelimit to 100MB via preset. And I confirm it is applied to these files. Raising it higher is possible, but.. not much worth of it. Usually, tailed files are from previous days only, that is sufficient. |
from my perspective as a node team member, crio and kubelet logs are the most important pieces for us to debug issues. We don't need them if they're caught in the overall journal though. Is bumping the size limit an option for those? or, we bump the size limit for the overall journal, and drop the crio/kubelet specfic journals. What do folks think? |
I'd prefer increasing the size limit of unit-specific journals and/or log files over increasing the system journal collection. It gives us granularity without enforcing potentially very large system journal collections across the board. Granted, I get the point of "well it's going to be the majority of the system journal anyway...", but I think this is the least-bad option overall. As far as the sar/sa files go, I'd defer to support teams on how often they're used. I know there's been a general shift away from sar but there's a lot of knowledge built around the use of these, at least the plaintext translations. I'd be open to dropping the binary collections since you need to use the same version to translate those as which generated them (hence why we do that during collection at all), but I'd be wary of dropping them entirely. |
The plain text ones are used a lot, even though they are not the most accurate output you could get... but as a first step when looking into performance issues, they are good enough. |
Pacemaker: It's been a couple of years since I've worked in support, so I would defer to any support engineers. Whether the limit is sufficient will always depend on how promptly the user opens a support ticket after an issue occurs, and on whether additional verbosity has been configured (it usually hasn't been). We could increase the size limit to some arbitrary higher number. I don't know what fraction of sosreports have truncated Pacemaker log files currently and whether this would be worth doing. Support engineers should not hesitate to request the full |
Hello all, +1 to remove sa*.xml files. They are redundant, binary saXX files are also included and they contain the full day dump. Some times also truncated, but not usual. It can happen if interval is too short. I'd also like to suggest increasing the size limit to the foreman plugin. These CSV files are sometimes truncated which leads to missing important dynflow steps. Note that the plugin already limits the output to last 14 days, which should be enough for any support case. That said, although I fully agree a limit is mandatory, in this specific plugin, file limit is somehow "redundant". IMO increasing it to 150/200M could be a good choice to let the 14 days limit the output in as many cases as possible. |
SAR data: I would drop the xml as rarely-if-at-all used (I am asking internally, either way), while I would keep the binary data (the "source of truth" that we can copy to another system with same Increasing the 100M limit of foreman's dynflow* tables: no strong opinion. Can you @pafernanr evaluate the impact? I.e. generate so many foreman tasks to have 200M data in each such table, and compare execution time and tarball size for sizelimits of 100MB, 150MB and 200MB? On one side, we would get some more history of tasks. On the other side, the data are already ordered by time so most recent is always present, and I am on torns if it is worth paying the extra cost in longer time and tarball size to get that info. This sizelimit affected my own investigation of foreman/Satellite support cases only rarely, hence my reluctant attitude. But if others hit it more often, no objections. |
SAR: Feedback from two groups of support engineers in Red Hat: "we dont use XML format, but we heavily use binary |
These columns are either empty, containing passwords or some encoded data. Get the *remaining* column names and query for them. If the query for column names fail, failover to current "SELECT *". Relevant: #3783 Resolves: #3784 Signed-off-by: Pavel Moravec <[email protected]>
These columns are either empty, containing passwords or some encoded data. Get the *remaining* column names and query for them. If the query for column names fail, failover to current "SELECT *". Relevant: sosreport#3783 Resolves: sosreport#3784 Signed-off-by: Pavel Moravec <[email protected]>
These columns are either empty, containing passwords or some encoded data. Get the *remaining* column names and query for them. If the query for column names fail, failover to current "SELECT *". Relevant: sosreport#3783 Resolves: sosreport#3784 Signed-off-by: Pavel Moravec <[email protected]>
We noticed a high occurrence of tailing some specific files in different sosreports. Below is a list of the most often tailed files and my suggestion to that. Any comment / suggestion is welcomed. Possible options are "leave as is" or "increase sizelimit" or "drop that file or some data to truncate it".
postgresql/var.lib.pgsql.data.log.postgresql-*.log
: this is most probably from Satellite / foreman systems with bigger postgres queries logged. Probably worth increasing the sizelimit, I will raise PR for itvar/log/*
files, namelymessages*
oraudit.log
orsecure
- probably let it be, maybe audits or secure should be collected for past X days instead of given filesize..?pacemaker/var.log.pacemaker.pacemaker.log
- any suggestion frompacemaker
plugin authors @TurboTurtle , @nrwahl2 ?pulpcore/core_task
- we collect all details about the tasks. Since many of the details are encrypted now, to prevent password leak, a lot of data are useless and I should improve the query. TODO point on mecrio/journalctl_--no-pager_--unit_crio
- any suggestion fromcrio
plugin authors @TurboTurtle , @vteratipally , @haircommander ?openshift/journalctl_--no-pager_--unit_kubelet
- any suggestion fromopenshift
plugin authors @TurboTurtle , @vwalek ?logs/journalctl_--no-pager
- that is expected and reasonable, no actionThe text was updated successfully, but these errors were encountered: