Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

💡 [Feature] Log download stats from THREDDS server #444

Open
huard opened this issue Apr 5, 2024 · 9 comments
Open

💡 [Feature] Log download stats from THREDDS server #444

huard opened this issue Apr 5, 2024 · 9 comments
Assignees
Labels
enhancement New feature or request

Comments

@huard
Copy link
Collaborator

huard commented Apr 5, 2024

Description

It would be useful for reporting purposes to monitor data downloads from THREDDS:

  • total volume of downloads (Gb/day);
  • per-file download volume (Gb/day);
  • per-datasets opendap streaming volumes (Gb/day);

References

This information can be parsed from NGINX logs, but those logs need to be exposed to Prometheus to be aggregated and archived within the current architecture.

Possible solutions:

Additional info

See also:

Concerned Organizations

@huard huard added the enhancement New feature or request label Apr 5, 2024
@fmigneault
Copy link
Collaborator

Consider downloads from WPS outputs and STAC data proxy endpoints as well for the same reasons.

@huard
Copy link
Collaborator Author

huard commented Apr 26, 2024

ESGF uses Beats and Logstash to collect logs and compute their stats. See https://drive.google.com/drive/folders/1LbvoYeQ_6L_bzTsO-EEhwqjIx1jZ-G1k

@fmigneault
Copy link
Collaborator

If the "node collector" can be located on the same instance, logstash seems like an interesting candidate. If there is no distinction between beats or logstash as "log producers", I would favor the 2nd architecture to limit the number of configurations/technologies involved.

mishaschwartz added a commit that referenced this issue May 14, 2024
## Overview

This version of canarie-api permits running the proxy (nginx) container
independently of the canarie-api application. This makes it easier to
monitor the logs of canarie-api and proxy containers simultaneously and
allows for the configuration files for canarie-api to be mapped to the
canarie-api containers where appropriate.

## Changes

**Non-breaking changes**
- New component version canarie-api:1.0.0

**Breaking changes**

## Related Issue / Discussion

- Resolves [issue id](url)

## Additional Information

Links to other issues or sources.

- This might make parsing the nginx logs slightly easier as well which
could help with #12 and #444

## CI Operations

<!--
The test suite can be run using a different DACCS config with
``birdhouse_daccs_configs_branch: branch_name`` in the PR description.
To globally skip the test suite regardless of the commit message use
``birdhouse_skip_ci`` set to ``true`` in the PR description.
Note that using ``[skip ci]``, ``[ci skip]`` or ``[no ci]`` in the
commit message will override ``birdhouse_skip_ci`` from the PR
description.
-->

birdhouse_daccs_configs_branch: master
birdhouse_skip_ci: false
@huard
Copy link
Collaborator Author

huard commented Oct 4, 2024

Parser for nginx logs and prometheus counter
https://gist.github.com/huard/25ca5be3479f72546f748da54f7097e7

@mishaschwartz
Copy link
Collaborator

mishaschwartz commented Nov 7, 2024

I've created two PRs to implement log parsing in different ways. I'd like to summarize each below and briefly discuss the pros and cons of each. We should decide here which one we are interested in.

Prometheus log parser: #473

Summary: reads log files with a lightweight python library and converts log lines to metrics using python functions that create metrics using the prometheus python client

Pros:

  • lightweight and easy to configure
  • we write the underlying log parser code so we can update it as needed
  • write custom python functions to convert lines to metrics (simple to write a basic parser and can be as complex as needed within the limits of python)
  • the prometheus python client is well maintained and is an official prometheus product
  • the prometheus python client is well documented and very simple so its easy to learn.
  • can easily be deployed on multiple machines (supports a federated architecture)

Cons:

  • we write the underlying log parser code so we need to maintain it
  • slightly slower than promtail (written in python vs. go)

Promtail and Loki #474

Summary: reads log files with the promtail component and converts log lines to metrics using the metrics pipeline stage. Optionally supports shipping the parsed logs themselves to grafana (through loki) for custom log inspection.

Pros:

  • slightly faster than the prometheus log parser (written in go vs. python)
  • part of the grafana stack and an official grafana product
  • can easily be deployed on multiple machines (supports a federated architecture)
  • we don't need to maintain the underlying code

Cons:

  • we can't customize the underlying code
  • officially, promtail cannot be run without loki so if we just want to generate metrics and nothing else, we can do it but it's a bit of a hack.
  • writing pipeline stages to extract log lines into metrics is complex and (in my opinion) very poorly documented which makes it very difficult to learn

Why not something else... logstash, beats, fluentbit ...

These could totally work as well... probably. I didn't have time to investigate them all. The main reason why I didn't choose to investigate these options is because for most of them, exporting log data to metrics required additional plugins and the complexity to set them up seemed much higher.

For our goals, I think we can achieve what we want with promtail or the prometheus log exporter. Unless there's a use-case that we can't achieve with either of those two I'm happy to look into other technologies but I'd rather stick with these two options for now.

@huard
Copy link
Collaborator Author

huard commented Nov 7, 2024

Thanks for the overview. I think one challenge we're having by plugging together different servers is the expertise required to configure each one. I don't think we have within our group someone fluent in Grafana for example. I'm concerned that as we add component, ẁe're going to make the problem worse.

In that sense, I'm leaning toward your first approach, which is simple and can be easily extended without delving into yet another configuration format.

@fmigneault
Copy link
Collaborator

I agree with @huard for the same reasons.

@mishaschwartz
Copy link
Collaborator

Thanks @huard and @fmigneault for your input. I'm happy with that decision as well. I'll un-draft #473 and close #474.

Once we're happy with #473 we can start adding some of the other metrics discussed here

@tlvu
Copy link
Collaborator

tlvu commented Nov 11, 2024

I agree with @huard for the same reasons.

Same here.

mishaschwartz added a commit that referenced this issue Nov 19, 2024
## Overview

This component parses log files from other components and converts their
logs to prometheus metrics that are then ingested by the monitoring
Prometheus instance (the one created by the`components/monitoring`
component).

For more information on how this component reads log files and converts
them to prometheus components see the
[log-parser](https://github.com/DACCS-Climate/log-parser/)
documentation.

To configure this component:

* set the `PROMETHEUS_LOG_PARSER_POLL_DELAY` variable to a number of
seconds to set how often the log parser checks if new lines have been
added to log files (default: 1)
* set the `PROMETHEUS_LOG_PARSER_TAIL` variable to `"true"` to only
parse new lines in log files. If unset, this will parse all existing
lines in the log file as well (default: `"true"`)

To view all metrics exported by the log parser:

* Navigate to the `https://<BIRDHOUSE_FQDN>/prometheus/graph` search
page
* Put `{job="log_parser"}` in the search bar and click the "Execute"
button

Update the prometheus version to the current latest `v2.53.3`. This is
required to support loading multiple prometheus scrape configuration
files with the `scrape_config_files` configuration option.

## Changes

**Non-breaking changes**
- New component version prometheus:v2.53.3

**Breaking changes**
- None

## Related Issue / Discussion

- #444

## Additional Information

- implements parser given as an example here:
#444 (comment)

- this is an alternative to #474. See discussion in #444 to help decide
which we should pick.

## CI Operations

<!--
The test suite can be run using a different DACCS config with
``birdhouse_daccs_configs_branch: branch_name`` in the PR description.
To globally skip the test suite regardless of the commit message use
``birdhouse_skip_ci`` set to ``true`` in the PR description.

Using ``[<cmd>]`` (with the brackets) where ``<cmd> = skip ci`` in the
commit message will override ``birdhouse_skip_ci`` from the PR
description.
Such commit command can be used to override the PR description behavior
for a specific commit update.
However, a commit message cannot 'force run' a PR which the description
turns off the CI.
To run the CI, the PR should instead be updated with a ``true`` value,
and a running message can be posted in following PR comments to trigger
tests once again.
-->

birdhouse_daccs_configs_branch: master
birdhouse_skip_ci: false
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants