Remote write target for Prometheus that saves metrics to parquet files
The prom2parquet remote write endpoint for Prometheus listens for incoming datapoints from Prometheus and saves them to parquet files in a user-configurable location. This can either (currently) be pod-local storage or to an AWS S3 bucket. Metrics are saved in the following directory structure:
/data/<prefix>/<metric name>/2024022021.parquet
Each file for a particular metric will have the same schema, but different metrics may have different schemas. At a
minimum, each file has a timestamp
and a value
column, and a variety of other extracted columns corresponding to the
labels on the Prometheus timeseries. They also have a "catch-all" labels
column to contain other unextracted columns.
Usage:
prom2parquet [flags]
Flags:
--backend backend supported remote backends for saving parquet files
(valid options: none, s3/aws) (default local)
--backend-root string root path/location for the specified backend (e.g. bucket name for AWS S3)
(default "/data")
-h, --help help for prom2parquet
--prefix string directory prefix for saving parquet files
-p, --server-port int port for the remote write endpoint to listen on (default 1234)
-v, --verbosity verbosity log level (valid options: debug, error, fatal, info, panic, trace, warning/warn)
(default info)
Here is a brief overview of the options:
Where to store the Parquet files;; currently supports pod-local storage and AWS S3.
"Root" location for the backend storage. For pod-local storage this is the base directory, for AWS S3 this is the bucket name.
This option provides a prefix that can be used to differentiate between metrics collections.
What port prom2parquet should listen on for timeseries data from Prometheus.
Prometheus needs to know where to send timeseries data. You can include this block in your Prometheus's config.yml
:
remote_write:
- url: http://prom2parquet-svc.monitoring:1234/receive
remote_timeout: 30s
Alternately, if you're using the Prometheus operator, you can add this configuration to your Prometheus custom resource:
spec:
remoteWrite:
- url: http://prom2parquet-svc.monitoring:1234/receive
We welcome any and all contributions to prom2parquet project! Please open a pull request.
To set up your development environment, run git submodule init && git submodule update
and make setup
. To build
prom2parquet
, run make build
.
This project uses 🔥Config to generate Kubernetes manifests from definitions
located in ./k8s/
. If you want to use this mechanism for deploying prom2parquet, you can just type make
to build
the executable, create and push the Docker images, and deploy to the configured Kubernetes cluster.
All build artifacts are placed in the .build/
subdirectory. You can remove this directory or run make clean
to
clean up.
Run make test
to run all the unit/integration tests. If you want to test using pod-local storage, and you want to
flush the Parquet files to disk without terminating the pod (e.g., so you can copy them elsewhere), you can send the
process a SIGUSR1:
> kubectl exec prom2parquet-pod -- kill -s SIGUSR1 <pid>
> kubectl cp prom2parquet-pod:/path/to/files ./
Applied Computing Research Labs has a strict code of conduct we expect all contributors to adhere to. Please read the full text so that you understand the expectations upon you as a contributor.
SimKube is licensed under the MIT License. Contributors to this project agree that they own the copyrights to all contributed material, and agree to license your contributions under the same terms. This is "inbound=outbound", and is the GitHub default.
Warning
Due to the uncertain nature of copyright and IP law, this repository does not accept contributions that have been all or partially generated with GitHub Copilot or other LLM-based code generation tools. Please disable any such tools before authoring changes to this project.