diff --git a/docs/learn.md b/docs/learn.md index c7cfebc..5dcb6b3 100644 --- a/docs/learn.md +++ b/docs/learn.md @@ -7,237 +7,135 @@ slug: / # Learn Tailpipe -Tailpipe allows you to create "pipelines as code" to define workflows and other tasks that run in a sequence. +Tailpipe is a high-performance data collection and querying tool that makes it easy to collect, store, and analyze log data. With Tailpipe, you can: -## Creating your first pipeline +- Collect logs from various sources and store them efficiently in parquet files +- Query your data using familiar SQL syntax through DuckDB +- Share collected data with your team using remote object storage +- Create filtered views of your data using schemas +- Join log data with other data sources for enriched analysis -Getting started is easy! If you haven't already done so, [download and install Tailpipe](/downloads). +## Install the NGINX Plugin -Tailpipe pipelines and triggers are packaged into [mods](/docs/build), and Tailpipe requires a mod to run. Let's create a new directory for our mod, and then run `tailpipe mod init` to initialize it: +This tutorial uses the NGINX plugin to demonstrate collecting and analyzing web server access logs. First, [download and install Tailpipe](/downloads), and then install the plugin: ```bash -mkdir learn_tailpipe -cd learn_tailpipe -tailpipe mod init +tailpipe plugin install nginx ``` -The `tailpipe mod init` command creates a file named `mod.fp` in the directory. This file contains a `mod` definition for our new mod: +## Configure Data Collection -```hcl -mod "local" { - title = "learn_tailpipe" -} -``` - -You can customize the [mod definition](/docs/tailpipe-hcl/mod) if you like, but the default is sufficient for our purposes. - -Let's create our first pipeline. - -Tailpipe mods are written in HCL. When Tailpipe runs, it will load the mod from the working directory and will read all files with the `.fp` extension from the directory and its subdirectories recursively. Create a file named `learn.fp` and add the following code: +Tailpipe uses HCL configuration files to define what data to collect. Create a file named `nginx.tpc` with the following content: ```hcl -pipeline "learn_tailpipe" { - step "http" "get_ipv4" { - url = "https://api.ipify.org?format=json" - } - - output "ip_address" { - value = step.http.get_ipv4.response_body.ip - } +partition "nginx_access_log" "web_servers" { + plugin = "nginx" + source "nginx_access_log_file" { + log_path = "/var/log/nginx/access.log" + } } ``` -A Tailpipe [pipeline](/docs/tailpipe-hcl/step/pipeline) is a sequence of steps to do work. This snippet creates a pipeline called `learn_tailpipe` that has a single [http step](/docs/tailpipe-hcl/step/http), and a single [output](/docs/tailpipe-hcl/step/pipeline#outputs). - -Let's run it! - -```bash -tailpipe pipeline run learn_tailpipe -``` - -![](/images/docs/learn/get-ipv4.png) +This configuration tells Tailpipe to collect NGINX access logs from the specified log file. The configuration defines: +- A partition named "web_servers" for the "nginx_access_log" table +- The source type "nginx_access_log_file" which reads NGINX formatted logs +- The path to the log file to collect from -Tailpipe runs the pipeline and prints its outputs once it is complete. +## Collect Data -When troubleshooting, it's often useful to view more information about the currently executing steps. You can use the `--verbose` flag to show this detailed information. +Now let's collect the logs: ```bash -tailpipe pipeline run learn_tailpipe --verbose +tailpipe collect nginx_access_log.web_servers ``` -![](/images/docs/learn/get-ipv4-verbose.png) - -## Using mods - -Tailpipe's modular design allows you to build pipelines from other pipelines. Let's install the `reallyfreegeoip` mod: - -```bash -tailpipe mod install github.com/turbot/tailpipe-mod-reallyfreegeoip +This command will: +1. Read the NGINX access logs from the specified file +2. Parse and standardize the log entries +3. Store the data in parquet files organized by date +4. Update the local database with table definitions + +## Query Your Data + +Tailpipe provides an interactive SQL shell for analyzing your collected data. Let's look at some examples of what you can do. + +### Analyze Traffic by Server + +This query shows a summary of traffic for each server for a specific date: + +```sql +SELECT + tp_index as server, + count(*) as requests, + count(distinct remote_addr) as unique_ips, + round(avg(bytes_sent)) as avg_bytes, + count(CASE WHEN status = 200 THEN 1 END) as success_count, + count(CASE WHEN status >= 500 THEN 1 END) as error_count, + round(avg(CASE WHEN method = 'GET' THEN bytes_sent END)) as avg_get_bytes +FROM nginx_access_log +WHERE tp_date = '2024-11-01' +GROUP BY tp_index +ORDER BY requests DESC; ``` -```bash -Installed 1 mod: - -local -└── github.com/turbot/tailpipe-mod-reallyfreegeoip@v0.1.0 -``` - -The mod is installed into the `.tailpipe/mods` subdirectory, and a dependency is added to your `mod.fp`. - -Now that the mod is installed, you should see its pipelines: - -```bash -tailpipe pipeline list ``` - -```bash -MOD NAME DESCRIPTION -local learn_tailpipe -reallyfreegeoip reallyfreegeoip.pipeline.get_ip_geolocation Get geolocation data for an IPv4 or IPv6 address. -``` - -You can run pipelines from the dependency mod on the command line: - -```bash -tailpipe pipeline run reallyfreegeoip.pipeline.get_ip_geolocation --arg ip_address=35.236.238.30 -``` - -![](/images/docs/learn/reallyfreegeoip.png) - -## Composing with pipelines - -While running the dependency pipelines directly in the CLI is useful, the real power is the ability to compose pipelines from other pipelines. Let's add a [pipeline step](/docs/tailpipe-hcl/step/pipeline) to take our IP address and look up our geo-location information. - -```hcl -pipeline "learn_tailpipe" { - step "http" "get_ipv4" { - url = "https://api.ipify.org?format=json" - } - - step "pipeline" "get_geo" { - pipeline = reallyfreegeoip.pipeline.get_ip_geolocation - args = { - ip_address = step.http.get_ipv4.response_body.ip - } - } - - output "ip_address" { - value = step.http.get_ipv4.response_body.ip - } - - output "latitude" { - value = step.pipeline.get_geo.output.geolocation.latitude - } - - output "longitude" { - value = step.pipeline.get_geo.output.geolocation.longitude - } -} +┌──────────────────────────────────────────────────────────────────────────────────────┐ +│ server requests unique_ips avg_bytes success_c… error_cou… avg_get_b… │ +│──────────────────────────────────────────────────────────────────────────────────── │ +│ web-01.ex… 349 346 7036 267 7 7158 │ +│ web-02.ex… 327 327 6792 246 11 6815 │ +│ web-03.ex… 324 322 7001 254 8 6855 │ +└──────────────────────────────────────────────────────────────────────────────────────┘ ``` -Notice that we used the IP address from the first step (`step.http.get_ipv4.response_body.ip`) as an argument to the second step. Tailpipe automatically detects this dependency and runs the steps in the correct order! - -Let's add a couple more steps to get the weather forecast for our location. - -```hcl -pipeline "learn_tailpipe" { - step "http" "get_ipv4" { - url = "https://api.ipify.org?format=json" - } - - step "pipeline" "get_geo" { - pipeline = reallyfreegeoip.pipeline.get_ip_geolocation - - args = { - ip_address = step.http.get_ipv4.response_body.ip - } - } - - step "http" "get_weather" { - url = join("", [ - "https://api.open-meteo.com/v1/forecast", - "?latitude=${step.pipeline.get_geo.output.geolocation.latitude}", - "&longitude=${step.pipeline.get_geo.output.geolocation.longitude}", - "¤t=temperature", - "&forecast_days=1", - "&daily=temperature_2m_min,temperature_2m_max,precipitation_probability_mean", - "&temperature_unit=${step.pipeline.get_geo.output.geolocation.country_code == "US" ? "fahrenheit" : "celsius"}" - ]) - } - - step "transform" "friendly_forecast" { - value = join("", [ - "It is currently ", - step.http.get_weather.response_body.current.temperature, - step.http.get_weather.response_body.current_units.temperature, - ", with a high of ", - step.http.get_weather.response_body.daily.temperature_2m_max[0], - step.http.get_weather.response_body.daily_units.temperature_2m_max, - " and a low of ", - step.http.get_weather.response_body.daily.temperature_2m_min[0], - step.http.get_weather.response_body.daily_units.temperature_2m_min, - ". There is a ", - step.http.get_weather.response_body.daily.precipitation_probability_mean[0], - step.http.get_weather.response_body.daily_units.precipitation_probability_mean, - " chance of precipitation." - ]) - } - - output "ip_address" { - value = step.http.get_ipv4.response_body.ip - } - - output "latitude" { - value = step.pipeline.get_geo.output.geolocation.latitude - } - - output "longitude" { - value = step.pipeline.get_geo.output.geolocation.longitude - } - - output "forecast" { - value = step.transform.friendly_forecast.value - } -} +This shows us: +- Number of requests per server +- Count of unique IP addresses +- Average response size +- Success and error counts +- Average size of GET requests + +### Join with External Data + +One of Tailpipe's powerful features is the ability to join log data with other tables. Here's an example joining with an IP information table to get more context about the traffic: + +```sql +SELECT + n.remote_addr as ip, + i.description, + count(*) as requests, + count(distinct n.server_name) as servers_accessed, + round(avg(n.bytes_sent)) as avg_bytes, + string_agg(distinct n.method, ', ') as methods_used, + count(CASE WHEN n.status >= 400 THEN 1 END) as errors +FROM nginx_access_log n +LEFT JOIN ip_info i ON n.remote_addr = i.ip_address +WHERE i.description IS NOT NULL +GROUP BY n.remote_addr, i.description +ORDER BY requests DESC; ``` -![](/images/docs/learn/weather-report.png) - - - -## Send a message - -Now we have a pipeline that can get the local forecast - let's send it somewhere! The [message step](/docs/tailpipe-hcl/step/message) provides a mechanism for sending messages via multiple communication channels, such as Slack and Email. - -Add this step to the `learn_tailpipe` pipeline. - -```hcl - step "message" "send_forecast" { - notifier = notifier.default - subject = "Todays Forecast" - text = step.transform.friendly_forecast.value - } ``` - -And run the pipeline again. - -```bash -tailpipe pipeline run learn_tailpipe +┌──────────────────────────────────────────────────────────────────────────────────────┐ +│ ip descripti… requests servers_a… avg_bytes methods_u… errors │ +│──────────────────────────────────────────────────────────────────────────────────── │ +│ 203.0.113… Test Netw… 1 1 1860 GET 0 │ +└──────────────────────────────────────────────────────────────────────────────────────┘ ``` -You should see the message printed to the console when you run the pipeline. - -Console messages and inputs are useful, but Tailpipe can also route these input requests, approvals and notifications to external systems like Slack, MS Teams, and Email! - -Tailpipe [Integrations](/docs/reference/config-files/integration) allow you to interface with external systems. [Notifiers](/docs/reference/config-files/notifier) allow you to route [message](/docs/tailpipe-hcl/step/message) and [input](/docs/build/input) steps to one or more integrations. Integrations are only loaded in [server-mode](/docs/run/server). +This enriched query shows: +- IP addresses and their descriptions +- How many servers each IP accessed +- Average response sizes +- HTTP methods used +- Error counts - -Tailpipe server creates a default [`http` integration](/docs/reference/config-files/integration/http) as well as a [default notifier](/docs/reference/config-files/notifier#default-notifier) that routes to it, but you can send it via [Email](/docs/reference/config-files/integration/email), [Slack](/docs/reference/config-files/integration/slack) or [Microsoft Teams](/docs/reference/config-files/integration/msteams) without modifying the pipeline code. Just create the appropriate [integrations](/docs/reference/config-files/integration), add them to the [default notifier](/docs/reference/config-files/notifier#default-notifier), and run the pipeline again from a server instance! - -```bash -tailpipe server & -tailpipe pipeline run learn_tailpipe --host local -``` +## What's Next? -![](/images/docs/learn/slack-weather-report.png) +We've demonstrated basic log collection and analysis with Tailpipe. Here's what to explore next: +- [Discover more plugins on the Hub →](https://hub.steampipe.io/plugins) +- [Learn about data compaction and optimization →](https://tailpipe.io/docs/managing/compaction) +- [Share data with your team using remotes →](https://tailpipe.io/docs/sharing/remotes) +- [Create schemas for filtered views →](https://tailpipe.io/docs/schemas) +- [Join #tailpipe on Slack →](https://turbot.com/community/join)