To help you get started quickly with super
, this repository contains small sample sets of Zeek data. There are six different log formats available, all representing events based on the same network traffic:
Directory | Format |
---|---|
zeek-default/ | Zeek default output format |
zeek-json/ | [ JSON as output by the Zeek package for JSON Streaming Logs |
bsup/ | Super Binary, output with super 's default LZ4-compressed format |
bsup-uncompressed/ | Super Binary, output with super 's option -bsup.compress=false to disable compression |
jsup/ | Super JSON, a text output format that has the look and feel of JSON |
This sample data is used frequently for a simple SuperDB performance test and to check for unexpected changes in the SuperDB output formats.
Because prior changes to the Super Binary and Super JSON output formats have added some bulk to the revision history, you'll typically want to save time by just downloading the latest revision:
# git clone --depth=1 https://github.com/brimdata/zed-sample-data.git
This sample data set was generated from a subset of the packet capture archives (formerly at https://archive.wrccdc.org/pcaps, though the site has been down of late) that are distributed by the WRCCDC.
This sample data is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License, as it is built upon the WRCCDC PCAP data that is distributed under the same license.
We would like to express our thanks to the WRCCDC for generously making their packet capture archives available to the public and for commercial use. The terabytes of "real world" data has been invaluable to us in testing the foundations of super
at scale.
The data set was made from the several PCAP files in the 2018 set. Zeek v6.2.0 was used in its default configuration with the only change being the addition/enabling of the JSON Streaming Logs package. The packet captures were then processed via the command-lines:
# mergecap -w wrccdc.pcap wrccdc.2018-03-24.10*.pcap
# zeek -r wrccdc.pcap local "JSONStreaming::enable_log_rotation=F"
This produced the logs in Zeek default and JSON formats. As Super Binary and Super JSON are not output by Zeek, these logs were created by sending each Zeek default log through super
, e.g.:
# mkdir -p bsup && \
for file in zeek-default/*
do
super -f bsup "$file" \
| gzip -n > bsup/"$(basename "$file" | sed 's/\.log\.gz//')".bsup.gz
done
# mkdir -p bsup-uncompressed && \
for file in zeek-default/*
do
super -f bsup -bsup.compress=false "$file" \
| gzip -n > bsup-uncompressed/"$(basename "$file" | sed 's/\.log\.gz//')".bsup.gz
done
# mkdir -p jsup && \
for file in zeek-default/*
do
super -f jsup "$file" \
| gzip -n > jsup/"$(basename "$file" | sed 's/\.log\.gz//')".jsup.gz
done
Since the sample Super Binary and Super JSON logs are generated by super
, regenerating these outputs is a useful super
test. Assuming super
is in your $PATH
, a script is provided to regenerate the hash for each Super Binary and Super JSON log and compare it to a last known "good" hash stored in the md5sums/
directory.
# scripts/check_md5sums.sh bsup
capture_loss:62949d22a0a557342d28ee5ee4b64d50
...
x509:10333d3d004c718b04cbedb8ee195cca
diff'ing current "super -f bsup" output hashes vs. committed hashes:
7c7
< ftp:c84824c8114df4db745399ff875b0d92
---
> ftp:2d8d90df3c4b84eb9e281a3f10767aa5
======> diffs detected! Check for a super bug or intentional Super Binary format change.
Current hashes are in /var/folders/yn/jbkxxkpd4vg142pc3_bd_krc0000gn/T/tmp.9X7Gab9I