Skip to content

Commit

Permalink
[UPDATE] add new chart and update readme
Browse files Browse the repository at this point in the history
  • Loading branch information
pratik-choudhari committed Sep 14, 2024
1 parent 562bfc6 commit 62b55d5
Show file tree
Hide file tree
Showing 2 changed files with 92 additions and 33 deletions.
13 changes: 11 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,5 +18,14 @@ An apache spark ETL pipeline to ingest HTTP web server logs and analyse using ma
- Read logs from text file
- Use regex to extract groups. Groups correspond to columns
- Update schema of dataframe
- Add new columns
- Partition and store in parquet format
- Add new columns using transformations
- Partition and store in parquet format

## Analysis

- Most requested file types
- Reply size stats
- Avg reply size over time
- HTTP methods distribution
- Avg reply size by HTTP method
- Request count by weekday
Loading

0 comments on commit 62b55d5

Please sign in to comment.