Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update mermaid graph of onigumo processing with a new approach #237

Merged
merged 7 commits into from
Aug 16, 2024
39 changes: 19 additions & 20 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,28 +15,27 @@ Onigumo is composed of three sequentially interconnected components:
The flowchart below illustrates the flow of data between those parts:

```mermaid
---
title: Onigumo architecture
---
flowchart LR
start([START]) --> onigumo_operator[OPERATOR]
onigumo_operator -- <hash>.urls ---> onigumo_downloader[DOWNLOADER]
onigumo_downloader -- <hash>.raw ---> onigumo_parser[PARSER]
onigumo_parser -- <hash>.json ---> onigumo_operator

onigumo_operator <-.-> spider_operator[OPERATOR]
onigumo_parser <-.-> spider_parser[PARSER]

onigumo_operator --> spider_materialization[MATERIALIZER]

subgraph "Onigumo (kernel)"
onigumo_operator
onigumo_downloader
onigumo_parser
end

subgraph "Spider (application)"
spider_operator
spider_parser
spider_materialization
subgraph Crawling
direction BT
spider_parser(🕷️ PARSER)
spider_operator(🕷️ OPERATOR)
Comment on lines +21 to +22
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of using the emoji to get around GitHub not supporting fontawesome icons in Mermaid diagrams. 💪🏻

onigumo_downloader[DOWNLOADER]
end

start([START]) --> onigumo_feeder[FEEDER]
onigumo_feeder -- .raw --> Crawling
onigumo_feeder -- .urls --> Crawling
onigumo_feeder -- .json --> Crawling
Crawling --> spider_materializer(🕷️ MATERIALIZER)
spider_materializer --> done([END])
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we only use a single space,

Suggested change
start([START]) --> onigumo_feeder[FEEDER]
onigumo_feeder -- .raw --> Crawling
onigumo_feeder -- .urls --> Crawling
onigumo_feeder -- .json --> Crawling
Crawling --> spider_materializer(🕷️ MATERIALIZER)
spider_materializer --> done([END])
start([START]) --> onigumo_feeder[FEEDER]
onigumo_feeder -- .raw --> Crawling
onigumo_feeder -- .urls --> Crawling
onigumo_feeder -- .json --> Crawling
Crawling --> spider_materializer(🕷️ MATERIALIZER)
spider_materializer --> done([END])

or align the items to table columns?

Suggested change
start([START]) --> onigumo_feeder[FEEDER]
onigumo_feeder -- .raw --> Crawling
onigumo_feeder -- .urls --> Crawling
onigumo_feeder -- .json --> Crawling
Crawling --> spider_materializer(🕷️ MATERIALIZER)
spider_materializer --> done([END])
start([START]) --> onigumo_feeder[FEEDER]
onigumo_feeder -- .raw --> Crawling
onigumo_feeder -- .urls --> Crawling
onigumo_feeder -- .json --> Crawling
Crawling --> spider_materializer(🕷️ MATERIALIZER)
spider_materializer --> done([END])

I don’t have preference (at least now), but I’d like it to be at least consistent.


spider_operator -. "<hash>.urls" .-> onigumo_downloader
onigumo_downloader -. "<hash>.raw" .-> spider_parser
spider_parser -. "<hash>.json" .-> spider_operator
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
spider_operator -. "<hash>.urls" .-> onigumo_downloader
onigumo_downloader -. "<hash>.raw" .-> spider_parser
spider_parser -. "<hash>.json" .-> spider_operator
onigumo_downloader -. "<hash>.raw" .-> spider_parser
spider_operator -. "<hash>.urls" .-> onigumo_downloader
spider_parser -. "<hash>.json" .-> spider_operator
Suggested change
spider_operator -. "<hash>.urls" .-> onigumo_downloader
onigumo_downloader -. "<hash>.raw" .-> spider_parser
spider_parser -. "<hash>.json" .-> spider_operator
onigumo_downloader -. "<hash>.raw" .-> spider_parser
spider_operator. -. "<hash>.urls" .-> onigumo_downloader
spider_parser -. "<hash>.json" .-> spider_operator

```

### Operator ###
Expand Down
Loading