Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update mermaid graph of onigumo processing with a new approach #237

Merged
merged 7 commits into from
Aug 16, 2024
64 changes: 44 additions & 20 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,27 +16,51 @@ The flowchart below illustrates the flow of data between those parts:

```mermaid
flowchart LR
start([START]) --> onigumo_operator[OPERATOR]
onigumo_operator -- <hash>.urls ---> onigumo_downloader[DOWNLOADER]
onigumo_downloader -- <hash>.raw ---> onigumo_parser[PARSER]
onigumo_parser -- <hash>.json ---> onigumo_operator

onigumo_operator <-.-> spider_operator[OPERATOR]
onigumo_parser <-.-> spider_parser[PARSER]

onigumo_operator --> spider_materialization[MATERIALIZER]

subgraph "Onigumo (kernel)"
onigumo_operator
onigumo_downloader
onigumo_parser
end

subgraph "Spider (application)"
spider_operator
spider_parser
spider_materialization
subgraph Crawling
direction BT
spider_parser(🕷️ PARSER)
spider_operator(🕷️ OPERATOR)
Comment on lines +21 to +22
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of using the emoji to get around GitHub not supporting fontawesome icons in Mermaid diagrams. 💪🏻

onigumo_downloader[DOWNLOADER]
end

start([START]) --> onigumo_feeder[FEEDER]

onigumo_feeder -- .raw --> Crawling
onigumo_feeder -- .urls --> Crawling
onigumo_feeder -- .json --> Crawling

Crawling --> spider_materializer(🕷️ MATERIALIZER)

spider_materializer --> done([END])

spider_operator -. "<hash>.urls" .-> onigumo_downloader
onigumo_downloader -. "<hash>.raw" .-> spider_parser
spider_parser -. "<hash>.json" .-> spider_operator
```

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We were thinking about adding some title to the second diagram. But I don’t know what it should say.

```mermaid
flowchart LR
subgraph "🕷️ Spider"
direction TB
spider_parser(PARSER)
spider_operator(OPERATOR)
spider_materializer(MATERIALIZER)
end

subgraph Onigumo
onigumo_feeder[FEEDER]
onigumo_downloader[DOWNLOADER]
end

onigumo_feeder -- .json --> spider_operator
onigumo_feeder -- .urls --> onigumo_downloader
onigumo_feeder -- .raw --> spider_parser

spider_parser -. "<hash>.json" .-> spider_operator
onigumo_downloader -. "<hash>.raw" .-> spider_parser
spider_operator -. "<hash>.urls" .-> onigumo_downloader

spider_operator ---> spider_materializer
```

### Operator ###
Expand Down
Loading