Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update mermaid graph of onigumo processing with a new approach #237

Merged
merged 7 commits into from
Aug 16, 2024
60 changes: 40 additions & 20 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,27 +16,47 @@ The flowchart below illustrates the flow of data between those parts:

```mermaid
flowchart LR
start([START]) --> onigumo_operator[OPERATOR]
onigumo_operator -- <hash>.urls ---> onigumo_downloader[DOWNLOADER]
onigumo_downloader -- <hash>.raw ---> onigumo_parser[PARSER]
onigumo_parser -- <hash>.json ---> onigumo_operator

onigumo_operator <-.-> spider_operator[OPERATOR]
onigumo_parser <-.-> spider_parser[PARSER]

onigumo_operator --> spider_materialization[MATERIALIZER]

subgraph "Onigumo (kernel)"
onigumo_operator
onigumo_downloader
onigumo_parser
end

subgraph "Spider (application)"
spider_operator
spider_parser
spider_materialization
subgraph Crawling
direction BT
spider_parser(🕷️ PARSER)
spider_operator(🕷️ OPERATOR)
Comment on lines +21 to +22
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea of using the emoji to get around GitHub not supporting fontawesome icons in Mermaid diagrams. 💪🏻

onigumo_downloader[DOWNLOADER]
end

start([START]) --> onigumo_feeder[FEEDER]
onigumo_feeder -- .raw --> Crawling
onigumo_feeder -- .urls --> Crawling
onigumo_feeder -- .json --> Crawling
Crawling --> spider_materializer(🕷️ MATERIALIZER)
spider_materializer --> done([END])
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
start([START]) --> onigumo_feeder[FEEDER]
onigumo_feeder -- .raw --> Crawling
onigumo_feeder -- .urls --> Crawling
onigumo_feeder -- .json --> Crawling
Crawling --> spider_materializer(🕷️ MATERIALIZER)
spider_materializer --> done([END])
start([START]) --> onigumo_feeder[FEEDER]
onigumo_feeder -- .raw --> Crawling
onigumo_feeder -- .urls --> Crawling
onigumo_feeder -- .json --> Crawling
Crawling --> spider_materializer(🕷️ MATERIALIZER)
spider_materializer --> done([END])

I‘d split this into smaller groups.


spider_operator -. "<hash>.urls" .-> onigumo_downloader
onigumo_downloader -. "<hash>.raw" .-> spider_parser
spider_parser -. "<hash>.json" .-> spider_operator
```

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We were thinking about adding some title to the second diagram. But I don’t know what it should say.

```mermaid
flowchart LR
subgraph "🕷️ Spider"
direction TB
spider_parser(PARSER)
spider_operator(OPERATOR)
spider_materializer(MATERIALIZER)
end

subgraph Onigumo
onigumo_feeder[FEEDER]
onigumo_downloader[DOWNLOADER]
end

onigumo_feeder -- .json --> spider_operator
spider_operator ---> spider_materializer
onigumo_feeder -- .urls --> onigumo_downloader
onigumo_feeder -- .raw --> spider_parser

spider_parser -. "<hash>.json" .-> spider_operator
onigumo_downloader -. "<hash>.raw" .-> spider_parser
spider_operator -. "<hash>.urls" .-> onigumo_downloader
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
onigumo_feeder -- .json --> spider_operator
spider_operator ---> spider_materializer
onigumo_feeder -- .urls --> onigumo_downloader
onigumo_feeder -- .raw --> spider_parser
spider_parser -. "<hash>.json" .-> spider_operator
onigumo_downloader -. "<hash>.raw" .-> spider_parser
spider_operator -. "<hash>.urls" .-> onigumo_downloader
onigumo_feeder -- .json --> spider_operator
onigumo_feeder -- .urls --> onigumo_downloader
onigumo_feeder -- .raw --> spider_parser
spider_parser -. "<hash>.json" .-> spider_operator
onigumo_downloader -. "<hash>.raw" .-> spider_parser
spider_operator -. "<hash>.urls" .-> onigumo_downloader
spider_operator ---> spider_materializer

See my comment from my previous review. I’d push the spider_operator --→ spider_materializer relation to the bottom, so the Spider subgraph arrows are coupled together

```

### Operator ###
Expand Down
Loading