Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

成片的Crawled (200) #343

Open
Macedonialapadian opened this issue Sep 22, 2024 · 0 comments
Open

成片的Crawled (200) #343

Macedonialapadian opened this issue Sep 22, 2024 · 0 comments

Comments

@Macedonialapadian
Copy link

Macedonialapadian commented Sep 22, 2024

如图所示,comment正常爬取时,返回的都是DEBUG: Scraped from <200 URL>{content}。但是图中出现了成片DEBUG: Crawled (200) (referer: None)的形式。出现此种问题后,comment.py往往会飞快地结束。(可能是直接跳过了无法爬取的微博)。

我对comment.py做了改动,改动是将tweet_id加入了对应评论的数据集中。(见附件)
同时,我将setting.py中的并行数从16改为8,将随机请求时间上限从1改为5

截屏2024-09-22 15 02 55 [comment.py.zip](https://github.com/user-attachments/files/17088611/comment.py.zip)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant