You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This commit was created on GitHub.com and signed with GitHub’s verified signature.
The key has expired.
You can now us the module as a command line interface (CLI). Usage: python -m newspaper --url https://www.test.com. More information in the documentation.
I have added an evaluation script against a dataset from scrapinghub. This will help keeping track of future improvements.
Better handling of multithreaded requests. The previous version had a bug that could lead to a deadlock. I implemented ThreadPoolExecutor from the concurrent.futures module, which is more stable. The previously news_pool was replaced with a fetch_news() function.
Caching is now much more flexible. You can disable it completely or for one request.
You can now use newspaper.article() function for convenience. It will create, download and parse an article in one step. It takes all the parameters of the Article class.
protected sites by cloudflare are better detected and raise an exception. The reason will be in the exception message.