You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Well, the Crawl-Delay is not respected by the biggest crawlers. We've seen days where a bot User-Agent is at an average of 5 req/s. We can however hope they respect Allow, which we could generate dynamically to avoid letting crawlers hope they can fully index Elixir one day.
We should limit to few versions for each project. Eg, for Linux, it should be a few old releases (like the latest v2.6), the LTS releases, and the last few N versions.
For example, for Linux, we would go from 6843 to 10~30 tags. Doing so would mean that crawlers would actually be able to index Elixir, and stop once done. Currently, it isn't possible any crawler is done seeing the theoretical page count.
Top crawlers in terms of requests per seconds are labeling themselves properly in User-Agent, we hope those respect robots.txt.
The text was updated successfully, but these errors were encountered:
Currently, our
robots.txt
is:Well, the
Crawl-Delay
is not respected by the biggest crawlers. We've seen days where a bot User-Agent is at an average of 5 req/s. We can however hope they respectAllow
, which we could generate dynamically to avoid letting crawlers hope they can fully index Elixir one day.We should limit to few versions for each project. Eg, for Linux, it should be a few old releases (like the latest v2.6), the LTS releases, and the last few N versions.
For example, for Linux, we would go from 6843 to 10~30 tags. Doing so would mean that crawlers would actually be able to index Elixir, and stop once done. Currently, it isn't possible any crawler is done seeing the theoretical page count.
Top crawlers in terms of requests per seconds are labeling themselves properly in User-Agent, we hope those respect
robots.txt
.The text was updated successfully, but these errors were encountered: