Skip to content

Incomplete Visits Table #1076

Closed Answered by vringar
MohammadMahdiJavid asked this question in Q&A
Discussion options

You must be logged in to vote

Hey,

I just realized that we never documented this anywhere.

The short version is that you should discard all data from incomplete visits.

There are two ways a visit can be considered a failed visit, if there was an error while executing the command sequence or if the command sequence was interrupted by a shutdown.

While there might be some data saved but due to the incompleteness we never considered it as part of our analysis.

If you are running larger crawls with lots of failing websites, I would recommend you take a look at the crawler.py, which is what we previously used. Retrying each website up to three times if there was a failure helped us significantly reduce the failure percentage.

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by MohammadMahdiJavid
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants