-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] scout:reimport moves index too soon, multiple race conditions #171
Comments
Hi @nathanblogs. Thanks for reporting this issue. A solution for your issue would be setting the
If it works, do you mind of creating a pull request adding this as optional argument on the |
where does |
It's not declared, that's why the configuration option that is When is scout-extended/src/Jobs/UpdateJob.php Line 139 in f3475be
Give it a try, and tell me if it solves your problem. |
I'm not sure how UpdateJob is relevent ? I can't see how that would help, isn't it calling the existing scout function makeAllSearchable.
This intern calls searchable in batches which dispatches jobs. After that there is a moveindex.
This moveIndex has no ability to tell if all the dispatched searchable jobs have finished, which I believe is the cause of the issue. I think the whole reimport needs a rethink to be a sync function instead of a temporary index. |
Problem finally understood. I will try to investigate this during the next couple days. |
I'm taking this issues @nunomaduro |
I got it to work by turning-off the queue entirely. Example:
|
Hopefully your reimports are as rare as mine. I had to make a structural change to everything in an index and added the above to a migration up() method so that I don't have to remember to SSH in to prod when a batch of changes get deployed. You would also be able to do this through SSH via |
@torrancemiller Maybe we can work in a pull request together on this no? |
@torrancemiller that solution doesn't resolve the race condition that can occur unless you put your system into read only model or maintenance. Basically what can happen is:
Which is a race condition you now have old / stale data in algolia for model id 1, this is particularly a problem if you have a large dataset or a high frequency of updates on searchable models. |
I do not believe there is a technical solution to that sort of thing. If that is not an option, you will need to get fancy and write a SQL query to retrieve all models updated after you decided to reimport and call the ->searchable() method on that result set and you should be peachy with them. |
@nunomaduro well, I suppose there may be much more elegant fixes if they were to be incorporated into the package. I wish I had more free time to see if there may be a surgical fix here. |
@torrancemiller of course there are technical solutions to this. I believe the easiest solution would be to update UpdateJob function to add a sha1/md5 hash to the array returned by Then we could update ImportCommand to do a sync instead of a full import:
|
Any thoughts ? |
@nunomaduro any reason not to just force SCOUT_QUEUE to be ignored when scout is invoked from this context? I don't see any other way this could possibly work. |
You mean in the command itself? |
I have my scout queue setting set like this: Specifying the queue like this will not wait for the temp index to be ready, but moves an empty index to 'production'. Anything I can do to get this fixed? |
Well, maybe you/we can work in a pull request to get this fixed. Just to be clear, the problem is: Only when the user is using queues:
|
I agree on the problem, gonna see if I can fix it in the upcoming days :) |
Any news on this? Edit: I went ahead and just created a new command that sets queue to false, then sets it back to my configured settings.
|
If you are using a queue to reimport into algolia that isn't FIFO eg AWS SQS, there is a very high chance of hitting a race condition where you move the temp index to the real index without pushing all the data into the new index. I've seen examples of 5-10% of the records missing.
Also depending on how long the reimport takes and how frequently you are updating records it seems like there is a high chance the reimport will import stale data by using this approach timeline eg:
You now have stale data in the newly moved index
Unsure is this is related but you seem to reference
config('scout.synchronous', false)
in the code but I can't seem to figure out where or how that is defined ?The text was updated successfully, but these errors were encountered: