Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] scout:reimport moves index too soon, multiple race conditions #171

Open
nathanblogs opened this issue Apr 8, 2019 · 20 comments
Open
Labels
bug Something isn't working

Comments

@nathanblogs
Copy link

nathanblogs commented Apr 8, 2019

If you are using a queue to reimport into algolia that isn't FIFO eg AWS SQS, there is a very high chance of hitting a race condition where you move the temp index to the real index without pushing all the data into the new index. I've seen examples of 5-10% of the records missing.

Also depending on how long the reimport takes and how frequently you are updating records it seems like there is a high chance the reimport will import stale data by using this approach timeline eg:

  1. start reimport
  2. records gets pushed into new temp index
  3. record gets updated and pushed into old index
  4. temp index gets moved to main index.

You now have stale data in the newly moved index

Unsure is this is related but you seem to reference config('scout.synchronous', false) in the code but I can't seem to figure out where or how that is defined ?

@nunomaduro
Copy link
Contributor

Hi @nathanblogs. Thanks for reporting this issue.

A solution for your issue would be setting the scout.synchronous to true. Using:

config(['scout.synchronous' => true]);
// Run your command
config(['scout.synchronous' => false]);

If it works, do you mind of creating a pull request adding this as optional argument on the reimport command?

@nunomaduro nunomaduro added the bug Something isn't working label Apr 8, 2019
@nathanblogs
Copy link
Author

where does scout.synchronous come from it doesn't seem to be declared anywhere ?

@nunomaduro
Copy link
Contributor

It's not declared, that's why the configuration option that is false by default.

When is true, we actually wait for the saveObjects to finished before moving to the next one:

$result->wait();
.

Give it a try, and tell me if it solves your problem.

@nathanblogs
Copy link
Author

I'm not sure how UpdateJob is relevent ?

I can't see how that would help, isn't it calling the existing scout function makeAllSearchable.

$searchable::makeAllSearchable();

This intern calls searchable in batches which dispatches jobs.

After that there is a moveindex.

$response = $client->moveIndex($temporaryName, $index->getIndexName());

This moveIndex has no ability to tell if all the dispatched searchable jobs have finished, which I believe is the cause of the issue.

I think the whole reimport needs a rethink to be a sync function instead of a temporary index.

@nunomaduro
Copy link
Contributor

Problem finally understood.

I will try to investigate this during the next couple days.

@TomKlotzPro
Copy link

I'm taking this issues @nunomaduro

@torrancemiller
Copy link

torrancemiller commented Apr 25, 2019

I got it to work by turning-off the queue entirely.

Example:
config(['scout.queue' => false]);

\Illuminate\Support\Facades\Artisan::call('scout:reimport "App\\\\Models\\\\Thread"');

config(['scout.queue' => true]);

@torrancemiller
Copy link

torrancemiller commented Apr 25, 2019

Hopefully your reimports are as rare as mine. I had to make a structural change to everything in an index and added the above to a migration up() method so that I don't have to remember to SSH in to prod when a batch of changes get deployed.

You would also be able to do this through SSH via php artisan tinker of course

@nunomaduro
Copy link
Contributor

@torrancemiller Maybe we can work in a pull request together on this no?

@nathanblogs
Copy link
Author

@torrancemiller that solution doesn't resolve the race condition that can occur unless you put your system into read only model or maintenance.

Basically what can happen is:

  1. you start the scout:reimport
  2. model id 1 gets push to algolia new temp index.
  3. model id 1 gets update in your system
  4. scout:reimport finish moving the temp index over the real index

Which is a race condition you now have old / stale data in algolia for model id 1, this is particularly a problem if you have a large dataset or a high frequency of updates on searchable models.

@torrancemiller
Copy link

torrancemiller commented Apr 26, 2019

@nathanblogs

I do not believe there is a technical solution to that sort of thing.
I would recommend doing this during a service maintenance window, and close to never on production servers.
However, if you did have to do this in production, I recommend running the php artisan down command to block incoming requests to your server while the reimport runs.

If that is not an option, you will need to get fancy and write a SQL query to retrieve all models updated after you decided to reimport and call the ->searchable() method on that result set and you should be peachy with them.

@torrancemiller
Copy link

torrancemiller commented Apr 26, 2019

@nunomaduro well, I suppose there may be much more elegant fixes if they were to be incorporated into the package. I wish I had more free time to see if there may be a surgical fix here.

@nathanblogs
Copy link
Author

nathanblogs commented May 14, 2019

@torrancemiller of course there are technical solutions to this.

I believe the easiest solution would be to update UpdateJob function to add a sha1/md5 hash to the array returned by toSearchableArray().

Then we could update ImportCommand to do a sync instead of a full import:

  • chunk the searchable model on setting scout.chunk.searchable.
  • fetch objects from algolia using getObjects for this chunk, if missing item from index run searchable.
  • compare local hash from toSearchableArray() to algolia record, run `searchable if different.
  • browse all objects in algolia index, check they are a valid and searchable object delete from index if they are not.

@nathanblogs
Copy link
Author

Any thoughts ?

@jimbojsb
Copy link
Contributor

jimbojsb commented Jan 8, 2020

@nunomaduro any reason not to just force SCOUT_QUEUE to be ignored when scout is invoked from this context? I don't see any other way this could possibly work.

@nunomaduro
Copy link
Contributor

You mean in the command itself?

@tomcoonen
Copy link

I have my scout queue setting set like this:
'queue' => [ 'queue' => 'search-index', ],

Specifying the queue like this will not wait for the temp index to be ready, but moves an empty index to 'production'.

Anything I can do to get this fixed?

@nunomaduro
Copy link
Contributor

nunomaduro commented Mar 20, 2020

Well, maybe you/we can work in a pull request to get this fixed. Just to be clear, the problem is:

Only when the user is using queues:

1. we copy settings to temporary index
2. we queue jobs of indexation
3. we move the temporary index to production without waiting for the queued jobs to be complete.

@tomcoonen
Copy link

I agree on the problem, gonna see if I can fix it in the upcoming days :)

@logan-jobzmall
Copy link

logan-jobzmall commented Mar 11, 2021

Any news on this?

Edit: I went ahead and just created a new command that sets queue to false, then sets it back to my configured settings.


public function handle()
    {
        \Config::set('scout.queue', false);
        $this->call('scout:reimport');
        \Config::set('scout.queue',
            [
                "queue" => "search",
                "connection" => "redis"
            ]);
    }
    

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants