Add support for dataset load with multiple processes #23

daverigby · 2024-02-14T09:50:41Z

Problem

Combining dataset loading (--pinecone-dataset) and multiple processes (--processes) does not currently work due to interactions between multithreading used by google.cloud.storage to download dataset files, and fork()ing done by locust to create multiple processes.

Solution

Fix this by only performing Dataset downloading in the parent process when locust is first started ('init' event), and having the child processes only read the already-downloaded dataset files later when the test starts ('test_start' event). This also avoids any unnecessary / conflicting download of the same data multiple times.

Type of Change

Bug fix (non-breaking change which fixes an issue)

Test Plan

New integration test added.

Combining dataset loading (--pinecone-dataset) and multiple processes (--processes) does not currently work due to interactions between multithreading used by google.cloud.storage to download dataset files, and fork()ing done by locust to create multiple processes. Fix this by only performing Dataset downloading in the parent process when locust is first started ('init' event), and having the child processes only read the already-downloaded dataset files later when the test starts ('test_start' event). This also avoids any unnecessary / conflicting download of the same data multiple times.

daverigby force-pushed the load_data_multi_process branch from 0bdf1df to f49407a Compare February 14, 2024 10:52

daverigby force-pushed the load_data_multi_process branch from f49407a to 85847c5 Compare February 14, 2024 11:10

daverigby merged commit 0a347f3 into main Feb 14, 2024
7 checks passed

daverigby deleted the load_data_multi_process branch February 14, 2024 11:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for dataset load with multiple processes #23

Add support for dataset load with multiple processes #23

daverigby commented Feb 14, 2024

Add support for dataset load with multiple processes #23

Add support for dataset load with multiple processes #23

Conversation

daverigby commented Feb 14, 2024

Problem

Solution

Type of Change

Test Plan