You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
My goal was to create a StreamingDiskANN index of 10,000,000,000 records.
To this end, I was running my DB under heavy INSERT load (see reproduction steps below) for many hours.
All 32 cores of the machine were at 85%+ CPU usage.
In the morning, I found the machine almost idle - all cores at 20% use at most.
All the client processes I had running on the same machine were still up, but at 0% CPU and in sleeping state.
~4 DB processes were running, and showed "INSERT" state in htop.
I was able to connect to the DB via psql, and could query other tables, but not my main table. When I tried to query my main table the connection was lost.
Those psql query attempts triggered the following messages in the DB's log:
2025-01-22 09:57:50.152 UTC [194477] FATAL: the database system is not yet accepting connections
2025-01-22 09:57:50.152 UTC [194477] DETAIL: Consistent recovery state has not been yet reached.
I am running the timescale/timescaledb-ha:pg16.4-ts2.17.1 docker image (ID 7f9533ca34d7), since the 2.17.2 is unusable for me due to #193 .
pgvectorscale extension affected
No response
PostgreSQL version used
16.4
What operating system did you use?
Ubuntu 22 x64
What installation method did you use?
Docker
What platform did you run on?
Google Cloud Platform (GCP)
Relevant log output and stack trace
2025-01-22 09:53:05.951 UTC [19] LOG: checkpoint starting: time
2025-01-22 09:55:39.338 UTC [1] LOG: server process (PID 182319) was terminated by signal 11: Segmentation fault
2025-01-22 09:55:39.338 UTC [1] DETAIL: Failed process was running: SELECT (class_embedding <=> (select class_embedding from basic_objects where id=228703)) as cosine_dist, (class_embedding <-> (select class_embedding from basic_objects where id=228703)) as l2_dist, *
FROM basic_objects ORDER BY cosine_dist LIMIT 100 offset 0;
2025-01-22 09:55:39.338 UTC [1] LOG: terminating any other active server processes
2025-01-22 09:55:39.368 UTC [182421] FATAL: the database system is in recovery mode
2025-01-22 09:55:39.957 UTC [182423] FATAL: the database system is in recovery mode
2025-01-22 09:55:39.957 UTC [182424] FATAL: the database system is in recovery mode
2025-01-22 09:55:39.959 UTC [182422] FATAL: the database system is in recovery mode
2025-01-22 09:55:39.964 UTC [182426] FATAL: the database system is in recovery mode
2025-01-22 09:55:39.972 UTC [182427] FATAL: the database system is in recovery mode
2025-01-22 09:55:39.976 UTC [182428] FATAL: the database system is in recovery mode
2025-01-22 09:55:39.993 UTC [182429] FATAL: the database system is in recovery mode
2025-01-22 09:55:39.998 UTC [182425] FATAL: the database system is in recovery mode
2025-01-22 09:55:40.012 UTC [182431] FATAL: the database system is in recovery mode
How can we reproduce the bug?
Use 24 parallel processes, each running 2 async workers uploading data from the payload I shared in#193 using multi-inserts of batches of 1000 records.
Are you going to work on the bugfix?
🆘 No, could someone else please work on the bugfix?
The text was updated successfully, but these errors were encountered:
Note that the segfault itself was reached when I tried to query the DB. I don't see an indication of a bad state before that, despite the state I found the DB in.
What happened?
My goal was to create a StreamingDiskANN index of 10,000,000,000 records.
To this end, I was running my DB under heavy INSERT load (see reproduction steps below) for many hours.
All 32 cores of the machine were at 85%+ CPU usage.
In the morning, I found the machine almost idle - all cores at 20% use at most.
All the client processes I had running on the same machine were still up, but at 0% CPU and in sleeping state.
~4 DB processes were running, and showed "INSERT" state in htop.
I was able to connect to the DB via psql, and could query other tables, but not my main table. When I tried to query my main table the connection was lost.
Those psql query attempts triggered the following messages in the DB's log:
I am running the
timescale/timescaledb-ha:pg16.4-ts2.17.1
docker image (ID 7f9533ca34d7), since the 2.17.2 is unusable for me due to #193 .pgvectorscale extension affected
No response
PostgreSQL version used
16.4
What operating system did you use?
Ubuntu 22 x64
What installation method did you use?
Docker
What platform did you run on?
Google Cloud Platform (GCP)
Relevant log output and stack trace
How can we reproduce the bug?
Are you going to work on the bugfix?
🆘 No, could someone else please work on the bugfix?
The text was updated successfully, but these errors were encountered: