Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: assertion failed: (*header).pd_special >= SizeOfPageHeaderData as u16 #193

Open
Eli-Airis opened this issue Jan 16, 2025 · 7 comments
Labels
bug Something isn't working community pgvectorscale

Comments

@Eli-Airis
Copy link

What happened?

I am using the official docker image unchanged:

$ docker images
REPOSITORY                 TAG               IMAGE ID       CREATED        SIZE
timescale/timescaledb-ha   pg17.2-ts2.17.2   16b410ac50dc   3 weeks ago    1.79GB

and I'm frequently getting the following assertion error when (multi-)inserting into a table with a diskann index: assertion failed: (*header).pd_special >= SizeOfPageHeaderData as u16.
My loaded extensions:

# SELECT extname, extversion
FROM pg_extension;
       extname       | extversion
---------------------+------------
 plpgsql             | 1.0
 vector              | 0.8.0
 vectorscale         | 0.5.1
 pg_stat_monitor     | 2.1
 pageinspect         | 1.12
 timescaledb_toolkit | 1.19.0
 timescaledb         | 2.17.2

I did not set any option explicitly when creating the index:

# CREATE INDEX my_index ON my_table USING diskann (my_vector_field vector_cosine_ops);
NOTICE:  Starting index build. num_neighbors=-1 search_list_size=100, max_alpha=1.2, storage_layout=SbqCompression

I only changed the following parameters in my configuration based on https://pgtune.leopard.in.ua/?dbVersion=17&osType=linux&dbType=dw&cpuNum=32&totalMemory=32&totalMemoryUnit=GB&connectionNum=&hdType=ssd:

# DB Version: 17
# OS Type: linux
# DB Type: dw
# Total Memory (RAM): 32 GB
# CPUs num: 32
# Data Storage: ssd

#max_connections = 40  # Why was this even suggested for my strong machine?
shared_buffers = 8GB
effective_cache_size = 24GB
maintenance_work_mem = 2GB
checkpoint_completion_target = 0.9
wal_buffers = 16MB
default_statistics_target = 500
random_page_cost = 1.1
effective_io_concurrency = 200
work_mem = 6553kB
huge_pages = try
min_wal_size = 4GB
max_wal_size = 16GB
max_worker_processes = 32
max_parallel_workers_per_gather = 16
max_parallel_workers = 32
max_parallel_maintenance_workers = 4

I have also tried running without any change to the default postgresql.conf, and still got the same assertion error.
SHOW blocksize returns 8192

I would appreciate any help resolving this.

pgvectorscale extension affected

0.5.1

PostgreSQL version used

17.2

What operating system did you use?

Linux my-db-machine 6.8.0-1020-gcp #22-Ubuntu SMP Mon Dec 9 17:09:22 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

What installation method did you use?

Docker

What platform did you run on?

Google Cloud Platform (GCP)

Relevant log output and stack trace

All I'm getting in the container's log is:
2025-01-16 18:49:08.093 UTC [538] ERROR:  assertion failed: (*header).pd_special >= SizeOfPageHeaderData as u16
followed by something like
2025-01-16 18:49:08.093 UTC [538] STATEMENT:  INSERT INTO my_table (field1, field2, field3, my_vector_field, final_field

How can we reproduce the bug?

The only unusual thing about my vector field is its size isn't a power of 2, it's 513.

I run a multi-insert from Sqlalchemy of batches of size 1000.

Are you going to work on the bugfix?

None

@Eli-Airis Eli-Airis added bug Something isn't working community pgvectorscale labels Jan 16, 2025
@Eli-Airis Eli-Airis changed the title [Bug]: <Title> [Bug]: assertion failed: (*header).pd_special >= SizeOfPageHeaderData as u16 Jan 16, 2025
@Eli-Airis
Copy link
Author

I also checked the page size:

# CREATE EXTENSION pageinspect;
CREATE EXTENSION
# SELECT page_header(get_raw_page('pg_class', 0));
              page_header
----------------------------------------
 (0/278CFA0,0,1,212,4400,8192,8192,4,0)

@tjgreen42
Copy link
Contributor

tjgreen42 commented Jan 16, 2025

Ouch, page corruption. @Eli-Airis would you be willing to share the workload you're running that triggers this? Likely me or @syvb will need to try to repro this ourselves to make progress. (I can share contact info if you're not comfortable posting it somewhere publicly accessible).

@Eli-Airis
Copy link
Author

Eli-Airis commented Jan 17, 2025 via email

@Eli-Airis
Copy link
Author

Hi @tjgreen42 and @smoya, I have approval for sharing the workload. Could you please provide the email address that I should give access to? The workload isn't sensitive but we would still prefer you don't share it outside of Timescale.

@Eli-Airis
Copy link
Author

Looks like the regression was introduced between versions 2.17.1 and 2.17.2:

I COULD NOT reproduce the assertion error with the docker image tagged timescale/timescaledb-ha:pg16.4-ts2.17.1 (image ID 7f9533ca34d7),
and COULD reproduce the error with the docker images tagged timescale/timescaledb-ha:pg16 (image ID 2bfd8ae73876), timescale/timescaledb-ha:pg17-ts2.17-oss (image ID f512c54b681c), or timescale/timescaledb-ha:pg17.2-ts2.17.2 (image ID 16b410ac50dc)

Note that the issue is reproduced with version 2.17.2 both on pg16 and on pg17.

@tjgreen42
Copy link
Contributor

Hi @Eli-Airis, my work email address is [email protected]. Thanks much for being willing to share the workload.

@Eli-Airis
Copy link
Author

Great! I shared the payload with you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working community pgvectorscale
Projects
None yet
Development

No branches or pull requests

2 participants