Skip to content

Commit

Permalink
Clean up leftover replication slots in tests (#7338)
Browse files Browse the repository at this point in the history
This commit fixes the flakiness in `logical_replication` and
`citus_non_blocking_split_shard_cleanup` tests. The flakiness
was related to leftover replication slots.
Below is a flaky example for each test:

logical_replication https://github.com/citusdata/citus/actions/runs/6721324131/attempts/1#summary-18267030604
citus_non_blocking_split_shard_cleanup https://github.com/citusdata/citus/actions/runs/6721324131/attempts/1#summary-18267006967

```diff
 -- Replication slots should be cleaned up
 SELECT slot_name FROM pg_replication_slots;
             slot_name            
 ---------------------------------
-(0 rows)
+ citus_shard_split_slot_19_10_17
+(1 row)
```

The tests by themselves are not flaky: 32 flaky test
schedules each with 20 runs run successfully.
https://github.com/citusdata/citus/actions/runs/6822020127?pr=7338

The conclusion is that:
1. `multi_tenant_isolation_nonblocking` is the problematic test running
before `logical_replication` in the `enterprise_schedule`, so I added a
cleanup at the end of `multi_tenant_isolation_nonblocking`.
https://github.com/citusdata/citus/actions/runs/6824334614/attempts/1#summary-18560127461
2. `citus_split_shard_by_split_points_negative` is the problematic test
running before `citus_non_blocking_split_shards_cleanup` in the split
schedule. Also added cleanup line.

For details on the investigation of leftover replication slots,
please check the PR #7338
  • Loading branch information
naisila authored Nov 14, 2023
1 parent cdef2d5 commit a960799
Show file tree
Hide file tree
Showing 4 changed files with 17 additions and 0 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -135,4 +135,10 @@ NOTICE: drop cascades to 3 other objects
DETAIL: drop cascades to table citus_split_shard_by_split_points_negative.range_paritioned_table_to_split
drop cascades to table citus_split_shard_by_split_points_negative.table_to_split
drop cascades to table citus_split_shard_by_split_points_negative.table_to_split_replication_factor_2
SELECT public.wait_for_resource_cleanup();
wait_for_resource_cleanup
---------------------------------------------------------------------

(1 row)

--END : Cleanup
Original file line number Diff line number Diff line change
Expand Up @@ -1275,3 +1275,10 @@ SELECT count(*) FROM pg_catalog.pg_dist_partition WHERE colocationid > 0;
TRUNCATE TABLE pg_catalog.pg_dist_colocation;
ALTER SEQUENCE pg_catalog.pg_dist_colocationid_seq RESTART 100;
ALTER SEQUENCE pg_catalog.pg_dist_placement_placementid_seq RESTART :last_placement_id;
-- make sure we don't have any replication objects leftover on the nodes
SELECT public.wait_for_resource_cleanup();
wait_for_resource_cleanup
---------------------------------------------------------------------

(1 row)

Original file line number Diff line number Diff line change
Expand Up @@ -113,4 +113,5 @@ SELECT citus_split_shard_by_split_points(
--BEGIN : Cleanup
\c - postgres - :master_port
DROP SCHEMA "citus_split_shard_by_split_points_negative" CASCADE;
SELECT public.wait_for_resource_cleanup();
--END : Cleanup
3 changes: 3 additions & 0 deletions src/test/regress/sql/multi_tenant_isolation_nonblocking.sql
Original file line number Diff line number Diff line change
Expand Up @@ -607,3 +607,6 @@ TRUNCATE TABLE pg_catalog.pg_dist_colocation;
ALTER SEQUENCE pg_catalog.pg_dist_colocationid_seq RESTART 100;

ALTER SEQUENCE pg_catalog.pg_dist_placement_placementid_seq RESTART :last_placement_id;

-- make sure we don't have any replication objects leftover on the nodes
SELECT public.wait_for_resource_cleanup();

0 comments on commit a960799

Please sign in to comment.