Skip to content

Commit

Permalink
Add zone config troubleshooting guide v1
Browse files Browse the repository at this point in the history
Fixes DOC-9210

Summary of changes:

- Add Zone Config troubleshooting guide

  - a.k.a. Chapter 3 of _The ZoneConfigonomicon (tm)_ (the existing
    'Replication Controls' page is Chapter 1, and 'Zone Config
    Extensions' is Chapter 2)

- Update 'Replication controls' page with more detailed info re: zone
  config inheritance hierarchy and behavior

- Fix incorrect ALTER RANGE statements since they're needed to map range
  IDs from critical nodes endpoint (mentioned in troubleshooting guide)
  to actual schema objects

- Add links from various zone config-related pages to the new
  troubleshooting guide
  • Loading branch information
rmloveland committed Jan 14, 2025
1 parent 9074f54 commit 31bc927
Show file tree
Hide file tree
Showing 27 changed files with 191 additions and 31 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
For instructions showing how to troubleshoot replication zones, see [Troubleshoot Replication Zone Configurations]({% link {{ page.version.version}}/troubleshoot-replication-zones.md %}).
6 changes: 6 additions & 0 deletions src/current/_includes/v24.3/sidebar-data/troubleshooting.json
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,12 @@
"/${VERSION}/query-replication-reports.html"
]
},
{
"title": "Troubleshoot Replication Zone Configurations",
"urls": [
"/${VERSION}/troubleshoot-replication-zones.html"
]
},
{
"title": "Benchmarking",
"items": [
Expand Down
10 changes: 10 additions & 0 deletions src/current/v24.3/alter-database.md
Original file line number Diff line number Diff line change
Expand Up @@ -715,6 +715,10 @@ You cannot `DISCARD` any zone configurations on multi-region tables, indexes, or
ALTER DATABASE movr CONFIGURE ZONE DISCARD;
~~~

#### Troubleshoot replication zones

{% include {{ page.version.version }}/see-zone-config-troubleshooting-guide.md %}

### Use Zone Config Extensions

The following examples show:
Expand Down Expand Up @@ -1078,6 +1082,12 @@ When you discard a zone configuration, the objects it was applied to will then i
However, this statement will not remove any configuration created by the [multi-region abstractions]({% link {{ page.version.version }}/multiregion-overview.md %}).
{{site.data.alerts.end}}

#### Troubleshoot Zone Config Extensions

The process for troubleshooting Zone Config Extensions is the same as troubleshooting any other changes to zone configs.

{% include {{ page.version.version }}/see-zone-config-troubleshooting-guide.md %}

### Change database owner

{% include {{page.version.version}}/sql/movr-statements.md %}
Expand Down
4 changes: 4 additions & 0 deletions src/current/v24.3/alter-index.md
Original file line number Diff line number Diff line change
Expand Up @@ -225,6 +225,10 @@ You cannot `DISCARD` any zone configurations on multi-region tables, indexes, or
ALTER INDEX vehicles@vehicles_auto_index_fk_city_ref_users CONFIGURE ZONE DISCARD;
~~~

#### Troubleshoot replication zones

{% include {{ page.version.version }}/see-zone-config-troubleshooting-guide.md %}

### Define partitions

#### Define a list partition on an index
Expand Down
8 changes: 8 additions & 0 deletions src/current/v24.3/alter-partition.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@ docs_area: reference.sql

To view details about existing replication zones, use [`SHOW ZONE CONFIGURATIONS`]({% link {{ page.version.version }}/show-zone-configurations.md %}). For more information about replication zones, see [Replication Controls]({% link {{ page.version.version }}/configure-replication-zones.md %}).

{% include {{ page.version.version }}/see-zone-config-troubleshooting-guide.md %}

You can use *replication zones* to control the number and location of replicas for specific sets of data, both when replicas are first added and when they are rebalanced to maintain cluster equilibrium.


Expand Down Expand Up @@ -44,3 +46,9 @@ The user must have the [`CREATE`]({% link {{ page.version.version }}/grant.md %}
### Create a replication zone for a partition

{% include {{ page.version.version }}/zone-configs/create-a-replication-zone-for-a-table-partition.md hide-enterprise-warning="true" %}

## See also

- [Table partitioning]({% link {{page.version.version}}/partitioning.md %})
- [`SHOW PARTITIONS`]({% link {{page.version.version}}/show-partitions.md %})
- [Troubleshoot Replication Zone Configurations]({% link {{ page.version.version}}/troubleshoot-replication-zones.md %})
14 changes: 7 additions & 7 deletions src/current/v24.3/alter-range.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,9 +34,9 @@ Additional parameters are documented for the respective [subcommands](#subcomman

### `CONFIGURE ZONE`

`ALTER RANGE ... CONFIGURE ZONE` is used to add, modify, reset, or remove replication zones for a range. To view details about existing replication zones, see [`SHOW ZONE CONFIGURATIONS`]({% link {{ page.version.version }}/show-zone-configurations.md %}).
`ALTER RANGE ... CONFIGURE ZONE` is used to add, modify, reset, or remove [replication zones]({% link {{ page.version.version }}/configure-replication-zones.md %}) for a range. To view details about existing replication zones, see [`SHOW ZONE CONFIGURATIONS`]({% link {{ page.version.version }}/show-zone-configurations.md %}).

You can use *replication zones* to control the number and location of replicas for specific sets of data, both when replicas are first added and when they are rebalanced to maintain cluster equilibrium.
You can use replication zones to control the number and location of replicas for specific sets of data, both when replicas are first added and when they are rebalanced to maintain cluster equilibrium.

#### Required privileges

Expand Down Expand Up @@ -121,7 +121,7 @@ For example, to get all range IDs, leaseholder store IDs, and leaseholder locali

{% include_cached copy-clipboard.html %}
~~~ sql
WITH user_info AS (SHOW RANGES FROM TABLE users) SELECT range_id, lease_holder, lease_holder_locality FROM user_info;
WITH user_info AS (SHOW RANGES FROM TABLE users WITH DETAILS) SELECT range_id, lease_holder, lease_holder_locality FROM user_info;
~~~

~~~
Expand Down Expand Up @@ -163,7 +163,7 @@ To move the leases for all data in the [`movr.users`]({% link {{ page.version.ve

{% include_cached copy-clipboard.html %}
~~~ sql
ALTER RANGE RELOCATE LEASE TO 2 FOR SELECT range_id from crdb_internal.ranges where table_name = 'users'
ALTER RANGE RELOCATE LEASE TO 2 FOR SELECT range_id from [SHOW RANGES FROM TABLE users WITH DETAILS];
~~~

~~~
Expand Down Expand Up @@ -205,7 +205,7 @@ To move the replicas for all data in the [`movr.users`]({% link {{ page.version.

{% include_cached copy-clipboard.html %}
~~~ sql
ALTER RANGE RELOCATE FROM 2 TO 7 FOR SELECT range_id from crdb_internal.ranges where table_name = 'users';
ALTER RANGE RELOCATE FROM 2 TO 7 FOR SELECT range_id from [SHOW RANGES FROM TABLE users WITH DETAILS];
~~~

~~~
Expand All @@ -231,7 +231,7 @@ To move all of a range's voting replicas from one store to another store:

{% include_cached copy-clipboard.html %}
~~~ sql
ALTER RANGE RELOCATE VOTERS FROM 7 TO 2 FOR SELECT range_id from crdb_internal.ranges where table_name = 'users';
ALTER RANGE RELOCATE VOTERS FROM 7 TO 2 FOR SELECT range_id from [SHOW RANGES FROM TABLE users WITH DETAILS];
~~~

~~~
Expand Down Expand Up @@ -261,7 +261,7 @@ This statement will only have an effect on clusters that have non-voting replica

{% include_cached copy-clipboard.html %}
~~~ sql
ALTER RANGE RELOCATE NONVOTERS FROM 7 TO 2 FOR SELECT range_id from crdb_internal.ranges where table_name = 'users';
ALTER RANGE RELOCATE NONVOTERS FROM 7 TO 2 FOR SELECT range_id from [SHOW RANGES FROM TABLE users WITH DETAILS];
~~~

~~~
Expand Down
2 changes: 2 additions & 0 deletions src/current/v24.3/alter-table.md
Original file line number Diff line number Diff line change
Expand Up @@ -223,6 +223,8 @@ You can use *replication zones* to control the number and location of replicas f

For examples, see [Replication Controls](#configure-replication-zones).

{% include {{ page.version.version }}/see-zone-config-troubleshooting-guide.md %}

#### Required privileges

The user must be a member of the [`admin` role]({% link {{ page.version.version }}/security-reference/authorization.md %}#admin-role) or have been granted [`CREATE`]({% link {{ page.version.version }}/security-reference/authorization.md %}#supported-privileges) or [`ZONECONFIG`]({% link {{ page.version.version }}/security-reference/authorization.md %}#supported-privileges) privileges. To configure [`system` objects]({% link {{ page.version.version }}/configure-replication-zones.md %}#for-system-data), the user must be a member of the `admin` role.
Expand Down
1 change: 1 addition & 0 deletions src/current/v24.3/backup.md
Original file line number Diff line number Diff line change
Expand Up @@ -378,3 +378,4 @@ To use an external connection URI to back up to cloud storage with an associated
- [`CREATE SCHEDULE FOR BACKUP`]({% link {{ page.version.version }}/create-schedule-for-backup.md %})
- [`RESTORE`]({% link {{ page.version.version }}/restore.md %})
- [Replication Controls]({% link {{ page.version.version }}/configure-replication-zones.md %})
- [Troubleshoot Replication Zone Configurations]({% link {{ page.version.version}}/troubleshoot-replication-zones.md %})
2 changes: 1 addition & 1 deletion src/current/v24.3/cluster-api.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ Endpoint | Name | Description | Support
[`/databases/{database}`](https://cockroachlabs.com/docs/api/cluster/v2.html#operation/databaseDetails) | Get database details | Get the descriptor ID of a specified database. | Stable
[`/databases/{database}/grants`](https://cockroachlabs.com/docs/api/cluster/v2.html#operation/databaseGrants) | List database grants | List all [privileges](security-reference/authorization.html#managing-privileges) granted to users for a specified database. | Stable
[`/databases/{database}/tables`](https://cockroachlabs.com/docs/api/cluster/v2.html#operation/databaseTables) | List database tables | List all tables in a specified database. | Stable
[`/databases/{database}/tables/{table}`](https://cockroachlabs.com/docs/api/cluster/v2.html#operation/tableDetails) | Get table details | Get details on a specified table, including schema, grants, indexes, range count, and zone configuration. | Stable
[`/databases/{database}/tables/{table}`](https://cockroachlabs.com/docs/api/cluster/v2.html#operation/tableDetails) | Get table details | Get details on a specified table, including schema, grants, indexes, range count, and [zone configurations]({% link {{ page.version.version }}/configure-replication-zones.md %}). | Stable
[`/events`](https://cockroachlabs.com/docs/api/cluster/v2.html#operation/listEvents) | List events | List the latest [events](eventlog.html) on the cluster, in descending order. | Unstable
[`/health`](https://cockroachlabs.com/docs/api/cluster/v2.html#operation/health) | Check node health | Determine if the node is running and ready to accept SQL connections. | Stable
[`/nodes`](https://cockroachlabs.com/docs/api/cluster/v2.html#operation/listNodes) | List nodes | Get details on all nodes in the cluster, including node IDs, software versions, and hardware. | Stable
Expand Down
6 changes: 3 additions & 3 deletions src/current/v24.3/cluster-setup-troubleshooting.md
Original file line number Diff line number Diff line change
Expand Up @@ -635,15 +635,15 @@ Even with `server.eventlog.enabled` set to `false`, notable log events are still
## Check for under-replicated or unavailable data
To see if any data is under-replicated or unavailable in your cluster, follow the steps described in [Replication Reports]({% link {{ page.version.version }}/query-replication-reports.md %}).
To see if any data is under-replicated or unavailable in your cluster, follow the steps described in [Critical nodes endpoint]({% link {{ page.version.version }}/monitoring-and-alerting.md %}#critical-nodes-endpoint).
## Check for replication zone constraint violations
To see if any of your cluster's [data placement constraints]({% link {{ page.version.version }}/configure-replication-zones.md %}#replication-constraints) are being violated, follow the steps described in [Replication Reports]({% link {{ page.version.version }}/query-replication-reports.md %}).
To see if any of your cluster's [data placement constraints]({% link {{ page.version.version }}/configure-replication-zones.md %}#replication-constraints) are being violated, follow the steps described in [Troubleshoot Replication Zone Configurations]({% link {{ page.version.version}}/troubleshoot-replication-zones.md %}).
## Check for critical localities
To see which of your [localities]({% link {{ page.version.version }}/cockroach-start.md %}#locality) (if any) are critical, follow the steps described in [Replication Reports]({% link {{ page.version.version }}/query-replication-reports.md %}). A locality is "critical" for a range if all of the nodes in that locality becoming [unreachable](#node-liveness-issues) would cause the range to become unavailable. In other words, the locality contains a majority of the range's replicas.
To see which of your [localities]({% link {{ page.version.version }}/cockroach-start.md %}#locality) (if any) are critical, follow the steps described in the [Critical nodes endpoint documentation]({% link {{ page.version.version }}/monitoring-and-alerting.md %}#critical-nodes-endpoint). A locality is "critical" for a range if all of the nodes in that locality becoming [unreachable](#node-liveness-issues) would cause the range to become unavailable. In other words, the locality contains a majority of the range's replicas.
## Something else?
Expand Down
3 changes: 2 additions & 1 deletion src/current/v24.3/common-errors.md
Original file line number Diff line number Diff line change
Expand Up @@ -106,7 +106,7 @@ When running a single-node CockroachDB cluster, an error about replicas failing
E160407 09:53:50.337328 storage/queue.go:511 [replicate] 7 replicas failing with "0 of 1 store with an attribute matching []; likely not enough nodes in cluster"
~~~
This happens because CockroachDB expects three nodes by default. If you do not intend to add additional nodes, you can stop this error by using [`ALTER RANGE ... CONFIGURE ZONE`]({% link {{ page.version.version }}/alter-range.md %}#configure-zone) to update your default zone configuration to expect only one node:
This happens because CockroachDB expects three nodes by default. If you do not intend to add additional nodes, you can stop this error by using [`ALTER RANGE ... CONFIGURE ZONE`]({% link {{ page.version.version }}/alter-range.md %}#configure-zone) to update your default [zone configuration]({% link {{ page.version.version }}/configure-replication-zones.md %}) to expect only one node:
{% include_cached copy-clipboard.html %}
~~~ shell
Expand Down Expand Up @@ -222,3 +222,4 @@ Try searching the rest of our docs for answers or using our other [support resou
- [StackOverflow](http://stackoverflow.com/questions/tagged/cockroachdb)
- [CockroachDB Support Portal](https://support.cockroachlabs.com)
- [Transaction retry error reference]({% link {{ page.version.version }}/transaction-retry-error-reference.md %})
- [Troubleshoot Replication Zone Configurations]({% link {{ page.version.version}}/troubleshoot-replication-zones.md %})
43 changes: 36 additions & 7 deletions src/current/v24.3/configure-replication-zones.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,13 +65,41 @@ System Range | CockroachDB comes with pre-configured replication zones for impor

### Level priorities

[XXX](): EDIT THIS SECTION MORE

When replicating data, whether table or system, CockroachDB always uses the most granular replication zone available. For example, for a piece of user data:

1. If there's a replication zone for the row, CockroachDB uses it.
1. If there's no applicable row replication zone and the row is from a secondary index, CockroachDB uses the secondary index replication zone.
1. If the row isn't from a secondary index or there is no applicable secondary index replication zone, CockroachDB uses the table replication zone.
1. If there's no applicable table replication zone, CockroachDB uses the database replication zone.
1. If there's no applicable database replication zone, CockroachDB uses the `default` cluster-wide replication zone.
1. If there's a replication zone [for the row](#create-a-replication-zone-for-a-partition) (a.k.a. [partition]({% link {{ page.version.version }}/partitioning.md %})), CockroachDB uses it.
1. If there's no applicable row replication zone and the row is from a secondary index, CockroachDB uses the [secondary index replication zone](#create-a-replication-zone-for-a-secondary-index).
1. If the row isn't from a secondary index or there is no applicable secondary index replication zone, CockroachDB uses the [table replication zone](#create-a-replication-zone-for-a-table).
1. If there's no applicable table replication zone, CockroachDB uses the [database replication zone](#create-a-replication-zone-for-a-database).
1. If there's no applicable database replication zone, CockroachDB uses [the `default` cluster-wide replication zone](#view-the-default-replication-zone).

The hierarchy of inheritance for zone configs can be visualized as follows:

```
- default
- database
- table
- index
- partition (row) A
- (sub)partition A.1 (DOES NOT INHERIT, SEE NOTE BELOW)
- (sub)partition A.1.1 (DOES NOT INHERIT)
- (sub)partition A.2 (DOES NOT INHERIT)
- ... etc.
```

Put differently, the system does a depth-first search (DFS) down the inheritance tree and always takes the most specific modified zone configuration at the current node of the tree, unless or until it finds a different modifed value for that zone configuration further down the tree. In cases where it doesn't find a modified value, it inherits from the current node's parent, which inherits from its parent, and on and on all the way back up the tree.

{{site.data.alerts.callout_info}}
The exception to the inheritance behavior described above is that sub-partitions do not inherit their values from their parent partitions (rows). Instead, they inherit their fields from the parent table. For more information, see [cockroachdb/cockroach#75862](https://github.com/cockroachdb/cockroach/issues/75862).
{{site.data.alerts.end}}

Each zone config inherits all of its initial values from its parent object. Any changes to a zone configuration's initial values are therefore supplied by the user.

A zone config only stores the values that differ from its parent. CockroachDB then looks up the values for any unset fields in the parent object’s zone configuration. This continues recursively up the inheritance tree all the way to the default zone config. In practice, most values are cached to avoid performance impacts.

All configurations will be modified versions of the [`default` range](#view-the-default-replication-zone).

## Manage replication zones

Expand Down Expand Up @@ -136,7 +164,7 @@ See [Cluster Topology]({% link {{ page.version.version }}/recommended-production

### Troubleshooting zone constraint violations

To see if any of the data placement constraints defined in your replication zone configurations are being violated, use the `system.replication_constraint_stats` report as described in [Replication Reports]({% link {{ page.version.version }}/query-replication-reports.md %}).
{% include {{ page.version.version }}/see-zone-config-troubleshooting-guide.md %}

## View replication zones

Expand Down Expand Up @@ -690,4 +718,5 @@ There's no need to make zone configuration changes; by default, the cluster is c
- [`SHOW PARTITIONS`]({% link {{ page.version.version }}/show-partitions.md %})
- [SQL Statements]({% link {{ page.version.version }}/sql-statements.md %})
- [Table Partitioning]({% link {{ page.version.version }}/partitioning.md %})
- [Replication Reports]({% link {{ page.version.version }}/query-replication-reports.md %})
- [Critical nodes endpoint]({% link {{ page.version.version }}/monitoring-and-alerting.md %}#critical-nodes-endpoint)
- [Troubleshoot Replication Zone Configurations]({% link {{ page.version.version }}/troubleshoot-replication-zones.md %})
4 changes: 2 additions & 2 deletions src/current/v24.3/demo-low-latency-multi-region-deployment.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ To determine which nodes are in which regions, you will need to refer to two (2)
Here is the output of `\demo ls` from the SQL shell.

{% include_cached copy-clipboard.html %}
~~~ sql
~~~
> \demo ls
~~~

Expand Down Expand Up @@ -145,7 +145,7 @@ Follow these steps to start 3 instances of MovR. Each instance is pointed at a n

{% include_cached copy-clipboard.html %}
~~~ sql
CREATE DATABASE movr;
CREATE DATABASE IF NOT EXISTS movr;
~~~

1. Open a second terminal and run the command below to populate the MovR data set. The options are mostly self-explanatory. We limit the application to 1 thread because using multiple threads quickly overloads this small demo cluster's ability to ingest data. As a result, loading the data takes about 90 seconds on a fast laptop.
Expand Down
1 change: 1 addition & 0 deletions src/current/v24.3/migrate-to-multiregion-sql.md
Original file line number Diff line number Diff line change
Expand Up @@ -307,3 +307,4 @@ SHOW ZONE CONFIGURATION FROM TABLE promo_codes;
- [Low Latency Reads and Writes in a Multi-Region Cluster]({% link {{ page.version.version }}/demo-low-latency-multi-region-deployment.md %})
- [Configure Replication Zones]({% link {{ page.version.version }}/configure-replication-zones.md %})
- [Non-voting replicas]({% link {{ page.version.version }}/architecture/replication-layer.md %}#non-voting-replicas)
- [Troubleshoot Replication Zone Configurations]({% link {{ page.version.version}}/troubleshoot-replication-zones.md %})
Loading

0 comments on commit 31bc927

Please sign in to comment.