Added the Ingest Workflow Developer's Guide

lsst · Oct 15, 2024 · a1e798d · a1e798d
1 parent d1e4716
commit a1e798d
Show file tree

Hide file tree

Showing 36 changed files with 5,175 additions and 118 deletions.
diff --git a/doc/_static/ingest-transaction-fsm.png b/doc/_static/ingest-transaction-fsm.png
diff --git a/doc/_static/ingest-transaction-fsm.pptx b/doc/_static/ingest-transaction-fsm.pptx
diff --git a/doc/_static/ingest-transactions-aborted.png b/doc/_static/ingest-transactions-aborted.png
diff --git a/doc/_static/ingest-transactions-aborted.pptx b/doc/_static/ingest-transactions-aborted.pptx
diff --git a/doc/_static/ingest-transactions-failed.png b/doc/_static/ingest-transactions-failed.png
diff --git a/doc/_static/ingest-transactions-failed.pptx b/doc/_static/ingest-transactions-failed.pptx
diff --git a/doc/_static/ingest-transactions-resolved.png b/doc/_static/ingest-transactions-resolved.png
diff --git a/doc/_static/ingest-transactions-resolved.pptx b/doc/_static/ingest-transactions-resolved.pptx
diff --git a/doc/admin/data-table-indexes.rst b/doc/admin/data-table-indexes.rst
diff --git a/doc/admin/index.rst b/doc/admin/index.rst
@@ -1,7 +1,5 @@
-.. warning::
 
-   **Information in this guide is known to be outdated.** A documentation sprint is underway which will
-   include updates and revisions to this guide.
+.. _admin:
 
 #####################
 Administrator's Guide
@@ -11,3 +9,5 @@ Administrator's Guide
    :maxdepth: 4
 
    k8s
+   row-counters
+   data-table-indexes
diff --git a/doc/admin/row-counters.rst b/doc/admin/row-counters.rst
@@ -0,0 +1,176 @@
+
+.. _admin-row-counters:
+
+=========================
+Row counters optimization
+=========================
+
+.. _admin-row-counters-intro:
+
+Introduction
+------------
+
+Shortly after the first public instances of Qserv were made available to the users, it was observed that many users
+were launching the following query:
+
+.. code-block:: sql
+
+    SELECT COUNT(*) FROM <database>.<table>
+
+Normally, Qserv would process the query by broadcasting the query to all workers to count rows at each chunk
+table and aggregate results into a single number at Czar. This is basically the very same mechanism that is found
+behind the *shared scan* (or just *scan*) queries. The performnce of the *scan* queries is known to vary depending on
+the following factors:
+
+- the number of chunks in the table of interest
+- the number of workers
+- and the presence of other competing queries (especially the slow ones)
+
+In the best case scenario such scan would take seconds, in the worst one - many minutes or even hours.
+This could quickly cause (and has caused) frustration among users since this query looks like (and in reality is)
+the very trivial non-scan query.
+
+To address this situation, Qserv has a built-in optimization that is targeting exactly this class of queries.
+Here is how it works. For each data table Qserv Czar would have an optional metadata table to store the number
+of rows for each chunk. The table is populated and managed by the Qserv Replication system.
+
+Note that this optimization is presently an option. And these are the reasons:
+
+-  Counter collection requires scanning all chunk tables, which would take time. Doing this during
+   the catalog *publishing* time would prolong the ingest time and increase the chances of instabilities
+   for the workflows (in general, the longer some operation is going - the higher the probability of runing into
+   the infrastructure-related faulures).
+- The counters are not needed for the purposes of the data ingest *per se*. These are just optimizations for the queries.
+- Building the counters before the ingested data have been Q&A-ed may not be a good idea.
+- The counters may need to be rebuilt if the data have been changed (after fix ups to the ingested catalogs)
+
+The rest of this section along with the formal description of the corresponding REST services explains how to build
+and manage the counters.
+
+.. note::
+
+    In the future, the per-chunk counters will be used for optimizing another class of the unconditional queries
+    presented below:
+
+    .. code-block:: sql
+
+        SELECT * FROM <database>.<table> LIMIT <N>
+        SELECT `col`,`col2` FROM <database>.<table> LIMIT <N>
+
+    For these "indiscriminate" data probes Qserv would dispatch chunk queries to a subset of random chunks that have enough
+    rows to satisfy the requirements specified in ``LIMIT <N>``.
+
+
+.. _admin-row-counters-build:
+
+Building and deploying
+----------------------
+
+.. warning::
+
+    Depending on a scale of a catalog (data size of the affected table), it may take a while before this operation
+    will be complete.
+
+.. note::
+
+    Please, be advised that the very same operation could be performed at the catalog publishing time as explained in:
+
+    - :ref:`ingest-db-table-management-publish-db` (REST)
+
+    The choice of doing this at the catalog publishing time, or doing this as a separate operation explained in this document
+    is left to the Qserv administrators or developers of the ingest workflows. The general recommendation is to make it
+    a separate stage of the ingest workflow. In this case, the overall transition time of a catalog to the final published
+    state would be faster. In the end, the row counters optimization is optional, and it doesn't affect the overall
+    functionality of Qserv or query results seen by users.
+
+To build and deploy the counters one would need to use the following REST service:
+
+- :ref:`ingest-row-counters-deploy` (REST)
+
+The service needs to be invoked for every table of the ingested catalog. This is the typical example of using this service
+that would work regardless if the very same operation was already done before:
+
+.. code-block:: bash
+
+    curl http://localhost:25080/ingest/table-stats \
+      -X POST -H "Content-Type: application/json" \
+      -d '{"database":"test101",
+           "table":"Object",
+           "overlap_selector":"CHUNK_AND_OVERLAP",
+           "force_rescan":1,
+           "row_counters_state_update_policy":"ENABLED",
+           "row_counters_deploy_at_qserv":1,
+           "auth_key":""}'
+
+This would work for tables of any type: *director*, *dependent*, *RefMatch*, or *regular* (fully replicated). If the counters
+already existed in the Replication system's database, they would still be rescanned and redeployed in there.
+
+It may be a good idea to compare the performance of Qserv for executing the above-mentioned queries before and after running
+this operation. Normally, if the table statistics are available at Qserv, it should take a small fraction of
+a second (about 10 milliseconds) to see the result on the lightly loaded Qserv.
+
+.. _admin-row-counters-delete:
+
+Deleting
+--------
+
+Sometimes, if there is doubt that the row counters were incorrectly scanned, or when Q&A-in the ingested catalog,
+a data administrator may want to remove the counters and let Qserv do the full scan of the table instead. This can be done
+by using the following REST service:
+
+
+- :ref:`ingest-row-counters-delete` (REST)
+
+Likewise, the previously explained service, this one should also be invoked for each table needing attention. Here is
+an example:
+
+.. code-block:: bash
+
+    curl http://localhost:25080/ingest/table-stats/test101/Object \
+      -X DELETE -H "Content-Type: application/json" \
+      -d '{"overlap_selector":"CHUNK_AND_OVERLAP","qserv_only":1,"auth_key":""}'
+
+Note that with a combination of the parameters shown above, the statistics will be removed from Qserv only.
+So, the system would not need to rescan the tables again should the statistics need to be rebuilt. The counters could be simply
+redeployed later at Qserv. To remove the counters from the Replication system's persistent state as well
+the request should have ``qserv_only=0``.
+
+An alternative technique explained in the next section is to tell Qserv not to use the counters for optimizing queries.
+
+
+.. _admin-row-counters-disable:
+
+Disabling the optimization at run-time
+---------------------------------------
+
+.. warning::
+
+    This is a global setting that affects all users of Qserv. All new quries will be ru w/o the optimization.
+    It should be used with caution. Normally, it is meant to be used by the Qserv data administrator to investigate
+    suspected issues with Qserv or the catalogs it serves.
+
+To complement the previously explained methods for scanning, deploying, or deleting row counters for query optimization,
+Qserv also supports the run-time switch. The switch is turned on or off by the following statement to be submitted via
+the front-ends of Qserv:
+
+.. code-block:: sql
+
+    SET GLOBAL QSERV_ROW_COUNTER_OPTIMIZATION = 1
+    SET GLOBAL QSERV_ROW_COUNTER_OPTIMIZATION = 0
+
+
+The default behavior of Qserv when the variable is not set is to enable the optimization for tables where the counters
+are available.
+
+.. _admin-row-counters-retrieve:
+
+Inspecting
+----------
+
+It's also possible to retrieve the counters from the Replication system's state using the following REST service:
+
+- :ref:`ingest-row-counters-inspect` (REST)
+
+The information obtained in this way could be used for various purposes, such as investigating suspected issues with
+the counters, monitoring data placement in the chunks, or making visual representations of the chunk density maps.
+See the description of the REST service for further details on this subject.
diff --git a/doc/conf.py b/doc/conf.py
@@ -42,6 +42,7 @@
     r"^https://rubinobs.atlassian.net/wiki/",
     r"^https://rubinobs.atlassian.net/browse/",
     r"^https://www.slac.stanford.edu/",
+    r".*/_images/",
 ]
 
 html_additional_pages = {

diff --git a/doc/ingest/api/advanced/index.rst b/doc/ingest/api/advanced/index.rst
@@ -0,0 +1,44 @@
+
+.. _ingest-api-advanced:
+
+==================
+Advanced Scenarios
+==================
+
+.. _ingest-api-advanced-multiple-transactions:
+
+Multiple transactions
+----------------------
+
+.. _ingest-api-advanced-unpublishing-databases:
+
+Ingesting tables into the published catalogs
+--------------------------------------------
+
+.. _ingest-api-advanced-transaction-abort:
+
+Aborting transactions
+----------------------
+
+When should a transaction be aborted?
+What happens when a transaction is aborted?
+How long does it take to abort a transaction?
+How to abort a transaction?
+What to do if a transaction cannot be aborted?
+What happens if a transaction is aborted while contributions are being processed?
+Will the transaction be aborted if the Master Replication Controller is restarted?
+Is it possible to have unfinished transactions in the system when publishing a catalog?
+
+.. _ingest-api-advanced-contribution-requests:
+
+Options for making contribution requests
+----------------------------------------
+
+.. _ingest-api-advanced-server-config:
+
+Ingest server configuration options
+------------------------------------
+
+Explain what configuration options are available for the Ingest server:
+
+- safe