Skip to content

Commit

Permalink
Fixed code and results typos in seventeen files and corresp in code s…
Browse files Browse the repository at this point in the history
…amples (yugabyte#7165)
  • Loading branch information
bllewell authored Feb 8, 2021
1 parent eb01013 commit 8fe53c1
Show file tree
Hide file tree
Showing 17 changed files with 138 additions and 130 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ This is the result:
20 | 40
```

When data-changing CTEs (`INSERT`, `UPDATE` , and `DELETE` are named in a CTE, and, when these use a `RETURNING` clause, the returned values can be used in other CTEs and in the overall statement's final [sub]statement. Here os an example.
When a data-changing [sub]statement (`INSERT`, `UPDATE` , or `DELETE`) is named in a CTE, and, when it uses a `RETURNING` clause, the returned values can be used in other CTEs and in the overall statement's final [sub]statement. Here is an example.

```plpgsql
set client_min_messages = warning;
Expand Down Expand Up @@ -103,9 +103,9 @@ This is the result:
t2 | 20 | 40
```

The central notion is that each CTE that you name in a `WITH` clause can then be invoked by its name, either in a subsequent CTE in that `WITH` clause or in the overall statement's final, main, [sub]statement. In this way, a CTE is analogous, in the declarative programming domain of SQL, to a procedure or a function in the imperative programming domain of a procedural language, bringing the modular programming benefit of hiding names, and the implementations that they stand for, from scopes that have no interest in them.
The central notion is that each CTE that you name in a `WITH` clause can then be invoked by its name, either in a subsequent CTE in that `WITH` clause or in the overall statement's final, main, [sub]statement. In this way, a CTE is analogous, in the declarative programming domain of SQL, to a procedure or a function in the domain of an "if-then-else" programming language, bringing the modular programming benefit of hiding names, and the implementations that they stand for, from scopes that have no interest in them.

Finally, the use of a _recursive_ CTE in a `WITH` clause enables graph analysis. For example, an _"employees"_ table often has a self-referential foreign key like _"manager_id"_ that points to the table's primary key, _"employee_id"_. `SELECT` statements that use a recursive CTE allow the reporting structure to be presented in various ways. This result shows the reporting paths of employees, in an organization with a strict hierarchical reporting scheme, in depth-first order. See the section [Pretty-printing the top-down depth-first report of paths](./emps-hierarchy/#pretty-printing-the-top-down-depth-first-report-of-paths).
Finally, the use of a _recursive_ CTE in a `WITH` clause enables advanced functionality, like graph analysis. For example, an _"employees"_ table often has a self-referential foreign key like _"manager_id"_ that points to the table's primary key, _"employee_id"_. `SELECT` statements that use a recursive CTE allow the reporting structure to be presented in various ways. This result shows the reporting paths of employees, in an organization with a strict hierarchical reporting scheme, in depth-first order. See the section [Pretty-printing the top-down depth-first report of paths](./emps-hierarchy/#pretty-printing-the-top-down-depth-first-report-of-paths).

```
emps hierarchy
Expand Down Expand Up @@ -145,7 +145,7 @@ Anyway, the ordinary good practice principle holds even more here: always check

{{< tip title="Download a zip of WITH clause demonstration scripts" >}}

The [`recursive-cte-code-examples.zip`](https://raw.githubusercontent.com/yugabyte/yugabyte-db/master/sample/recursive-cte-code-examples/recursive-cte-code-examples.zip) file contains the `.sql` scripts that illustrate the use off the [recursive CTE](./recursive-cte/):
The [`recursive-cte-code-examples.zip`](https://raw.githubusercontent.com/yugabyte/yugabyte-db/master/sample/recursive-cte-code-examples/recursive-cte-code-examples.zip) file contains the `.sql` scripts that illustrate the use of the [recursive CTE](./recursive-cte/):

- [Case study—Using a recursive CTE to traverse an employee hierarchy](./emps-hierarchy/)

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ The Bacon Numbers problem is conventionally formulated in the context of the dat
- _each actor must act in at least one movie_
- _each movie's cast must list at least one actor_

The actors are the nodes in an undirected cyclic graph. There is an edge between two actors when the both have acted together in at least one movie.
The actors are the nodes in an undirected cyclic graph. There is an edge between two actors when they both have acted together in at least one movie.

The ERD implies the conventional three-table representation with an _"actors_ table, a _"movies_table"_, and a _"cast_members"_ intersection table. Create them with this script:

Expand Down Expand Up @@ -62,10 +62,12 @@ create table cast_members(

Of course, the IMDb has facts like _date of birth_, _nationality_, and so on for the actors and like _release date_, _language_ and so on for the movies. The information would doubtless allow the _"cast_members"_ table to have columns like _"character_name"_. The data that this case study uses happen to include the movie release date, in parentheses, after the movie name in a single text field. The pedagogy is sufficiently served without parsing out these two facts into separate columns in the _"movies"_ table.

Notice that the notion of a graph is so far only implied. A derived _"edges"_ table makes the graph explicit. An edge exists between a pair of actors if they are both on the cast list of one or more movies. The SQL needed to populate the _"edges"_ table from the _"cast_members"_ table is straightforward.
Notice that the notion of a graph is so far only implied. A derived _"edges"_ table makes the graph explicit. An edge exists between a pair of actors if they are both on the cast list of the same one or more movies. The SQL needed to populate the _"edges"_ table from the _"cast_members"_ table is straightforward.

When the paths have been found, it's useful to be able to annotate each edge with the list of movies that are responsible for its existence. The annotation code could, of course, derive this information dynamically. But it simplifies the overall coding scheme if a denormalization is adopted to annotate the paths at the time that they are discovered. Another departure from strict purity simplifies the overall coding scheme further. If the row for the edge between a particular pair of actors records the _list_ of movies that brought it (rather than recording many edges, each with a single-valued _"movie"_ attribute), then the path-tracing code that the section [Using a recursive CTE to traverse graphs of all kinds](../traversing-general-graphs/) presented can be used "as is". To this end, the columns that represent the actor pair in the _"edges"_ table are called _"node_1"_ and "_node_2"_ rather than the more natural _"actor_1"_ and _"actor_2"_.

**Note:** The previous paragraph was stated as something of a sketch. In fact, each edge between a pair of actors is recorded twice—once in each direction, as is described in the section [Graph traversal using the denormalized "edges" table design](../traversing-general-graphs/undirected-cyclic-graph/#graph-traversal-using-the-denormalized-edges-table-design). Each of the edges in such a pair is annotated with the same list of movies.

This code creates the _"edges"_ table and the procedure that populates it.

##### `cr-actors-movies-edges-table-and-proc.sql`
Expand Down Expand Up @@ -223,7 +225,7 @@ $body$;

The section [Computing Bacon Numbers for a small set of synthetic actors and movies data](./synthetic-data/) demonstrates the approach using a small data set.

The section [Computing Bacon Numbers for real IMDb data](./imdb-data/) shows how to ingest the raw _"imdb.small.txt"_ file into the same representation that was used for the synthetic data. (The subsection [Download and ingest some IMDb data](./imdb-data/#download-and-ingest-some-imdb-data) explains how to download the IMDb subset that this case study uses.)
The section [Computing Bacon Numbers for real IMDb data](./imdb-data/) shows how to ingest the raw `imdb.small.txt` file into the same representation that was used for the synthetic data. (The subsection [Download and ingest some IMDb data](./imdb-data/#download-and-ingest-some-imdb-data) explains how to download the IMDb subset that this case study uses.)

While a straightforward use of a recursive CTE can be used to produce the solution for the small synthetic data set quickly, it fails to complete before crashing (see the section [Stress testing different find_paths() implementations on maximally connected graphs](../traversing-general-graphs/stress-test/)) when it's applied to the ingested *"imdb.small.txt"* data. The approach described in the [How to implement early path pruning](../traversing-general-graphs/undirected-cyclic-graph/#how-to-implement-early-path-pruning) section comes to the rescue.
While a straightforward use of a recursive CTE can be used to produce the solution for the small synthetic data set quickly, it fails to complete before crashing (see the section [Stress testing different find_paths() implementations on maximally connected graphs](../traversing-general-graphs/stress-test/)) when it's applied to the ingested `imdb.small.txt` data. The approach described in the [How to implement early path pruning](../traversing-general-graphs/undirected-cyclic-graph/#how-to-implement-early-path-pruning) section comes to the rescue.

Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ All of the `.sql` scripts that this section presents for copy-and-paste at the `

[Download `recursive-cte-code-examples.zip`](https://raw.githubusercontent.com/yugabyte/yugabyte-db/master/sample/recursive-cte-code-examples/recursive-cte-code-examples.zip).

After unzipping it on a convenient new directory, you'll see a `README.txt`. It tells you how to start the master-script that invokes all the child scripts that jointly instantiate the small and then the large IMDb data sets and compute the Bacon Numbers on these. Simply start it in `ysqlsh`. You can run it time and again. It always finishes silently. You can see the report that it produces on a dedicated spool directory and confirm that your reports are identical to the reference copies that are delivered in the zip-file.
After unzipping it on a convenient new directory, you'll see a `README.txt`. It tells you how to start the Bacon Numbers master-script. This invokes all the child scripts that: (1) ingest the `imdb.small.txt` data set from the Oberlin College Computer Science department's [CSCI 151 Lab 10—Everything is better with Bacon](https://www.cs.oberlin.edu/~rhoyle/16f-cs151/lab10/index.html); and (2) solve the Bacon Numbers problem using this data set. (The `imdb.small.txt` file is included in the `.zip`.) Simply start it in `ysqlsh`. You can run it time and again. It always finishes silently. You can see the report that it produces on a dedicated spool directory and confirm that your reports are identical to the reference copies that are delivered in the zip-file.
{{< /tip >}}

## Download and ingest some IMDb data
Expand All @@ -31,7 +31,7 @@ Before trying the code in this section, make sure that you have created the supp

- All the code shown in the section [Common code for traversing all kinds of graph](../../traversing-general-graphs/common-code/). Be sure to chose the [cr-raw-paths-no-tracing.sql](../../traversing-general-graphs/common-code#cr-raw-paths-no-tracing-sql) option.

Use Internet search for "download IMDb data for Bacon Numbers". This brings you to useful subsets of the vast, entire, IMDb data set that have been prepared by, for example, university Computer Science departments specifically to support Bacon Numbers course assignments. The [Oberlin College Computer Science](https://www.cs.oberlin.edu/) department is a good source. Go to [CSCI 151 Lab 10—Everything is better with Bacon](https://www.cs.oberlin.edu/~rhoyle/16f-cs151/lab10/index.html) and look for this:
Use Internet search for "download IMDb data for Bacon Numbers". This brings you to useful subsets of the vast, entire, IMDb data set that have been prepared by, for example, university Computer Science departments specifically to support Bacon Numbers course assignments. The [Oberlin College Computer Science department](https://www.cs.oberlin.edu/) is a good source. Go to [CSCI 151 Lab 10—Everything is better with Bacon](https://www.cs.oberlin.edu/~rhoyle/16f-cs151/lab10/index.html) and look for this:

> "[imdb.small.txt](http://cs.oberlin.edu/~gr151/imdb/imdb.small.txt): a 1817 line file with just a handful of performers (161), fully connected"
Expand All @@ -49,7 +49,7 @@ Alec Guinness|Lovesick (1983)
...
```

After downloading it, rename it to `imdb-small.text` so that its name follows the same convention (a hyphen between the English words) that the all the other files that [`recursive-cte-code-examples.zip`](https://raw.githubusercontent.com/yugabyte/yugabyte-db/master/sample/recursive-cte-code-examples/recursive-cte-code-examples.zip) contains. The information content is identical to what the _"cast_members"_ table was designed to represent. (See the [`cr-actors-movies-cast-members-tables.sql`](../../bacon-numbers#cr-actors-movies-cast-members-tables-sql) script.) This means that the `ysqlsh` metacommand `\COPY` can be used, straightforwardly, to ingest the downloaded data.
The information content is identical to what the _"cast_members"_ table was designed to represent. (See the [`cr-actors-movies-cast-members-tables.sql`](../../bacon-numbers#cr-actors-movies-cast-members-tables-sql) script.) This means that the `ysqlsh` metacommand `\COPY` can be used, straightforwardly, to ingest the downloaded data.

However, as this script creates it, the _"actors"_ and _"movies"_ tables must be populated first so that the foreign key constraints to these from the _"cast_members"_ table will be satisfied when it its populated. Use this script to do the steps in the proper order and display the resulting table counts:

Expand All @@ -67,7 +67,7 @@ create table cast_members(
movie text not null,
constraint imdb_facts_pk primary key(actor, movie));

\copy cast_members(actor, movie) from 'imdb-small.txt' with delimiter '|';
\copy cast_members(actor, movie) from 'imdb.small.txt' with delimiter '|';

insert into actors select distinct actor from cast_members;
insert into movies select distinct movie from cast_members;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ Before trying the code in this section, make sure that you have created the supp

- The _"edges"_ table and the procedure to populate it from the _"cast_members"_ table—see [`cr-actors-movies-edges-table-and-proc-sql`](../../bacon-numbers#cr-actors-movies-edges-table-and-proc-sql)

- All the code shown in the section [Common code for traversing all kinds of graph](../../traversing-general-graphs/common-code/). Be sure to chose the [cr-raw-paths-with-tracing.sql](../../traversing-general-graphs/common-code#cr-raw-paths-with-tracing-sql) option.
- All the code shown in the section [Common code for traversing all kinds of graph](../../traversing-general-graphs/common-code/). Be sure to choose the [cr-raw-paths-with-tracing.sql](../../traversing-general-graphs/common-code#cr-raw-paths-with-tracing-sql) option.

{{< tip title="Download a zip of scripts that include all the code examples that implement this case study" >}}

Expand All @@ -35,9 +35,7 @@ Here is a depiction of the synthetic data that this section uses:

![undirected-cyclic-graph](/images/api/ysql/the-sql-language/with-clause/bacon-numbers/actors-movies-1.jpg)

It has six nodes and nine edges.

Before trying any of the code in this section, make sure that you have installed all the code shown in the section [Common code for traversing all kinds of graph](../../traversing-general-graphs/common-code/). Then use this script to insert the synthetic data:
It has six nodes and nine edges. Use this script to insert the synthetic data:

##### `insert-synthetic-data.sql`

Expand Down Expand Up @@ -169,7 +167,7 @@ This is the result:
Twelfth Night
```

List the edges that have _"node_1 < node_1"_:
List the edges that have _"node_1 < node_2"_:

```plpgsql
select
Expand Down Expand Up @@ -304,7 +302,7 @@ The five paths are highlighted in red here:

![undirected-cyclic-graph](/images/api/ysql/the-sql-language/with-clause/bacon-numbers/actors-movies-2.jpg)

Now invoke _"find_paths()"_ _with_ early pruning and confirm that the result is identical to the produced by invoking in _without_ early pruning and then deriving the shortest paths:
Now invoke _"find_paths()"_ _with_ early pruning and confirm that the result is identical to that produced by invoking in _without_ early pruning and then deriving the shortest paths:

```plpgsql
call find_paths('Emily', true);
Expand All @@ -327,7 +325,7 @@ This is the result:
2 3 Emily > Chloe > Helen
3 3 Emily > James > Steve
```
The first and second result both contain _"Emily > Chloe"_ and the third contains _"Emily > James"_. So all five paths are accounted for.
The first result and the second result each contains _"Emily > Chloe"_ and the third contains _"Emily > James"_. So all five paths are accounted for.

## Decorate the path edges with the list of movies that brought each edge

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ After unzipping it on a convenient new directory, you'll see a `README.txt`. It

## Create and populate the "emps" table

The following code creates a table of employees that records each employee's manager this way. This is a stylized example where _"name"_ serves as the primary key column, _"mgr_name"_ serves as the foreign key column, and there are no other columns.
The following code creates a table of employees that records each employee's manager using a self-referential foreign key as explained above. This is a stylized example where _"name"_ serves as the primary key column, _"mgr_name"_ serves as the self-referential foreign key column, and there are no other columns.

##### `cr-table.sql`

Expand Down Expand Up @@ -126,7 +126,7 @@ It cases this error:
Key (name)=(fred) is still referenced from table "emps".
```

This constraint, together with the fact that the "_mgr_name"_ column is nullable, guarantees the invariant that there is exactly one ultimate manager—in other words that the employee graph is a tree (a.k.a. hierarchy). Check it thus:
This constraint, together with the fact that the "_mgr_name"_ column is nullable, guarantees the invariant that there is exactly one ultimate manager—in other words that the employee graph is a rooted tree (a.k.a. hierarchy). Check it thus:

##### `do-assert-is-hierarchy.sql`

Expand Down Expand Up @@ -236,7 +236,7 @@ The blank lines were added by hand to make the results easier to read.

## List the path top-down from the ultimate manager to each employee in breadth first order

The term of art "path" denotes the list of managers from the ultimate manager through each next direct report down to the current employee. It is easily calculated by using array concatenation as described in the [The&nbsp;||&nbsp;operator](../../../datatypes/type_array/functions-operators/concatenation/#the-160-160-160-160-operator) subsection of the [Array data types and functionality](../../../datatypes/type_array/) major section. Yet again, "array" functionality comes to the rescue.
The term of art "path" denotes the list of managers from the ultimate manager through each next direct report down to the current employee. It is easily calculated by using array concatenation as described in [the&nbsp;||&nbsp;operator](../../../datatypes/type_array/functions-operators/concatenation/#the-160-160-160-160-operator) subsection of the [Array data types and functionality](../../../datatypes/type_array/) major section. Yet again, "array" functionality comes to the rescue.

##### `cr-view-top-down-paths.sql`

Expand Down Expand Up @@ -621,8 +621,8 @@ with
union all

-- Recursive term.
-- Treat the emps from the previous iteration as reports.
-- Join these with their managers.
-- Treat the employee from the previous iteration as a report.
-- Join this with its manager.
-- Increase the depth with each step upwards.
-- Stop when the current putative report has no manager, i.e. is
-- the ultimate manager.
Expand Down
Loading

0 comments on commit 8fe53c1

Please sign in to comment.