-
Notifications
You must be signed in to change notification settings - Fork 503
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
IMPALA-13257: [DOCS] Documentation for unnest() and querying arrays
Currently, the two topics, Querying Arrays and Zipping Unnest on Arrays from Views, were missing. The documentation has been added, and the parent topic has been updated with references to the child topics. Change-Id: I3ad29153bf6ed3939fb1d87d6220bd22f8f7fa1b Reviewed-on: http://gerrit.cloudera.org:8080/21651 Tested-by: Impala Public Jenkins <[email protected]> Reviewed-by: Quanlong Huang <[email protected]>
- Loading branch information
1 parent
6736063
commit 6edbf57
Showing
4 changed files
with
219 additions
and
3 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,132 @@ | ||
<?xml version="1.0" encoding="UTF-8"?> | ||
<!-- | ||
Licensed to the Apache Software Foundation (ASF) under one | ||
or more contributor license agreements. See the NOTICE file | ||
distributed with this work for additional information | ||
regarding copyright ownership. The ASF licenses this file | ||
to you under the Apache License, Version 2.0 (the | ||
"License"); you may not use this file except in compliance | ||
with the License. You may obtain a copy of the License at | ||
http://www.apache.org/licenses/LICENSE-2.0 | ||
Unless required by applicable law or agreed to in writing, | ||
software distributed under the License is distributed on an | ||
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
KIND, either express or implied. See the License for the | ||
specific language governing permissions and limitations | ||
under the License. | ||
--> | ||
<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd"> | ||
<concept id="impala_queryingarrays" rev="4.1.0"> | ||
<title>Querying arrays (<keyword keyref="impala41"/> or higher only)</title> | ||
<titlealts audience="PDF"> | ||
<navtitle>Querying arrays</navtitle> | ||
</titlealts> | ||
<prolog> | ||
<metadata> | ||
<data name="Category" value="Impala"/> | ||
<data name="Category" value="Impala Data Types"/> | ||
<data name="Category" value="SQL"/> | ||
<data name="Category" value="Data Analysts"/> | ||
<data name="Category" value="Developers"/> | ||
<data name="Category" value="Schemas"/> | ||
</metadata> | ||
</prolog> | ||
<conbody> | ||
<p rev="4.1.0"> | ||
<indexterm audience="hidden">Querying arrays</indexterm> Describes how to use UNNEST function | ||
to query arrays. ARRAY data types represent collections with arbitrary numbers of elements, | ||
where each element is the same type.</p> | ||
<section id="section_yl4_2qb_3cc"> | ||
<title>Querying arrays using JOIN and UNNEST</title> | ||
<p>You can query arrays by making a join between the table and the array inside the table. | ||
This approach is improved with the introduction of the <codeph>UNNEST</codeph> function in | ||
the <codeph>SELECT</codeph> list or in the <codeph>FROM</codeph> clause in the | ||
<codeph>SELECT</codeph> statement. When you use <codeph>UNNEST</codeph>, you can provide | ||
more than one array in the <codeph>SELECT</codeph> statement. If you use JOINs for querying | ||
arrays it will yield a <term>joining unnest</term> however the latter will provide a | ||
<term>zipping unnest</term>.</p> | ||
</section> | ||
<section id="section_hmf_hqb_3cc"> | ||
<title>Example of querying arrays using JOIN</title> | ||
<p>Use <codeph>JOIN</codeph> in cases where you must join unnest of multiple arrays. However | ||
if you must zip unnest then use the newly implemented <codeph>UNNEST</codeph> function.</p> | ||
<p>Here is an example of a <codeph>SELECT</codeph> statement that uses JOINs to query an | ||
array.</p> | ||
<codeblock id="codeblock_uqy_rqb_3cc">SELECT id, arr1.item, arr2.item FROM tbl_name tbl, tbl.arr1, tbl.arr2; | ||
|
||
ID, ARR1.ITEM, ARR2.ITEM | ||
[1, 1, 10] | ||
[1, 1, 11] | ||
[1, 2, 10] | ||
[1, 2, 11] | ||
[1, 3, 10] | ||
[1, 3, 11] | ||
</codeblock> | ||
<note id="note_tpb_wln_htb"> | ||
<p>The test data used in this example is ID: 1, arr1: {1, 2, 3}, arr2: {10, 11}</p> | ||
</note> | ||
</section> | ||
<section id="section_ipq_tqb_3cc"> | ||
<title>Examples of querying arrays using UNNEST</title> | ||
<p>You can use one of the two different syntaxes shown here to unnest multiple arrays in one | ||
query. This results in the items of the arrays being zipped together instead of joining.</p> | ||
<ul id="ul_jpq_tqb_3cc"> | ||
<li>ISO:SQL 2016 compliant syntax: | ||
<codeblock id="codeblock_kpq_tqb_3cc">SELECT a1.item, a2.item | ||
FROM complextypes_arrays t, UNNEST(t.arr1, t.arr2) AS (a1, a2); | ||
</codeblock></li> | ||
<li>Postgres compatible | ||
syntax:<codeblock id="codeblock_yl5_3mn_htb">SELECT UNNEST(arr1), UNNEST(arr2) FROM complextypes_arrays;</codeblock></li> | ||
</ul> | ||
<p><b>Unnest operator in SELECT list</b></p> | ||
<codeblock id="codeblock_lpq_tqb_3cc">SELECT id, unnest(arr1), unnest(arr2) FROM tbl_name;</codeblock> | ||
<p><b>Unnest operator in FROM clause</b></p> | ||
<codeblock id="codeblock_mpq_tqb_3cc">SELECT id, arr1.item, arr2.item FROM tbl_name tbl_alias, UNNEST(tbl_alias.arr1, tbl_alias.arr2);</codeblock> | ||
<p>This new functionality would zip the arrays next to each other as shown here. </p> | ||
<codeblock id="codeblock_npq_tqb_3cc">ID, ARR1.ITEM, ARR2.ITEM | ||
[1, 1, 10] | ||
[1, 2, 11] | ||
[1, 3, NULL] | ||
</codeblock> | ||
<p>Note, that arr2 is shorter than arr1 so the "missing" items in its column will be filled | ||
with NULLs.</p> | ||
<note id="note_opq_tqb_3cc">The test data used in this example is ID: 1, arr1: {1, 2, 3}, | ||
arr2: {10, 11}</note> | ||
</section> | ||
<section id="section_i1g_wqb_3cc"> | ||
<title>Limitations in Using UNNEST</title> | ||
<p> | ||
<ul id="ul_j1g_wqb_3cc"> | ||
<li>Only arrays from the same table can be zipping unnested</li> | ||
<li>The old (joining) and the new (zipping) unnests cannot be used together</li> | ||
<li>You can add a <codeph>WHERE</codeph> filter on an unnested item only if you add a | ||
wrapper <codeph>SELECT</codeph> and do the filtering | ||
<p>Example:</p><codeblock id="codeblock_k1g_wqb_3cc">SELECT id, arr1_unnest FROM (SELECT id, unnest(arr1) as arr1_unnest FROM tbl_name) WHERE arr1_unnest < 10;</codeblock></li> | ||
</ul> | ||
</p> | ||
</section> | ||
<section id="section_ewb_yqb_3cc"> | ||
<title>Using ARRAY columns in the SELECT list</title> | ||
<!--Removing this since zipping unnest for arrays (IMPALA-10920) and allow array type in SELECT list (IMPALA-9498) are all added in the same upstream release (Impala 4.1) | ||
<p>Prior to this release to look into the content of an array you had to unnest the array | ||
either by the joining syntax or by using the zipping <codeph>UNNEST</codeph> operator as | ||
shown in the following example:</p> | ||
<codeblock id="codeblock_fwb_yqb_3cc">SELECT unnest(IDs), unnest(NAMES) FROM table_name;</codeblock> | ||
--> | ||
<p rev="4.1">Impala 4.1 adds support to return <codeph>ARRAYs</codeph> as | ||
<codeph>STRINGs</codeph> (<term>JSON arrays</term>) in the <codeph>SELECT</codeph> list, | ||
for example: </p> | ||
<codeblock id="codeblock_gwb_yqb_3cc">select id, int_array from functional_parquet.complextypestbl where id = 1; | ||
returns: 1, “[1,2,3]” | ||
</codeblock> | ||
<p>Returning <codeph>ARRAYs</codeph> from inline or Hive Metastore views is also supported. | ||
These arrays can be used both in the select list or as relative table references.</p> | ||
<codeblock id="codeblock_hwb_yqb_3cc">select id, int_array from (select id, int_array from complextypestbl) s;</codeblock> | ||
<p>Though <codeph>STRUCTs</codeph> are already supported, <codeph>ARRAYs</codeph> and | ||
<codeph>STRUCTs</codeph> nested within each other are not supported yet. Using them as | ||
non-relative table references is also not supported yet.</p> | ||
</section> | ||
</conbody> | ||
</concept> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,80 @@ | ||
<?xml version="1.0" encoding="UTF-8"?> | ||
<!-- | ||
Licensed to the Apache Software Foundation (ASF) under one | ||
or more contributor license agreements. See the NOTICE file | ||
distributed with this work for additional information | ||
regarding copyright ownership. The ASF licenses this file | ||
to you under the Apache License, Version 2.0 (the | ||
"License"); you may not use this file except in compliance | ||
with the License. You may obtain a copy of the License at | ||
http://www.apache.org/licenses/LICENSE-2.0 | ||
Unless required by applicable law or agreed to in writing, | ||
software distributed under the License is distributed on an | ||
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
KIND, either express or implied. See the License for the | ||
specific language governing permissions and limitations | ||
under the License. | ||
--> | ||
<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd"> | ||
<concept id="impala_unnest_views" rev="4.1.0"> | ||
<title>Zipping unnest on arrays from views (<keyword keyref="impala41"/> or higher only)</title> | ||
<titlealts audience="PDF"> | ||
<navtitle>Zipping unnest on arrays from views</navtitle> | ||
</titlealts> | ||
<prolog> | ||
<metadata> | ||
<data name="Category" value="Impala"/> | ||
<data name="Category" value="Impala Data Types"/> | ||
<data name="Category" value="SQL"/> | ||
<data name="Category" value="Data Analysts"/> | ||
<data name="Category" value="Developers"/> | ||
<data name="Category" value="Schemas"/> | ||
</metadata> | ||
</prolog> | ||
<conbody> | ||
<p rev="4.1.0"> | ||
<indexterm audience="hidden">Zipping unnest on arrays from views</indexterm> | ||
<keyword keyref="impala41"/>, the zipping unnest functionality works for arrays in both tables | ||
and | ||
views.<codeblock id="codeblock_nwz_lyc_3cc">SELECT UNNSET(arr1) FROM view_name;</codeblock> | ||
</p> | ||
<section id="section_irf_rrb_3cc"> | ||
<title>UNNEST() on array columns</title> | ||
<p>You can use <codeph>UNNEST()</codeph> on array columns in two ways. Using one of these two | ||
ways results in the items of the arrays being zipped together instead of joining.</p> | ||
<ul id="ul_jrf_rrb_3cc"> | ||
<li>ISO:SQL 2016 compliant syntax</li> | ||
</ul> | ||
<codeblock id="codeblock_krf_rrb_3cc">SELECT a1.item, a2.item | ||
FROM complextypes_arrays t, UNNEST(t.arr1, t.arr2) AS (a1, a2); | ||
</codeblock> | ||
<ul id="ul_lrf_rrb_3cc"> | ||
<li>Postgres compatible syntax</li> | ||
</ul> | ||
<codeblock id="codeblock_mrf_rrb_3cc">SELECT UNNEST(arr1), UNNEST(arr2) FROM complextypes_arrays;</codeblock> | ||
<p>When unnesting multiple arrays with zipping unnest, the i'th item of one array will be | ||
placed next to the i'th item of the other arrays in the results. If the size of the arrays | ||
is not equal then the shorter arrays will be filled with NULL values up to the size of the | ||
longest array as shown in the following example:</p> | ||
<p>The test data used in this example is arr1: {1, 2, 3}, arr2: {11, 12}</p> | ||
<p>After running any of the queries listed in the examples, the result will be as shown | ||
here:</p> | ||
<p> | ||
<codeblock id="codeblock_n5z_cjq_mtb">arr1 arr2 | ||
[1, 11] | ||
[2, 12] | ||
[3, NULL] | ||
</codeblock> | ||
</p> | ||
</section> | ||
<section id="section_ifc_trb_3cc"> | ||
<title>Example of an UNNEST() in an inline view using SELECT/FILTER of the inline view</title> | ||
<p>In the following example the filter is not in the <codeph>SELECT</codeph> query that | ||
creates the inline view but a level above that.</p> | ||
<codeblock id="codeblock_jfc_trb_3cc">SELECT id, ac1, ac2 FROM (SELECT id, UNNEST(array_col1) AS ac1, UNNEST(array_col2) AS ac2 FROM some_view) WHERE id <10; | ||
</codeblock> | ||
</section> | ||
</conbody> | ||
</concept> |