Skip to content

Commit

Permalink
IMPALA-13257: [DOCS] Documentation for unnest() and querying arrays
Browse files Browse the repository at this point in the history
Currently, the two topics, Querying Arrays and Zipping Unnest on
Arrays from Views, were missing.

The documentation has been added, and the parent topic has been
updated with references to the child topics.

Change-Id: I3ad29153bf6ed3939fb1d87d6220bd22f8f7fa1b
Reviewed-on: http://gerrit.cloudera.org:8080/21651
Tested-by: Impala Public Jenkins <[email protected]>
Reviewed-by: Quanlong Huang <[email protected]>
  • Loading branch information
m-sanjana19 authored and stiga-huang committed Aug 13, 2024
1 parent 6736063 commit 6edbf57
Show file tree
Hide file tree
Showing 4 changed files with 219 additions and 3 deletions.
5 changes: 4 additions & 1 deletion docs/impala.ditamap
Original file line number Diff line number Diff line change
Expand Up @@ -108,7 +108,10 @@ under the License.
</topicref>
<topicref href="topics/impala_tinyint.xml"/>
<topicref href="topics/impala_varchar.xml"/>
<topicref href="topics/impala_complex_types.xml"/>
<topicref href="topics/impala_complex_types.xml">
<topicref href="topics/impala_queryingarrays.xml"/>
<topicref href="topics/impala_unnest_views.xml"/>
</topicref>
</topicref>
<topicref href="topics/impala_literals.xml"/>
<topicref href="topics/impala_operators.xml"/>
Expand Down
5 changes: 3 additions & 2 deletions docs/topics/impala_complex_types.xml
Original file line number Diff line number Diff line change
Expand Up @@ -45,8 +45,6 @@ under the License.
and higher. The Hive <codeph>UNION</codeph> type is not currently supported.
</p>

<p outputclass="toc inpage"/>

<p>
Once you understand the basics of complex types, refer to the individual type topics when you need to refresh your memory about syntax
and examples:
Expand All @@ -65,6 +63,9 @@ under the License.
<xref href="impala_map.xml#map"/>
</li>
</ul>
<p outputclass="toc inpage"/>
<p>For information on querying arrays and improvements to zipping unnest functionality, refer to
the following:</p>

</conbody>

Expand Down
132 changes: 132 additions & 0 deletions docs/topics/impala_queryingarrays.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,132 @@
<?xml version="1.0" encoding="UTF-8"?>
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
<concept id="impala_queryingarrays" rev="4.1.0">
<title>Querying arrays (<keyword keyref="impala41"/> or higher only)</title>
<titlealts audience="PDF">
<navtitle>Querying arrays</navtitle>
</titlealts>
<prolog>
<metadata>
<data name="Category" value="Impala"/>
<data name="Category" value="Impala Data Types"/>
<data name="Category" value="SQL"/>
<data name="Category" value="Data Analysts"/>
<data name="Category" value="Developers"/>
<data name="Category" value="Schemas"/>
</metadata>
</prolog>
<conbody>
<p rev="4.1.0">
<indexterm audience="hidden">Querying arrays</indexterm> Describes how to use UNNEST function
to query arrays. ARRAY data types represent collections with arbitrary numbers of elements,
where each element is the same type.</p>
<section id="section_yl4_2qb_3cc">
<title>Querying arrays using JOIN and UNNEST</title>
<p>You can query arrays by making a join between the table and the array inside the table.
This approach is improved with the introduction of the <codeph>UNNEST</codeph> function in
the <codeph>SELECT</codeph> list or in the <codeph>FROM</codeph> clause in the
<codeph>SELECT</codeph> statement. When you use <codeph>UNNEST</codeph>, you can provide
more than one array in the <codeph>SELECT</codeph> statement. If you use JOINs for querying
arrays it will yield a <term>joining unnest</term> however the latter will provide a
<term>zipping unnest</term>.</p>
</section>
<section id="section_hmf_hqb_3cc">
<title>Example of querying arrays using JOIN</title>
<p>Use <codeph>JOIN</codeph> in cases where you must join unnest of multiple arrays. However
if you must zip unnest then use the newly implemented <codeph>UNNEST</codeph> function.</p>
<p>Here is an example of a <codeph>SELECT</codeph> statement that uses JOINs to query an
array.</p>
<codeblock id="codeblock_uqy_rqb_3cc">SELECT id, arr1.item, arr2.item FROM tbl_name tbl, tbl.arr1, tbl.arr2;

ID, ARR1.ITEM, ARR2.ITEM
[1, 1, 10]
[1, 1, 11]
[1, 2, 10]
[1, 2, 11]
[1, 3, 10]
[1, 3, 11]
</codeblock>
<note id="note_tpb_wln_htb">
<p>The test data used in this example is ID: 1, arr1: {1, 2, 3}, arr2: {10, 11}</p>
</note>
</section>
<section id="section_ipq_tqb_3cc">
<title>Examples of querying arrays using UNNEST</title>
<p>You can use one of the two different syntaxes shown here to unnest multiple arrays in one
query. This results in the items of the arrays being zipped together instead of joining.</p>
<ul id="ul_jpq_tqb_3cc">
<li>ISO:SQL 2016 compliant syntax:
<codeblock id="codeblock_kpq_tqb_3cc">SELECT a1.item, a2.item
FROM complextypes_arrays t, UNNEST(t.arr1, t.arr2) AS (a1, a2);
</codeblock></li>
<li>Postgres compatible
syntax:<codeblock id="codeblock_yl5_3mn_htb">SELECT UNNEST(arr1), UNNEST(arr2) FROM complextypes_arrays;</codeblock></li>
</ul>
<p><b>Unnest operator in SELECT list</b></p>
<codeblock id="codeblock_lpq_tqb_3cc">SELECT id, unnest(arr1), unnest(arr2) FROM tbl_name;</codeblock>
<p><b>Unnest operator in FROM clause</b></p>
<codeblock id="codeblock_mpq_tqb_3cc">SELECT id, arr1.item, arr2.item FROM tbl_name tbl_alias, UNNEST(tbl_alias.arr1, tbl_alias.arr2);</codeblock>
<p>This new functionality would zip the arrays next to each other as shown here. </p>
<codeblock id="codeblock_npq_tqb_3cc">ID, ARR1.ITEM, ARR2.ITEM
[1, 1, 10]
[1, 2, 11]
[1, 3, NULL]
</codeblock>
<p>Note, that arr2 is shorter than arr1 so the "missing" items in its column will be filled
with NULLs.</p>
<note id="note_opq_tqb_3cc">The test data used in this example is ID: 1, arr1: {1, 2, 3},
arr2: {10, 11}</note>
</section>
<section id="section_i1g_wqb_3cc">
<title>Limitations in Using UNNEST</title>
<p>
<ul id="ul_j1g_wqb_3cc">
<li>Only arrays from the same table can be zipping unnested</li>
<li>The old (joining) and the new (zipping) unnests cannot be used together</li>
<li>You can add a <codeph>WHERE</codeph> filter on an unnested item only if you add a
wrapper <codeph>SELECT</codeph> and do the filtering
<p>Example:</p><codeblock id="codeblock_k1g_wqb_3cc">SELECT id, arr1_unnest FROM (SELECT id, unnest(arr1) as arr1_unnest FROM tbl_name) WHERE arr1_unnest &lt; 10;</codeblock></li>
</ul>
</p>
</section>
<section id="section_ewb_yqb_3cc">
<title>Using ARRAY columns in the SELECT list</title>
<!--Removing this since zipping unnest for arrays (IMPALA-10920) and allow array type in SELECT list (IMPALA-9498) are all added in the same upstream release (Impala 4.1)
<p>Prior to this release to look into the content of an array you had to unnest the array
either by the joining syntax or by using the zipping <codeph>UNNEST</codeph> operator as
shown in the following example:</p>
<codeblock id="codeblock_fwb_yqb_3cc">SELECT unnest(IDs), unnest(NAMES) FROM table_name;</codeblock>
-->
<p rev="4.1">Impala 4.1 adds support to return <codeph>ARRAYs</codeph> as
<codeph>STRINGs</codeph> (<term>JSON arrays</term>) in the <codeph>SELECT</codeph> list,
for example: </p>
<codeblock id="codeblock_gwb_yqb_3cc">select id, int_array from functional_parquet.complextypestbl where id = 1;
returns: 1, “[1,2,3]”
</codeblock>
<p>Returning <codeph>ARRAYs</codeph> from inline or Hive Metastore views is also supported.
These arrays can be used both in the select list or as relative table references.</p>
<codeblock id="codeblock_hwb_yqb_3cc">select id, int_array from (select id, int_array from complextypestbl) s;</codeblock>
<p>Though <codeph>STRUCTs</codeph> are already supported, <codeph>ARRAYs</codeph> and
<codeph>STRUCTs</codeph> nested within each other are not supported yet. Using them as
non-relative table references is also not supported yet.</p>
</section>
</conbody>
</concept>
80 changes: 80 additions & 0 deletions docs/topics/impala_unnest_views.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
<?xml version="1.0" encoding="UTF-8"?>
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->
<!DOCTYPE concept PUBLIC "-//OASIS//DTD DITA Concept//EN" "concept.dtd">
<concept id="impala_unnest_views" rev="4.1.0">
<title>Zipping unnest on arrays from views (<keyword keyref="impala41"/> or higher only)</title>
<titlealts audience="PDF">
<navtitle>Zipping unnest on arrays from views</navtitle>
</titlealts>
<prolog>
<metadata>
<data name="Category" value="Impala"/>
<data name="Category" value="Impala Data Types"/>
<data name="Category" value="SQL"/>
<data name="Category" value="Data Analysts"/>
<data name="Category" value="Developers"/>
<data name="Category" value="Schemas"/>
</metadata>
</prolog>
<conbody>
<p rev="4.1.0">
<indexterm audience="hidden">Zipping unnest on arrays from views</indexterm>
<keyword keyref="impala41"/>, the zipping unnest functionality works for arrays in both tables
and
views.<codeblock id="codeblock_nwz_lyc_3cc">SELECT UNNSET(arr1) FROM view_name;</codeblock>
</p>
<section id="section_irf_rrb_3cc">
<title>UNNEST() on array columns</title>
<p>You can use <codeph>UNNEST()</codeph> on array columns in two ways. Using one of these two
ways results in the items of the arrays being zipped together instead of joining.</p>
<ul id="ul_jrf_rrb_3cc">
<li>ISO:SQL 2016 compliant syntax</li>
</ul>
<codeblock id="codeblock_krf_rrb_3cc">SELECT a1.item, a2.item
FROM complextypes_arrays t, UNNEST(t.arr1, t.arr2) AS (a1, a2);
</codeblock>
<ul id="ul_lrf_rrb_3cc">
<li>Postgres compatible syntax</li>
</ul>
<codeblock id="codeblock_mrf_rrb_3cc">SELECT UNNEST(arr1), UNNEST(arr2) FROM complextypes_arrays;</codeblock>
<p>When unnesting multiple arrays with zipping unnest, the i'th item of one array will be
placed next to the i'th item of the other arrays in the results. If the size of the arrays
is not equal then the shorter arrays will be filled with NULL values up to the size of the
longest array as shown in the following example:</p>
<p>The test data used in this example is arr1: {1, 2, 3}, arr2: {11, 12}</p>
<p>After running any of the queries listed in the examples, the result will be as shown
here:</p>
<p>
<codeblock id="codeblock_n5z_cjq_mtb">arr1 arr2
[1, 11]
[2, 12]
[3, NULL]
</codeblock>
</p>
</section>
<section id="section_ifc_trb_3cc">
<title>Example of an UNNEST() in an inline view using SELECT/FILTER of the inline view</title>
<p>In the following example the filter is not in the <codeph>SELECT</codeph> query that
creates the inline view but a level above that.</p>
<codeblock id="codeblock_jfc_trb_3cc">SELECT id, ac1, ac2 FROM (SELECT id, UNNEST(array_col1) AS ac1, UNNEST(array_col2) AS ac2 FROM some_view) WHERE id &lt;10;
</codeblock>
</section>
</conbody>
</concept>

0 comments on commit 6edbf57

Please sign in to comment.