Skip to content

Commit

Permalink
Document array type
Browse files Browse the repository at this point in the history
  • Loading branch information
iamlucaswolf committed Aug 5, 2024
1 parent 47f2de8 commit 0526779
Show file tree
Hide file tree
Showing 3 changed files with 108 additions and 6 deletions.
84 changes: 84 additions & 0 deletions website/docs/sql/datatype/array.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
# Array Type

An array is a sequential collection of elements packed into a single SQL value.
Arrays are useful to model dependent information with an inherent ordering in a single row – for example, time-series data from longitudinal studies, or embedding vectors from machine learning models.

Generally, array values are denoted in curly braces with elements seperated by commas, like so:

```
{'Lorem', 'ipsum', 'dolor', 'sit', 'amed'}
```


## Element Types and Nullability

Hyper's arrays are _strongly typed_:
All elements must be of the same type – the array's _element type_.
The element type is defining part of the array's overall type, meaning that `array(integer)` is a different type than `array(boolean)` or `array(text)`.
Arrays can be built from all [atomic types](./index.md) available in Hyper.

Part of an array's element type is its nullability.
For example, `array(smallint)` is a different array type than `array(smallint not null)`.
Note that this is independent of the array's overall nullability.
The following four options all represent different types in Hyper:

|Type|array nullable?|elements nullable?| possible values|
|---|---|---|---|
|`array(integer)`|||`{}`,`{1,2,3}`,`{1,2,null}`, `null`|
|`array(integer not null)`|||`{}`,`{1,2,3}`,`null`|
|`array(integer) not null`|||`{}`,`{1,2,3}`,`{1,2,null}`|
|`array(integer not null) not null`|||`{}`,`{1,2,3}`|

Array types can be converted using the conventional [cast syntax](../scalar_func/conversion.md).

:::info
Non-nullable element types use less memory and enable optimizations for certain array operatios. Users are therefore advised to use the most "restrictive" element type possible, if the use semantics of the case allows it.
:::

For nullable types, there exists an alternative shorthand bracket syntax of the form `type[]`. For example, `integer[]` and `array(integer)` reference the same type.

Contrary to some common programming languages, the length (i.e., the number of elements) of an array in Hyper is not part of its type.
While arrays inside a column must have the same element type, they can be of different length.

## Working with Arrays

Arrays can be created in two ways:

- Using the type constructor syntax:
```sql
> select array[1,2,3];
{1,2,3}
```
- Using casts from string data:
```sql
> select '{1,2,3}'::array(integer);
{1,2,3}
```

Note that using the constructor syntax, the array type will be inferred automatically.
If the constructed array does not contain a `null` value, the element type will be inferred as non-nullable.


Array elements can retrieved using the conventional bracket-indexing notation. Indexes always start at one.
```sql
> select ('{1,1,2,3,5}'::integer[])[4]
3
```

:::info
Arrays are always one-dimensional. Higher-dimensional objects (e.g., matrices) must be flattened explicitly.
:::

For more operations on array, see the section on [Array Functions](../scalar_func/arrays.md).

## Limitations

While arrays are first-class citizens in Hyper, they are subject to some moderate limitations:

- The size of an array is limited to 4,294,967,296 elements. The actual limit may be lower, depending on the sizes of its elements.
- Arrays cannot be nested.
- Persisting arrays requires [file format version 4](../../hyper-api/hyper_process.md#version-4) or higher.

:::note
Also, see the restrictions regarding array support in [external formats](../external/formats.md).
:::
27 changes: 22 additions & 5 deletions website/docs/sql/datatype/index.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,15 @@
# Data Types

Hyper has a rich set of native data types available to users.
Hyper provides a rich set of native data types.

The table below shows all the built-in general-purpose
data types. Most of the alternative names listed in the "Aliases" column
are the names supported by Hyper for compatibility reasons with Postgres.
Generally, Hyper's type system can be divided into two buckets: _atomic_ types which describe single values, and _composite_ types which describe collection of values.
However, this distinction is made for educational puproses only; both kinds are equally supported, and there is no fundamental limitation applying to either category.

## Atomic Types

Atomic Types comprise fundamental, general-purpose data types.
The following table lits all available atomic types.
Most of the alternative names listed in the "Aliases" column are supported for compatibility with Postgres.

Name|Aliases|Description
---|---|---
Expand Down Expand Up @@ -34,11 +39,23 @@ Persisting 32-bit floating point values (e.g., type `REAL`) requires at least [d
Up until Hyper API release [0.0.18825](/docs/releases#0.0.18825) Hyper used 64-bit floating points for all float types (i.e., also for `REAL`).
:::

## Composite Types

Composite types are collections of data in a single SQL value.
They allow for dedicated [schema denormalization][schema-denormalization] which can be useful for specific domains, such as machine learning applications.

Links to detailed documentation:
### Array

An array is an ordered sequence of values.
Arrays in Hyper are strongly-typed and can be built from all supported atomic types.
See [Array Type](./array.md) for more details.

## Further Reading

```mdx-code-block
import DocCardList from '@theme/DocCardList';
<DocCardList />
```

[schema-denormalization]: https://en.wikipedia.org/wiki/Denormalization
3 changes: 2 additions & 1 deletion website/sidebars.js
Original file line number Diff line number Diff line change
Expand Up @@ -80,14 +80,14 @@ const sidebars = {
"sql/datatype/numeric",
"sql/datatype/datetime",
"sql/datatype/binary",
"sql/datatype/array"
],
},
{
type: 'category',
label: 'Scalar Functions and Operators',
link: { type: 'doc', id: 'sql/scalar_func/index' },
items: [
"sql/scalar_func/arrays",
"sql/scalar_func/conversion",
"sql/scalar_func/comparison",
"sql/scalar_func/subquery_comparison",
Expand All @@ -97,6 +97,7 @@ const sidebars = {
"sql/scalar_func/string",
"sql/scalar_func/string_matching",
"sql/scalar_func/formatting",
"sql/scalar_func/arrays",
"sql/scalar_func/datetime",
"sql/scalar_func/geography",
],
Expand Down

0 comments on commit 0526779

Please sign in to comment.