-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
2 changed files
with
114 additions
and
58 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,94 +1,100 @@ | ||
--- | ||
title: Querying JSON and STRUCT | ||
title: Querying JSON | ||
--- | ||
|
||
# Querying STRUCT and JSON Columns | ||
# Querying JSON Columns | ||
|
||
Logs can contain complex data represented as JSON. Tailpipe plugins store such objects as one of two native DuckDB types: STRUCT or JSON. | ||
Logs can contain complex data represented as JSON. Tailpipe plugins store such objects as one of two native DuckDB types: JSON or STRUCT. Learn about JSON idioms here, see [Querying STRUCT Columns](/docs/sql/querying-struct) for STRUCT idioms. | ||
|
||
When all instances of the object have a regular shape, a plugin uses DuckDB's STRUCT type. The `user_identity` column of the `aws_cloudtrail_log` table is a STRUCT column, as you can verify using the `typeof` function. | ||
When an object's instances have irregular shape, a plugin uses DuckDB's JSON type. The `request_parameters` column of the `aws_cloudtrail_log` table is a JSON column, as you can verify using the `typeof` function. | ||
|
||
```sql | ||
select typeof(user_identity) from aws_cloudtrail_log limit 1; | ||
typeof(user_identity) = STRUCT("type" VARCHAR, principal_id VARCHAR, arn VARCHAR, account_id VARCHAR, access_key_id VARCHAR, user_name VARCHAR, session_context STRUCT(attributes STRUCT(mfa_authenticated VARCHAR, creation_date BIGINT), ... | ||
select typeof(request_parameters) from aws_cloudtrail_log limit 1; | ||
JSON | ||
``` | ||
|
||
>[!NOTE] | ||
> if .inspect is available, that'll be the preferred way to observe the type | ||
You can list the keys of a JSON object: | ||
|
||
```sql | ||
select json_keys(request_parameters) from aws_cloudtrail_log limit 1; | ||
``` | ||
|
||
Dot notation is the native method for accessing STRUCT fields. An element extracted by the `.` operator has the type defined by the STRUCT. Because the `invoked_by` field is of type VARCHAR, you can compare it to a string. | ||
The request_parameters` column contains a JSON object that includes a `Host` key whose value you can extract with a function: | ||
|
||
```sql | ||
select user_identity.invoked_by | ||
from aws_cloudtrail_log | ||
where user_identity.invoked_by = 'AWS Internal' | ||
limit 1; | ||
select json_extract(request_parameters, '$.Host') from aws_cloudtrail_log; | ||
``` | ||
|
||
When an object's instances have irregular shape, a plugin uses DuckDB's JSON type. The `request_parameters` column of the `aws_cloudtrail_log` table is a JSON column, as you can again verify using the `typeof` function. | ||
Or with the JSON operator that returns a stringified representation of an element: | ||
|
||
```sql | ||
select typeof(request_parameters) from aws_cloudtrail_log limit 1; | ||
JSON | ||
select request_parameters ->> 'Host' from aws_cloudtrail_log; | ||
``` | ||
|
||
Both methods return a string that you can compare. | ||
|
||
```sql | ||
select json_extract(request_parameters, '$.Host') ilike '%aws%' from aws_cloudtrail_log; | ||
|
||
select ( request_parameters ->> 'Host' ) ilike '%aws%' from aws_cloudtrail_log; | ||
``` | ||
|
||
In this case, DuckDB's precedence rules require you to parenthesize the `->>` expression. To avoid confusion we prefer functions over operators in Tailpipe mods. | ||
|
||
|
||
The `resource` column of `aws_cloudtrail_log` is a JSON array of objects. You use 0-based indexing to access elements of an array. To access the first element: | ||
|
||
```sql | ||
select resources[0] from aws_cloudtrail_log; | ||
``` | ||
|
||
The `->` and `->>` operators work with JSON columns. To compare the `Host` field to a string, use `->>` which returns a stringified representation of the JSON value. | ||
Alternatively you can use the JSON `->` operator: | ||
|
||
```sql | ||
select resources -> 0 from aws_cloudtrail_log; | ||
``` | ||
|
||
Use 1-based indexing for STRUCT arrays and 0-based for JSON arrays. For example, here are two equivalent columns. | ||
Either approach returns a JSON object that you can drill into. | ||
|
||
While this is also possible: | ||
|
||
```sql | ||
CREATE TABLE example ( | ||
struct_col STRUCT( | ||
num DOUBLE, | ||
text STRING, | ||
string_array STRING[] | ||
), | ||
json_col JSON | ||
); | ||
|
||
INSERT INTO example VALUES | ||
( | ||
{num: 42.0, text: 'Hello', string_array: ['one', 'two', 'three']}, | ||
'{"num": 42.0, "text": "Hello", "string_array": ["one", "two", "three"]}' | ||
); | ||
select resources ->> 0 from aws_cloudtrail_log; | ||
``` | ||
|
||
To access the first element of the STRUCT's array: | ||
it's probably not you want because `->>` returns a string, not a JSON object. | ||
|
||
To access an element of the zeroth object in the array: | ||
|
||
```sql | ||
select struct_col.string_array[1] from example; | ||
+----------------------------+ | ||
| one | | ||
+----------------------------+ | ||
select json_extract(resources[0], '$.ARN') from aws_cloudtrail_log; | ||
``` | ||
|
||
To access the first element of the JSON array: | ||
Or: | ||
|
||
```sql | ||
select json_col->'string_array'->>0 from example; | ||
+----------------------------+ | ||
| one | | ||
+----------------------------+ | ||
select resources -> 0 ->> 'ARN' from aws_cloudtrail_log; | ||
``` | ||
|
||
Note the distinction between the `->` and `->>` operators. The type of `json_col->'string_array'` is JSON. If we instead accessed `json_col->>'string_array'` the type would be VARCHAR, and the result would be a string that we cannot index into. Similarly the type of `json_col->'string_array'->>0` is VARCHAR, and the result is a string we can compare to another string. If we instead accessed `json_col->'string_array'->>0` the type would be JSON. The printed value would look the same (*one*), but this result would not be comparable to a string. | ||
Either method returns a string suitable for comparison with another string, or evaluation by a string-oriented operator such as `ilike`. | ||
|
||
While this is also possible: | ||
|
||
```sql | ||
select resources -> 0 -> 'ARN' from aws_cloudtrail_log; | ||
``` | ||
|
||
For JSON columns you can alternatively extract elements using functions. | ||
The JSON object it returns has type JSON, not string, so this comparison fails: | ||
|
||
```sql | ||
select resources -> 0 -> 'ARN' ilike '%aws%' from aws_cloudtrail_log; | ||
``` | ||
select json_extract(json_col, '$.text') from example; | ||
+----------------------------------+ | ||
| "Hello" | | ||
+----------------------------------+ | ||
select json_extract_string(json_col, '$.num') from example; | ||
+----------------------------------------+ | ||
| 42.0 | | ||
+----------------------------------------+ | ||
select json_extract(json_col, '$.string_array')[0] from example; | ||
+---------------------------------------------+ | ||
| "one" | | ||
+---------------------------------------------+ | ||
|
||
You can parenthesize to enable the comparison: | ||
|
||
```sql | ||
select ( resources -> 0 -> 'ARN' ) ilike '%aws%' from aws_cloudtrail_log; | ||
``` | ||
|
||
But again, to avoid confusion, you may want to prefer JSON functions over operators. | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,50 @@ | ||
--- | ||
title: Querying STRUCT | ||
--- | ||
|
||
# Querying STRUCT Columns | ||
|
||
Logs can contain complex data represented as JSON. Tailpipe plugins store such objects as one of two native DuckDB types: STRUCT or JSON. Learn about STRUCT idioms here, see [Querying JSON Columns](/docs/sql/querying-json) for JSON idioms. | ||
|
||
When all instances of the object have a regular shape, a plugin uses DuckDB's STRUCT type. The `user_identity` column of the `aws_cloudtrail_log` table is a STRUCT column, as you can verify using the `typeof` function. | ||
|
||
```sql | ||
select typeof(user_identity) from aws_cloudtrail_log limit 1; | ||
typeof(user_identity) = STRUCT("type" VARCHAR, principal_id VARCHAR, arn VARCHAR, \ | ||
account_id VARCHAR, access_key_id VARCHAR, user_name VARCHAR, \ | ||
session_context STRUCT(attributes STRUCT(mfa_authenticated VARCHAR, creation_date BIGINT), \ | ||
... | ||
``` | ||
|
||
>[!NOTE] | ||
> if .inspect is available, that'll be the preferred way to observe the type | ||
DuckDB doesn't have a `struct_keys` function analogous to `json_keys` but you can list the keys of STRUCT by casting to JSON | ||
|
||
```sql | ||
select json_keys( json(user_identity) ) from aws_cloudtrail_log limit 1; | ||
``` | ||
|
||
|
||
The `user_identity` column includes an `invoked_by` field that you can extract using dot notation: | ||
|
||
```sql | ||
select user_identity.invoked_by from aws_cloudtrail_log | ||
``` | ||
|
||
Because the STRUCT-defined type of `invoked_by` field is VARCHAR, you can compare it to a string. | ||
|
||
```sql | ||
select user_identity.invoked_by | ||
from aws_cloudtrail_log | ||
where user_identity.invoked_by = 'AWS Internal' | ||
``` | ||
|
||
The `user_identity` column includes a nested STRUCT, `session_context`. You can use dot notation to drill into it: | ||
|
||
``` | ||
select user_identity.session_context.attributes.mfa_authenticated | ||
from aws_cloudtrail_log | ||
``` | ||
|
||
|