Skip to content

Commit

Permalink
Zyp: Improve documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
amotl committed Sep 19, 2024
1 parent a22fd28 commit 00ff329
Show file tree
Hide file tree
Showing 11 changed files with 726 additions and 241 deletions.
1 change: 1 addition & 0 deletions CHANGES.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
- MongoDB: Use improved decoding machinery also for `MongoDBCDCTranslator`
- Dependencies: Make MongoDB subsystem not strictly depend on Zyp
- Zyp: Translate a few special treatments to jq-based `MokshaTransformation` again
- Zyp: Improve documentation

## 2024/09/10 v0.0.15
- Added Zyp Treatments, a slightly tailored transformation subsystem
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

[![Tests](https://github.com/crate/commons-codec/actions/workflows/tests.yml/badge.svg)](https://github.com/crate/commons-codec/actions/workflows/tests.yml)
[![Coverage](https://codecov.io/gh/crate/commons-codec/branch/main/graph/badge.svg)](https://app.codecov.io/gh/crate/commons-codec)
[![Build status (documentation)](https://readthedocs.org/projects/commons-codec/badge/)](https://cratedb.com/docs/commons-codec/)
[![Build status (documentation)](https://readthedocs.org/projects/commons-codec/badge/)](https://commons-codec.readthedocs.io/)
[![PyPI Version](https://img.shields.io/pypi/v/commons-codec.svg)](https://pypi.org/project/commons-codec/)
[![Python Version](https://img.shields.io/pypi/pyversions/commons-codec.svg)](https://pypi.org/project/commons-codec/)
[![PyPI Downloads](https://pepy.tech/badge/commons-codec/month)](https://pepy.tech/project/commons-codec/)
Expand Down
8 changes: 8 additions & 0 deletions doc/cdc/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,14 @@ and need further curation and improvements.
:::


## Prior Art

- [core-cdc] by Alejandro Cora González
- [Carabas Research]


[Carabas Research]: https://lorrystream.readthedocs.io/carabas/research.html
[core-cdc]: https://pypi.org/project/core-cdc/
[DynamoDB CDC Relay for CrateDB]: https://cratedb-toolkit.readthedocs.io/io/dynamodb/cdc.html
[MongoDB CDC Relay for CrateDB]: https://cratedb-toolkit.readthedocs.io/io/mongodb/cdc.html
[Replicating CDC Events from DynamoDB to CrateDB]: https://cratedb.com/blog/replicating-cdc-events-from-dynamodb-to-cratedb
4 changes: 3 additions & 1 deletion doc/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -85,7 +85,9 @@
intersphinx_mapping = {
# "influxio": ("https://influxio.readthedocs.io/", None),
}
linkcheck_ignore = []
linkcheck_ignore = [
r"https://stackoverflow.com/questions/70518350",
]

# Disable caching remote inventories completely.
# http://www.sphinx-doc.org/en/stable/ext/intersphinx.html#confval-intersphinx_cache_limit
Expand Down
8 changes: 0 additions & 8 deletions doc/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,14 +37,6 @@ decode
zyp/index
```

```{toctree}
:maxdepth: 3
:caption: Topics
:hidden:
prior-art
```

```{toctree}
:maxdepth: 1
:caption: Workbench
Expand Down
16 changes: 0 additions & 16 deletions doc/prior-art.md

This file was deleted.

76 changes: 37 additions & 39 deletions doc/zyp/backlog.md
Original file line number Diff line number Diff line change
@@ -1,53 +1,51 @@
# Zyp Backlog

## Iteration +1
- Refactor module namespace to `zyp`
- Documentation
- CLI interface
- Apply to MongoDB Table Loader in CrateDB Toolkit
- Document `jq` functions
- [x] Refactor module namespace to `zyp`
- [x] Documentation
- [ ] CLI interface
- [x] Apply to MongoDB Table Loader in CrateDB Toolkit
- [ ] Document `jq` functions
- `builtin.jq`: https://github.com/jqlang/jq/blob/master/src/builtin.jq
- `function.jq`
- [ ] Renaming needs JSON Pointer support. Alternatively, can `jq` do it?
- [ ] Documentation: Add Python example to "Synopsis" section on /index.html

## Iteration +2
Demonstrate!
- math expressions
- omit key (recursively)
- combine keys
- filter on keys and/or values
- Pathological cases like "Not defined" in typed fields like `TIMESTAMP`
- Use simpleeval, like Meltano, and provide the same built-in functions
- https://sdk.meltano.com/en/v0.39.1/stream_maps.html#other-built-in-functions-and-names
- https://github.com/MeltanoLabs/meltano-map-transform/pull/255
- https://github.com/MeltanoLabs/meltano-map-transform/issues/252
- Use JSONPath, see https://sdk.meltano.com/en/v0.39.1/code_samples.html#use-a-jsonpath-expression-to-extract-the-next-page-url-from-a-hateoas-response
- Is `jqpy` better than `jq`?
https://baterflyrity.github.io/jqpy/
Demonstrate more use cases, like...
- [ ] math expressions
- [ ] omit key (recursively)
- [ ] combine keys
- [ ] filter on keys and/or values
- [ ] Pathological cases like "Not defined" in typed fields like `TIMESTAMP`
- [ ] Use simpleeval, like Meltano, and provide the same built-in functions
- https://sdk.meltano.com/en/v0.39.1/stream_maps.html#other-built-in-functions-and-names
- https://github.com/MeltanoLabs/meltano-map-transform/pull/255
- https://github.com/MeltanoLabs/meltano-map-transform/issues/252
- [ ] Use JSONPath, see https://sdk.meltano.com/en/v0.39.1/code_samples.html#use-a-jsonpath-expression-to-extract-the-next-page-url-from-a-hateoas-response

## Iteration +3
- Moksha transformations on Buckets
- Investigate using JSON Schema
- Fluent API interface
- https://github.com/Halvani/alphabetic
- Mappers do not support external API lookups.
- [ ] Moksha transformations on Buckets
- [ ] Fluent API interface
```python
from zyp.model.fluent import FluentTransformation

transformation = FluentTransformation()
.jmes("records[?starts_with(location, 'B')]")
.rename_fields({"_id": "id"})
.convert_values({"/id": "int", "/value": "float"}, type="pointer-python")
.jq(".[] |= (.value /= 100)")
```
- [ ] Investigate using JSON Schema
- [ ] https://github.com/Halvani/alphabetic
- [ ] Mappers do not support external API lookups.
To add external API lookups, you can either (a) land all your data and
then joins using a transformation tool like dbt, or (b) create a custom
mapper plugin with inline lookup logic.
=> Example from Luftdatenpumpe, using a reverse geocoder
- [ ] Define schema
https://sdk.meltano.com/en/latest/typing.html
- https://docs.meltano.com/guide/v2-migration/#migrate-to-an-adapter-specific-dbt-transformer
- https://github.com/meltano/sdk/blob/v0.39.1/singer_sdk/mapper.py

## Fluent API Interface

```python

from zyp.model.fluent import FluentTransformation

transformation = FluentTransformation()
.jmes("records[?starts_with(location, 'B')]")
.rename_fields({"_id": "id"})
.convert_values({"/id": "int", "/value": "float"}, type="pointer-python")
.jq(".[] |= (.value /= 100)")
```
- https://sdk.meltano.com/en/latest/typing.html
- https://docs.meltano.com/guide/v2-migration/#migrate-to-an-adapter-specific-dbt-transformer
- https://github.com/meltano/sdk/blob/v0.39.1/singer_sdk/mapper.py
- [ ] Is `jqpy` better than `jq`?
- https://baterflyrity.github.io/jqpy/
Loading

0 comments on commit 00ff329

Please sign in to comment.