Skip to content

Commit

Permalink
feat(docs): migrated docs v2
Browse files Browse the repository at this point in the history
  • Loading branch information
paulobressan committed Jul 20, 2023
1 parent 8354e69 commit 6ab8262
Show file tree
Hide file tree
Showing 42 changed files with 2,374 additions and 0 deletions.
11 changes: 11 additions & 0 deletions docs/pages/v2/_meta.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
{
"index": "Introduction",
"installation": "Installation",
"usage": "Usage",
"filters": "Filters",
"sources": "Sources",
"sinks": "Sinks",
"reference": "Reference",
"advanced": "Advanced Features",
"guides": "Guides"
}
11 changes: 11 additions & 0 deletions docs/pages/v2/advanced.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# Advanced Features

This section provides detailed information on the some of the advanced features available in Oura:

- [Stateful Cursor](advanced/stateful_cursor.md): provides a mechanism to persist the "position" of the processing pipeline to make it resilient to restarts.
- [Rollback Buffer](advanced/rollback_buffer.md): provides a way to mitigate the impact of chain rollbacks in downstream stages.
- [Pipeline Metrics](advanced/pipeline_metrics.md): allows operators to track the progress and performance of long-running Oura sessions.
- [Mapper Options](advanced/mapper_options.md): A set of "expensive" event mapping procedures that require an explicit opt-in to be activated.
- [Intersect Options](advanced/intersect_options.md): Advanced options for instructing Oura from which point in the chain to start reading from.
- [Custom Network](advanced/custom_network.md): Instructions on how to configure Oura for connecting to a custom network.
- [Retry Policy](advanced/retry_policy.md): Instructions on how to configure retry policies for different operations
53 changes: 53 additions & 0 deletions docs/pages/v2/advanced/custom_network.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
# Custom networks

Instructions on how to configure Oura for connecting to a custom network (aka: other than mainnet / testnet).

## Context

Oura requires certain information about the chain it is reading from. In a way, this is similar to the json config files required to run the Cardano node. These values are used for procedures such as encoding bech32 addresses, computing wall-clock time for blocks, etc.

Since `mainnet` and `testnet` are well-known, heavily used networks, Oura hardcodes these values as part of the binary release so that the user is spared from having to manually specify them. On the other hand, custom networks require the user to configure these values manually for Oura to establish a connection.

## Feature

By adding a `[chain]` section in the daemon configuration file, users can provide the information required by Oura to connect to a custom network.

The `[chain]` section has the following propoerties:

| Name | DataType | Description |
| :------------------- | :------- | :--------------------------------------------------------- |
| byron_epoch_length | integer | the length (in seconds) of a Byron epoch in this network |
| byron_slot_length | integer | the length (in seconds) of a Byron slot in this network |
| byron_known_slot | integer | the slot of a Byron block known to exist in this network |
| byron_known_hash | string | the hash of the known Byron block |
| byron_known_time | integer | the unix timestamp of the known Byron block |
| shelley_epoch_length | integer | the length (in seconds) of a Shelley epoch in this network |
| shelley_slot_length | integer | the length (in seconds) of a Shelley slot in this network |
| shelley_known_slot | integer | the slot of a Shelley block known to exist in this network |
| shelley_known_hash | String | the hash of the known Shelley block |
| shelley_known_time | integer | the unix timestamp of the known Shelley block |
| address_hrp | string | the human readable part for addresses of this network |
| adahandle_policy | string | the minting policy for AdaHandle on this network. |


## Examples

### Chain information for Testnet

This example configuration shows the values for Testnet. Since testnet values are hardcoded as part of Oura's release, users are not required to input these exact values anywhere, but it serves as a good example of what the configuration looks like.

```toml
[chain]
byron_epoch_length = 432000
byron_slot_length = 20
byron_known_slot = 0
byron_known_hash = "8f8602837f7c6f8b8867dd1cbc1842cf51a27eaed2c70ef48325d00f8efb320f"
byron_known_time = 1564010416
shelley_epoch_length = 432000
shelley_slot_length = 1
shelley_known_slot = 1598400
shelley_known_hash = "02b1c561715da9e540411123a6135ee319b02f60b9a11a603d3305556c04329f"
shelley_known_time = 1595967616
address_hrp = "addr_test"
adahandle_policy = "8d18d786e92776c824607fd8e193ec535c79dc61ea2405ddf3b09fe3"
```
62 changes: 62 additions & 0 deletions docs/pages/v2/advanced/intersect_options.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
# Intersect Options

Advanced options for instructing Oura from which point in the chain to start reading from.

## Feature

When running in daemon mode, Oura provides 4 different strategies for finding the intersection point within the chain sync process.

- `Origin`: Oura will start reading from the beginning of the chain.
- `Tip`: Oura will start reading from the current tip of the chain.
- `Point`: Oura will start reading from a particular point (slot, hash) in the chain. If the point is not found, the process will be terminated with a non-zero exit code.
- `Fallbacks`: Oura will start reading the first valid point within a set of alternative positions. If point is not valid, the process will fallback into the next available point in the list of options. If none of the points are valid, the process will be terminated with a non-zero exit code.

The default strategy use by Oura is `Tip`, unless an alternative option is specified via configuration.

You can also define a finalizing point by providing a block hash at which oura will stop reading from the the chain and exit gracefully.

## Configuration

To modify the default behaviour used by the daemon mode, a section named `[source.intersect]` needs to be added in the `daemon.toml` file.

```toml
[source.intersect]
type = <Type>
value = <Value>
```

- `type`: Defines which strategy to use. Valid values are `Origin`, `Tip`, `Point`, `Fallbacks`. Default value is `Tip`.
- `value`: Either a point or an array of points to be used as argument for the selected strategy.

If you'd like it to only sync an specific section of the chain, you can also instruct oura to stop syncing when it reaches an specific block hash by defining a `[source.finalize]` config:

```toml
[source.finalize]
until_hash = <BlockHash>
```

Note that unlike the intersect point, no slot is provided for the finalizer.

## Examples

The following example show how to configure Oura to use a set of fallback intersection point. The chain sync process will attempt to first intersect at slot `4449598`. If not found, it will continue with slot `43159` and finally with slot `0`.

```toml
[source.intersect]
type = "Fallbacks"
value = [
[4449598, "2c9ba2611c5d636ecdb3077fde754413c9d6141c6288109922790e53bbb938b5"],
[43159, "f5d398d6f71a9578521b05c43a668b06b6103f94fcf8d844d4c0aa906704b7a6"],
[0, "f0f7892b5c333cffc4b3c4344de48af4cc63f55e44936196f365a9ef2244134f"],
]
```

This configuration will sync the whole Byron era only:

```toml
[source.intersect]
type = "Origin"

[source.finalize]
until_hash = "aa83acbf5904c0edfe4d79b3689d3d00fcfc553cf360fd2229b98d464c28e9de"
```
36 changes: 36 additions & 0 deletions docs/pages/v2/advanced/mapper_options.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# Mapper Options

A set of "expensive" event mapping procedures that require an explicit opt-in to be activated.

## Context

One of the main concerns of Oura is turning block / tx data into atomic events to send down the pipeline for further processing. The `source` stage is responsible for executing these mapping procedures.

Most of the time, this logic is generic enough that it can be reused in different scenarios. For example, the `N2N` and the `N2C` sources share the same mapping procedures. If a particular use-case needs to cherry-pick, enrich or alter the data in some way, the recommendation is to handle the transformation in downstream stages, by using any of the built-in filter or by creating new ones.

There are some exceptions though, whenever a mapping has a heavy impact on performance, it is better to disable it completely at the `source` level to avoid paying the overhead associated with the initial processing of the data.

## Feature

We consider a mapping procedure "expensive" if it involves: handling a relative large amount of data, computing some relatively expensive value or generating redundant data required only for very particular use cases.

For these expensive procedures, we provide configurable options that instructs an Oura instance running in daemon mode to opt-in on each particular rule.

## Configuration

The mapper options can be defined by adding the following configuration in the `daemon.toml` file:

```toml
[source.mapper]
include_block_end_events = <bool>
include_transaction_details = <bool>
include_transaction_end_events = <bool>
include_block_cbor = <bool>
include_byron_ebb = <bool>
```

- `include_block_end_events`: if enabled, the source will output an event signaling the end of a block, duplicating all of the data already sent in the corresponding block start event. Default value is `false`.
- `include_transaction_details`: if enabled, each transaction event payload will contain an nested version of all of the details of the transaction (inputs, outputs, mint, assets, metadata, etc). Useful when the pipeline needs to process the tx as a unit, instead of handling each sub-object as an independent event. Default value is `false`.
- `include_transaction_end_events`: if enabled, the source will output an event signaling the end of a transaction, duplicating all of the data already sent in the corresponding transaction start event. Defaul value is `false`.
- `include_block_cbor`: if enabled, the block event will include the raw, unaltered cbor content received from the node, formatted as an hex string. Useful when some custom cbor decoding is required. Default value is `false`.
- `include_byron_ebb`: if enabled, a block event will be emmitted for legacy epoch boundary block of the Byron era (deprecated in newer eras). Useful when performing validation on previous block hashes. Default value is `false`.
79 changes: 79 additions & 0 deletions docs/pages/v2/advanced/pipeline_metrics.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
# Pipeline Metrics

The _metrics_ features allows operators to track the progress and performance of long-running Oura sessions.

## Context

Some use-cases require Oura to be running either continuosuly or for prolonged periods of time. Keeping a sink updated in real-time requires 24/7 operation. Dumping historical chain-data from origin might take several hours.

These scenarios require an understanding of the internal state of the pipeline to facilitate monitoring and troubleshooting of the system. In a data-processing pipeline such as this, two important aspects need to observable: progress and performance.

## Feature

Oura provides an optional `/metrics` HTTP endpoint that uses Prometheus format to expose real-time opertional metrics of the pipeline. Each stage (source / sink) is responsible for notifying their progress as they process each event. This notifications are then aggregated via counters & gauges and exposed via HTTP using the well-known Prometheus encoding.

The following metrics are available:

- `chain_tip`: the last detected tip of the chain (height)
- `rollback_count`: number of rollback events occurred
- `source_current_slot`: last slot processed by the source of the pipeline
- `source_current_height`: last height (block #) processed by the source of the pipeline
- `source_event_count`: number of events processed by the source of the pipeline
- `sink_current_slot`: last slot processed by the sink of the pipeline
- `sink_event_count`: number of events processed by the sink of the pipeline

## Configuration

The _metrics_ feature is a configurable setting available when running in daemon mode. A top level `[metrics]` section of the daemon toml file controls the feature:

```toml
# daemon.toml file

[metrics]
address = "0.0.0.0:9186"
endpoint = "/metrics"
```

- `[metrics]` section needs to be present to enable the feature. Absence of the section will not expose any HTTP endpoints.
- `address`: The address at which the HTTP server will be listening for request. Expected format is `<ip>:<port>`. Use the IP value `0.0.0.0` to allow connections on any of the available IP address of the network interface. Default value is `0.0.0.0:9186`.
- `endpoint`: The path at which the metrics will be exposed. Default value is `/metrics`.

## Usage

Once enabled, a quick method to check the metrics output is to navigate to the HTTP endpoint using any common browser. A local instance of Oura with metrics enabled on port `9186` can be accessed by opening the URL http://localhost:9186

An output similar to the following should be shown by the browser:

```
# HELP chain_tip the last detected tip of the chain (height)
# TYPE chain_tip gauge
chain_tip 6935733
# HELP rollback_count number of rollback events occurred
# TYPE rollback_count counter
rollback_count 1
# HELP sink_current_slot last slot processed by the sink of the pipeline
# TYPE sink_current_slot gauge
sink_current_slot 2839340
# HELP sink_event_count number of events processed by the sink of the pipeline
# TYPE sink_event_count counter
sink_event_count 2277714
# HELP source_current_height last height (block #) processed by the source of the pipeline
# TYPE source_current_height gauge
source_current_height 2837810
# HELP source_current_slot last slot processed by the source of the pipeline
# TYPE source_current_slot gauge
source_current_slot 2839340
# HELP source_event_count number of events processed by the source of the pipeline
# TYPE source_event_count counter
source_event_count 2277715
```

Regardless of the above mechanism, the inteded approach for tracking Oura's metrics is to use a monitoring infrastructure compatible with Prometheus format. Setting up and managing the monitoring stack is outside the scope of Oura. If you don't have any infrastructure in place, we recommend checking out some of the more commons stacks:

- Prometheus Server + Grafana
- Metricbeat + Elasticsearch + Kibana
- Telegraf + InfluxDB + Chronograf

The following screenshot is an example of a _Grafana_ dashboard showing Prometheus data scraped from an Oura instance:

![Grafana Dashboard](/v2/grafana.png)
22 changes: 22 additions & 0 deletions docs/pages/v2/advanced/retry_policy.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# Retry Policy

Advanced options for instructing Oura how to deal with failed attempts.

## Configuration

To modify the default behavior, a section named `[retries]` needs to be added to the `daemon.toml` file.

```toml
[retries]
max_retries = 3
backoff_unit_sec = 10
backoff_factor = 3
max_backoff_sec = 10
dismissible = true
```

- `max_retries`: the max number of retries before failing the whole pipeline.
- `backoff_unit_sec`: the delay expressed in seconds between each retry.
- `backoff_factor`: the amount to increase the backoff delay after each attempt.
- `max_backoff_sec`: the longest possible delay in seconds.
- `dismissible`:
39 changes: 39 additions & 0 deletions docs/pages/v2/advanced/rollback_buffer.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
# Rollback Buffer

The "rollback buffer" feature provides a way to mitigate the impact of chain rollbacks in downstream stages of the data-processing pipeline.

## Context

Handling rollbacks in a persistent storage requires clearing the orphaned data / blocks before adding new records. The complexity of this process may vary by concrete storage engine, but it always has an associated impact on performance. In some scenarios, it might even be prohibitive to process events without a reasonable level of confidence about the immutability of the record.

Rollbacks occur frequently under normal conditions, but the chances of a block becoming orphaned diminishes as the depth of the block increases. Some Oura use-cases may benefit from this property, some pipelines might prefer lesser rollback events, even if it means waiting for a certain number of confirmations.

## Feature

Oura provides a "rollback buffer" that will hold blocks in memory until they reach a certain depth. Only blocks above a min depth threshold will be sent down the pipeline. If a rollback occurs and the intersection is within the scope of the buffer, the rollback operation will occur within memory, totally transparent to the subsequent stages of the pipeline.

If a rollback occurs and the intersection is outside of the scope of the buffer, Oura will fallback to the original behaviour and publish a RollbackEvent so that the "sink" stages may handle the rollback procedure manually.

## Trade-off

There's an obvious trade-off to this approach: latency. A pipeline will not process any events until the buffer fills up. Once the initial wait is over, the throughput of the whole pipeline should be equivalent to having no buffer at all (due to Oura's "pipelining" nature). If a rollback occurs, an extra delay will be required to fill the buffer again.

Notice that even if the throughput isn't affected, the latency (measured as the delta between the timestamp at which the event reaches the "sink" stage and the original timestamp of the block) will always be affected by a fixed value proportional to the size of the buffer.

## Implementation Details

The buffer logic is implemented in pallas-miniprotocols library. It works by keeping a VecDeque data structure of chain "points", where roll-forward operations accumulate at the end of the deque and retrieving confirmed points means popping from the front of the deque.

## Configuration

The min depth is a configurable setting available when running in daemon mode. Higher min_depth values will lower the chances of experiencing a rollback event, at the cost of adding more latency. An node-to-node source stage config would look like this:

```toml
[source]
type = "N2N"
address = ["Tcp", "relays-new.cardano-mainnet.iohk.io:3001"]
magic = "mainnet"
min_depth = 6
```

Node-to-client sources provide an equivalent setting.
Loading

0 comments on commit 6ab8262

Please sign in to comment.