Skip to content

Commit

Permalink
doc review
Browse files Browse the repository at this point in the history
  • Loading branch information
jonmmease committed Nov 12, 2024
1 parent 116fec5 commit a5abbc5
Show file tree
Hide file tree
Showing 8 changed files with 50 additions and 11 deletions.
18 changes: 18 additions & 0 deletions docs/source/_static/custom-icon.js

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

4 changes: 4 additions & 0 deletions docs/source/about/related_projects.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,8 @@ These workflows were not supported by the initial version of VegaFusion, but sup

`altair-transform` does not support evaluating transforms on the server in interactive workflows like linked histogram brushing, which was the initial focus of VegaFusion.

`altair-transform` is not under active development.

## [`ibis-vega-transform`](https://github.com/Quansight/ibis-vega-transform)
`ibis-vega-transform` is a Python library and JupyterLab extension developed by [Quansight](https://www.quansight.com/). It translates pipelines of Vega transforms into [Ibis](https://ibis-project.org/) query expressions, which can then be evaluated with a variety of Ibis database backends (in particular, OmniSci).

Expand All @@ -21,3 +23,5 @@ In contrast to the Planner approach used by VegaFusion, `ibis-vega-transform` re
An advantage of this approach is that the Vega JavaScript library remains in control of the entire specification so the external `ibis-vega-transform` library does not need to maintain an independent task graph in order to support interactivity. A downside of this approach is that the result of every transform pipeline must be sent back to the client and be stored in the Vega dataflow graph. Often times this is not a problem, because the transform pipeline includes an aggregation stage that significantly reduces the dataset size. However, sometimes the result of a transform pipeline is quite large, but it is only used as input to other transform pipelines. In this case, it is advantageous to keep the large intermediary result cached on the server and to not send it to the client at all. This use case is one of the reasons that VegaFusion uses the Planner+Runtime architecture described previously.

Currently, VegaFusion implements all of its transform logic in the Python process (with efficient multi-threading) and has no capability to connect to external data providers like databases. This is certainly a desirable capability, and may be enabled in VegaFusion by the [datafusion-federation](https://github.com/datafusion-contrib/datafusion-federation) project.

`ibis-vega-transform` is not under active development.
2 changes: 1 addition & 1 deletion docs/source/about/technology.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,4 @@ VegaFusion uses a fairly diverse technology stack. The planner and runtime are b
The Task Graph specifications are defined as protocol buffer messages. The [prost](https://github.com/tokio-rs/prost) library is used to generate Rust data structures from these protocol buffer messages. When Arrow tables appear as task graph root values, they are serialized inside the protocol buffer specification using the [Apache Arrow IPC format](https://arrow.apache.org/docs/format/Columnar.html#serialization-and-interprocess-communication-ipc). The binary representation of the task graph protocol buffer message is what is transferred across the Jupyter Comms protocol.

## DataFusion integration
[Apache Arrow DataFusion](https://github.com/apache/arrow-datafusion) is an SQL compatible query engine that integrates with the Rust implementation of Apache Arrow. VegaFusion uses DataFusion to implement many of the Vega transforms, and it compiles the Vega expression language directly into the DataFusion expression language. In addition to being very fast, a particularly powerful characteristic of DataFusion is that it provides many interfaces that can be extended with custom Rust logic. For example, VegaFusion defines a few custom UDFs that are designed to implement the precise semantics of the Vega transforms and the Vega expression language.
[Apache DataFusion](https://github.com/apache/datafusion) is an SQL compatible query engine that integrates with the Rust implementation of Apache Arrow. VegaFusion uses DataFusion to implement many of the Vega transforms, and it compiles the Vega expression language directly into the DataFusion expression language. In addition to being very fast, a particularly powerful characteristic of DataFusion is that it provides many interfaces that can be extended with custom Rust logic. For example, VegaFusion defines a few custom UDFs that are designed to implement the precise semantics of the Vega transforms and the Vega expression language.
21 changes: 20 additions & 1 deletion docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,11 +17,30 @@
html_static_path = ['_static']
html_logo = "_static/VegaFusionLogo-Color.svg"
html_favicon = "_static/favicon.ico"

html_theme_options = {
"icon_links": [
{
"name": "Twitter",
"url": "https://twitter.com/vegafusion_io",
"icon": "fa-brands fa-twitter",
},
{
"name": "GitHub",
"url": "https://github.com/vega/vegafusion",
"icon": "fa-brands fa-github",
},
{
"name": "PyPI",
"url": "https://pypi.org/project/vegafusion/",
"icon": "fa-custom fa-pypi",
}
],
}
# Add custom CSS
html_css_files = [
'custom.css',
]
html_js_files = ["custom-icon.js"]

_social_img = "https://vegafusion.io/_static/vegafusion_social.png"
_description = "VegaFusion provides serverside scaling for Vega visualizations"
Expand Down
4 changes: 2 additions & 2 deletions docs/source/features/grpc.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# gRPC
The VegaFusion Runtime can run as a [gRPC](https://grpc.io/) service, which makes it possible for multiple clients to connect to the same runtime, and share a cache. This also makes it possible for the Runtime to reside on a different host than the client.
The VegaFusion Runtime can run as a [gRPC](https://grpc.io/) service, which makes it possible for multiple clients to connect to the same runtime, and share a cache (See [How it Works](../about/how_it_works) for more details). This also makes it possible for the Runtime to reside on a different host than the client.

:::{warning}
VegaFusion's gRPC server does not currently support authentication, and chart specifications may reference the local file system of the machine running the server. It is not currently recommended to use VegaFusion server with untrusted Vega specifications unless other measures are taken to isolate the service.
Expand Down Expand Up @@ -35,7 +35,7 @@ See [grpc.py](https://github.com/vega/vegafusion/tree/v2/examples/python-example
## Rust
The `GrpcVegaFusionRuntime` struct is an alternative to the `VegaFusionRuntime` struct that provides the same interface, but connects to a VegaFusion Server.

See [grpc.rs](https://github.com/vega/vegafusion/tree/v2/examples/rust-examples/grpc.rs) for a complete example.
See [grpc.rs](https://github.com/vega/vegafusion/tree/v2/examples/rust-examples/examples/grpc.rs) for a complete example.

## JavaScript
The `vegafusion-wasm` package can connect to an instance of VegaFusion Server over [gRPC-Web](https://github.com/grpc/grpc-web).
Expand Down
2 changes: 1 addition & 1 deletion docs/source/features/transform_spec.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ VegaFusion can be used to evaluate datasets in a Vega spec, remove unused column
This is the foundation of Vega-Altair's [``"vegafusion"`` data transformer](https://altair-viz.github.io/user_guide/large_datasets.html#vegafusion-data-transformer) when used with the default HTML or static image renderers.

:::{warning}
The pre-transform process will, by default, preserve the interactive behavior of the input Vega specification. For interactive charts that perform filtering, this may result in the generation of a spec containing the full input dataset. If interactivity does not need to be preserved (e.g. if the resulting chart is used in a static context) then the ``preserve_interactivity`` option should be set to False. If interactivity is needed, then the Chart State workflow may be more appropriate.
The pre-transform process will, by default, preserve the interactive behavior of the input Vega specification. For interactive charts that perform filtering, this may result in the generation of a spec containing the full input dataset. If interactivity does not need to be preserved (e.g. if the resulting chart is used in a static context) then the ``preserve_interactivity`` option should be set to False. If interactivity is needed, then the [Chart State](./chart_state) workflow may be more appropriate.
:::

## Python
Expand Down
2 changes: 1 addition & 1 deletion docs/source/vega_coverage/supported_transforms.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Supported Transforms

VegaFusion implements a subset of [Vega's transforms](https://vega.github.io/vega/). Below is a detailed breakdown of transform support status.
VegaFusion implements a subset of [Vega's transforms](https://vega.github.io/vega/docs/transforms/). Below is a detailed breakdown of transform support status.

:::{note}

Expand Down
8 changes: 3 additions & 5 deletions vegafusion-python/vegafusion/runtime.py
Original file line number Diff line number Diff line change
Expand Up @@ -452,8 +452,6 @@ def pre_transform_datasets(
Extract the fully evaluated form of the requested datasets from a Vega
specification.
Extracts datasets as pandas DataFrames.
Args:
spec: A Vega specification dict or JSON string.
datasets: A list with elements that are either:
Expand Down Expand Up @@ -606,9 +604,9 @@ def pre_transform_extract(
"""
Evaluate supported transforms in an input Vega specification.
Produces a new specification with small pre-transformed datasets (under 100
rows) included inline and larger inline datasets (20 rows or more) extracted
into pyarrow tables.
Produces a new specification with small pre-transformed datasets
(under ``extract_threshold`` rows) included inline and larger inline
datasets (``extract_threshold`` rows or more) extracted into arrow tables.
Args:
spec: A Vega specification dict or JSON string.
Expand Down

0 comments on commit a5abbc5

Please sign in to comment.