doc review

vega · Nov 12, 2024 · a5abbc5 · a5abbc5
1 parent 116fec5
commit a5abbc5
Show file tree

Hide file tree

Showing 8 changed files with 50 additions and 11 deletions.
diff --git a/docs/source/_static/custom-icon.js b/docs/source/_static/custom-icon.js
diff --git a/docs/source/about/related_projects.md b/docs/source/about/related_projects.md
@@ -11,6 +11,8 @@ These workflows were not supported by the initial version of VegaFusion, but sup
 
 `altair-transform` does not support evaluating transforms on the server in interactive workflows like linked histogram brushing, which was the initial focus of VegaFusion. 
 
+`altair-transform` is not under active development.
+
 ## [`ibis-vega-transform`](https://github.com/Quansight/ibis-vega-transform)
 `ibis-vega-transform` is a Python library and JupyterLab extension developed by [Quansight](https://www.quansight.com/). It translates pipelines of Vega transforms into [Ibis](https://ibis-project.org/) query expressions, which can then be evaluated with a variety of Ibis database backends (in particular, OmniSci). 
 
@@ -21,3 +23,5 @@ In contrast to the Planner approach used by VegaFusion, `ibis-vega-transform` re
 An advantage of this approach is that the Vega JavaScript library remains in control of the entire specification so the external `ibis-vega-transform` library does not need to maintain an independent task graph in order to support interactivity.  A downside of this approach is that the result of every transform pipeline must be sent back to the client and be stored in the Vega dataflow graph.  Often times this is not a problem, because the transform pipeline includes an aggregation stage that significantly reduces the dataset size.  However, sometimes the result of a transform pipeline is quite large, but it is only used as input to other transform pipelines.  In this case, it is advantageous to keep the large intermediary result cached on the server and to not send it to the client at all.  This use case is one of the reasons that VegaFusion uses the Planner+Runtime architecture described previously.
 
 Currently, VegaFusion implements all of its transform logic in the Python process (with efficient multi-threading) and has no capability to connect to external data providers like databases.  This is certainly a desirable capability, and may be enabled in VegaFusion by the [datafusion-federation](https://github.com/datafusion-contrib/datafusion-federation) project.
+
+`ibis-vega-transform` is not under active development.
diff --git a/docs/source/about/technology.md b/docs/source/about/technology.md
@@ -5,4 +5,4 @@ VegaFusion uses a fairly diverse technology stack. The planner and runtime are b
 The Task Graph specifications are defined as protocol buffer messages. The [prost](https://github.com/tokio-rs/prost) library is used to generate Rust data structures from these protocol buffer messages.  When Arrow tables appear as task graph root values, they are serialized inside the protocol buffer specification using the [Apache Arrow IPC format](https://arrow.apache.org/docs/format/Columnar.html#serialization-and-interprocess-communication-ipc).  The binary representation of the task graph protocol buffer message is what is transferred across the Jupyter Comms protocol.
 
 ## DataFusion integration
-[Apache Arrow DataFusion](https://github.com/apache/arrow-datafusion) is an SQL compatible query engine that integrates with the Rust implementation of Apache Arrow.  VegaFusion uses DataFusion to implement many of the Vega transforms, and it compiles the Vega expression language directly into the DataFusion expression language.  In addition to being very fast, a particularly powerful characteristic of DataFusion is that it provides many interfaces that can be extended with custom Rust logic.  For example, VegaFusion defines a few custom UDFs that are designed to implement the precise semantics of the Vega transforms and the Vega expression language.
+[Apache DataFusion](https://github.com/apache/datafusion) is an SQL compatible query engine that integrates with the Rust implementation of Apache Arrow.  VegaFusion uses DataFusion to implement many of the Vega transforms, and it compiles the Vega expression language directly into the DataFusion expression language.  In addition to being very fast, a particularly powerful characteristic of DataFusion is that it provides many interfaces that can be extended with custom Rust logic.  For example, VegaFusion defines a few custom UDFs that are designed to implement the precise semantics of the Vega transforms and the Vega expression language.
diff --git a/docs/source/conf.py b/docs/source/conf.py
@@ -17,11 +17,30 @@
 html_static_path = ['_static']
 html_logo = "_static/VegaFusionLogo-Color.svg"
 html_favicon = "_static/favicon.ico"
-
+html_theme_options = {
+    "icon_links": [
+        {
+            "name": "Twitter",
+            "url": "https://twitter.com/vegafusion_io",
+            "icon": "fa-brands fa-twitter",
+        },
+        {
+            "name": "GitHub",
+            "url": "https://github.com/vega/vegafusion",
+            "icon": "fa-brands fa-github",
+        },
+        {
+            "name": "PyPI",
+            "url": "https://pypi.org/project/vegafusion/",
+            "icon": "fa-custom fa-pypi",
+        }
+    ],
+}
 # Add custom CSS
 html_css_files = [
     'custom.css',
 ] 
+html_js_files = ["custom-icon.js"]
 
 _social_img = "https://vegafusion.io/_static/vegafusion_social.png"
 _description = "VegaFusion provides serverside scaling for Vega visualizations"

diff --git a/docs/source/features/grpc.md b/docs/source/features/grpc.md
@@ -1,5 +1,5 @@
 # gRPC
-The VegaFusion Runtime can run as a [gRPC](https://grpc.io/) service, which makes it possible for multiple clients to connect to the same runtime, and share a cache. This also makes it possible for the Runtime to reside on a different host than the client.
+The VegaFusion Runtime can run as a [gRPC](https://grpc.io/) service, which makes it possible for multiple clients to connect to the same runtime, and share a cache (See [How it Works](../about/how_it_works) for more details). This also makes it possible for the Runtime to reside on a different host than the client.
 
 :::{warning}
 VegaFusion's gRPC server does not currently support authentication, and chart specifications may reference the local file system of the machine running the server. It is not currently recommended to use VegaFusion server with untrusted Vega specifications unless other measures are taken to isolate the service.
@@ -35,7 +35,7 @@ See [grpc.py](https://github.com/vega/vegafusion/tree/v2/examples/python-example
 ## Rust
 The `GrpcVegaFusionRuntime` struct is an alternative to the `VegaFusionRuntime` struct that provides the same interface, but connects to a VegaFusion Server.
 
-See [grpc.rs](https://github.com/vega/vegafusion/tree/v2/examples/rust-examples/grpc.rs) for a complete example.
+See [grpc.rs](https://github.com/vega/vegafusion/tree/v2/examples/rust-examples/examples/grpc.rs) for a complete example.
 
 ## JavaScript
 The `vegafusion-wasm` package can connect to an instance of VegaFusion Server over [gRPC-Web](https://github.com/grpc/grpc-web). 

diff --git a/docs/source/features/transform_spec.md b/docs/source/features/transform_spec.md
@@ -5,7 +5,7 @@ VegaFusion can be used to evaluate datasets in a Vega spec, remove unused column
 This is the foundation of Vega-Altair's [``"vegafusion"`` data transformer](https://altair-viz.github.io/user_guide/large_datasets.html#vegafusion-data-transformer) when used with the default HTML or static image renderers. 
 
 :::{warning}
-The pre-transform process will, by default, preserve the interactive behavior of the input Vega specification. For interactive charts that perform filtering, this may result in the generation of a spec containing the full input dataset. If interactivity does not need to be preserved (e.g. if the resulting chart is used in a static context) then the ``preserve_interactivity`` option should be set to False. If interactivity is needed, then the Chart State workflow may be more appropriate.
+The pre-transform process will, by default, preserve the interactive behavior of the input Vega specification. For interactive charts that perform filtering, this may result in the generation of a spec containing the full input dataset. If interactivity does not need to be preserved (e.g. if the resulting chart is used in a static context) then the ``preserve_interactivity`` option should be set to False. If interactivity is needed, then the [Chart State](./chart_state) workflow may be more appropriate.
 :::
 
 ## Python

diff --git a/docs/source/vega_coverage/supported_transforms.md b/docs/source/vega_coverage/supported_transforms.md
@@ -1,6 +1,6 @@
 # Supported Transforms
 
-VegaFusion implements a subset of [Vega's transforms](https://vega.github.io/vega/). Below is a detailed breakdown of transform support status.
+VegaFusion implements a subset of [Vega's transforms](https://vega.github.io/vega/docs/transforms/). Below is a detailed breakdown of transform support status.
 
 :::{note}
 

diff --git a/vegafusion-python/vegafusion/runtime.py b/vegafusion-python/vegafusion/runtime.py
@@ -452,8 +452,6 @@ def pre_transform_datasets(
         Extract the fully evaluated form of the requested datasets from a Vega
         specification.
 
-        Extracts datasets as pandas DataFrames.
-
         Args:
             spec: A Vega specification dict or JSON string.
             datasets: A list with elements that are either:
@@ -606,9 +604,9 @@ def pre_transform_extract(
         """
         Evaluate supported transforms in an input Vega specification.
 
-        Produces a new specification with small pre-transformed datasets (under 100
-        rows) included inline and larger inline datasets (20 rows or more) extracted
-        into pyarrow tables.
+        Produces a new specification with small pre-transformed datasets 
+        (under ``extract_threshold`` rows) included inline and larger inline
+        datasets (``extract_threshold`` rows or more) extracted into arrow tables.
 
         Args:
             spec: A Vega specification dict or JSON string.