-
Notifications
You must be signed in to change notification settings - Fork 56
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rust runtime and connectors: phase one #1167
Commits on Sep 1, 2023
-
runtime protocol updates: $internal and validation network ports
* Remove NetworkPort from Request.Validate of the runtime protocols. This is now an internal detail negotiated during validation. It's outcome is still captured in the built flow specs, but it's not validated by the connector (and isn't actually known until after validation completes). * Breaking: remove use of `Any` for internal field and switch to bytes. Any has major compatibility issues in its JSON encoding between Go and Rust. For one, the pbjson_types serde imlementation for Any does not match the protobuf spec (`typeUrl` instead of `@type`). For another, Go `jsonpb` decoding requires that @type dynamically resolves to a concrete Go type which is unmarshalled. Neat, but that's not the intended purpose of the internal field, which is really a pass-through piggyback for runtime-internal structured data. So, instead have internal be regular bytes. Also use `$internal` as the JSON name to make the special nature of this field clearer. Add and update tooling in Rust / Go to make it easier to work with internal fields. For the moment, filter out `$internal` fields in the connector-proxy. We can remove this once these protocol changes have propagated to connectors. * Remove most of the internal BuildAPI, leaving only its Config message. * Derive: update Response.Spec to exactly match that of the capture & materialize protocols. We should further extract the various Response.Spec messages into a common message with the same tags and names. I avoided doing so just now to reduce churn. * Add explicit zeroing in Rust of the `config_json` fields. `sops` decryption will shortly happen from Rust, and this provides a tight bound on how long decrypted credentials can be resident in memory.
Configuration menu - View commit details
-
Copy full SHA for dae5e19 - Browse repository at this point
Copy the full SHA dae5e19View commit details -
crates/sources: rework Loader to be Send + Sync
For use in a tokio::Runtime, which requires these traits. This is basically using a BoxFuture instead of LocalBoxFuture, and making the internal `tables` field thread-safe. It remains the case that Loader does not evaluate in parallel -- only concurently.
Configuration menu - View commit details
-
Copy full SHA for bbd3344 - Browse repository at this point
Copy the full SHA bbd3344View commit details -
derive-sqlite/typescript: containerize and update for protocol updates
Both connectors are updated for protocol changes, especially handling of internal fields and new helper routines. `derive-typescript` is further updated to build outside of the root Cargo workspace, as a stand-alone Dockerized binary.
Configuration menu - View commit details
-
Copy full SHA for ba2bdc0 - Browse repository at this point
Copy the full SHA ba2bdc0View commit details -
crates/validation: update Connectors + ControlPlane traits
* Make `Connectors` and `ControlPlane` Send + Sync so it can be used in tokio::Runtime. * Have `Connectors` take and receive top-level Request/Response so that `internal` fields may be provided. This is used for threading-through log level as well as for resolving flow.NetworkPorts. * Remove `inspect_image` from Connectors, as that is no longer required (related network ports are instead carried by the Response.internal field). * Remove all dependencies on build_api::Config. All we actually need is the `build_id` and `project_root`, so use those instead. * Some use of destructuring is updated due to new zero-ing Drop implementations for message types. * Add a new explicit validation that generated file URLs are valid. (They sometimes weren't in some prior build configurations).
Configuration menu - View commit details
-
Copy full SHA for 2c88f15 - Browse repository at this point
Copy the full SHA 2c88f15View commit details
Commits on Sep 12, 2023
-
crates/async-process: add
input_output
For passing input to a process while also accumulating its output.
Configuration menu - View commit details
-
Copy full SHA for 52efa51 - Browse repository at this point
Copy the full SHA 52efa51View commit details -
crates/runtime: support for connector images
Add `unseal` module which performs `sops` decryptions, matching our existing Go implementation. Add `container` module which manages an image container instance, similar to our Go implementation. Containers have an explicit, random, and recognizable container name like `fc_a50cfebf` instead of docker-generated pet names like `bold_bose`. This allows us to inspect containers that we start. Containers do NOT use host port-mappings, as the Go implementation does. Instead a runtime::Container description is returned which includes the containers allocated (host-local) IP address as well as the network-ports inspected from its container image. There are a variety of startup races that Docker doesn't do a great job of managing -- there is no precise means of knowing, through docker, when it's safe to ask what a started container's IP address is, because Docker does network setup as an async background task. It IS reliable to wait until the internal container process has definititively started, which we can do by waiting for _something_ to arrive over its forward stderr. So, we have flow-connector-init write a single whitespace byte on startup and expect to read it from the host. Next, building on module `container`, add an `image_connector` adapter which manages the lifecyle of an RPC stream which may start and restart *many* container instances over its lifetime. Refactor deriev::Middleware into a common runtime::Runtime which provides all three (capture, derive, materialize) of the core Runtime RPCs. We'll extend this further with shuffling services in the future. Capture and materialize have minimal implementations which dispatch to an image connector. Derivations are re-worked to transparently map a `derive: using: typescript` into an image connector of `ghcr.io/estuary/derive-typescript:dev`. The hacky `flowctl` flow for running these derivations is removed. Also refactor TaskService & TaskRuntime into a thinner TaskServie and reusable TokioContext. TokioContext makes it easy to build an isolated tokio::Runtime with that dispatches its tracing events to an ops::Log handler, and has the required non-synchronous Drop semantics.
Configuration menu - View commit details
-
Copy full SHA for ddf0484 - Browse repository at this point
Copy the full SHA ddf0484View commit details -
crates/build: remove CGO build API and refactor routines
The CGO build API is removed entirely. Instead, the role of the `build` crate is now to hold common routines shared across the `agent` and `flowctl` crates for building Flow specifications into built catalog specs. It also provides general-purpose implementations of sources::Fetcher and validation::Connectors, but not validation::ControlPlane.
Configuration menu - View commit details
-
Copy full SHA for 035fe1b - Browse repository at this point
Copy the full SHA 035fe1bView commit details -
crates/flowctl:
raw build/json-schema
andbuild
-crate refactorsOverhaul the `local_specs` module atop the refactored `build` crate. What remains is intended to be opinionated wrappers of routines provide by `build`. Don't immediately fail un-credentialed users. This allows invocations that don't require authenticated control-plane access to succeed. Initialize to a WARN log-level by default (so that "you're not authenticated" warnings actually show up). Add a `raw build` subcommand which performs a build::managed_build(). This is a replacement for `flowctl-go api build`. Add a `raw json-schema` subcommand to replace `flowctl-go json-schema`.
Configuration menu - View commit details
-
Copy full SHA for 2431586 - Browse repository at this point
Copy the full SHA 2431586View commit details -
crates/agent: perform in-process managed builds
Instead of shelling out to `flowctl-go`. This is a fairly stright forward drop-in replacement with `build::managed_build`, dispatched to an isolated runtime::TokioContext so that in-process tracing events can also be gathered. However, this means the Go logrus package is no longer responsible for text-formatting logs and we need to present structured instances of `ops::Log` to the control-plane and UI. Eventually we'd like to directly pass-through structured logs into the `internal.log_lines` table, but for now I've introduced a reasonable ANSI-colored rendering of deeply structured logs (that, IMO, improves on logrus). When we're ready, we can rip this out and trivially pass-through JSON encodings of ops::Logs.
Configuration menu - View commit details
-
Copy full SHA for 0d81819 - Browse repository at this point
Copy the full SHA 0d81819View commit details -
Configuration menu - View commit details
-
Copy full SHA for eedd881 - Browse repository at this point
Copy the full SHA eedd881View commit details -
go: use
flowctl raw build
for catalog buildsbindings.BuildCatalog() remains, but is now a simple wrapper around `flowctl raw build`. bindings.CatalogJSONSchema and `flowctl-go json-schema` are removed.
Configuration menu - View commit details
-
Copy full SHA for cc21dce - Browse repository at this point
Copy the full SHA cc21dceView commit details -
Configuration menu - View commit details
-
Copy full SHA for 473a6a7 - Browse repository at this point
Copy the full SHA 473a6a7View commit details -
crates/sources: fix some subtle bugs in spec indirection and merging
Merging specs into source tables and indirecting specs are transformations done by `flowctl` as part of manipulating a user's local catalog specs. It's not used in the control-plane or data-plane. As such, in my testing I found some aspects were a bit fast-and-lose. Bug 1) sources::extend_from_catalog() was not properly attaching the fragment scopes to entities merged into tables::Sources from a models::Catalog. This matters because those scopes may be used to idenfity JSON schemas to bundle when later inline-ing the schema of a collection, so they need to be correct. Update the routine to be much more pedantic about producing correct scopes. Bug 2) sources::indirect_large_files() wasn't producing tables::Imports for the resources that it chose to indirect. This *also* matters if those sources are later inlined again, because JSON schema bundling relies on correct imports to identify schemas to bundle. Fixing both of these bugs allows `flowctl catalog pull-specs` to correctly pull down, indirect, then inline, validate, and generate files for a control-plane derivation (where it was failing previously). Extend sources scenario test coverage to fully compare each fixture for equality after fully round-tripping through indirection. Bonuses, that got wrapped up in this changeset and which I'm not bothering to pull out right now: * build::Fetcher now uses a thirty-second timeout for all fetches. In combination with the connectors timeout, this means catalog builds are now guaranteed to complete in a bounded time frame. * Move flowctl::local_specs::into_catalog() => sources::merge::into_catalog(). * When running with RUST_LOG=debug, `flowctl` will now persist a SQLite build database for examination or shipping to Estuary support.
Configuration menu - View commit details
-
Copy full SHA for 9a6e209 - Browse repository at this point
Copy the full SHA 9a6e209View commit details -
flowctl: support docker desktop and missing flow-connector-init
When running on non-linux OS's, ask docker to publish all ports (including the init port), then extract and pass around the mapping of container ports to mapped host addresses. Also fix a flow-connector-init startup race where it writes to stderr before its port has been bound.
Configuration menu - View commit details
-
Copy full SHA for e751641 - Browse repository at this point
Copy the full SHA e751641View commit details