Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat!: add parse module #89

Closed
wants to merge 29 commits into from

Conversation

mbrobbel
Copy link
Member

@mbrobbel mbrobbel commented Apr 5, 2023

Some requirements of Substrait data can not be expressed via Protobuf definition or schema files. This module provides new types for the generated types, that when constructed are known to be checked. This enables producers and consumers to skip redundant checking of invariants described by the specification.

This is based on the idea described in the Parse don't validate blog post.

Proposal

The approach is to build the parsing around two traits: one for parsing and one for the parser context.

/// A parse trait.
pub trait Parse<C: Context>: fmt::Debug + Sized {
    /// The parsed type.
    ///
    /// After parsing this type must be able to convert back. Note that it is
    /// not required for the conversion to be lossless, as long as the semantics
    /// don't change.
    ///
    /// This bound also helps with tracking breaking Protobuf definition changes
    /// via compilation errors.
    type Parsed: Into<Self>;

    /// The error type for this parser.
    type Error: error::Error;

    /// Parse and return a parsed type or error.
    fn parse(self, ctx: &mut C) -> Result<Self::Parsed, Self::Error>;
}

/// A parse context.
///
/// Parsing Substrait data is context-sensitive. This trait provides methods
/// that can be used by parser implementations to parse Substrait data.
pub trait Context {
    /// Parse an item using this context.
    ///
    /// See [Parse::parse].
    #[tracing::instrument(skip_all, err, fields(item = %std::any::type_name::<T>()))]
    fn parse<T: Parse<Self>>(&mut self, item: T) -> Result<T::Parsed, T::Error>
    where
        Self: Sized,
    {
        item.parse(self)
    }

    /// Parse an item using this context and record the item value on the parse span.
    ///
    /// See [Parse::parse].
    #[tracing::instrument(name = "parse", skip_all, err, fields(item = %std::any::type_name::<T>(), value = ?item))]
    fn parse_record_value<T: Parse<Self>>(&mut self, item: T) -> Result<T::Parsed, T::Error>
    where
        Self: Sized,
    {
        item.parse(self)
    }

    /// Parse an item using this context and record the item value and its index on the parse span.
    ///
    /// See [Parse::parse].
    #[tracing::instrument(name = "parse", skip_all, err, fields(item = %std::any::type_name::<T>(), index = index, value = ?item))]
    fn parse_record_value_with_index<T: Parse<Self>>(
        &mut self,
        item: T,
        index: usize,
    ) -> Result<T::Parsed, T::Error>
    where
        Self: Sized,
    {
        item.parse(self)
    }

    /// Parse an item using this context and record the item index on the parse span.
    ///
    /// See [Parse::parse].
    #[tracing::instrument(name = "parse", skip_all, err, fields(item = %std::any::type_name::<T>(), index = index))]
    fn parse_record_index<T: Parse<Self>>(
        &mut self,
        item: T,
        index: usize,
    ) -> Result<T::Parsed, T::Error>
    where
        Self: Sized,
    {
        item.parse(self)
    }

    /// Add a [SimpleExtensionURI] to this context. Should return an error for duplicate
    /// anchors or when the URI is not supported.
    ///
    /// This function must eagerly resolve and parse the simple extension, returning an
    /// error if either fails.
    fn add_simple_extension_uri(
        &mut self,
        simple_extension_uri: &SimpleExtensionURI,
    ) -> Result<&SimpleExtensions, ContextError>;

    /// Returns the simple extensions for the given simple extension anchor.
    fn simple_extensions(&self, anchor: Anchor<SimpleExtensionURI>) -> &SimpleExtensions;

    /// Add an extension type declaration to this context.
    fn add_extension_type(
        &mut self,
        extension_type: &ExtensionType,
    ) -> Result<&SimpleExtensionsTypesItem, ContextError>;

    /// Returns the extension type for the given extension type anchor.
    fn extension_type(
        &self,
        extension_type_anchor: Anchor<ExtensionType>,
    ) -> &SimpleExtensionsTypesItem;

    /// Add an extension type variation declaration to this context.
    fn add_extension_type_variation(
        &mut self,
        extension_type_variation: &ExtensionTypeVariation,
    ) -> Result<&SimpleExtensionsTypeVariationsItem, ContextError>;

    /// Returns the extension type variation for the given extension type variation anchor.
    fn extension_type_variation(
        &self,
        extension_type_variation_anchor: Anchor<ExtensionTypeVariation>,
    ) -> &SimpleExtensionsTypeVariationsItem;

    /// Add an extension function declaration to this context.
    fn add_extension_function(
        &mut self,
        extension_function: &ExtensionFunction,
    ) -> Result<&SimpleExtensionFunction, ContextError>;

    /// Returns the extension function for the given extension type variation anchor.
    fn extension_function(
        &self,
        extension_function_anchor: Anchor<ExtensionFunction>,
    ) -> &SimpleExtensionFunction;
}

Progress

  • [plan::Plan]
    • Version
    • SimpleExtensionURIs
    • ...
  • [plan_version::PlanVersion]
    • Version
  • [extended_expression::ExtendedExpression]
    • Version
    • SimpleExtensionURIs
    • ...

@universalmind303
Copy link
Contributor

@mbrobbel Just wondering if you've had a chance to make any more progress on this? I cloned this branch & started expanding the functionality. I'd be happy to continue this work if our are unable.

@mbrobbel
Copy link
Member Author

@mbrobbel Just wondering if you've had a chance to make any more progress on this? I cloned this branch & started expanding the functionality. I'd be happy to continue this work if our are unable.

Yes I made some progress. I'll push it and fix conflicts. Happy to collaborate.

@mbrobbel mbrobbel mentioned this pull request Mar 5, 2024
32 tasks
@@ -140,6 +145,10 @@
//! # Ok(()) }
//! ```
//!
//! # Parsing
//!
//! Write about [parse] module.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documentation goes here?

/// Returns the compound names of this simple extensions function.
pub fn compound_names(&self) -> Vec<String> {
// todo(mbrobbel): parse functions
// notes :
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leftover comments to self?

Substrait(semver::Version, semver::VersionReq),
}

impl<C: Context> Parse<C> for proto::Version {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised to see all of this validation here. Were you aiming for putting the validation into a separate step from the parsing?

@@ -12,6 +12,11 @@
//! Protobuf serialization and deserialization are provided via [prost] in the
//! [proto] module.
//!
//! > Please note that [protoc](https://grpc.io/docs/protoc-installation/) is
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the canonical protobuf installation guide for Rust is on the grpc.io site? I would have expected a link to the protobuf site.

#[error("undefined reference to extension function with anchor `{0}`")]
UndefinedExtensionFunction(Anchor<ExtensionFunction>),

/// Undefined reference to extension function.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

function definition

version: Version,
simple_extension_uris: Vec<SimpleExtensionURI>,
simple_extension_declarations: Vec<SimpleExtensionDeclaration>,
// ...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a TODO?

#[cfg(feature = "parse-reqwest")]
#[derive(Default)]
/// A resolver that uses [reqwest] to resolve simple extensions.
pub struct ReqwestResolver {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, reqwest is a package name. It may be useful to disable external checking of URIs -- that seems more like a validation step that might not be useful on systems just trying to use plans.

use crate::proto;
use thiserror::Error;

/// A parsed [proto::Plan].
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a naming standpoint I'd consider the plan to be parsed once you use the protobuf parse methods to deserialize it. As a second step (perhaps called study) it makes more sense to me to have the validation that is present in this PR. I could see even more validation as an optional third step, like checking for issues with output mappings, function name checking, etc.

pub enum Relation {
/// A read relation.
Read(ReadRelation),
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO for the other relation types?

@mbrobbel
Copy link
Member Author

mbrobbel commented Mar 7, 2024

Closing (I'm going to spllit this PR up). See #157.

@mbrobbel mbrobbel closed this Mar 7, 2024
mbrobbel added a commit that referenced this pull request May 21, 2024
Include core extensions from `Subtrait`.
The majority of the code originates from the un-merged pr #89.
---------

Co-authored-by: Matthijs Brobbel <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants