Questions About Creating Preprocessors #623

FlipperPlz · 2024-04-16T16:08:19Z

FlipperPlz
Apr 16, 2024

Hello everyone!! I am currently exploring the functionalities of the library and have found it to be quite promising. In the course of my project aimed at creating a build tool for the languages used in many Bohemia games, I have been contemplating the implementation of a preprocessing mechanism that operates prior to parsing. Specifically, I aim to devise a preprocessor capable of transforming input data into a format suitable for subsequent parsing, which entails generating an input comprising a list of strings (or slices if they are derived from the original input) accompanied by corresponding spans. Any nudges in the right direction would be greatly appreciated, again I really love the project so far and cant wait to see it's future!

//using latest alpha

pub type BSpan<'ast> = SimpleSpan<usize, &'ast ParseSource>;
pub type Spanned<S: Span, T> = (T, S);

pub type BSpanned<'ast, T> = Spanned<BSpan<'ast>, T>;

#[derive(Debug, Clone, PartialEq, Eq)]
pub enum ParseSource {
    Unknown,
    FileSystem(PathBuf),
}

pub type ProcessorOutput<'ast> = ????; //Some form of input that can be later parsed

pub trait ProcToken<'ast>: Debug + Clone + PartialEq + Eq + Hash + Sized {}

trait ProcError<'ast> {
    fn parse_error(errors: Vec<Rich<'ast, char, BSpan<'ast>>>) -> Self;
}
pub enum ProcessedToken<'ast> {
    Existed(&'ast str),
    Created(String)
}

pub trait PreProcessor<'ast, T: ProcToken<'ast>> {
    type ProcessorError: ProcError<'ast>;
    type ProcessorDriver: ProcDriver<'ast, T, ProcessorError = Self::ProcessorError>;

    fn lexer() -> impl Parser<
        'ast,
        WithContext<BSpan<'ast>, &'ast str>,
        Vec<BSpanned<'ast, T>>,
        extra::Err<Rich<'ast, char, BSpan<'ast>>>
    >;

    fn process(input: ParserInput<'ast>, driver: &mut Self::ProcessorDriver) -> Result<(), Self::ProcessorError> {
        let processed_tokens: Vec<BSpanned<ProcessedToken>> = {
            let processed = match Self::lexer().parse(input).into_result() {
                Ok(tokens) => tokens,
                Err(s) => return Err(Self::ProcessorError::parse_error(s))
            };

            let mut res = vec![];
            for token in processed.into_iter() {
                if let Some(processed) = driver.expand(token)? {
                    res.extend(processed.into_iter());
                }
            }
            res
        };
        //How can I turn this into an input that can then be later parsed with the correct spans
        //so that the errors reported actually match up with their location if one exists,
        //The reasoning for ProcessedToken is to allow tokens to be generated if they dont already
        //exist, e.g. if someone uses the macro `__EXEC(1 + 2)` it should return 3 which doesn't
        //already exist in the source so we therefore create an owned string with the value instead

    }


}
//This driver would contain logic for unwrapping macros
pub trait ProcDriver<'ast, T: ProcToken<'ast>> {
    type ProcessorError: ProcError<'ast>;

    fn expand(&mut self, token: BSpanned<T>) -> Result<
        Option<Vec<BSpanned<ProcessedToken>>>,
        Self::ProcessorError
    >;
}

Answered by zesterer

Apr 18, 2024

Hmm, so you're looking for an input that grows as parsing occurs? You might want to look into Stream and the nested_in combinator. In combination, they'll allow you to create new streams of input on the fly and parse them in the current parsing context.

View full answer

zesterer · 2024-04-17T06:04:39Z

zesterer
Apr 17, 2024
Maintainer

I'm not really familiar enough with the domain to give a more precise answer, but chumsky is definitely capable of this sort of thing, and can be used to write lexers just as easily as it can parsers. You might find the nested_in combinator useful if you're parsing nested inputs (i.e: trees).

That said, I'd recommend thinking carefully before jumping in. Chumsky works best when parsing context-free grammars (or grammars with only a small bit of localised context sensitivity). Preprocessors (such as C's preprocessor) tend to be extremely context-sensitive and hence are often better off hand-written.

3 replies

FlipperPlz Apr 17, 2024
Author

Wow, thanks for such a quick response! Luckily the pre processors I am implementing are completely context free and can always be run on the entire input. With this being said, I would like to figure out a way for a single parser to first read over and expand the tokens lazily into their corresponding strings so that parse() is only called once

// Ideally, this runs on the whole tree. This is just a condensed example for a string.
// The stream undergoes preprocessing first, and then we parse as usual.
// My challenge lies in devising the `preprocessed` function.
fn unquoted_string<'ast, 'literal>(
    word_terminators: &'static [char],
) -> impl Parser<'ast, &'ast str, ParamLiteral<'literal>, ParseError<'ast>> {
    preprocessed(
        none_of(word_terminators)
            .repeated()
            .collect::<String>()
            .map(move | inner | ParamLiteral::String(ParamString { inner, quoted: false, }))
    )
}

fn preprocessed<'a, I, O,  E>(parser: impl Parser<'a, I, O, E>) -> impl Parser<'a, I, O, E>;

zesterer Apr 18, 2024
Maintainer

Hmm, so you're looking for an input that grows as parsing occurs? You might want to look into Stream and the nested_in combinator. In combination, they'll allow you to create new streams of input on the fly and parse them in the current parsing context.

Answer selected by FlipperPlz

FlipperPlz Apr 19, 2024
Author

Thanks SO MUCH!!! I love you and this library!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions About Creating Preprocessors #623

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 3 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Questions About Creating Preprocessors #623

FlipperPlz Apr 16, 2024

Replies: 1 comment · 3 replies

zesterer Apr 17, 2024 Maintainer

FlipperPlz Apr 17, 2024 Author

zesterer Apr 18, 2024 Maintainer

FlipperPlz Apr 19, 2024 Author

FlipperPlz
Apr 16, 2024

Replies: 1 comment 3 replies

zesterer
Apr 17, 2024
Maintainer

FlipperPlz Apr 17, 2024
Author

zesterer Apr 18, 2024
Maintainer

FlipperPlz Apr 19, 2024
Author