Questions About Creating Preprocessors #623
-
Hello everyone!! I am currently exploring the functionalities of the library and have found it to be quite promising. In the course of my project aimed at creating a build tool for the languages used in many Bohemia games, I have been contemplating the implementation of a preprocessing mechanism that operates prior to parsing. Specifically, I aim to devise a preprocessor capable of transforming input data into a format suitable for subsequent parsing, which entails generating an input comprising a list of strings (or slices if they are derived from the original input) accompanied by corresponding spans. Any nudges in the right direction would be greatly appreciated, again I really love the project so far and cant wait to see it's future! //using latest alpha
pub type BSpan<'ast> = SimpleSpan<usize, &'ast ParseSource>;
pub type Spanned<S: Span, T> = (T, S);
pub type BSpanned<'ast, T> = Spanned<BSpan<'ast>, T>;
#[derive(Debug, Clone, PartialEq, Eq)]
pub enum ParseSource {
Unknown,
FileSystem(PathBuf),
}
pub type ProcessorOutput<'ast> = ????; //Some form of input that can be later parsed
pub trait ProcToken<'ast>: Debug + Clone + PartialEq + Eq + Hash + Sized {}
trait ProcError<'ast> {
fn parse_error(errors: Vec<Rich<'ast, char, BSpan<'ast>>>) -> Self;
}
pub enum ProcessedToken<'ast> {
Existed(&'ast str),
Created(String)
}
pub trait PreProcessor<'ast, T: ProcToken<'ast>> {
type ProcessorError: ProcError<'ast>;
type ProcessorDriver: ProcDriver<'ast, T, ProcessorError = Self::ProcessorError>;
fn lexer() -> impl Parser<
'ast,
WithContext<BSpan<'ast>, &'ast str>,
Vec<BSpanned<'ast, T>>,
extra::Err<Rich<'ast, char, BSpan<'ast>>>
>;
fn process(input: ParserInput<'ast>, driver: &mut Self::ProcessorDriver) -> Result<(), Self::ProcessorError> {
let processed_tokens: Vec<BSpanned<ProcessedToken>> = {
let processed = match Self::lexer().parse(input).into_result() {
Ok(tokens) => tokens,
Err(s) => return Err(Self::ProcessorError::parse_error(s))
};
let mut res = vec![];
for token in processed.into_iter() {
if let Some(processed) = driver.expand(token)? {
res.extend(processed.into_iter());
}
}
res
};
//How can I turn this into an input that can then be later parsed with the correct spans
//so that the errors reported actually match up with their location if one exists,
//The reasoning for ProcessedToken is to allow tokens to be generated if they dont already
//exist, e.g. if someone uses the macro `__EXEC(1 + 2)` it should return 3 which doesn't
//already exist in the source so we therefore create an owned string with the value instead
}
}
//This driver would contain logic for unwrapping macros
pub trait ProcDriver<'ast, T: ProcToken<'ast>> {
type ProcessorError: ProcError<'ast>;
fn expand(&mut self, token: BSpanned<T>) -> Result<
Option<Vec<BSpanned<ProcessedToken>>>,
Self::ProcessorError
>;
} |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 3 replies
-
I'm not really familiar enough with the domain to give a more precise answer, but That said, I'd recommend thinking carefully before jumping in. Chumsky works best when parsing context-free grammars (or grammars with only a small bit of localised context sensitivity). Preprocessors (such as C's preprocessor) tend to be extremely context-sensitive and hence are often better off hand-written. |
Beta Was this translation helpful? Give feedback.
Hmm, so you're looking for an input that grows as parsing occurs? You might want to look into
Stream
and thenested_in
combinator. In combination, they'll allow you to create new streams of input on the fly and parse them in the current parsing context.