From ff91b680233765afa3b311a02eb540f89358e0c1 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ma=C3=ABlle=20Salmon?= Date: Tue, 9 Apr 2024 09:41:49 +0200 Subject: [PATCH 1/6] post on Markdown --- .../index.md | 2 +- .../index.Rmd | 162 ++++++++++++++++++ .../2024-04-16-markdown-programmatic/index.md | 162 ++++++++++++++++++ 3 files changed, 325 insertions(+), 1 deletion(-) create mode 100644 content/blog/2024-04-16-markdown-programmatic/index.Rmd create mode 100644 content/blog/2024-04-16-markdown-programmatic/index.md diff --git a/content/blog/2023-06-01-troubleshooting-pandoc-problems-as-an-r-user/index.md b/content/blog/2023-06-01-troubleshooting-pandoc-problems-as-an-r-user/index.md index efc98f130..9837b7d73 100644 --- a/content/blog/2023-06-01-troubleshooting-pandoc-problems-as-an-r-user/index.md +++ b/content/blog/2023-06-01-troubleshooting-pandoc-problems-as-an-r-user/index.md @@ -384,4 +384,4 @@ It's crucial to remember that while this can seem like a lot, your Pandoc skills As an R user, do not forget that Pandoc supports a lot of your publication tools; and that there's a handy R package for interacting with Pandoc: pandoc šŸŽ‰. -If you enjoy playing with files in various formats, you might also appreciate reading about [rtika](/blog/2018/04/25/rtika-introduction/) by Sasha Goodman. +If you enjoy playing with files in various formats, you might also appreciate reading about [rtika](/blog/2018/04/25/rtika-introduction/) by Sasha Goodman. \ No newline at end of file diff --git a/content/blog/2024-04-16-markdown-programmatic/index.Rmd b/content/blog/2024-04-16-markdown-programmatic/index.Rmd new file mode 100644 index 000000000..93c8eff80 --- /dev/null +++ b/content/blog/2024-04-16-markdown-programmatic/index.Rmd @@ -0,0 +1,162 @@ +--- +slug: "markdown-programmatic-parsing.edits" +title: All the ways to programmatically edit R Markdown / Quarto documents +author: + - MaĆ«lle Salmon + - Christophe Dervieux +# Set the date below to the publication date of your post +date: 2024-04-16 +# Minimal tags for a post about a community-contributed package +# that has passed software peer review are listed below +# Consult the Technical Guidelines for information on choosing tags +tags: + - pandoc + - rmarkdown + - tinkr + - quarto + - markdown + - tech notes +description: "" +output: hugodown::md_document +--- + +If life gives you a bunch of Markdown files to analyse or edit, do you warm up your regex muscles and get going? +How about using more specific parsing tools instead? +In this post, we shall give an overview of programmatic ways to parse and edit Markdown files: Markdown, R Markdown, Quarto, Hugo files, you name it. + +## What is Markdown? + +Markdown is a (punny, eh) markup language created by John Gruber and Aaron Swartz. +Here is an example: + +```md + +# My first header + +Some content, with parts in **bold** or *italic*. +Let me add a [link](https://ropensci.org). + +``` + +Different Markdown files can lead to the same output, for instance this is equivalent to our first example: + +```md + +My first header +=============== + +Some content, with parts in __bold__ or _italic_. Let me add a [link](https://ropensci.org). + +``` + +Furthermore there are different _flavors_ of Markdown, and some supplementary features added depending on what your Markdown files will be used by, like emoji written so: `:grin:`. + +Common Markdown consumers R users interact with include: R Markdown (that uses Pandoc under the hood), Quarto (that uses Pandoc under the hood... see any trend here?), GitHub, Hugo. + +Many tools using Markdown also accept metadata at the top of Markdown files, either YAML or TOML. +Here is an example with YAML: + +```md +--- +title: My cool thing +author: Myself +--- + +Some content, *nice* content. +``` + +Most often R users will write Markdown manually, or with the help of an editor such as RStudio IDE visual editor. +But sometimes, one will have to edit a bunch of Markdown files at once. + +## Templating tools + +Imagine you need to create a bunch of different R Markdown files, for instance for students to use as personalized exercises. +In that case, you can create a boilerplate document as a template, and create its different output versions using a templating tool. + +Templating tools include: + +- `knitr::knit_expand()` by Yihui Xie; +- the [whisker package](https://github.com/edwindj/whisker) maintained by Edwin de Jonge (used in for instance pkgddown); +- the [brew package](https://github.com/gregfrog/brew) maintained by Greg Hunt; +- [Pandoc](/blog/2023/06/01/troubleshooting-pandoc-problems-as-an-r-user/) by John MacFarlane. + +The simplest example of the whisker package might furthermore remind you of the glue package. + +A common workflow would be: + +- You create a template in a file, where variable parts are indicated by strings such as `{{name}}`. +- You read this template in R using for instance the brio package. +- Mapping over your set of variables, you render the template using whisker and save each version to a file using the brio package. + +## String manipulation tools + +You can use string manipulation tools to parse Markdown if you are sure of the Markdown variants your code will get as input, or if you are willing to grow your codebase to accomodate many edge cases... which in the end means you are writing an actual Markdown parser. +Not for the faint of heart... neither necessary if you read the section after this one. :relieved: + +You'd detect heading using for instance `grep("^#", markdown_lines)`[^edge]. + +[^edge]: But this would also detect code comments! Don't do this! + +Example of string manipulation tools include base R (`sub()`, `grep()` and friends), [stringr](https://stringr.tidyverse.org/) (and stringi), `xfun::gsub_file()`. + +Although string manipulation tools are of a limited usefulness when parsing Markdown, they can _complement_ the actual parsing tools. +Even if using specific Markdown parsing tools will help you write less regular expressions yourself... they won't completely free you from them. + +## Parsing tools + +Parsing tools are fantastic, and numerous. +We will only mention the ones you can directly use from R. + + +The [tinkr package](http://docs.ropensci.org/tinkr/) maintained by Zhian Kamvar parses Markdown to XML using Commonmark, and writes it back to Markdown using XSLT. The YAML metadata is available as a string. + +With Pandoc that we presented in a [tech note last year](blog/2023/06/01/troubleshooting-pandoc-problems-as-an-r-user/#raw-attributes), you can parse a Markdown files to a Pandoc Abstract Syntax Tree, or to, say HTML, and then back to Markdown. + +The [parsermd package](https://rundel.github.io/parsermd/) maintained by Colin Rundel is "implementation of a formal grammar and parser for R Markdown documents using the Boost Spirit X3 library. It also includes a collection of high level functions for working with the resulting abstract syntax tree." + +The [md4r package](https://rundel.github.io/md4r/), more recent and also maintained by Colin Rundel, is very similar except that it uses the MD4C (Markdown for C) library. + +### The impossibility of a perfect roundtrip + +When parsing and editing Markdown, then writing it back to Markdown, some undesired changes might appear. +For instance, with [tinkr](http://docs.ropensci.org/tinkr/#general-principles-and-solution) list items all start with a `-` even if in the original document they started with a `*`. + +Depending on your use case you might want to find ways to mitigate such losses, for instance only re-writing the lines you made intentional edits to. + +### How to choose a parser? + +You can choose a parser based on what it lets you manipulate the Markdown with: if you prefer XML and HTML to nested lists for instance, you might prefer using tinkr or Pandoc. +If the high-level functions of md4r or parsermd are suitable for your use case, you might prefer one of them. + +Another important criterion is to choose a parser that's a close to the use case of your Markdown files as possible. +If you are only going to work with Markdown files for GitHub, commonmark/tinkr is an excellent choice since GitHub itself uses commonmark. +Now, your work might encompass different sorts of Markdown files that will be used by different tools. +For instance, the babeldown package processes any Markdown file[^caveat]: Markdown, R Markdown, Quarto, Hugo. +In that case, or if there is no R parser doing exactly what your Markdown's end user does, you need to pay attention to the quirks of that end user. +Maybe you have to throw [Pandoc raw attributes](blog/2023/06/01/troubleshooting-pandoc-problems-as-an-r-user/#raw-attributes) around a Hugo shortcode, for instance. +Furthermore, if you need to parse certain elements, like again Hugo shortcodes, you might need to write the parsing code yourself, that is, regular expressions. + +[^caveat]: Or at least it's supposed to :sweat_smile: Thankfully users report edge cases that are not covered yet. + +## What about the code chunks + +Programmatically parsing and editing R code is out of the scope of this post, but closely related enough to throw in a few tips. +As with Markdown, you might need to use regular expressions but try not to. +You can parse the code to XML using base R parsing and [xmlparsedata](https://r-lib.github.io/xmlparsedata/), then you manipulate the XML with [XPath](https://masalmon.eu/2022/04/08/xml-xpath/). +To write code back, you can make use of the attributes of each node that indicates the original lines and columns. + +So a possible workflow is + +- parse the code to XML, use xmlparsedata to inform what to change and where. Out of these steps you'd get a list of elements' positions for instance. +- use brio to read the lines, change a few of them with base R tools, then use brio again to write the lines back. + +## Examples of Markdown parsing and editing + +The [pegboard package](https://carpentries.github.io/pegboard/) maintained by Zhian Kamvar, parses and validates Carpentrie's lessons for structural markdown elements, thanks to tinkr. + +The [babeldown package](https://docs.ropensci.org/babeldown/) maintained by MaĆ«lle Salmon transforms Markdown to XML, sends it to DeepL API for translation, and writes the results back to Markdown, also using tinkr. + +## Conclusion + +In this post we explained how to best parse and edit Markdown files: using specific parsing tools, possibly complemented by ad-hoc string manipulation. +What do *you* use to handle Markdown files? \ No newline at end of file diff --git a/content/blog/2024-04-16-markdown-programmatic/index.md b/content/blog/2024-04-16-markdown-programmatic/index.md new file mode 100644 index 000000000..939c7d6fb --- /dev/null +++ b/content/blog/2024-04-16-markdown-programmatic/index.md @@ -0,0 +1,162 @@ +--- +slug: "markdown-programmatic-parsing.edits" +title: All the ways to programmatically edit R Markdown / Quarto documents +author: + - MaĆ«lle Salmon + - Christophe Dervieux +# Set the date below to the publication date of your post +date: 2024-04-16 +# Minimal tags for a post about a community-contributed package +# that has passed software peer review are listed below +# Consult the Technical Guidelines for information on choosing tags +tags: + - pandoc + - rmarkdown + - tinkr + - quarto + - markdown + - tech notes +description: "" +output: hugodown::md_document +--- + +If life gives you a bunch of Markdown files to analyse or edit, do you warm up your regex muscles and get going? +How about using more specific parsing tools instead? +In this post, we shall give an overview of programmatic ways to parse and edit Markdown files: Markdown, R Markdown, Quarto, Hugo files, you name it. + +## What is Markdown? + +Markdown is a (punny, eh) markup language created by John Gruber and Aaron Swartz. +Here is an example: + +```md + +# My first header + +Some content, with parts in **bold** or *italic*. +Let me add a [link](https://ropensci.org). + +``` + +Different Markdown files can lead to the same output, for instance this is equivalent to our first example: + +```md + +My first header +=============== + +Some content, with parts in __bold__ or _italic_. Let me add a [link](https://ropensci.org). + +``` + +Furthermore there are different _flavors_ of Markdown, and some supplementary features added depending on what your Markdown files will be used by, like emoji written so: `:grin:`. + +Common Markdown consumers R users interact with include: R Markdown (that uses Pandoc under the hood), Quarto (that uses Pandoc under the hood... see any trend here?), GitHub, Hugo. + +Many tools using Markdown also accept metadata at the top of Markdown files, either YAML or TOML. +Here is an example with YAML: + +```md +--- +title: My cool thing +author: Myself +--- + +Some content, *nice* content. +``` + +Most often R users will write Markdown manually, or with the help of an editor such as RStudio IDE visual editor. +But sometimes, one will have to edit a bunch of Markdown files at once. + +## Templating tools + +Imagine you need to create a bunch of different R Markdown files, for instance for students to use as personalized exercises. +In that case, you can create a boilerplate document as a template, and create its different output versions using a templating tool. + +Templating tools include: + +- `knitr::knit_expand()` by Yihui Xie; +- the [whisker package](https://github.com/edwindj/whisker) maintained by Edwin de Jonge (used in for instance pkgddown); +- the [brew package](https://github.com/gregfrog/brew) maintained by Greg Hunt; +- [Pandoc](/blog/2023/06/01/troubleshooting-pandoc-problems-as-an-r-user/) by John MacFarlane. + +The simplest example of the whisker package might furthermore remind you of the glue package. + +A common workflow would be: + +- You create a template in a file, where variable parts are indicated by strings such as `{{name}}`. +- You read this template in R using for instance the brio package. +- Mapping over your set of variables, you render the template using whisker and save each version to a file using the brio package. + +## String manipulation tools + +You can use string manipulation tools to parse Markdown if you are sure of the Markdown variants your code will get as input, or if you are willing to grow your codebase to accomodate many edge cases... which in the end means you are writing an actual Markdown parser. +Not for the faint of heart... neither necessary if you read the section after this one. :relieved: + +You'd detect heading using for instance `grep("^#", markdown_lines)`[^edge]. + +[^edge]: But this would also detect code comments! Don't do this! + +Example of string manipulation tools include base R (`sub()`, `grep()` and friends), [stringr](https://stringr.tidyverse.org/) (and stringi), `xfun::gsub_file()`. + +Although string manipulation tools are of a limited usefulness when parsing Markdown, they can _complement_ the actual parsing tools. +Even if using specific Markdown parsing tools will help you write less regular expressions yourself... they won't completely free you from them. + +## Parsing tools + +Parsing tools are fantastic, and numerous. +We will only mention the ones you can directly use from R. + + +The [tinkr package](http://docs.ropensci.org/tinkr/) maintained by Zhian Kamvar parses Markdown to XML using Commonmark, and writes it back to Markdown using XSLT. The YAML metadata is available as a string. + +With Pandoc that we presented in a [tech note last year](blog/2023/06/01/troubleshooting-pandoc-problems-as-an-r-user/#raw-attributes), you can parse a Markdown files to a Pandoc Abstract Syntax Tree, or to, say HTML, and then back to Markdown. + +The [parsermd package](https://rundel.github.io/parsermd/) maintained by Colin Rundel is "implementation of a formal grammar and parser for R Markdown documents using the Boost Spirit X3 library. It also includes a collection of high level functions for working with the resulting abstract syntax tree." + +The [md4r package](https://rundel.github.io/md4r/), more recent and also maintained by Colin Rundel, is very similar except that it uses the MD4C (Markdown for C) library. + +### The impossibility of a perfect roundtrip + +When parsing and editing Markdown, then writing it back to Markdown, some undesired changes might appear. +For instance, with [tinkr](http://docs.ropensci.org/tinkr/#general-principles-and-solution) list items all start with a `-` even if in the original document they started with a `*`. + +Depending on your use case you might want to find ways to mitigate such losses, for instance only re-writing the lines you made intentional edits to. + +### How to choose a parser? + +You can choose a parser based on what it lets you manipulate the Markdown with: if you prefer XML and HTML to nested lists for instance, you might prefer using tinkr or Pandoc. +If the high-level functions of md4r or parsermd are suitable for your use case, you might prefer one of them. + +Another important criterion is to choose a parser that's a close to the use case of your Markdown files as possible. +If you are only going to work with Markdown files for GitHub, commonmark/tinkr is an excellent choice since GitHub itself uses commonmark. +Now, your work might encompass different sorts of Markdown files that will be used by different tools. +For instance, the babeldown package processes any Markdown file[^caveat]: Markdown, R Markdown, Quarto, Hugo. +In that case, or if there is no R parser doing exactly what your Markdown's end user does, you need to pay attention to the quirks of that end user. +Maybe you have to throw [Pandoc raw attributes](blog/2023/06/01/troubleshooting-pandoc-problems-as-an-r-user/#raw-attributes) around a Hugo shortcode, for instance. +Furthermore, if you need to parse certain elements, like again Hugo shortcodes, you might need to write the parsing code yourself, that is, regular expressions. + +[^caveat]: Or at least it's supposed to :sweat_smile: Thankfully users report edge cases that are not covered yet. + +## What about the code chunks + +Programmatically parsing and editing R code is out of the scope of this post, but closely related enough to throw in a few tips. +As with Markdown, you might need to use regular expressions but try not to. +You can parse the code to XML using base R parsing and [xmlparsedata](https://r-lib.github.io/xmlparsedata/), then you manipulate the XML with [XPath](https://masalmon.eu/2022/04/08/xml-xpath/). +To write code back, you can make use of the attributes of each node that indicates the original lines and columns. + +So a possible workflow is + +- parse the code to XML, use xmlparsedata to inform what to change and where. Out of these steps you'd get a list of elements' positions for instance. +- use brio to read the lines, change a few of them with base R tools, then use brio again to write the lines back. + +## Examples of Markdown parsing and editing + +The [pegboard package](https://carpentries.github.io/pegboard/) maintained by Zhian Kamvar, parses and validates Carpentrie's lessons for structural markdown elements, thanks to tinkr. + +The [babeldown package](https://docs.ropensci.org/babeldown/) maintained by MaĆ«lle Salmon transforms Markdown to XML, sends it to DeepL API for translation, and writes the results back to Markdown, also using tinkr. + +## Conclusion + +In this post we explained how to best parse and edit Markdown files: using specific parsing tools, possibly complemented by ad-hoc string manipulation. +What do *you* use to handle Markdown files? From ad5fc0ab7dda7b4b5ece61fc63f86b1057d55403 Mon Sep 17 00:00:00 2001 From: "Zhian N. Kamvar" Date: Tue, 25 Jun 2024 02:49:37 -0700 Subject: [PATCH 2/6] Structure packages and add more context (#788) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * first pass at edits - add links and context - reword some sections * Separate parsers into sections I also added some more context. * add additional context * Apply suggestions from MaĆ«lle Code Review Co-authored-by: MaĆ«lle Salmon * Add Zhian as author; define AST * Add code example for templating --------- Co-authored-by: MaĆ«lle Salmon --- .../hw-template.md | 10 ++ .../index.Rmd | 126 +++++++++++++--- .../2024-04-16-markdown-programmatic/index.md | 139 +++++++++++++++--- 3 files changed, 233 insertions(+), 42 deletions(-) create mode 100644 content/blog/2024-04-16-markdown-programmatic/hw-template.md diff --git a/content/blog/2024-04-16-markdown-programmatic/hw-template.md b/content/blog/2024-04-16-markdown-programmatic/hw-template.md new file mode 100644 index 000000000..8125a0fbb --- /dev/null +++ b/content/blog/2024-04-16-markdown-programmatic/hw-template.md @@ -0,0 +1,10 @@ +--- +title: "Homework assignment 1" +author: "{{name}}" +--- + +Create a normal distribution with a mean of {{mean}} and a standard deviation of {{sd}}: + +```{r solution-1} +# hint: use the rnorm function +``` diff --git a/content/blog/2024-04-16-markdown-programmatic/index.Rmd b/content/blog/2024-04-16-markdown-programmatic/index.Rmd index 93c8eff80..ebf0a36fa 100644 --- a/content/blog/2024-04-16-markdown-programmatic/index.Rmd +++ b/content/blog/2024-04-16-markdown-programmatic/index.Rmd @@ -4,6 +4,7 @@ title: All the ways to programmatically edit R Markdown / Quarto documents author: - MaĆ«lle Salmon - Christophe Dervieux + - Zhian N. Kamvar # Set the date below to the publication date of your post date: 2024-04-16 # Minimal tags for a post about a community-contributed package @@ -49,9 +50,14 @@ Some content, with parts in __bold__ or _italic_. Let me add a [link](https://ro ``` -Furthermore there are different _flavors_ of Markdown, and some supplementary features added depending on what your Markdown files will be used by, like emoji written so: `:grin:`. +Furthermore there are different _flavors_ of Markdown[^md-flavours], which add some [extended syntax], like emoji written so: `:grin:`. -Common Markdown consumers R users interact with include: R Markdown (that uses Pandoc under the hood), Quarto (that uses Pandoc under the hood... see any trend here?), GitHub, Hugo. +[^md-flavours]: As of 2024-06-20, there are [76 programs that parse markdown](https://github.com/markdown/markdown.github.com/wiki/Implementations), some with their own unique flavour. + +[extended syntax]: https://www.markdownguide.org/extended-syntax/ + + +Markdown formats that R users will commonly interact with include: R Markdown (uses Pandoc under the hood), Quarto (uses Pandoc under the hood... see any trend here?), GitHub, Hugo (for blogdown or hugodown websites). Many tools using Markdown also accept metadata at the top of Markdown files, either YAML or TOML. Here is an example with YAML: @@ -65,17 +71,19 @@ author: Myself Some content, *nice* content. ``` -Most often R users will write Markdown manually, or with the help of an editor such as RStudio IDE visual editor. -But sometimes, one will have to edit a bunch of Markdown files at once. +Most often R users will write Markdown manually, or with the help of an editor such as the RStudio IDE visual editor. +But sometimes, one will have to create or edit a bunch of Markdown files at once, and editing all those files by hand is a huge waste of time. +This blog post will give you resources in R that you can use to create, parse, and edit markdown documents, so that you can become the markdown wizard you have always dreamed of becoming :mage:! + -## Templating tools +## Templating Tools for Boilerplate Documents Imagine you need to create a bunch of different R Markdown files, for instance for students to use as personalized exercises. In that case, you can create a boilerplate document as a template, and create its different output versions using a templating tool. Templating tools include: -- `knitr::knit_expand()` by Yihui Xie; +- [`knitr::knit_expand()`](https://cran.r-project.org/web/packages/knitr/vignettes/knit_expand.html) by Yihui Xie; - the [whisker package](https://github.com/edwindj/whisker) maintained by Edwin de Jonge (used in for instance pkgddown); - the [brew package](https://github.com/gregfrog/brew) maintained by Greg Hunt; - [Pandoc](/blog/2023/06/01/troubleshooting-pandoc-problems-as-an-r-user/) by John MacFarlane. @@ -88,9 +96,48 @@ A common workflow would be: - You read this template in R using for instance the brio package. - Mapping over your set of variables, you render the template using whisker and save each version to a file using the brio package. -## String manipulation tools +### Example + +Here's an example markdown file that we can use as a template: + +````{r show-markdown, echo = FALSE, warn = FALSE, message = FALSE, results = 'asis', comment = ""} +md <- readLines("hw-template.md") +writeLines(c("````markdown", md, "````"), con = stdout()) +```` + +Using the workflow above, we can r + +```{r, message = FALSE} +# generate student variables ---- +students <- c("MaĆ«lle", "Christophe", "Zhian") +n <- length(students) +key <- data.frame( + name = students, + mean = rpois(n, 5), + sd = sprintf("%.1f", runif(n)), + file = sprintf("%s-hw.md", students) +) +# render and write assignment from template ---- +make_assignment <- function(key, template) { + lapply(seq(n), function(i) { + new <- whisker::whisker.render(template, data = key[i, ]) + brio::write_lines(new, key$file[i]) + }) + return(invisible()) +} +md <- brio::read_lines("hw-template.md") +make_assignment(key, template = md) +print(key) +``` + +```{r, echo = FALSE} +# clean up after ourselves +unlink(list.files(pattern = "*-hw.md")) +``` + +## String Manipulation Tools -You can use string manipulation tools to parse Markdown if you are sure of the Markdown variants your code will get as input, or if you are willing to grow your codebase to accomodate many edge cases... which in the end means you are writing an actual Markdown parser. +You can use string manipulation tools to parse Markdown if you are sure of the Markdown variants your code will get as input, or if you are willing to grow your codebase to accommodate many edge cases... which in the end means you are writing an actual Markdown parser. Not for the faint of heart... neither necessary if you read the section after this one. :relieved: You'd detect heading using for instance `grep("^#", markdown_lines)`[^edge]. @@ -102,32 +149,66 @@ Example of string manipulation tools include base R (`sub()`, `grep()` and frien Although string manipulation tools are of a limited usefulness when parsing Markdown, they can _complement_ the actual parsing tools. Even if using specific Markdown parsing tools will help you write less regular expressions yourself... they won't completely free you from them. -## Parsing tools +## Parsing Tools Parsing tools are fantastic, and numerous. +These translate the Markdown document into a data structure called an [Abstract +Syntax Tree (AST)][AST] that gives you fine-grained control over specific elements of the document (e.g. individual headings or links regardless of how they are written). +With a formal data structure, you can programmatically manipulate the Markdown document by adding, removing, or manipulating pieces of Markdown in a standardized way. We will only mention the ones you can directly use from R. +[AST]: https://en.wikipedia.org/wiki/Abstract_syntax_tree -The [tinkr package](http://docs.ropensci.org/tinkr/) maintained by Zhian Kamvar parses Markdown to XML using Commonmark, and writes it back to Markdown using XSLT. The YAML metadata is available as a string. +### Fine-grain Parsing -With Pandoc that we presented in a [tech note last year](blog/2023/06/01/troubleshooting-pandoc-problems-as-an-r-user/#raw-attributes), you can parse a Markdown files to a Pandoc Abstract Syntax Tree, or to, say HTML, and then back to Markdown. +Let's say you have created a bunch of tutorials that link to a website containing a gallery of extensions for a popular plotting package. +Let's also say that one day, someone discovers that the link to the website is suddenly [redirecting to a potentially malicious site that is most certainly not related to the grammar of graphics](https://github.com/ggplot2-exts/gallery/issues/112) and you +need to replace all instances of that link to `**redacted**`. Since links in Markdown could be written any number of ways, regex is not going to help you, but a fine-grained Markdown parser will! -The [parsermd package](https://rundel.github.io/parsermd/) maintained by Colin Rundel is "implementation of a formal grammar and parser for R Markdown documents using the Boost Spirit X3 library. It also includes a collection of high level functions for working with the resulting abstract syntax tree." +A workflow for this situation would be: -The [md4r package](https://rundel.github.io/md4r/), more recent and also maintained by Colin Rundel, is very similar except that it uses the MD4C (Markdown for C) library. +- read in the Markdown AST with your favourite parser +- pull out all links that point to the rotten link +- replace them with emphasized text that says "redacted" +- convert the AST and write back to file -### The impossibility of a perfect roundtrip +The [tinkr package](http://docs.ropensci.org/tinkr/) dreamed up by MaĆ«lle Salmon and maintained by Zhian Kamvar parses Markdown to XML using Commonmark, allows you to extract and manipulate markdown using XPath via the xml2 package. +Tinkr writes the XML back to Markdown using XSLT. +The YAML metadata is available as a string. + +The [md4r package](https://rundel.github.io/md4r/), is a recent experimental package maintained by Colin Rundel, and is an R wrapper around the MD4C (Markdown for C) library and represents the AST as a nested list with attributes in R. +The development version of the package has utilities for constructing Markdown documents programmatically. + +With Pandoc that we presented in a [tech note last year](blog/2023/06/01/troubleshooting-pandoc-problems-as-an-r-user/#raw-attributes), you can parse a Markdown files to a Pandoc Abstract Syntax Tree (in JSON format). +Nic Crane has an experimental package called [parseqmd](https://github.com/thisisnic/parseqmd) that uses this strategy, parsing +the output with the jsonlite package. +You can also parse to, say HTML, and then back to Markdown. The benefit of parsing it to HTML is that you can use a package such as rvest to extract and manipulate the elements. + +### High-level Parsing + +If you are only interested in the heading structure of a document and code chunks, where you may or may not want to manipulate the rest of the Markdown, you might benefit from using a high-level parser. +The [parsermd package](https://rundel.github.io/parsermd/) is another package maintained by Colin Rundel is "implementation of a formal grammar and parser for R Markdown documents using the Boost Spirit X3 library. +It also includes a collection of high level functions for working with the resulting abstract syntax tree." +This package is different from other parsing options mentioned here because, in the words of its author, the aim of the package is "...to capture the fundamental structure of the document and as such we do not attempt to parse every detail of the Rmd." + +This package has functionality for a tidy workflow allowing you to select different sections of the document. +One useful feature is that it has the function [`rmd_check_template()`](https://rundel.github.io/parsermd/articles/templates.html) allowing you to compare student markdown submissions against a standard template. +You can watch his [RStudio::conf(2021) talk about it](https://posit.co/resources/videos/parsermd-parsing-r-markdown-for-fun-and-profit/). + +### The Impossibility of a Perfect Roundtrip When parsing and editing Markdown, then writing it back to Markdown, some undesired changes might appear. -For instance, with [tinkr](http://docs.ropensci.org/tinkr/#general-principles-and-solution) list items all start with a `-` even if in the original document they started with a `*`. +For instance, with [tinkr](http://docs.ropensci.org/tinkr/#general-principles-and-solution) list items all start with a `-` even if in the original document they started with a `*`. With md4r, lists that are indented with extra space will be readjusted. Depending on your use case you might want to find ways to mitigate such losses, for instance only re-writing the lines you made intentional edits to. -### How to choose a parser? +### How to Choose a Parser? -You can choose a parser based on what it lets you manipulate the Markdown with: if you prefer XML and HTML to nested lists for instance, you might prefer using tinkr or Pandoc. +You can choose a parser based on what it lets you manipulate the Markdown with: if you prefer XML[^maelle-approved] and HTML to nested lists for instance, you might prefer using tinkr or Pandoc. If the high-level functions of md4r or parsermd are suitable for your use case, you might prefer one of them. +[^maelle-approved]: Both MaĆ«lle and Zhian are _huge_ fans of XML and XPath (see: https://masalmon.eu/2022/04/08/xml-xpath/ and https://zkamvar.netlify.app/blog/gh-task-lists/). + Another important criterion is to choose a parser that's a close to the use case of your Markdown files as possible. If you are only going to work with Markdown files for GitHub, commonmark/tinkr is an excellent choice since GitHub itself uses commonmark. Now, your work might encompass different sorts of Markdown files that will be used by different tools. @@ -138,21 +219,24 @@ Furthermore, if you need to parse certain elements, like again Hugo shortcodes, [^caveat]: Or at least it's supposed to :sweat_smile: Thankfully users report edge cases that are not covered yet. -## What about the code chunks +## What about the Code Chunks Programmatically parsing and editing R code is out of the scope of this post, but closely related enough to throw in a few tips. As with Markdown, you might need to use regular expressions but try not to. You can parse the code to XML using base R parsing and [xmlparsedata](https://r-lib.github.io/xmlparsedata/), then you manipulate the XML with [XPath](https://masalmon.eu/2022/04/08/xml-xpath/). To write code back, you can make use of the attributes of each node that indicates the original lines and columns. -So a possible workflow is +So a possible workflow is: - parse the code to XML, use xmlparsedata to inform what to change and where. Out of these steps you'd get a list of elements' positions for instance. - use brio to read the lines, change a few of them with base R tools, then use brio again to write the lines back. -## Examples of Markdown parsing and editing +## Examples of Markdown Parsing and Editing + +The [pegboard package](https://carpentries.github.io/pegboard/) maintained by Zhian Kamvar, parses and validates Carpentries' lessons for structural markdown elements, including valid links, alt-text, and known fenced-divs thanks to tinkr. +This package was instrumental in converting all of The Carpentries lesson infrastructure from Jekyll's markdown syntax to Pandoc's markdown[^transition]. -The [pegboard package](https://carpentries.github.io/pegboard/) maintained by Zhian Kamvar, parses and validates Carpentrie's lessons for structural markdown elements, thanks to tinkr. +[^transition]: For examples, see [The Carpentries Workbench Transition Guide](https://carpentries.github.io/workbench/transition-guide.html). The [babeldown package](https://docs.ropensci.org/babeldown/) maintained by MaĆ«lle Salmon transforms Markdown to XML, sends it to DeepL API for translation, and writes the results back to Markdown, also using tinkr. diff --git a/content/blog/2024-04-16-markdown-programmatic/index.md b/content/blog/2024-04-16-markdown-programmatic/index.md index 939c7d6fb..babf531e0 100644 --- a/content/blog/2024-04-16-markdown-programmatic/index.md +++ b/content/blog/2024-04-16-markdown-programmatic/index.md @@ -4,6 +4,7 @@ title: All the ways to programmatically edit R Markdown / Quarto documents author: - MaĆ«lle Salmon - Christophe Dervieux + - Zhian N. Kamvar # Set the date below to the publication date of your post date: 2024-04-16 # Minimal tags for a post about a community-contributed package @@ -49,9 +50,14 @@ Some content, with parts in __bold__ or _italic_. Let me add a [link](https://ro ``` -Furthermore there are different _flavors_ of Markdown, and some supplementary features added depending on what your Markdown files will be used by, like emoji written so: `:grin:`. +Furthermore there are different _flavors_ of Markdown[^md-flavours], which add some [extended syntax], like emoji written so: `:grin:`. -Common Markdown consumers R users interact with include: R Markdown (that uses Pandoc under the hood), Quarto (that uses Pandoc under the hood... see any trend here?), GitHub, Hugo. +[^md-flavours]: As of 2024-06-20, there are [76 programs that parse markdown](https://github.com/markdown/markdown.github.com/wiki/Implementations), some with their own unique flavour. + +[extended syntax]: https://www.markdownguide.org/extended-syntax/ + + +Markdown formats that R users will commonly interact with include: R Markdown (uses Pandoc under the hood), Quarto (uses Pandoc under the hood... see any trend here?), GitHub, Hugo (for blogdown or hugodown websites). Many tools using Markdown also accept metadata at the top of Markdown files, either YAML or TOML. Here is an example with YAML: @@ -65,17 +71,19 @@ author: Myself Some content, *nice* content. ``` -Most often R users will write Markdown manually, or with the help of an editor such as RStudio IDE visual editor. -But sometimes, one will have to edit a bunch of Markdown files at once. +Most often R users will write Markdown manually, or with the help of an editor such as the RStudio IDE visual editor. +But sometimes, one will have to create or edit a bunch of Markdown files at once, and editing all those files by hand is a huge waste of time. +This blog post will give you resources in R that you can use to create, parse, and edit markdown documents, so that you can become the markdown wizard you have always dreamed of becoming :mage:! + -## Templating tools +## Templating Tools for Boilerplate Documents Imagine you need to create a bunch of different R Markdown files, for instance for students to use as personalized exercises. In that case, you can create a boilerplate document as a template, and create its different output versions using a templating tool. Templating tools include: -- `knitr::knit_expand()` by Yihui Xie; +- [`knitr::knit_expand()`](https://cran.r-project.org/web/packages/knitr/vignettes/knit_expand.html) by Yihui Xie; - the [whisker package](https://github.com/edwindj/whisker) maintained by Edwin de Jonge (used in for instance pkgddown); - the [brew package](https://github.com/gregfrog/brew) maintained by Greg Hunt; - [Pandoc](/blog/2023/06/01/troubleshooting-pandoc-problems-as-an-r-user/) by John MacFarlane. @@ -88,9 +96,61 @@ A common workflow would be: - You read this template in R using for instance the brio package. - Mapping over your set of variables, you render the template using whisker and save each version to a file using the brio package. -## String manipulation tools +### Example + +Here's an example markdown file that we can use as a template: + +````markdown +--- +title: "Homework assignment 1" +author: "{{name}}" +--- + +Create a normal distribution with a mean of {{mean}} and a standard deviation of {{sd}}: + +```{r solution-1} +# hint: use the rnorm function +``` +```` + +Using the workflow above, we can r + + +``` r +# generate student variables ---- +students <- c("MaĆ«lle", "Christophe", "Zhian") +n <- length(students) +key <- data.frame( + name = students, + mean = rpois(n, 5), + sd = sprintf("%.1f", runif(n)), + file = sprintf("%s-hw.md", students) +) +# render and write assignment from template ---- +make_assignment <- function(key, template) { + lapply(seq(n), function(i) { + new <- whisker::whisker.render(template, data = key[i, ]) + brio::write_lines(new, key$file[i]) + }) + return(invisible()) +} +md <- brio::read_lines("hw-template.md") +make_assignment(key, template = md) +print(key) +``` + +``` +## name mean sd file +## 1 MaĆ«lle 5 0.2 MaĆ«lle-hw.md +## 2 Christophe 6 0.1 Christophe-hw.md +## 3 Zhian 5 0.7 Zhian-hw.md +``` + + -You can use string manipulation tools to parse Markdown if you are sure of the Markdown variants your code will get as input, or if you are willing to grow your codebase to accomodate many edge cases... which in the end means you are writing an actual Markdown parser. +## String Manipulation Tools + +You can use string manipulation tools to parse Markdown if you are sure of the Markdown variants your code will get as input, or if you are willing to grow your codebase to accommodate many edge cases... which in the end means you are writing an actual Markdown parser. Not for the faint of heart... neither necessary if you read the section after this one. :relieved: You'd detect heading using for instance `grep("^#", markdown_lines)`[^edge]. @@ -102,32 +162,66 @@ Example of string manipulation tools include base R (`sub()`, `grep()` and frien Although string manipulation tools are of a limited usefulness when parsing Markdown, they can _complement_ the actual parsing tools. Even if using specific Markdown parsing tools will help you write less regular expressions yourself... they won't completely free you from them. -## Parsing tools +## Parsing Tools Parsing tools are fantastic, and numerous. +These translate the Markdown document into a data structure called an [Abstract +Syntax Tree (AST)][AST] that gives you fine-grained control over specific elements of the document (e.g. individual headings or links regardless of how they are written). +With a formal data structure, you can programmatically manipulate the Markdown document by adding, removing, or manipulating pieces of Markdown in a standardized way. We will only mention the ones you can directly use from R. +[AST]: https://en.wikipedia.org/wiki/Abstract_syntax_tree + +### Fine-grain Parsing + +Let's say you have created a bunch of tutorials that link to a website containing a gallery of extensions for a popular plotting package. +Let's also say that one day, someone discovers that the link to the website is suddenly [redirecting to a potentially malicious site that is most certainly not related to the grammar of graphics](https://github.com/ggplot2-exts/gallery/issues/112) and you +need to replace all instances of that link to `**redacted**`. Since links in Markdown could be written any number of ways, regex is not going to help you, but a fine-grained Markdown parser will! -The [tinkr package](http://docs.ropensci.org/tinkr/) maintained by Zhian Kamvar parses Markdown to XML using Commonmark, and writes it back to Markdown using XSLT. The YAML metadata is available as a string. +A workflow for this situation would be: -With Pandoc that we presented in a [tech note last year](blog/2023/06/01/troubleshooting-pandoc-problems-as-an-r-user/#raw-attributes), you can parse a Markdown files to a Pandoc Abstract Syntax Tree, or to, say HTML, and then back to Markdown. +- read in the Markdown AST with your favourite parser +- pull out all links that point to the rotten link +- replace them with emphasized text that says "redacted" +- convert the AST and write back to file -The [parsermd package](https://rundel.github.io/parsermd/) maintained by Colin Rundel is "implementation of a formal grammar and parser for R Markdown documents using the Boost Spirit X3 library. It also includes a collection of high level functions for working with the resulting abstract syntax tree." +The [tinkr package](http://docs.ropensci.org/tinkr/) dreamed up by MaĆ«lle Salmon and maintained by Zhian Kamvar parses Markdown to XML using Commonmark, allows you to extract and manipulate markdown using XPath via the xml2 package. +Tinkr writes the XML back to Markdown using XSLT. +The YAML metadata is available as a string. -The [md4r package](https://rundel.github.io/md4r/), more recent and also maintained by Colin Rundel, is very similar except that it uses the MD4C (Markdown for C) library. +The [md4r package](https://rundel.github.io/md4r/), is a recent experimental package maintained by Colin Rundel, and is an R wrapper around the MD4C (Markdown for C) library and represents the AST as a nested list with attributes in R. +The development version of the package has utilities for constructing Markdown documents programmatically. -### The impossibility of a perfect roundtrip +With Pandoc that we presented in a [tech note last year](blog/2023/06/01/troubleshooting-pandoc-problems-as-an-r-user/#raw-attributes), you can parse a Markdown files to a Pandoc Abstract Syntax Tree (in JSON format). +Nic Crane has an experimental package called [parseqmd](https://github.com/thisisnic/parseqmd) that uses this strategy, parsing +the output with the jsonlite package. +You can also parse to, say HTML, and then back to Markdown. The benefit of parsing it to HTML is that you can use a package such as rvest to extract and manipulate the elements. + +### High-level Parsing + +If you are only interested in the heading structure of a document and code chunks, where you may or may not want to manipulate the rest of the Markdown, you might benefit from using a high-level parser. +The [parsermd package](https://rundel.github.io/parsermd/) is another package maintained by Colin Rundel is "implementation of a formal grammar and parser for R Markdown documents using the Boost Spirit X3 library. +It also includes a collection of high level functions for working with the resulting abstract syntax tree." +This package is different from other parsing options mentioned here because, in the words of its author, the aim of the package is "...to capture the fundamental structure of the document and as such we do not attempt to parse every detail of the Rmd." + +This package has functionality for a tidy workflow allowing you to select different sections of the document. +One useful feature is that it has the function [`rmd_check_template()`](https://rundel.github.io/parsermd/articles/templates.html) allowing you to compare student markdown submissions against a standard template. +You can watch his [RStudio::conf(2021) talk about it](https://posit.co/resources/videos/parsermd-parsing-r-markdown-for-fun-and-profit/). + +### The Impossibility of a Perfect Roundtrip When parsing and editing Markdown, then writing it back to Markdown, some undesired changes might appear. -For instance, with [tinkr](http://docs.ropensci.org/tinkr/#general-principles-and-solution) list items all start with a `-` even if in the original document they started with a `*`. +For instance, with [tinkr](http://docs.ropensci.org/tinkr/#general-principles-and-solution) list items all start with a `-` even if in the original document they started with a `*`. With md4r, lists that are indented with extra space will be readjusted. Depending on your use case you might want to find ways to mitigate such losses, for instance only re-writing the lines you made intentional edits to. -### How to choose a parser? +### How to Choose a Parser? -You can choose a parser based on what it lets you manipulate the Markdown with: if you prefer XML and HTML to nested lists for instance, you might prefer using tinkr or Pandoc. +You can choose a parser based on what it lets you manipulate the Markdown with: if you prefer XML[^maelle-approved] and HTML to nested lists for instance, you might prefer using tinkr or Pandoc. If the high-level functions of md4r or parsermd are suitable for your use case, you might prefer one of them. +[^maelle-approved]: Both MaĆ«lle and Zhian are _huge_ fans of XML and XPath (see: https://masalmon.eu/2022/04/08/xml-xpath/ and https://zkamvar.netlify.app/blog/gh-task-lists/). + Another important criterion is to choose a parser that's a close to the use case of your Markdown files as possible. If you are only going to work with Markdown files for GitHub, commonmark/tinkr is an excellent choice since GitHub itself uses commonmark. Now, your work might encompass different sorts of Markdown files that will be used by different tools. @@ -138,21 +232,24 @@ Furthermore, if you need to parse certain elements, like again Hugo shortcodes, [^caveat]: Or at least it's supposed to :sweat_smile: Thankfully users report edge cases that are not covered yet. -## What about the code chunks +## What about the Code Chunks Programmatically parsing and editing R code is out of the scope of this post, but closely related enough to throw in a few tips. As with Markdown, you might need to use regular expressions but try not to. You can parse the code to XML using base R parsing and [xmlparsedata](https://r-lib.github.io/xmlparsedata/), then you manipulate the XML with [XPath](https://masalmon.eu/2022/04/08/xml-xpath/). To write code back, you can make use of the attributes of each node that indicates the original lines and columns. -So a possible workflow is +So a possible workflow is: - parse the code to XML, use xmlparsedata to inform what to change and where. Out of these steps you'd get a list of elements' positions for instance. - use brio to read the lines, change a few of them with base R tools, then use brio again to write the lines back. -## Examples of Markdown parsing and editing +## Examples of Markdown Parsing and Editing + +The [pegboard package](https://carpentries.github.io/pegboard/) maintained by Zhian Kamvar, parses and validates Carpentries' lessons for structural markdown elements, including valid links, alt-text, and known fenced-divs thanks to tinkr. +This package was instrumental in converting all of The Carpentries lesson infrastructure from Jekyll's markdown syntax to Pandoc's markdown[^transition]. -The [pegboard package](https://carpentries.github.io/pegboard/) maintained by Zhian Kamvar, parses and validates Carpentrie's lessons for structural markdown elements, thanks to tinkr. +[^transition]: For examples, see [The Carpentries Workbench Transition Guide](https://carpentries.github.io/workbench/transition-guide.html). The [babeldown package](https://docs.ropensci.org/babeldown/) maintained by MaĆ«lle Salmon transforms Markdown to XML, sends it to DeepL API for translation, and writes the results back to Markdown, also using tinkr. From dfcb6870b77b272eb1b08ddaa396bd04513f88e8 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ma=C3=ABlle=20Salmon?= Date: Tue, 25 Jun 2024 12:07:06 +0200 Subject: [PATCH 3/6] fix case :nail_care: --- .../index.Rmd | 16 +++++++------- .../2024-04-16-markdown-programmatic/index.md | 22 +++++++++---------- 2 files changed, 19 insertions(+), 19 deletions(-) diff --git a/content/blog/2024-04-16-markdown-programmatic/index.Rmd b/content/blog/2024-04-16-markdown-programmatic/index.Rmd index ebf0a36fa..98d83594c 100644 --- a/content/blog/2024-04-16-markdown-programmatic/index.Rmd +++ b/content/blog/2024-04-16-markdown-programmatic/index.Rmd @@ -1,5 +1,5 @@ --- -slug: "markdown-programmatic-parsing.edits" +slug: "markdown-programmatic-parsing" title: All the ways to programmatically edit R Markdown / Quarto documents author: - MaĆ«lle Salmon @@ -52,7 +52,7 @@ Some content, with parts in __bold__ or _italic_. Let me add a [link](https://ro Furthermore there are different _flavors_ of Markdown[^md-flavours], which add some [extended syntax], like emoji written so: `:grin:`. -[^md-flavours]: As of 2024-06-20, there are [76 programs that parse markdown](https://github.com/markdown/markdown.github.com/wiki/Implementations), some with their own unique flavour. +[^md-flavours]: As of 2024-06-20, there are [76 programs that parse Markdown](https://github.com/markdown/markdown.github.com/wiki/Implementations), some with their own unique flavour. [extended syntax]: https://www.markdownguide.org/extended-syntax/ @@ -73,7 +73,7 @@ Some content, *nice* content. Most often R users will write Markdown manually, or with the help of an editor such as the RStudio IDE visual editor. But sometimes, one will have to create or edit a bunch of Markdown files at once, and editing all those files by hand is a huge waste of time. -This blog post will give you resources in R that you can use to create, parse, and edit markdown documents, so that you can become the markdown wizard you have always dreamed of becoming :mage:! +This blog post will give you resources in R that you can use to create, parse, and edit Markdown documents, so that you can become the Markdown wizard you have always dreamed of becoming :mage:! ## Templating Tools for Boilerplate Documents @@ -98,7 +98,7 @@ A common workflow would be: ### Example -Here's an example markdown file that we can use as a template: +Here's an example Markdown file that we can use as a template: ````{r show-markdown, echo = FALSE, warn = FALSE, message = FALSE, results = 'asis', comment = ""} md <- readLines("hw-template.md") @@ -172,7 +172,7 @@ A workflow for this situation would be: - replace them with emphasized text that says "redacted" - convert the AST and write back to file -The [tinkr package](http://docs.ropensci.org/tinkr/) dreamed up by MaĆ«lle Salmon and maintained by Zhian Kamvar parses Markdown to XML using Commonmark, allows you to extract and manipulate markdown using XPath via the xml2 package. +The [tinkr package](http://docs.ropensci.org/tinkr/) dreamed up by MaĆ«lle Salmon and maintained by Zhian Kamvar parses Markdown to XML using Commonmark, allows you to extract and manipulate Markdown using XPath via the xml2 package. Tinkr writes the XML back to Markdown using XSLT. The YAML metadata is available as a string. @@ -192,7 +192,7 @@ It also includes a collection of high level functions for working with the resul This package is different from other parsing options mentioned here because, in the words of its author, the aim of the package is "...to capture the fundamental structure of the document and as such we do not attempt to parse every detail of the Rmd." This package has functionality for a tidy workflow allowing you to select different sections of the document. -One useful feature is that it has the function [`rmd_check_template()`](https://rundel.github.io/parsermd/articles/templates.html) allowing you to compare student markdown submissions against a standard template. +One useful feature is that it has the function [`rmd_check_template()`](https://rundel.github.io/parsermd/articles/templates.html) allowing you to compare student Markdown submissions against a standard template. You can watch his [RStudio::conf(2021) talk about it](https://posit.co/resources/videos/parsermd-parsing-r-markdown-for-fun-and-profit/). ### The Impossibility of a Perfect Roundtrip @@ -233,8 +233,8 @@ So a possible workflow is: ## Examples of Markdown Parsing and Editing -The [pegboard package](https://carpentries.github.io/pegboard/) maintained by Zhian Kamvar, parses and validates Carpentries' lessons for structural markdown elements, including valid links, alt-text, and known fenced-divs thanks to tinkr. -This package was instrumental in converting all of The Carpentries lesson infrastructure from Jekyll's markdown syntax to Pandoc's markdown[^transition]. +The [pegboard package](https://carpentries.github.io/pegboard/) maintained by Zhian Kamvar, parses and validates Carpentries' lessons for structural Markdown elements, including valid links, alt-text, and known fenced-divs thanks to tinkr. +This package was instrumental in converting all of The Carpentries lesson infrastructure from Jekyll's Markdown syntax to Pandoc's Markdown[^transition]. [^transition]: For examples, see [The Carpentries Workbench Transition Guide](https://carpentries.github.io/workbench/transition-guide.html). diff --git a/content/blog/2024-04-16-markdown-programmatic/index.md b/content/blog/2024-04-16-markdown-programmatic/index.md index babf531e0..b67c07f00 100644 --- a/content/blog/2024-04-16-markdown-programmatic/index.md +++ b/content/blog/2024-04-16-markdown-programmatic/index.md @@ -1,5 +1,5 @@ --- -slug: "markdown-programmatic-parsing.edits" +slug: "markdown-programmatic-parsing" title: All the ways to programmatically edit R Markdown / Quarto documents author: - MaĆ«lle Salmon @@ -52,7 +52,7 @@ Some content, with parts in __bold__ or _italic_. Let me add a [link](https://ro Furthermore there are different _flavors_ of Markdown[^md-flavours], which add some [extended syntax], like emoji written so: `:grin:`. -[^md-flavours]: As of 2024-06-20, there are [76 programs that parse markdown](https://github.com/markdown/markdown.github.com/wiki/Implementations), some with their own unique flavour. +[^md-flavours]: As of 2024-06-20, there are [76 programs that parse Markdown](https://github.com/markdown/markdown.github.com/wiki/Implementations), some with their own unique flavour. [extended syntax]: https://www.markdownguide.org/extended-syntax/ @@ -73,7 +73,7 @@ Some content, *nice* content. Most often R users will write Markdown manually, or with the help of an editor such as the RStudio IDE visual editor. But sometimes, one will have to create or edit a bunch of Markdown files at once, and editing all those files by hand is a huge waste of time. -This blog post will give you resources in R that you can use to create, parse, and edit markdown documents, so that you can become the markdown wizard you have always dreamed of becoming :mage:! +This blog post will give you resources in R that you can use to create, parse, and edit Markdown documents, so that you can become the Markdown wizard you have always dreamed of becoming :mage:! ## Templating Tools for Boilerplate Documents @@ -98,7 +98,7 @@ A common workflow would be: ### Example -Here's an example markdown file that we can use as a template: +Here's an example Markdown file that we can use as a template: ````markdown --- @@ -141,9 +141,9 @@ print(key) ``` ## name mean sd file -## 1 MaĆ«lle 5 0.2 MaĆ«lle-hw.md -## 2 Christophe 6 0.1 Christophe-hw.md -## 3 Zhian 5 0.7 Zhian-hw.md +## 1 MaĆ«lle 8 0.6 MaĆ«lle-hw.md +## 2 Christophe 6 0.2 Christophe-hw.md +## 3 Zhian 3 0.7 Zhian-hw.md ``` @@ -185,7 +185,7 @@ A workflow for this situation would be: - replace them with emphasized text that says "redacted" - convert the AST and write back to file -The [tinkr package](http://docs.ropensci.org/tinkr/) dreamed up by MaĆ«lle Salmon and maintained by Zhian Kamvar parses Markdown to XML using Commonmark, allows you to extract and manipulate markdown using XPath via the xml2 package. +The [tinkr package](http://docs.ropensci.org/tinkr/) dreamed up by MaĆ«lle Salmon and maintained by Zhian Kamvar parses Markdown to XML using Commonmark, allows you to extract and manipulate Markdown using XPath via the xml2 package. Tinkr writes the XML back to Markdown using XSLT. The YAML metadata is available as a string. @@ -205,7 +205,7 @@ It also includes a collection of high level functions for working with the resul This package is different from other parsing options mentioned here because, in the words of its author, the aim of the package is "...to capture the fundamental structure of the document and as such we do not attempt to parse every detail of the Rmd." This package has functionality for a tidy workflow allowing you to select different sections of the document. -One useful feature is that it has the function [`rmd_check_template()`](https://rundel.github.io/parsermd/articles/templates.html) allowing you to compare student markdown submissions against a standard template. +One useful feature is that it has the function [`rmd_check_template()`](https://rundel.github.io/parsermd/articles/templates.html) allowing you to compare student Markdown submissions against a standard template. You can watch his [RStudio::conf(2021) talk about it](https://posit.co/resources/videos/parsermd-parsing-r-markdown-for-fun-and-profit/). ### The Impossibility of a Perfect Roundtrip @@ -246,8 +246,8 @@ So a possible workflow is: ## Examples of Markdown Parsing and Editing -The [pegboard package](https://carpentries.github.io/pegboard/) maintained by Zhian Kamvar, parses and validates Carpentries' lessons for structural markdown elements, including valid links, alt-text, and known fenced-divs thanks to tinkr. -This package was instrumental in converting all of The Carpentries lesson infrastructure from Jekyll's markdown syntax to Pandoc's markdown[^transition]. +The [pegboard package](https://carpentries.github.io/pegboard/) maintained by Zhian Kamvar, parses and validates Carpentries' lessons for structural Markdown elements, including valid links, alt-text, and known fenced-divs thanks to tinkr. +This package was instrumental in converting all of The Carpentries lesson infrastructure from Jekyll's Markdown syntax to Pandoc's Markdown[^transition]. [^transition]: For examples, see [The Carpentries Workbench Transition Guide](https://carpentries.github.io/workbench/transition-guide.html). From c98e8b1e323434ca6ab9c3075d389ba516ec896f Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ma=C3=ABlle=20Salmon?= Date: Tue, 25 Jun 2024 12:14:33 +0200 Subject: [PATCH 4/6] add @zkamvar's homework in full --- .../index.Rmd | 7 +++++++ .../2024-04-16-markdown-programmatic/index.md | 21 ++++++++++++++++--- 2 files changed, 25 insertions(+), 3 deletions(-) diff --git a/content/blog/2024-04-16-markdown-programmatic/index.Rmd b/content/blog/2024-04-16-markdown-programmatic/index.Rmd index 98d83594c..2cb4c5f15 100644 --- a/content/blog/2024-04-16-markdown-programmatic/index.Rmd +++ b/content/blog/2024-04-16-markdown-programmatic/index.Rmd @@ -130,6 +130,13 @@ make_assignment(key, template = md) print(key) ``` +Here's how Zhian's homework looks like: + +````{r show-zhian-markdown, echo = FALSE, warn = FALSE, message = FALSE, results = 'asis', comment = ""} +md <- readLines("Zhian-hw.md") +writeLines(c("````markdown", md, "````"), con = stdout()) +```` + ```{r, echo = FALSE} # clean up after ourselves unlink(list.files(pattern = "*-hw.md")) diff --git a/content/blog/2024-04-16-markdown-programmatic/index.md b/content/blog/2024-04-16-markdown-programmatic/index.md index b67c07f00..65479badf 100644 --- a/content/blog/2024-04-16-markdown-programmatic/index.md +++ b/content/blog/2024-04-16-markdown-programmatic/index.md @@ -141,11 +141,26 @@ print(key) ``` ## name mean sd file -## 1 MaĆ«lle 8 0.6 MaĆ«lle-hw.md -## 2 Christophe 6 0.2 Christophe-hw.md -## 3 Zhian 3 0.7 Zhian-hw.md +## 1 MaĆ«lle 5 0.4 MaĆ«lle-hw.md +## 2 Christophe 8 0.7 Christophe-hw.md +## 3 Zhian 7 0.1 Zhian-hw.md ``` +Here's how Zhian's homework looks like: + +````markdown +--- +title: "Homework assignment 1" +author: "Zhian" +--- + +Create a normal distribution with a mean of 7 and a standard deviation of 0.1: + +```{r solution-1} +# hint: use the rnorm function +``` +```` + ## String Manipulation Tools From 68e2ea4381dcbbd43ece9efb142a4bad9979a997 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ma=C3=ABlle=20Salmon?= Date: Tue, 25 Jun 2024 12:15:07 +0200 Subject: [PATCH 5/6] add missing s --- .../blog/2024-04-16-markdown-programmatic/index.Rmd | 2 +- content/blog/2024-04-16-markdown-programmatic/index.md | 10 +++++----- 2 files changed, 6 insertions(+), 6 deletions(-) diff --git a/content/blog/2024-04-16-markdown-programmatic/index.Rmd b/content/blog/2024-04-16-markdown-programmatic/index.Rmd index 2cb4c5f15..4eab8a24b 100644 --- a/content/blog/2024-04-16-markdown-programmatic/index.Rmd +++ b/content/blog/2024-04-16-markdown-programmatic/index.Rmd @@ -147,7 +147,7 @@ unlink(list.files(pattern = "*-hw.md")) You can use string manipulation tools to parse Markdown if you are sure of the Markdown variants your code will get as input, or if you are willing to grow your codebase to accommodate many edge cases... which in the end means you are writing an actual Markdown parser. Not for the faint of heart... neither necessary if you read the section after this one. :relieved: -You'd detect heading using for instance `grep("^#", markdown_lines)`[^edge]. +You'd detect headings using for instance `grep("^#", markdown_lines)`[^edge]. [^edge]: But this would also detect code comments! Don't do this! diff --git a/content/blog/2024-04-16-markdown-programmatic/index.md b/content/blog/2024-04-16-markdown-programmatic/index.md index 65479badf..01a2a6007 100644 --- a/content/blog/2024-04-16-markdown-programmatic/index.md +++ b/content/blog/2024-04-16-markdown-programmatic/index.md @@ -141,9 +141,9 @@ print(key) ``` ## name mean sd file -## 1 MaĆ«lle 5 0.4 MaĆ«lle-hw.md -## 2 Christophe 8 0.7 Christophe-hw.md -## 3 Zhian 7 0.1 Zhian-hw.md +## 1 MaĆ«lle 3 1.0 MaĆ«lle-hw.md +## 2 Christophe 5 0.1 Christophe-hw.md +## 3 Zhian 3 0.8 Zhian-hw.md ``` Here's how Zhian's homework looks like: @@ -154,7 +154,7 @@ title: "Homework assignment 1" author: "Zhian" --- -Create a normal distribution with a mean of 7 and a standard deviation of 0.1: +Create a normal distribution with a mean of 3 and a standard deviation of 0.8: ```{r solution-1} # hint: use the rnorm function @@ -168,7 +168,7 @@ Create a normal distribution with a mean of 7 and a standard deviation of 0.1: You can use string manipulation tools to parse Markdown if you are sure of the Markdown variants your code will get as input, or if you are willing to grow your codebase to accommodate many edge cases... which in the end means you are writing an actual Markdown parser. Not for the faint of heart... neither necessary if you read the section after this one. :relieved: -You'd detect heading using for instance `grep("^#", markdown_lines)`[^edge]. +You'd detect headings using for instance `grep("^#", markdown_lines)`[^edge]. [^edge]: But this would also detect code comments! Don't do this! From f0ea6945225b7462e1152bfb0b24c1b85b81da17 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ma=C3=ABlle=20Salmon?= Date: Tue, 25 Jun 2024 12:21:51 +0200 Subject: [PATCH 6/6] add link --- .../blog/2024-04-16-markdown-programmatic/index.Rmd | 2 +- content/blog/2024-04-16-markdown-programmatic/index.md | 10 +++++----- 2 files changed, 6 insertions(+), 6 deletions(-) diff --git a/content/blog/2024-04-16-markdown-programmatic/index.Rmd b/content/blog/2024-04-16-markdown-programmatic/index.Rmd index 4eab8a24b..046bf6782 100644 --- a/content/blog/2024-04-16-markdown-programmatic/index.Rmd +++ b/content/blog/2024-04-16-markdown-programmatic/index.Rmd @@ -233,7 +233,7 @@ As with Markdown, you might need to use regular expressions but try not to. You can parse the code to XML using base R parsing and [xmlparsedata](https://r-lib.github.io/xmlparsedata/), then you manipulate the XML with [XPath](https://masalmon.eu/2022/04/08/xml-xpath/). To write code back, you can make use of the attributes of each node that indicates the original lines and columns. -So a possible workflow is: +So a possible workflow, as exemplified in a [blog post](https://masalmon.eu/2024/05/15/refactoring-xml/) is: - parse the code to XML, use xmlparsedata to inform what to change and where. Out of these steps you'd get a list of elements' positions for instance. - use brio to read the lines, change a few of them with base R tools, then use brio again to write the lines back. diff --git a/content/blog/2024-04-16-markdown-programmatic/index.md b/content/blog/2024-04-16-markdown-programmatic/index.md index 01a2a6007..30aeca8c2 100644 --- a/content/blog/2024-04-16-markdown-programmatic/index.md +++ b/content/blog/2024-04-16-markdown-programmatic/index.md @@ -141,9 +141,9 @@ print(key) ``` ## name mean sd file -## 1 MaĆ«lle 3 1.0 MaĆ«lle-hw.md -## 2 Christophe 5 0.1 Christophe-hw.md -## 3 Zhian 3 0.8 Zhian-hw.md +## 1 MaĆ«lle 5 1.0 MaĆ«lle-hw.md +## 2 Christophe 1 0.2 Christophe-hw.md +## 3 Zhian 1 0.5 Zhian-hw.md ``` Here's how Zhian's homework looks like: @@ -154,7 +154,7 @@ title: "Homework assignment 1" author: "Zhian" --- -Create a normal distribution with a mean of 3 and a standard deviation of 0.8: +Create a normal distribution with a mean of 1 and a standard deviation of 0.5: ```{r solution-1} # hint: use the rnorm function @@ -254,7 +254,7 @@ As with Markdown, you might need to use regular expressions but try not to. You can parse the code to XML using base R parsing and [xmlparsedata](https://r-lib.github.io/xmlparsedata/), then you manipulate the XML with [XPath](https://masalmon.eu/2022/04/08/xml-xpath/). To write code back, you can make use of the attributes of each node that indicates the original lines and columns. -So a possible workflow is: +So a possible workflow, as exemplified in a [blog post](https://masalmon.eu/2024/05/15/refactoring-xml/) is: - parse the code to XML, use xmlparsedata to inform what to change and where. Out of these steps you'd get a list of elements' positions for instance. - use brio to read the lines, change a few of them with base R tools, then use brio again to write the lines back.