Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The SAW Documentation Revolution (Part 1) #2202

Open
wants to merge 71 commits into
base: master
Choose a base branch
from

Conversation

ChrisEPhifer
Copy link
Member

@ChrisEPhifer ChrisEPhifer commented Jan 28, 2025

Overview

This is a pretty large PR, but is focused entirely on the contents and structure of doc.

TL;DR for reviewers: The following commits/commit ranges correspond to relatively high-level changes to SAW's documentation structure:

  • 51c669e
    Moves 'internal' documentation out of the way. A temporary measure modified later to include this documentation in the overall hierarchy.
  • dc42594 to e050f28
    Initial Markdown normalization and cleanup of doc/'s stale files.
  • 0df6156 to e98eea5
    Refactors the LLVM/Java tutorial into multiple Markdown sources, so maintenance is easier.
  • 9a72549 to ba8c180
    Similar to the above, but for the Rust tutorial
  • 81a6250 to a416bc9
    Ditto for the SAW User Manual.
  • dba1454 to c4444d1
    The tutorials/manual are given more memorable names.
  • c3eed8f to c318c83
    Rename 'internal' to 'development', and structure these documents to be included in the overall hierarchy.
  • a235e05
    Adds all of the necessary index.md files to tie the documentation together.
  • f808628
    Implements a pygments lexer for SAWScript (see full commit message for details - this can be removed eventually).
  • 71a2d4a
    Prepares doc/ for a Sphinx installation.
  • b7e0232 and 954ab67
    Add helper scripts to doc/ to build the documentation.
  • a15e5b7
    Add newly-rendered PDFs of existing resources.
  • 4266590
    Document the documents :)

TODOs not covered in this PR

  • Resolving pointers to doc/ that exist elsewhere in saw-script as appropriate

  • Moving other existing docs to doc/ (see the README for details)

  • Deploying documentation to the GitHub pages site for saw-script in CI

    The Cryptol setup to get this working with multiple versions is... complicated.
    I think we can adapt this SO answer for our purposes - it's far simpler than what Cryptol does currently, and should probably be considered as a solution in that context as well.

  • Replacing in-repo PDFs with appropriate CI/Action artifacts

Related issues

A few of the Markdown files that sat in doc/ are not intended for
public consumption (i.e. deployment to HTML/PDF versions of
documentation).
These include:

- doc/extcore.md -> doc/internal/extcore.md
  This is documentation for the SAWCore dump format, which
  is (hopefully) not something typical users are having to mess with.
- doc/limitations.md -> doc/internal/limitations.md
  A five-year-old attempt at documenting the limitations of SAW's use
  for verification.
  It should be updated or scrapped.
- doc/RELEASING.md -> doc/internal/releasing.md
  A (partially?) outdated documentation of SAW's release process.
  This needs some updating as well.
None of the files in doc/manual/code were referenced or used anywhere
in doc/manual, and at least one of these files made reference to
something completely nonexistent (see #2185).
One was ill-formed (links based on headers are written with a '#'
followed by a normalization of the header text), and another was
completely non-existent / referred to a section that had been renamed
at some point in the past.
This document is (in my opinion) fundamentally ill-formed, as there
are multiple level-1 headings.
Before fixing that, however, we might as well use the more-correct
incorrect thing.
I'm not entirely sure where the syntax `{.c}`/`{.rs}`/etc. came from,
but it was not recognized by the Sphinx Markdown engine while testing
locally, so I've used the conventional syntax everywhere instead.

This change doesn't break the existing PDF builds, either, so perhaps
it has to do with how the old website pages were built?
This temporarily makes this Markdown violate the "thou shalt only have
one <h1>" rule, however: Logically, we want this part of the SAW
documentation to be split into chapters, similar to the LLVM/Java
tutorial.
Before, this part of the documentation rendered to PDF (via Sphinx) as
one giant chapter.
Specifically, to 'Introduction'. Consistency is good!
Not in that order :)

In an effort to start standardizing the Markdown (e.g. removing
multiple top-level headers), this commit primarily restructures the
LLVM/Java tutorial into multiple source files.

This additionally makes individual parts of the tutorial easier to
maintain.
A similar change will be implemented for the manual and Rust tutorial
for the same reason.

This restructuring naturally broke the original PDF generation;
fortunately, the eventual introduction of Sphinx will give us both
HTML and PDF generation for free, and this generation strategy is no
longer necessary to maintain.

Note that this commit does _not_ perform any changes to the content of
the tutorial or indeed any other structural property of the Markdown.
Additional changes (e.g. standardizing the Markdown syntax used) will
be the subject of future commits.
This is part of a larger effort to standardize the Markdown sources
for SAW's various pieces of documentation.
Namely: Fenced with ``` ... ```, no extra space between the fence
marker and the language specifier, and _always_ labeled with a
language specifier.
Because we are using Sphinx/MyST, there is no need to use the
docode.hs script to insert code fragments, as we have the
literalinclude directive.

tutorial/using-smt-lib-solvers: Repair broken code references.

tutorial/compositional-proofs.md: Repair broken code references.

tutorial/interactive-interpreter: Repair broken code references.

tutorial/other-examples: Repair broken code references.

tutorial: Remove docode.hs.
And delete the PNG version, as the PDF will suffice to add the Galois
logo to PDF documentation.
A much more descriptive name :)
This prevents rendering issues, and is more consistent with the
directive mechanism.
It also allows for the use of 'admonitions', which may be helpful in
improving the documentation.
- Mismatched inline code elements
- Code blocks without whitespace in between
- Incorrect link (introduced by splitting up rust-tutorial.md)
- ::: fences
- {literalinclude} with :lines: for code/ sources (docode.hs defunct)
- {code-block} <lang> for everything else
- Invalid whitespace around headings / code blocks
- Bare URLs
- Incorrect emphasis notation (** instead of __)
- Invalid links to other pages/sections
- Code blocks (::: fences)
- __ for emphasis instead of **
- '-' instead of '*' for unordered lists
- Unordered -> Ordered (and vice versa) where appropriate
- Fix broken link(s)
For consistency with the actual title of this resource.
Analogous to dba1454 - more consistent with the title of the book
whose sources are contained within.
These documents may not be intended for most users, but they ought to
be available to read in a nicer format than Markdown files in the
repository.
Since we control which documents are rendered to PDF (i.e. when we
`make latexpdf`) explicitly, these developer documents will only be
rendered as part of the HTML (that will eventually be our GitHub Pages
site for saw-script).

This change reflects that these are documents related to the
development of SAW (rather than generically 'internal').
Additional restructuring will follow in additional commits.
This maintains the original internal linking of the affected
documents while using  explicit _reference labels_ rather than links
based on document names / headers.

In addition to removing dependence on references generated from
headers, which are subject to change during content editing (thereby
breaking links), reference labels allow all documents (eventually
managed by Sphinx) to straightforwardly cross-reference one another,
and are one of the documentation engine's most compelling features.

There is, however, a catch: For Sphinx to correctly render
cross-referenced documents, reference labels must be unique _to the
entirety of the documentation_.
In other words, the SAW User Manual and Rust Verification with SAW
tutorial could not both define a reference label `my-reference`.

There is no local scoping mechanism for reference labels, so there is
necessarily a tradeoff between creating a highly-connected, wiki-like
documentation (where items are given reference labels more often than
not), and having a massive namespace of reference labels to manage.
This is something to consider for the future, but certainly not our
biggest concern regarding documentation at the moment.
And normalize the Markdown formatting in the process.

This is analogous to earlier commits relating to the manual and
tutorials (but simpler, since there was no existing infrastructure
around this document).
And by 'fix', I mean: Add a document containing a verbatim rendering
of the 'real' `CHANGES.md` at the SAWScript repository root, and link
to that.

An alternative that was considered that would have resulted in the
'real' `CHANGES.md` being rendered directly was the `{include}`
directive.
Not only is it the case that "the include directive represents a
potential security hole"[^1], Sphinx will attempt (and fail) to
resolve the links within, which will fail fantastically since most of
them are outside of the doc/ root.

[^1]:
https://docutils.sourceforge.io/docs/ref/rst/directives.html#include
It will be a convenient place to link to from the index of the SAWCore
external format description in the developer materials.
To clarify that `make_docs.sh` handles setting up the Python environment, this
commit adds a link to the section of the README about the `setup_env.sh` script.

There is also a reiteration of the fact that a Python installation >=3.9 is
required.
Since we're still working on things, there's no good reason to upstream this to
`pygments`, and thusly no good reason to mention it in the README.

Additionally, using Markdown's absurdity, I reworded the description of
`saw-lexer` to properly refer to how code blocks need to be labeled for the
SAWScript lexer to be applied for highlighting (the absurdity being the need for
_four_ backticks on either side, in order to render three backticks within in
the code block).
So that it's clearer that the exclusion of the index document means exclusion of
its title, and that titles for PDFs are configured when adding a new entry to
the list of PDFs to generate.
First, my comment was inaccurate -- there is a big table of builtins in the
interpreter (thanks @sauclovian-g for pointing this out).
As he rightfully noted, one day we will like to have these / their docstrings
generate an appendix for the user manual.

Second, other than looking a bit fancier in the render, there's not too much
reason to give special treatment to the builtin functions and how they render;
so, since we already plan to generate documentation of the builtins eventually
anyway, we can let them be lexed as regular identifiers (and in the future add
references to the appendix).
We're going to hide developer docs from Sphinx anyway, but this essentially puts
things back to how they were before this PR (with the addition of an extra '..'
to get to the right place.

Essentially, this change reflects our desire for doc/developer/releasing.md to
be readable on GH, and properly link to the change log at the repository root.
As originally planned, this commit hides the development/ directory from the
Sphinx build, and removes all links to it from the document tree.

See #2202 (comment) for
relevant conversation on the matter of publishing developer documentation.
TL;DR: For now, there is nothing we want to publish (extcore is not something
intended for broader use than loading/storing SAW Terms, the release process is
only relevant to developers, and the limitations document is old / needs review
before being added to user-facing documentation, at which time it would likely
be best added to the manual instead of as a separate resource).
And, in the doc/README, move the description of the directory to a more
appropriate section of the file.
And, notably, we don't need bash, so use /bin/sh instead.

This allows the script to be run from anywhere, as long as the requirements.txt
file stays adjacent to it in the filesystem.

This is documented in doc/README.
And move setup_env.sh, make_docs.sh, and requirements.txt, which cleans up the
doc/ root significantly.

Includes updated documentation of the file structure in doc/README.
The use of `make` is ubiquitous to Sphinx builds, and it was rather
inappropriate to introduce scripts that invoked it to build the documentation.
This restores one functionality of the old make_docs.sh, namely the packaging of
code example directories.
Thank you to @sauclovian-g for suggesting this rearrangement - not only is it
"more correct", it resulted in smaller, more reusable scripts and a more
intuitive build process.
That's great!

First, we explicitly list SUPPORTED_DOCS (that is, the targets Sphinx's old
Makefile understood that we wish to continue supporting).
This allows us to (1) always show up-to-date help text documenting what someone
can run to build the docs, (2) easily add additional format support, and (3)
replace the old catch-all target with a more helpful suggestion given these
changes.

We replace Sphinx's help with a series of messages listing available targets,
but still leave Sphinx's original help available (this is mostly for developers
looking to add support for additional Sphinx targets) via target sphinx-help.

Targets setup-env and package-code are methods of executing the scripts in
scripts/.
These are also the dependencies of the document-rendering targets, and anything
related to running Sphinx.

mostlyclean and clean respectively clean up the documentation build (including
package code examples) and all of that plus the Python environment.

install-pdf copies rendered PDFs from the Sphinx output directory to doc/pdfs,
building them if necessary.
To eventually validate that the 'correct' PDFs have been checked in, this commit
adds a mechanism to control the reproduction of SAW documentation PDFs: The
SOURCE_DATE_EPOCH environment variable (see
https://reproducible-builds.org/docs/source-date-epoch/).

We set this in a .env file, and use if in the Makefile on documentation builds
so the last edit time is consistent.

For now, this value is set to (GMT) Tuesday, April 1, 2025 12:00:00 AM, but for
releases should be set to an epoch timestamp corresponding to the date of
release (for example).

Release instructions for the repository have been updated to reflect the need to
update this variable at release time (and correspondingly, the documentation
PDFs themselves).
In particular:

- Explicitly note that `make` is required to build documentation
- Replace the description of `make_docs.sh` with a description of the `Makefile`
  and its targets, as well as the `scripts/` directory and contents
- Update troubleshooting recommendations to match
- Update descriptions of the `doc/` directory contents:
  - In particular, remove note about removing PDFs from the repository.
    This was discussed out-of-band, and we aligned on leaving the PDFs around
    for developer benefit while browsing the repository
- Remove the no-longer-necessary disclaimer about packaging code
This reduces the 'escaped' dollar signs and produces more sensible
output (i.e. the output shows the result of expansion).
This Python package implements a new Sphinx role, download-dir, that can be used
to indicate that a referenced directory should be archived and made
downloadable.

For example: If we write {download-dir}`/my/cool/dir`, the role attempts to
create `$SOURCEDIR/my/cool/dir.tar.gz` from the contents of
`$SOURCEDIR/my/cool/dir`, where $SOURCEDIR is the Sphinx project root (for us,
`saw-script/doc/`).

In short, this means that you can only create downloads out of directories
somewhere under in `doc/`, including `doc/` itself.

As is Sphinx's convention, then, paths starting with `/` indicate that the
path is rooted at $SOURCEDIR, and relative paths are relative to the document
where the role was used.

The role also supports
{download-dir}`links with explicit titles <thing/to/download>`; when no title is
provided, the link will be the name of the archive (i.e. the stem of the target
directory + ".tar.gz").

This is an improvement over our current solution, which is inflexible (requiring
that only specific directories be used as download targets) and doesn't fully
utilize Sphinx's/docutils's extensible role framework (instead using scripts).
And, remove a spurious comment.

This makes sphinx-download-dir available in the build environment created by
scripts/setup_env.sh.
This will keep workspaces clean when building locally, and encourages handling
the archiving process via sphinx-download-dir.

There is currently one exception in the Rust tutorial sample code, where a
tar.gz of a large JSON file is provided.
Of course, one can use git add -f to commit additional archives as necessary.
This replaces uses of MyST's <path:...> link format used to generate download
links to individual files with uses of the role introduced by the
sphinx-download-dir package.
Given that we now use sphinx-download-dir (which handles the archiving of code
samples during the Sphinx build), we have no need for the packaging script /
make target.

Furthermore, we make the mostlyclean target interactive in its deletion of
tar.gz archives, to prevent unintentionally deleting a checked-in archive that
is part of a sample directory (the Rust tutorial is a case like this).

This is mostly necessary since we no longer force the sample directory / archive
to have a specific name, and simply attempt to clean up _all_ tar.gz archives in
and under `doc/`.
Mostly focuses on the effects on users (namely, that the PDFs they get with
releases have changed a bit).
Copy link
Member Author

@ChrisEPhifer ChrisEPhifer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I believe I addressed all of the review comments, and in addition I was able to further clean things up by replacing the somewhat hacked-together code sample packaging with a flexible Sphinx role.

doc/README.md Outdated Show resolved Hide resolved
doc/README.md Outdated Show resolved Hide resolved
doc/README.md Show resolved Hide resolved
doc/README.md Outdated Show resolved Hide resolved
doc/README.md Outdated Show resolved Hide resolved
doc/development/releasing.md Outdated Show resolved Hide resolved
doc/development/releasing.md Outdated Show resolved Hide resolved
doc/rust-tutorial/case-study-salsa20.md Outdated Show resolved Hide resolved
doc/tutorial/sawScriptTutorial.bib Outdated Show resolved Hide resolved
Copy link
Contributor

@sauclovian-g sauclovian-g left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more minutiae, and there's one leftover from the last batch, but in general looks good. Thanks for taking care of all this!

@@ -49,16 +49,6 @@ class SAWScriptLexer(RegexLexer):
bygroups(Keyword.Type, Whitespace),
),
(r"(true|false|abc|z3)(\s+)", bygroups(Keyword.Constant, Whitespace)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not to kibitz further or anything 😸 but abc and z3 should probably get dropped too.

(or add the other solvers, but the full list is pretty large... there's eight each with four variant calls, plus or minus, plus a few addenda)

I think what we should do in the long run is either have saw dump out info from the primitives table, or generate both this and the primitives table from an external definition file. But right now the primitives table doesn't have enough info to make this worthwhile.

(Also what I should probably do at some point is run pygments manually on a bunch of existing proof code and fiddle with the coloring until it looks nice. FUTURE...)

$(info Tidy everything, including Python venv: 'make clean')

sphinx-help: setup-env
@. .venv/bin/activate && $(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you wrap this (and the other one) in parens so it runs in a subshell?
That is,

@(. ./venv/bin/activate && ...)

The reason is that the lifetime of the shells spawned by make is unspecified, and while it's de facto ok (especially with gmake) ... the extra fork doesn't cost anything important, and accidentally leaking the venv into some other command that wasn't supposed to have it could end up extremely confusing.

doc/Makefile Outdated Show resolved Hide resolved
@@ -0,0 +1 @@
SOURCE_DATE_EPOCH=1743465600
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you put a comment here indicating what it expands to?
(Also it might be nice to note that you can generate the stamp with date +%s -d 20250401)
(although apparently this works on BSD and Linux but not MacOS)

(and it might be better to put this in scripts/epoch.mk or something like that)

This can be used to provide downloads in the HTML rendering of the
documentation (see [below](#code-examples)).
- `install-pdf`: Build and install PDF renderings to `doc/pdfs/`.
- `mostlyclean`: Clear out all packaged code and Sphinx-generated files.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My preference for these two would be clean and distclean respectively, since the general convention is that clean removes build products and distclean also removes configuration. FWIW, may not matter...

@@ -2,4 +2,4 @@ __pycache__
*.egg-info

_build
code.tar.gz
*.tar.gz
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears .venv didn't get in here

pdf: latexpdf

install-pdf: latexpdf
mkdir -p pdfs
cp -f $(shell ls _build/latex/*.pdf | sed '/\/galois\.pdf$$/d') pdfs

mostlyclean:
rm -i $(shell find . -path ./$(BUILDDIR) -prune -o -name '*.tar.gz' -print)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is desirable (for one thing the CI ought to be able to run this; even if we don't need it now it's bound to come up eventually...)

But also, if you put *.tar.gz in .gitignore, git will get all upset and/or confused about having one checked in.

The one preexisting .tar.gz file contains a single file and could just be a .gz file instead. (Unless there's some specific reason to also use tar. @RyanGlScott?) Alternatively we could put the generated downloads somewhere else, e.g. in their own downloads/ subdir.

every Python under the sun).

This is enough to generate the HTML versions of documentation, which is what is
deployed to <https://galoisinc.github.io/saw-script>.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this accidentally got dropped again; the current version of the README doesn't have it. Can you stick something like "By default the build will configure and download a Python virtual environment in .venv; to set this up explicitly beforehand, run make setup-env" at the bottom of ### Prerequisites?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Nonexistent java_symexec in doc/manual/code
3 participants