Caching of assets? #707

siefkenj · 2024-03-19T15:19:27Z

Currently there is some support for rebuilding assets only if they've changed, but it seems to rely on document structure. Since assets are extracted and them compiled in isolation, I imagine if you stored <md5sum>.svg files in some .cache folder, you could just detect if the asset contents was the same and copy over the cached version instead of running compile again. This method would not rely on document structure at all.

The text was updated successfully, but these errors were encountered:

StevenClontz · 2024-03-19T17:12:44Z

+1

So we have an element like <latex-image xml:id="bar">FOO</latex-image>, we checksum FOO to abc123, then save the result to .cache/latex-image/abc123.svg as well as generated-assets/latex-image/bar.svg. Then on future builds, we simply copy .cache/latex-image/abc123.svg to generated-assets/latex-image/bar.svg (or wherever it should be, in case the filename changes.

rbeezer · 2024-03-19T17:43:35Z

+1

…

On March 19, 2024 10:13:06 AM PDT, Steven Clontz ***@***.***> wrote: +1 So we have an element like `<latex-image xml:id="bar">FOO</latex-image>`, we checksum `FOO` to `abc123`, then save the result to `.cache/latex-image/abc123.svg` as well as `generated-assets/latex-image/bar.svg`. Then on future builds, we simply copy `.cache/latex-image/abc123.svg` to `generated-assets/latex-image/bar.svg` (or wherever it should be, in case the filename changes.

oscarlevin · 2024-03-19T18:37:43Z

I'm not sure I understand what issue this resolves. Currently, If you have an asset with xml:id="bar" (or if bar is the id of the youngest ancestor of the asset that has an xml:id), then we store the hash of the asset with the xml:id. If the author changes the asset, then the hashes won't match, so we ask for the asset to be regenerated (and put into the generated-assets).

With this proposal, we keep a copy of the generated asset in .cache. If the author changes the asset, the hash will no longer match, so we regenerate the asset (an put it in .cache and generated-assets).

In both cases, if the asset isn't changed, nothing gets regenerated.

Last case: the asset isn't changed, but the xml:id is changed. Now, the asset is regenerated. Under the proposal, the asset isn't regenerated, but a new copy is made with the new name. I see there is an advantage here, but the disadvantage is keeping every version of the generated asset in the cache and copying over every asset from the cache to generated-assets.

What am I missing?

StevenClontz · 2024-03-19T18:59:11Z

Another potential use-case: user has <latex-image xml:id="foo">BAR</latex-image> and later <latex-image xml:id="baz">BAR</latex-image>. Maybe it's an anti-pattern that should have been solved with an xref but this would avoid building the same image twice.

siefkenj · 2024-03-19T19:04:47Z

This would also mean images are cached without assigning an ID to them.

StevenClontz · 2024-06-16T21:01:41Z

I'm waiting on https://github.com/TeamBasedInquiryLearning/precalculus/actions/runs/9538778663 and I'm seeing a lot of duplication of assets being generated. This could probably be avoided through cleverer configuration of the action, but I still think having a .generated-cache directory that contains a bunch of ELEMENT/FORMAT/HASH.FMT files that is checked before every build and copied over (barring some kind of --force-regenerate) would be excellent.

Another use case: I change my sageplot from blue to green, then hate it, then change it back to blue. The old blue version is still cached so I get it immediately.

oscarlevin · 2024-06-17T02:25:24Z

I am coming around to really liking this idea. I think this would be handled by core though, correct? So definitely something we will want to collaborate on.

StevenClontz · 2024-06-17T03:42:41Z

I think this would be handled by core though, correct?

💯 - and this is a good week to do it

StevenClontz · 2024-06-18T17:45:08Z

Caching should be used in tandem with https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows to speed up CI/CD for PreTeXt projects

StevenClontz · 2024-06-18T17:46:20Z

(meanwhile: https://github.com/TeamBasedInquiryLearning/precalculus/actions/runs/9569658606/job/26382647393 💀)

oscarlevin · 2025-01-20T17:56:38Z

Okay, @StevenClontz and @siefkenj, I'm going to implement this this week. Here is what I'm thinking; please feel free to nudge me a different direction if you have time to consider this.

When a user runs pretext build, we want to ensure the generated assets are up to date. We do the following in order:

Hash assets to see if any appear to have changed. For any type of asset that has a change, we add that type to the list that should be generated.
For each asset type that should be generated, we call core.generate_* on the entire document (not subsetting like we do now, unless this was explicitly requested). We also pass in our version of "individual_generate_asset" using the provided hook from core (already implemented for asymptote, latex-image, and sage).
Our custom generator function gets passed an individual file that will be generated. We hash this and check whether hash.ext exists in .generated_cache. If so, we just copy it over where it should go. If not, we call core's individual generation function and get the new asset that way, but in addition, make a copy into our .generated_cache folder with the appropriate hash and ext.
After all the assets are successfully generated (or copied), we update the hash table for that target.

I think that the .generated_cache folder should live in the root of the project and be added to .gitignore. It could also go inside generated_assets and when we build we don't copy it over ever.

Of course we would keep all the forced generation flags we have. Probably should also add a way to clear the generated cache (perhaps pretext generate --clean; or should this be done whenever forced generate happens?).

StevenClontz · 2025-01-21T21:07:42Z

I'm a little hazy on what gets hashed and what we compare with.

I imagine something like this workflow (which may be exactly what you're suggesting):

When generating assets for <element>...</element> (which should be expanded for any xi:includes) for format *.fmt, first hash the string <element>...</element> as HASH.
Check if HASH.fmt exists in .cache. (note: I would call it .cache to not be confused with generated-assets)
- If it does:
  1. Copy HASH.fmt to the correct filename within generated-assets
- If it does not:
  1. Use core PreTeXt routine to create correct file in generated-assets
  2. Copy it as HASH.fmt to the ._cache directory

oscarlevin · 2025-01-21T23:26:37Z

That's basically the plan, except that we will hash the output of the extract_*.xsl where * is the asset type. So we are hashing the actual latex/tikz, not the xml source. This is better anyway, since there is no way to tell core to just build the xml element; it would need to extract all the tex code anyway.

StevenClontz · 2025-01-21T23:32:01Z

Does every format have an extract_ file? I think this might also be useful for preview images for interactives and YouTube videos, to avoid network calls and headless browsers.

StevenClontz · 2025-01-21T23:33:30Z

And would <latex-image>foo</latex-image> and <sageplot>foo</sageplot> be hashed the same or different?

oscarlevin · 2025-01-21T23:50:02Z

Yeah, probably the same, although I don't know exactly what the extract templates do.

I don't think this is an issue for latex-image, as you wouldn't have \begin{tikzpicture} in either a sage or asymptote. But perhaps there would be collision between those two?

I suppose we could always prepend the asset type to the hash.

StevenClontz · 2025-01-22T01:37:49Z

Another option: .cache/latex-image/HASH.fmt vs .cache/sageplot/HASH.fmt.

oscarlevin · 2025-01-22T01:54:30Z

Great idea. Implemented in #908

StevenClontz mentioned this issue Aug 1, 2024

shared generated_assets cache #811

Closed

oscarlevin mentioned this issue Aug 16, 2024

Add --clean option for generate #819

Closed

oscarlevin closed this as completed Jan 20, 2025

oscarlevin reopened this Jan 20, 2025

oscarlevin mentioned this issue Jan 21, 2025

Image cache #906

Merged

oscarlevin closed this as completed in #906 Jan 22, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Caching of assets? #707

Caching of assets? #707

siefkenj commented Mar 19, 2024

StevenClontz commented Mar 19, 2024

rbeezer commented Mar 19, 2024 via email

oscarlevin commented Mar 19, 2024

StevenClontz commented Mar 19, 2024

siefkenj commented Mar 19, 2024

StevenClontz commented Jun 16, 2024 •

edited

Loading

oscarlevin commented Jun 17, 2024

StevenClontz commented Jun 17, 2024

StevenClontz commented Jun 18, 2024

StevenClontz commented Jun 18, 2024

oscarlevin commented Jan 20, 2025

StevenClontz commented Jan 21, 2025

oscarlevin commented Jan 21, 2025

StevenClontz commented Jan 21, 2025

StevenClontz commented Jan 21, 2025

oscarlevin commented Jan 21, 2025

StevenClontz commented Jan 22, 2025

oscarlevin commented Jan 22, 2025

Caching of assets? #707

Caching of assets? #707

Comments

siefkenj commented Mar 19, 2024

StevenClontz commented Mar 19, 2024

rbeezer commented Mar 19, 2024 via email

oscarlevin commented Mar 19, 2024

StevenClontz commented Mar 19, 2024

siefkenj commented Mar 19, 2024

StevenClontz commented Jun 16, 2024 • edited Loading

oscarlevin commented Jun 17, 2024

StevenClontz commented Jun 17, 2024

StevenClontz commented Jun 18, 2024

StevenClontz commented Jun 18, 2024

oscarlevin commented Jan 20, 2025

StevenClontz commented Jan 21, 2025

oscarlevin commented Jan 21, 2025

StevenClontz commented Jan 21, 2025

StevenClontz commented Jan 21, 2025

oscarlevin commented Jan 21, 2025

StevenClontz commented Jan 22, 2025

oscarlevin commented Jan 22, 2025

StevenClontz commented Jun 16, 2024 •

edited

Loading