Change search_index to be a JSON file #2517

aaruni96 · 2024-05-31T17:43:20Z

Not the best thing at the moment, because the fetch() function assumes index lives in "./". So search only works from /index.html.
I don't really know how to fix this.

Addresses #2516 , when its fixed.

Not the best thing at the moment, because the fetch() function assumes site is served from / So it won't find search_index.json if site is project.com/docs instead of docs.project.com

aaruni96 · 2024-06-07T10:40:03Z

Search now works from all pages in the built docs, and the basic file is a valid JSON file, which can be manipulated using standard JSON tools.

fingolfin · 2024-06-12T11:47:18Z

@mortenpi @fredrikekre could you please let the CI tests run on this one?

It seems quite useful to me (but I am biased, I work with @aaruni96 ;-) ).

mortenpi · 2024-06-13T08:09:42Z

Sorry for the delay here.

So, the reason we have it as a .js file is that you could load it from the .html files if you open them locally with the file:/// protocol in the browser (i.e. without having a web server, like LiveServer) running. Last I checked, browser security rules don't allow you to fetch files from a higher level directory. I.e. for example if you have build/search_index.json and try to fetch it from build/foo/bar.html, it won't be allowed.

And I think we still want to support that file:/// case (especially if prettyurls=false), e.g. for the Julia offline documentation.

Just a thought I had that we could try though is put the fetch call into the search_index.js file. I suspect it won't work, but maybe it will -- browsers can be weird 🤷

If that doesn't work, then I'm inclined to leave the default to generate the .js file. But we could have an option to change to the .json, if there's demand. Alternatively, we could also add an option to generate the .json in parallel maybe, but still use .js in the Documenter HTML? But I would like to understand the use case in more detail, before adding the complexity. There is also code out there that depends on search_index.js having the format that it has, so changing this will have some cost.

Last note: purely on aesthetic grounds, I agree that a JSON file would be cleaner. And would also make it much easier for anyone to consume or post-process it. But it's the file:/// requirement that made me implement it this way.

aaruni96 · 2024-06-13T10:02:03Z

browser security rules don't allow you to fetch files from a higher level directory

MDN seems to suggest fetch() will just fail if the URL starts with file:// ( https://developer.mozilla.org/en-US/docs/Web/HTTP/CORS/Errors/CORSRequestNotHttp ).

Just a thought I had that we could try though is put the fetch call into the search_index.js file. I suspect it won't work,

That idea doesn't work for two reasons: 1) fetch still throws a CORS error, and 2) I couldn't figure out a way to have the path of the index be relative to the current page, if it was all inside the javascript.

But I would like to understand the use case in more detail, before adding the complexity

Currently the documentation for the OSCAR project (https://docs.oscar-system.org) includes the documentation for sub projects like Nemo, Hecke, AbstractAlgebra. We do want some, but not all documentation pages from these sub projects. While we make the unwanted pages unreachable via navigation, they still show up in search, being confusing to users (functions with the same name but different method might work in Nemo, but not Oscar). If the search index were a standard json file, we could run a post process script to trim out these unwanted pages from search.

This is the situation as I understand it, maybe @fingolfin can correct me.

aaruni96 · 2024-06-13T10:05:30Z

So, the reason we have it as a .js file is that you could load it from the .html files if you open them locally with the file:/// protocol in the browser

Local browsing anyways doesn't work well if prettyurls is true (default).

Maybe a compromise can be, use a valid json file if prettyurls is true, and the current method of loading the index to a variable from within the search_index.js if prettyurls is false ?

aaruni96 · 2024-06-13T10:08:26Z

To uncomplicate the two codepaths, we could keep the name search_index.js.

if prettyurls is true, add the fetch to HTML pages, and don't add the assignment to search_index.js
if prettyurls is false, don't write the fetch to the HTML pages, and add the assignment to the front of search_index.js

Would that be an idea worth considering ?

mortenpi · 2024-06-14T01:24:40Z

I would prefer not to complicate prettyurls -- it is meant to configure one thing. In an ideal world, instead of individual options, I think we'd have two modes: (1) web server / deploy mode, (2) local build mode. But we don't at the moment.

For the OSCAR use case.. what about hooking into the search index generation in Julia, in make.jl? Would that work, if we'd have some systematic way to do it? Or are you pulling in pre-built HTML?

But if you want to process the JSON file, then I'd say let's make this opt-in, configured via an option to Documenter.HTML.

browser security rules don't allow you to fetch files from a higher level directory

MDN seems to suggest fetch() will just fail if the URL starts with file:// ( https://developer.mozilla.org/en-US/docs/Web/HTTP/CORS/Errors/CORSRequestNotHttp ).

Uh, yea, okay, looks like things have become more strict, not less, over time. My knowledge predated CVE-2019-11730 😄

mortenpi · 2024-06-14T01:24:52Z

src/html/HTMLWriter.jl

        # convert Vector{SearchRecord} to a JSON string + do additional JS escaping
-        println(io, JSDependencies.json_jsescape(ctx.search_index), "\n}")
+        println(io, JSDependencies.json_jsescape(ctx.search_index))


Needs to be double checked, but JSON might need a different escaping function.

It looks to me like json_jsescape only escapes some newline unicode, so it turns already valid JSON to also be valid javascript (also, according to comments in the source code, this is not required since 2019).

More closely looking at the function, it calls JSON.json(), and then escapes the problematic unicode in whatever is returned, so this should be fine (at least for this change).

aaruni96 · 2024-06-14T12:10:33Z

what about hooking into the search index generation in Julia, in make.jl?

I'm going to need some more explanation here...

Or are you pulling in pre-built HTML?

Best as I can tell, we somehow convince Documenter to build the documentation for Oscar + all the sub projects at the same time, right @fingolfin ?

fingolfin · 2024-06-14T12:44:07Z

We do all kinds of crude things and hacks in https://github.com/oscar-system/Oscar.jl/blob/master/docs/make_work.jl to make it work, including using a nasty hack to workaround the fact that PR #552 here (from 2017) still hasn't been merged (perhaps I can volunteer @aaruni96 to help finish that? ;-) )

But I'll try to stay on topic: we actually copy documentation from Hecke, Nemo, AbstractAlgebra, but we do by copying (or rather: symlinking) .md files, not compiled HTML. See lines 142ff in the linked make_work.jl.

So I assume that the index creation is after that, so hooking into the search index generation might indeed work?

mortenpi · 2024-06-26T03:09:23Z

Sorry for the delay. So, the search index is just a linear list we populate while HTMLWriter goes through the document

Documenter.jl/src/html/HTMLWriter.jl

Line 646 in 0c55392

search_index :: Vector{SearchRecord}

It then gets written out here at the end

Documenter.jl/src/html/HTMLWriter.jl

Lines 809 to 813 in 0c55392

    
           open(joinpath(doc.user.build, ctx.search_index_js), "w") do io 
        
               println(io, "var documenterSearchIndex = {\"docs\":") 
        
               # convert Vector{SearchRecord} to a JSON string + do additional JS escaping 
        
               println(io, JSDependencies.json_jsescape(ctx.search_index), "\n}") 
        
           end

We could add an interface where you e.g. pass a callback to Documenter.HTML that gets passed that list just before it's written and you can then to whatever processing you want in make.jl, and you return a new list. And it defaults to identity, of course.

The SearchRecord-s themselves are pretty straightforward:

Documenter.jl/src/html/HTMLWriter.jl

Lines 619 to 627 in 0c55392

    
           struct SearchRecord 
        
               src :: String 
        
               page :: Documenter.Page 
        
               fragment :: String 
        
               category :: String 
        
               title :: String 
        
               page_title :: String 
        
               text :: String 
        
           end

Documenter.jl/src/html/HTMLWriter.jl

Lines 703 to 714 in 0c55392

    
           function JSON.lower(rec::SearchRecord) 
        
               # Replace any backslashes in links, if building the docs on Windows 
        
               src = replace(rec.src, '\\' => '/') 
        
               ref = string(src, rec.fragment) 
        
               Dict{String, String}( 
        
                   "location" => ref, 
        
                   "page" => rec.page_title, 
        
                   "title" => rec.title, 
        
                   "category" => rec.category, 
        
                   "text" => rec.text 
        
               ) 
        
           end

aaruni96 and others added 2 commits May 31, 2024 19:40

Change search_index to be a JSON file

7111ab6

Not the best thing at the moment, because the fetch() function assumes site is served from / So it won't find search_index.json if site is project.com/docs instead of docs.project.com

Load search index for all pages

d6dd0ab

mortenpi added the Format: HTML Related to the default HTML output label Jun 13, 2024

mortenpi reviewed Jun 14, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change search_index to be a JSON file #2517

Change search_index to be a JSON file #2517

aaruni96 commented May 31, 2024 •

edited

Loading

aaruni96 commented Jun 7, 2024

fingolfin commented Jun 12, 2024

mortenpi commented Jun 13, 2024

aaruni96 commented Jun 13, 2024

aaruni96 commented Jun 13, 2024 •

edited

Loading

aaruni96 commented Jun 13, 2024

mortenpi commented Jun 14, 2024

mortenpi Jun 14, 2024

aaruni96 Jun 14, 2024

aaruni96 commented Jun 14, 2024

fingolfin commented Jun 14, 2024

mortenpi commented Jun 26, 2024

Change search_index to be a JSON file #2517

Are you sure you want to change the base?

Change search_index to be a JSON file #2517

Conversation

aaruni96 commented May 31, 2024 • edited Loading

aaruni96 commented Jun 7, 2024

fingolfin commented Jun 12, 2024

mortenpi commented Jun 13, 2024

aaruni96 commented Jun 13, 2024

aaruni96 commented Jun 13, 2024 • edited Loading

aaruni96 commented Jun 13, 2024

mortenpi commented Jun 14, 2024

mortenpi Jun 14, 2024

Choose a reason for hiding this comment

aaruni96 Jun 14, 2024

Choose a reason for hiding this comment

aaruni96 commented Jun 14, 2024

fingolfin commented Jun 14, 2024

mortenpi commented Jun 26, 2024

aaruni96 commented May 31, 2024 •

edited

Loading

aaruni96 commented Jun 13, 2024 •

edited

Loading