Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTML in output (proof-of-concept) #5172

Open
PerBothner opened this issue Sep 30, 2024 · 10 comments
Open

HTML in output (proof-of-concept) #5172

PerBothner opened this issue Sep 30, 2024 · 10 comments
Labels
type/proposal A proposal that needs some discussion before proceeding

Comments

@PerBothner
Copy link
Contributor

My html-blocks forkallows an application to "print" output lines containing HTML to an xterm.js terminal. This is a generalization of existing extensions to "print" images, such as using Sixel. However, using HTML is often preferable, as the output can be scaled, selected (copied), can include clickable links and buttons, is usually more compact, and more easily serialized. The output can also re-flow (based on terminal width and zoom), and can also react to style changes, such a light vs dark mode.

The current implementation is restricted to HTML blocks that extend the full width of the screen. It is also limited to output that gets appended to the end of the buffer. While limited, this is basically what you need to support a REPL with "rich" output. For example a graphing program that display plots using SVG. Emitting nicer-looking and copyable tables. A symbolic math program that emits formulas using MathML.

Replacing/updating previously-printed HTML blocks would be a straight-forward extension. Another natural extension would be a protocol for buttons that when clicked sends a string to the application.

This is a very preliminary proof-of-concept, not usable for real use. It is based on my buffer-cell-cursor fork, which is mostly-usable (though there are still bugs to fix). I think of the html-blocks branch as an example and motivation for the buffer-cell-cursor branch: The former adds a new class ElementBufferLine that extends BufferLine.

Screenshots and examples

Gnuplot is plotting program that can emit plots in a number of formals, including SVG and "domterm" (which is just SVG wrapped in an escape sequence). Gnuplot defaults to "domterm" output when the DOMTERM environment varible is set. The following shows an example, running gnuplot in batch (rather than interactive) mode.

Screenshot from 2024-09-29 17-49-50

More examples later.

Issues

  • Before polishing and finishing the html-blocks branch we need to polish and finish the buffer-cell-cursor branch that it depends on.

  • Scrolling is xterm.js is done as multiples of rows. This doesn't work well when lines are different heights. A work-around is to treat an HTML block as multiple rows, rounding up the height divided by standard row height. However, this leads to ugly excess space. It is also not good long-term. For example one might want to allow plain-text lines with a mix of font sizes:
    We don't want each line to be some multiple of the "standard row size" depending on the font used.

    Probably the best solution is to change the scrolling API and implementation to work in terms of pixels rather than rows.
    This is not inherently complicated, but it is extensive. I added a scrollPartialLines option to enable scrolling by fractional rows, but it does not do much yet. Getting it working should be a separate issue and PR.

  • If lines no longer are a fixed height, mapping between pixel offsets and (row, char) offsets are no longer simple
    multiplications or divisions. Linear or binary search may be needed, augmented with caching. However, note that while (for example) mapping a mouse click to a (row, char) offset may require a linear or binary search, the constant factors are small, since we are restricted to the visible screen.

  • Only the Dom renderer has the needed support, but I see no reason the WebGl renderer should be a problem.

  • Selection is not implemented. Ideally, one would want selection to extend across both regular rows and parts of HTML blocks.

  • Re-flow on screen resize is not implemented.

  • Truncation of output is not implemented. (Most people who want rich HTML output will probably want infinite scrollback, so it is a lesser priority.)

  • Serialization of HTML segments has not been implemented, though no complicated issues are foreseen.

Trying it out

Let me know if you want to try it out.

My current test-bed uses xterm.js embedded in DomTerm. DomTerm provides "safety-scrubbing" of the HTML that most people will want, and some other features to make the feature easier. I can provide instructions, if requested.

The next step is to make this feature not depend on DomTerm. Specifcally, it should be accessible from the Demo. This would probably involve a new addon (which we might call addon-html-blocks). This would include customizable safety-scrubbing.

@jerch
Copy link
Member

jerch commented Sep 30, 2024

@PerBothner The first question coming to my mind is indeed about security - can we get this secure enough within a browser env? Linked to this:

  • What about JS in html snippets? Can we forbid any JS? Could we allow JS but keep it separated from the terminal context (ECMA spec speaks here of "realms")?
  • What about asset sources? Can we forbid side channel loading, force them all to be inlined? If not, can we force them to be routed through the TE connection?

Currently I have my doubts about both, JS execution and asset loading. We might get somewhere here with putting the snippets into iframes and applying very strict security policies, re-routing of asset source can be achieve with a service worker. Still that type of "isolation" is subject of many browser bugs, so I feel a bit uneasy about the whole concept.

To not only argue with FUD from my side - the main attack vector I see here is breaking out of any secluded area and gaining access the the terminal's JS context and thus direct shell access. Such a break could happen through any weak security setup in our counter measures or browser bugs itself, which means, that we need perfect knowledge/control here (eww what a burden).
A second vector would be the ability to track terminal user side through side-channelling foreign assets, which violates a fundamental pattern of current terminal interaction (everything has to come through the TE connection).

So any ideas, how not to fall into these traps?

@Tyriar
Copy link
Member

Tyriar commented Sep 30, 2024

Something I tried and gave up on as it was a little too big of a change I wanted at the time was to add the concept of monaco editor-style "view zones" to the terminal. Basically the embedder would register a view zone that inserts a chunk of empty space in the renderer and manages a DOM element that is put there. The embedder could dispose of this view zone whenever it was done with it and maybe also change properties on it (like height).

The great thing about this approach is that it works similar to decorations (which were also inspired by monaco), its entirely up to the embedder to implement their sequence handler/feature (eg. clicking a link could open the image inline) and it also doesn't get into the mess of touching how the buffer works and making it more complex by having multiple buffer line types. The main challenges are scrolling as you've called out in your example and having the renderer support the gap properly.

I found the WIP branch I was working on master...Tyriar:xterm.js:zone_wip. I think the conclusion I came to was we must work out smooth scrolling and then pixel-based scrolling, where the scroll bar would be able to land in between buffer rows, instead of always having the top of the terminal show the top of some row.

@Tyriar Tyriar added the type/proposal A proposal that needs some discussion before proceeding label Sep 30, 2024
@jerch
Copy link
Member

jerch commented Sep 30, 2024

Idk if inline placing of HTML snippets is a good idea. It basically can be arbitrary in height/width - how is that supposed to work inline? How will too wide content be handled (typically we only have a top-down scrollbar)? What about overprinting with text content?
To me it seems that inlining complex content into the terminal buffer output will lead to "hard to comprehend" UI pattern.

How about this - always render complex content into a separate buffer, thus make it a full-viewport view. This way the content size does not matter, as we can place scrollbars as needed for both directions. This "other buffer" could be accessible through a link-like annotation in the original REPL context, and even could be held as long as the marker in the original buffer stays active (thus gets auto-evicted on scrolling off).

@PerBothner
Copy link
Contributor Author

About security, there are various approaches. I suggest we support some level of configurability, since different applications may prefer different approaches. I would focus on what should be default (standard) for a general-purpose terminal-emulator, either a stand-alone application or embedded in an IDE.

One approach is to remove "bad stuff" using a blacklist or whitelist (preferable) of allowed HTML. Domterm uses this scrubHtml function in domterm-utils.js. It has a white-list of allowed elements (see the HTMLinfo table) and it also restricts attributes (see the allowAttribute function). Specifically, it disallows <script> elements, on event handlers, and javascript: URLs in <a>, <base>, and <img> elements,

Not allowing JavaScript has worked fine for the "REPL" uses cases I'm most interested in. However, JavaScript might be useful in some applications - for example you might want animated or interactive output. One possibility is to allow the terminal to install "extensions" under explicit user control, but that may too limited.

Another approach is to wrap all HTML inside an <iframe>, as you suggest, and as YAET (see issue #5110) does. This should be fine for most REPL-style uses cases, and in that case you could allow JavaScript.

A possible problem using <iframe> is that it is presumably more resource-heavy. This might be an issue if you have a long session, with hundreds of interactions, each producing a frame. One possibility is to have two different OSC codes - one that wraps the HTML in an <ifrane> (and doesn't scrub or only scrubs minimally), and another code that inserts directly (and scrubs heavily, including removing all JavaScript).

Another possibility is that emitted HTML needs to include a session-specific randomly-generated passkey, passed in via an environment variable. I haven't really though about this, but if I remember correctly this is the approach used by GraphTerm.

@PerBothner
Copy link
Contributor Author

By the way, a very interesting related project was GraphTerm, written by R. Saravanan. He also wrote the even older XMLTerm, which was a big inspiration to me.

@PerBothner
Copy link
Contributor Author

@Tyriar The "view zone" approach might work. However, I don't know anything about Monaco view zones, nor have I looked into the decorator API and implementation.

I'm unclear if it saves us anything in terms of implementation complexity - we still need to deal with scrolling, and (preferably) zones that are a non-integral number of rows high. Furthermore, it seems like it would be helpful to have this extra content be part of the buffer. I think it may simplify some of the logic - and I think it ties in with being able to at support lines with different heights and fonts (which I think at some point we should). The user model should be that "rich output" is part of the buffer. Serialization is probably simpler if the DOM element is directly accessible from the buffer structure.

@PerBothner
Copy link
Contributor Author

@jerch _ Idk if inline placing of HTML snippets is a good idea._

If by "inline" you mean in the CSS display: inline sense, I'm inclined to agree: It's quite a bit more complicated (both conceptually and implementation), and I think less useful.

If you mean vertical interleaving of user input lines, fixed-width output lines, and "rich" HTML output lines, I disagree: I think that is very useful. It's the paradigm of REPL updated to allow rich text, graphics, images, math etc in the "print" part.

That said, having the option to individually show commands (or just their output) in a separate window, delete commands, "fold" command output etc is useful . That can be built on a "shell integration" protocol such that the terminal can understand the concepts of commands as consisting of prompt, input, and output (with possible nesting).

@jerch
Copy link
Member

jerch commented Sep 30, 2024

If you mean vertical interleaving of user input lines, fixed-width output lines, and "rich" HTML output lines, I disagree: I think that is very useful. It's the paradigm of REPL updated to allow rich text, graphics, images, math etc in the "print" part.

Yes, with "inline" I mean within the normal text buffer progression. Plz note, that my argument was not about usefulness - I also think thats quite useful. My argument is about breaking the UI so badly, that ppl will get confused, how to properly interact with it. Especially for stuff, that needs a proper width/height to show up correctly.

@PerBothner
Copy link
Contributor Author

Another demo/screenshot - the man command outputting rich text instead of mono-space text:

Screenshot from 2024-09-30 14-36-46

The xt-man command is just this script:

echo -en '\e]72;'
man -Hcat $1 2>/dev/null
echo -en '\007'

This assumes the patch in issue #5173 is applied. Without it, literal newlines in the man -Hcat output have to be converted to &#xA;. This is tricky because in HTML it is context-dependent whether whitespace is ignored.

@PerBothner
Copy link
Contributor Author

@jerch My argument is about breaking the UI so badly, that ppl will get confused, how to properly interact with it.

The use-case I'm focusing on (for now) is a REPL (including a shell) with rich output. This uses the normal scrollable screen buffer, with full-width output. The cursor never moves backwards, except within the prompt+input area. I believe this would be straightforward and intuitive.

Mixing regular column-based output with rich text using the alternate screen buffer, where the cursor can jump between column-based and rich output - that is more complicated. Not necessarily for the user, but for the application programmer. We would have to define the cursor semantics of rich output (see below).

I think if an application wants to do combine rich text with interactive full-screen alternate-buffer use, the preferred way to do that would be to clear the screen and output HTML to cover the entire screen (scrollable as needed). Instead of using row+column addressing to navigate to sections to update, one should use id attributes to specify a chunk to replace,

Of course we still want to define how row+column addressing is defined when moving through rich-text HTML blocks - even if that's not the recommended way to do things. The simplest would be to define each HTML block as a single large character-cell: It might be an extra-tall line containing a single extra-wide character. This model is more-or-less what the prototype does.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/proposal A proposal that needs some discussion before proceeding
Projects
None yet
Development

No branches or pull requests

3 participants