Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feat] Implementation of re-wrapping logic #64

Draft
wants to merge 26 commits into
base: main
Choose a base branch
from

Conversation

cdesaintguilhem
Copy link
Contributor

@cdesaintguilhem cdesaintguilhem commented Dec 12, 2024

This draft PR proposes an implementation to re-wrap lines to the appropriate wrapping length. This is a feature first discussed in #62.

Summary

In essence, for each line that that doesn't require wrapping (i.e., is shorter than the maximum wrapping length) the formatter takes a look at the next line and checks how many words it can re-wrap onto the current line. If some, or all, words can be re-wrapped, it modifies the current line and the next one and returns both to the formatting queue. When performing checks, certain conditions de-activate re-wrapping to preserve formatting (e.g., splitting commands are not re-wrapped onto previous lines).

Discussion required

This draft PR contains a new test file, rewrap.tex which contains initial re-wrapping tests for expected behaviour. For now, the re-wrapping heavily affects the previous test files that were not written with re-wrapping in mind; hence, I would like to discuss here the expected behaviour of the re-wrapping logic to converge on how to

  1. modify existing test files to capture any changes in expected behaviour, and
  2. expand the re-wrapping test file for any other behaviour that is not captured in the other files.

This is to enable pushing lines to the front of the queue while preserving other pending lines.
This will enable the re-wrapping code to access the next line of text to check if there are available words for re-wrapping.
This commit maintains functionality by returning only the last wrap point from the `find_wrap_point()` function.
To maintain behaviour, `apply_wrap()` is changed to extract the _last_ wrap point from the list, and wrap there.
This is still not enough for a MVP because it will attempt to re-wrap lines of text onto previous lines that didn't initially have text.
This branch is going to change the wrapping strategy so large test files will be impractical until they can be amended.
@WGUNDERWOOD
Copy link
Owner

Thanks for your work on this. Does it affect math mode? I very often choose to have shorter lines in math mode (starting each = on a new line, keeping long operators such as \sum_{i=1}^n on their own lines, etc). Perhaps you can run this on some of the test files and comment on the differences?

@cdesaintguilhem
Copy link
Contributor Author

Thanks for your work on this. Does it affect math mode? I very often choose to have shorter lines in math mode (starting each = on a new line, keeping long operators such as \sum_{i=1}^n on their own lines, etc). Perhaps you can run this on some of the test files and comment on the differences?

Indeed currently it does affect math mode, and I agree with you that this is undesirable.

I think that a good way to structure the discussion would be to take the old test files one-by-one, look at the diff, and converge to what should the re-wrapping not touch, and what parts of the test should change. I can foresee this leading to several interleaved discussions about the behaviour for different parts (e.g., math mode, preamble, etc.) which it could be useful to separate into their own PRs or issues.

Would you find it useful to merge this PR into a rewrap branch in your repository, and I could then make individual PRs onto it for fixing different parts? Or would you prefer to keep the work constrained to this single PR and maybe open issues to discuss the behaviour for different parts?

@WGUNDERWOOD
Copy link
Owner

I think we can keep this as a PR for now. I'm hoping to get quite a large PR merged soon regarding web assembly support, which will end up moving a lot of files around. So maybe we branch of that once it's merged.

@cdesaintguilhem cdesaintguilhem force-pushed the cdsg-rewrap branch 2 times, most recently from 79c03a4 to f2df0b4 Compare December 12, 2024 15:03
@cdesaintguilhem
Copy link
Contributor Author

Looking at the short_document.tex file, the re-wrapping logic produces the following diffs when compared to the target test file.

Preamble

3,4c3
< \usepackage{amsmath}
< \usepackage{amsthm}
---
> \usepackage{amsmath} \usepackage{amsthm}
8,9c7
< \title{Testing \texttt{tex-fmt}}
< \author{William G.\ Underwood}
---
> \title{Testing \texttt{tex-fmt}} \author{William G.\ Underwood}

Re-wrapping currently affects the preamble. It probably shouldn't. Options I can imagine are:

  1. Add common preamble commands to the list of splitting commands.
    I don't like this option because the logic will still attempt to re-wrap other lines of the preamble that don't contain these common commands, and it will be hard to be exhaustive with the list of commands to ignore.
  2. Detect when the formatter is not in the preamble, and only activate re-wrapping after that point.
    This is tricky in the case of multi-file projects where files like commands.tex and section_X.tex might not contain \documentclass{} or \begin{document} and yet should be treated differently by the re-wrapping logic.

Math mode

14,20c12
<   E = m c^2 \\
<   1 + 2
<   + (3 + 4)
<   + (5 + 6
<   + 7 + 8)
<   + (9 + 10
<     + 11 + 12
---
>   E = m c^2 \\ 1 + 2 + (3 + 4) + (5 + 6 + 7 + 8) + (9 + 10 + 11 + 12

Re-wrapping currently affects math mode. It shouldn't. It must be able to not re-wrap multi-line math environments. This might require enforcing a style where \[ and \] should be on their own lines, or at least at the beginning or end of a line.

The logic could be: "If the math mode delimiters start or end a line, then everything in between is not re-wrapped, but if the math mode delimiters are inside other lines, their content can be re-wrapped".

Indentation tests

26,27c18
<   \item Item two on
<     multiple lines
---
>   \item Item two on multiple lines
51,56c42,44
< This line contains \emph{emphasized} text.
< \emph{This line contains only emphasized text,
< and is broken over two lines}.
< \emph{This line contains only
<   emphasized text,
< and is broken over three lines}.
---
> This line contains \emph{emphasized} text. \emph{This line contains
> only emphasized text, and is broken over two lines}. \emph{This line
> contains only emphasized text, and is broken over three lines}.

These tests were written to check correct indentation behaviour. The options I see here are:

  1. Amend the text so that the lines are longer (and therefore don't re-wrap) and preserve the test.
  2. Make a new flag for ignoring re-wrapping specifically, which can be used to isolate re-wrapping from tests. (Something like % tex-fmt: no-rewrap start and % tex-fmt: no-rewrap end.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants