Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement tiling of canvas into smaller pieces #6419

Open
yurydelendik opened this issue Sep 4, 2015 · 20 comments
Open

Implement tiling of canvas into smaller pieces #6419

yurydelendik opened this issue Sep 4, 2015 · 20 comments

Comments

@yurydelendik
Copy link
Contributor

The problem is that large canvases take much memory space. It's visible if a PDF page is large (e.g. map) or zoomed in (e.g. at 800%+ zoom). Currently we are limiting canvas size (#4834) for mobile device. However a proper solution will be to divide page into smaller canvases and render only visible parts.

It's mostly blocked by generating operator list based on crop area (useful for zooming heavy maps), but we can proceed without it, and try to render the same operator list on several canvases.

@yurydelendik yurydelendik changed the title Implement tiling canvas into smaller pieces Implement tiling of canvas into smaller pieces Sep 4, 2015
@yurydelendik
Copy link
Contributor Author

(We might also consider to tile JPEG images when we read operator list)

@ManasJayanth
Copy link
Contributor

@yurydelendik Can you help me get started on this?

@yurydelendik
Copy link
Contributor Author

@prometheansacrifice first thing will be to modify API (and canvas.js) at https://github.com/mozilla/pdf.js/blob/master/src/display/api.js#L834 to have e.g. targets property which will be alternative to canvasContext. It will be an array of objects with canvas and location properties, which will define location of the canvas on the grid of canvases (Notice that now we will use CANVAS vs its 2d context to know its size; we might add canvas property along with canvasContext). canvas.js shall render now on multiple contexts at the same time.

@yurydelendik
Copy link
Contributor Author

TODOs:

  • modify API to render on multiple canvases and know their boundaries
  • modify viewer to not create huge canvases (but multiple) and determine visible canvas tiles vs page
  • provide boundary to getOperatorList and discard objects/text from the list that are not in the boundary

@Liestambeur
Copy link

Is there any progress around this issue or is it not on the future planning anymore?

@yurydelendik
Copy link
Contributor Author

@Liestambeur it's still in plans but not on a schedule -- other projects is in progress atm. A contributors who wish to help to advance it, may find us at IRC channel.

@d01010101
Copy link

I would just like to add that with today's wide desktop displays, I find interactive page-less PDFs a great way for a high-quality 1:1 web presentation of e.g. Latex documents, where a plain zoom works well enough and text flowing is only breaking typography rules or a careful placement of floats. Of course, it won't replace more interactivity, animations or text flow when it is actually needed, but may have its niche.

Yet, when I generated a PDF of 1x A4 width and 25 x A4 height and rendered it with PDFSinglePageViewer, probably one of these limits here discussed have been hit, despite the rendering having only about 100 dpi. The canvas size was about 800 x 20000 and produced a blocky, unusable rendering.

I would say that some kind of tiling might extend pdf.js applications beyond these of a viewer of printable PDFs. If anyone is interested, I attach a test PDF of a similar aspect ratio to that discussed.

@bobsingor
Copy link

I would like to offer a bounty of 2000 USD for this feature

This feature is becoming more important to our company. I see that there were 16 issues raised that are related to this one. I was wondering if this deserves more attention?

Hoping that this bonus will help get this resolved for everyone interested.

@jjaychen1e
Copy link

jjaychen1e commented Jun 8, 2023

Any plan about this issue? This limitation makes pdf.js unavailable in many scenarios, for example, zooming a pdf file, or opening a PDF exported by Safari. Also it affects lots of applications depend on pdf.js, such as Logseq. I think this issue should get a higher priority :)

@alexcat3
Copy link
Contributor

I would really appreciate a solution to this as I enjoy looking at track maps of railway and subway systems. These are generally large images intended to be viewed at high zoom levels. For example, see the track map of the NY Subway below, which is
https://www.vanshnookenraggen.com/_index/docs/NYC_full_trackmap.pdf

@AkiSakurai
Copy link

I would really appreciate a solution to this as I enjoy looking at track maps of railway and subway systems. These are generally large images intended to be viewed at high zoom levels. For example, see the track map of the NY Subway below, which is vanshnookenraggen.com/_index/docs/NYC_full_trackmap.pdf

A naive solution would be to simply set the transform of the render function to render the page to a smaller canvas. However, the performance is quite bad, as you can imagine. The render time increases linearly with the number of tiles.

Demo

const tile = popHighestPriorityTile();
const canvasContext = tile.canvas.getContext("2d", { alpha: false });
const translate = [1, 0, 0, 1, -tile.x, -tile.y];
const transform = renderContext.transform
? Util.transform(translate, renderContext.transform)
: translate;
const renderTask = pdfPage.render({
...renderContext,
canvasContext,
transform,
});

@marco-c
Copy link
Contributor

marco-c commented Dec 3, 2023

Even if performance is not ideal, it seems still better than the current status. WDYT @calixteman @Snuffleupagus @timvandermeij?

@Snuffleupagus
Copy link
Collaborator

Snuffleupagus commented Dec 3, 2023

Even if performance is not ideal, it seems still better than the current status.

Not really, since as already mentioned in #6419 (comment) this will affect performance quite badly in many cases: "The render time increases linearly with the number of tiles."

It might not look so bad in the demo above, but that's probably because that particular PDF document isn't all that "complex". Please consider the case where a page (currently) takes 2 seconds to render: If that's split into 10 sub-canvases, that same page now takes 20 seconds to finish rendering!
The explanation is that while each individual canvas indeed becomes smaller, that doesn't really help performance-wise since the OperatorList is still the same and will be parsed (and rendered) in its entirety for each sub-canvas.

In order for this to work we'd need a way for the src/display/canvas.js code to skip rendering instructions that are outside of the current sub-canvas, while still handling general graphic-state changes correctly. (This could perhaps be done a little similar to how disabled OptionalContent is skipped in src/display/canvas.js.)

@marco-c
Copy link
Contributor

marco-c commented Dec 3, 2023

Other than potentially wasted CPU time, what is the downside if we use the current CSS zoom solution and replace it when the rendering is done? Isn't it still a net improvement?

@marco-c
Copy link
Contributor

marco-c commented Dec 4, 2023

Here's the PDF for future reference NYC_full_trackmap.pdf.

@github-project-automation github-project-automation bot moved this to To triage in PDF.js quality Mar 26, 2024
@marco-c marco-c moved this from To triage to High priority in PDF.js quality Mar 26, 2024
nicolo-ribaudo added a commit to nicolo-ribaudo/pdf.js that referenced this issue Nov 14, 2024
This commit is a first step towards mozilla#6419, and it can also help with
mozilla#13287. To support rendering _part_ of a page, we will need to
first compute which ops can affect what is visible in that part of
the page.

This commit adds logic to track "group of ops" with their respective
bounding boxes. Each group eather corresponds to a single op or
to a range, and it can have dependencies earlier in the ops list that
are not contiguous to the range.

Consider the following example:
```
0. setFillRGBColor
1. beginText
2. showText "Hello"
3. endText
4. constructPath [...]
5. eoFill
```
here we have two groups: the text (range 1-3) and the path (range 4-5).
Each of them has a corresponding bounding box, and a dependency
on the op at index 0.

This tracking happens when first rendering a PDF: we wrap the canvas
with a "canvas recorder" that has the same API, but with additional
methods to mark the start/end of a group.
nicolo-ribaudo added a commit to nicolo-ribaudo/pdf.js that referenced this issue Nov 14, 2024
This commit is a first step towards mozilla#6419, and it can also help with
mozilla#13287. To support rendering _part_ of a page, we will need to
first compute which ops can affect what is visible in that part of
the page.

This commit adds logic to track "group of ops" with their respective
bounding boxes. Each group eather corresponds to a single op or
to a range, and it can have dependencies earlier in the ops list that
are not contiguous to the range.

Consider the following example:
```
0. setFillRGBColor
1. beginText
2. showText "Hello"
3. endText
4. constructPath [...]
5. eoFill
```
here we have two groups: the text (range 1-3) and the path (range 4-5).
Each of them has a corresponding bounding box, and a dependency
on the op at index 0.

This tracking happens when first rendering a PDF: we wrap the canvas
with a "canvas recorder" that has the same API, but with additional
methods to mark the start/end of a group.
nicolo-ribaudo added a commit to nicolo-ribaudo/pdf.js that referenced this issue Nov 14, 2024
This commit is a first step towards mozilla#6419, and it can also help with
mozilla#13287. To support rendering _part_ of a page, we will need to
first compute which ops can affect what is visible in that part of
the page.

This commit adds logic to track "group of ops" with their respective
bounding boxes. Each group eather corresponds to a single op or
to a range, and it can have dependencies earlier in the ops list that
are not contiguous to the range.

Consider the following example:
```
0. setFillRGBColor
1. beginText
2. showText "Hello"
3. endText
4. constructPath [...]
5. eoFill
```
here we have two groups: the text (range 1-3) and the path (range 4-5).
Each of them has a corresponding bounding box, and a dependency
on the op at index 0.

This tracking happens when first rendering a PDF: we wrap the canvas
with a "canvas recorder" that has the same API, but with additional
methods to mark the start/end of a group.
@AkiSakurai
Copy link

AkiSakurai commented Nov 30, 2024

One thing to note is that the time taken to draw outside the canvas is significantly lower than drawing inside the canvas. This measurement is based on drawing 1,000,000 Bézier curves. Therefore, re-issuing the drawing command for every tile might not be as bad as it seems.

Here are some benchmark results:

Browser Time Inside Canvas (ms) Time Outside Canvas (ms)
Chrome 3800 203
Safari 53033 887
Firefox 17266 778

@nicolo-ribaudo
Copy link
Contributor

nicolo-ribaudo commented Dec 1, 2024

I noticed the same in #19128, where rendering the tile is much faster than rendering the whole. For a partial render, the JavaScript code significantly dominates the time spent drawing.

nicolo-ribaudo added a commit to nicolo-ribaudo/pdf.js that referenced this issue Dec 16, 2024
This commit is a first step towards mozilla#6419, and it can also help with
first compute which ops can affect what is visible in that part of
the page.

This commit adds logic to track "group of ops" with their respective
bounding boxes. Each group eather corresponds to a single op or
to a range, and it can have dependencies earlier in the ops list that
are not contiguous to the range.

Consider the following example:
```
0. setFillRGBColor
1. beginText
2. showText "Hello"
3. endText
4. constructPath [...]
5. eoFill
```
here we have two groups: the text (range 1-3) and the path (range 4-5).
Each of them has a corresponding bounding box, and a dependency
on the op at index 0.

This tracking happens when first rendering a PDF: we wrap the canvas
with a "canvas recorder" that has the same API, but with additional
methods to mark the start/end of a group.
@marco-c
Copy link
Contributor

marco-c commented Feb 7, 2025

This will likely be required to fully fix https://bugzilla.mozilla.org/show_bug.cgi?id=1936605.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: High priority
Development

No branches or pull requests