Skip to content
This repository has been archived by the owner on Jan 20, 2021. It is now read-only.

Detected spreadsheet area too small #99

Open
jazzido opened this issue May 30, 2015 · 6 comments
Open

Detected spreadsheet area too small #99

jazzido opened this issue May 30, 2015 · 6 comments
Assignees

Comments

@jazzido
Copy link
Contributor

jazzido commented May 30, 2015

Note the last ruling line at the right of the table:

screen shot 2015-05-30 at 6 17 16 pm

Since that line is not included in the detected table area, the last column is not included in the extraction:

screen shot 2015-05-30 at 6 17 52 pm

PDF is here: https://www.dropbox.com/s/wdneizaxgrsydqe/helen_boaden_central_bookings_q3_2011_12.pdf?dl=0

@jeremybmerrill
Copy link
Member

This is in newUI-ruby?

@jeremybmerrill
Copy link
Member

The thing that treats the selection boundaries as rulings should deal with this, no?

@jazzido
Copy link
Contributor Author

jazzido commented May 30, 2015

The too-small area comes directly from tabula-extractor, so it will also happen in tabulapdf/tabula@master .

There's no user selection involved in this issue, so the selection-boundaries-as-rulings feature won't solve it.

@jazzido
Copy link
Contributor Author

jazzido commented May 30, 2015

A possible solution would be expand the detected area by a small factor; big enough so it covers the entire area of interest but small enough so it doesn't overlap with other elements in the page.

@jeremybmerrill
Copy link
Member

@jazzido I don't see why that wouldn't solve it. Even if the selection areas come from tabula-extractor, we can still buffer them a bit to include lines.

A better solution might be outputting detected tables using the OUTER coordinates of the lines that comprise them. Do you think the problem could be that easy to solve?

@CJLees01
Copy link

Is this ticket still live? I still see this on the version 1.2.1 of tabula.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants