-
Notifications
You must be signed in to change notification settings - Fork 13
Attach document image to tweets #6
Comments
A possible roadmap:
|
I'll give this one a try. |
Hi @paulocezar To convert the PDF to Image I suggest you to take a look in these notebooks from okfn-brasil/serenata-de-amor#238 This, you can speed up your work focusing on crop the image and uploading it. The necessary libraries are in the end of the docker file. "OpenCV, Wand..." . Try to look in segmentation techniques to crop the recipe :) |
Hi, |
Thanks for sharing it if us @murilobsd . ;) In Addition, I would like to suggest who is implementing it to take a look in these libraries to upload the images :) https://pypi.python.org/pypi/python-tumblpy/1.1.4 |
Why not upload image to twitter? Keeping only twitter as external service dependency. https://developer.twitter.com/en/docs/media/upload-media/api-reference/post-media-upload |
I think this is better/easier/simpler than using a third-party service for hosting images (unless @silviodc has other usages for the storage in mind). What's is needed IMHO is an implementation sich as:
|
Hi! Does anyone need some help with this issue? |
AFAIK there's no one working on that, @rodolfolottin – make yourself at home : ) |
Ok @cuducos . I'll give it a try. |
So, I have some doubts about how to test this functionally. First one is: how can I get some data, once that the tests are using mocks? I know that I can get the pdf directly in the camara’s web site, but I also want to see the data from each reimbursement tuple. Another one is: what my tests should test? I get that I should test the tweet content that is going to be posted, but what about the fetched pdf? And the blank area that I have to crop, how can/should I test that? Should I use some example pdf as fixture? Many thanks! |
Hi @rodolfolottin, let me recap road map drafted above:
Given these steps, this is my 2c: In steps 1 and 5 we're responsible for generating the right calls to external services, but not responsible to manage the calls themselves. What I mean is that:
That said, we I'd say that in step 1 we can mock the download method and:
Then we must mock the Twitter API call and assert that we're calling it with the image as an attachement. Does that make sense to you all? |
Sorry for the late answer. Yeah, @cuducos . Thanks again! Now I'm working on croping the image using wand, but it's not being easy. My first approach was to try to crop the image based in it background color. As the image is a scan itself, the whole background of the image have the same white color. I'm looking for some related problems, but the ones that I've founded have, in general, two different colors, which makes easier to differentiate the image. Edit: just thinking here, but maybe, IDK, I could parse the rows and colums from the image and crop the pixels based on the presence of a different color than white. |
Hi @rodolfolottin, I see to non-exclusive possibilities here:
|
Hi guys, One question about the crop of images. The function mentioned by @murilobsd (trim) doesn't work? trim(color=None, fuzz=0) PS: In your case you will use it without color and defining a defaultfuzz empirically. |
Hi @silviodc and @cuducos . Thanks for your help. @cuducos , as the part of crop the image was the hard one to me I decided to go for and do some tests. Because of that, I don't have the another part done yet, but I can work on finish it. @silviodc , in my tests with this function I was using the white color and I did'nt get what I expected. For this image, the more I increase the defaultfuzz value, most of the image (the invoice) is cropped. In both cases, using the white color and not using it, I got the same results. And as sometimes the invoice is not in the center of the scanned image, I did'nt fell secure to go on with this approach. As an example I am attaching a cropped image with a defaultfuzz value of 50%. I'm taking the @cuducos advice of doing baby steps and I will worry with the croping function later. |
Hi @rodolfolottin |
Hey, today I asked for help a friend who work with image processing in the job and he suggest the use of OpenCV for that. The response: https://twitter.com/begnini/status/960547129264615425 Both are in portuguese. |
Hi, my friend @CauanCabral pointed me to this issue and I'm played a little with the documents. I work with digitalized documents and I know some are hard to manipulate, so, what I made is good, but is not pixel perfect. To proof the concept, I downloaded 100 pdfs from the jarbas.sereneta.ai home page, and with pdfimage extracted all images from these pdfs. After this, I processed these images. The result you can see here https://github.com/begnini/document_crop/blob/master/crop.md. The code is in this repository, too (https://github.com/begnini/document_crop/blob/master/crop.py). I'll improve the documentation later, but if you have any questions, be free to ask me. |
Following @talespaiva's suggestion in a tweet and recent replies to
@RosieDaSerenata
.Docs for Twitter API: https://python-twitter.readthedocs.io/en/latest/_modules/twitter/api.html?highlight=%22def%20PostUpdate%22
The text was updated successfully, but these errors were encountered: