-
Notifications
You must be signed in to change notification settings - Fork 369
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Store screenshots as attachments in NDB. #4473
Conversation
Thanks for putting this PR together!
Could you describe the security problem again? I know you mentioned the separate domain part to me before. But will this separate domain be a separate deployment that needs to happen now? If so, will that second deployment restrict the usage of the other routes to reduce the load on that deployment? (Admittedly, only a malicious actor would be poking at the other APIs on the same server) After looking at the GCP link, I would be curious if we should look at your previous PR that used GCS instead. (I do see you are storing images that are 250x200) Also, if we continue with Datastore, we should add comments that we shouldn't increase the thumbnail size as it could have negative impacts on the reads. |
# Create and save a thumbnail too. | ||
thumb_content = None | ||
try: | ||
thumb_content = images.resize(content, THUMB_WIDTH, THUMB_HEIGHT) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Question: Will this code maintain the aspect ratio of the original image if the user uploads an image with different proportions than 250x200? For example, if someone uploads a square image, will it be distorted when resized to 250x200?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope. It will be somewhat distorted. After I get the backend PRs merged, I'll start experimenting with the frontend and coordination with Yann to see how some sample images will look. My thinking at this point is that all the thumbnails should be the same size so that they look nice when there are multiple, so we might end up with some combination of scaling and cropping.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few quick alternatives to consider in case image dimensions change in the future.
- Resize with Padding: This will add a white/transparent background around the picture but will keep the original content without distortion
- Resize to Fit: Pick the largest dimension and resize it down to its limit (Width=250 or Height=200)
If the images already have the same aspect ratio as 250x200, option 2 is the simplest. These approaches will make our code more robust to future changes.
I just don't want the original images that any user uploaded to be served from our app's domain (the same one that has session cookies). The reason is that the uploaded file is user-controlled data that could be somehow malicious. The serving approach that I have in mind is simply to use one of the infinite number of other domains that already reach our app. Our session cookies are on chromestatus.com, but we can serve the images from imgXXX-dot-cr-status.appspot.com. The "imgXXX" can be anything, so I was thinking of making it different for every image. Any malicious image content that somehow runs JS would not have access to the main app domain cookies, and therefore can't do anything to our site that an anon user couldn't do. This does not require an additional GAE module or deployment because strings like imgXXX's that don't match any app version get routed to the live version of the default GAE app service. Using a different domain for each image would add another DNS lookup for each one. That's not ideal, but I don't know if performance will matter much at all for these images. The alternative of serving directly from GCS itself involves a REST API call, a redirect, and a DNS lookup. So, I wouldn't conclude anything about the relative performance of the two approaches yet.
Let's see how the performance goes. I haven't tested performance at all yet. If needed, we will have the ability to change the data storage layer before this launches, but I'd like to start with this simpler approach first. |
This makes sense.
Let me know if you need help with this. |
thumb_content = None | ||
try: | ||
thumb_content = images.resize(content, THUMB_WIDTH, THUMB_HEIGHT) | ||
except images.LargeImageError: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may want to handle large files at the handler level so that users can't upload large files into memory in the first place.
https://flask.palletsprojects.com/en/3.0.x/patterns/fileuploads/#improving-uploads
By default Flask will happily accept file uploads with an unlimited amount of memory, but you can limit that by setting the MAX_CONTENT_LENGTH config key
Or you can do both.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant to put this comment on the check_attachment_size method that currently checks the size.
# limitations under the License. | ||
|
||
import logging | ||
from google.appengine.api import images |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The images API is part of the legacy bundled services. I know we use existing appengine legacy services but this will be the first time we use the images API. Instead of adding another dependency on the legacy bundled service, we should consider using one of the libraries they mention in their migrating off legacy bundle services document
To resize, convert, and manipulate images, use an image processing library such as Pillow or a Python interface for ImageMagick.
At first glance, Pillow has a similar API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good suggestion. I'll try to follow up with a PR to switch to Pillow and use max-aspect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM after the considerations around the bundled services and image resizing.
Regarding the performance between GCS vs Datastore, let me know if you need help with that.
This should resolve #4459.
This is a simpler approach for storing screenshots. Rather than add a dependency on using GCS, we can store them in NDB, which we already use extensively. I am expecting performance to be acceptable because only thumbnails of the images will be loaded into the feature detail page.
The security concern around serving images from our app (rather than directly from GCS) will be mitigated in an upcoming PR that deals with serving issues. Basically, the image content will be served from a different origin than where we store user session cookies.
In this PR: