-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
refactor: simplify file type checking from MIME to extension #342
refactor: simplify file type checking from MIME to extension #342
Conversation
@aymeric-roucher Can you review my PR? |
Closes #566 |
Built-in mimetypes has a weak security, but this makes it essentially non-existent. Would not a better lib like python-magic work better? E.g. #569 |
No need to overcomplicate it, just determine based on the file extension. |
Then if you expose your setup via GradIO with |
Security is undoubtedly important, but the current version of the code has more critical issues beyond security. The mimetypes detection logic in the current code is overly redundant and contains logical errors. Let’s address these issues first. My pull request does not introduce new problems. On the contrary, it simplifies the logic and resolves bugs. The current version of |
@kingdomad thank you for the submission! Could you go back to using mimetypes, while fixing the other problems you pointed? |
My code maintains the original code’s allowed file types for uploading. When the original code class MimeTypes:
def guess_type(self, url, strict=True):
...
if ext in types_map:
return types_map[ext], encoding
... The two entries in the aforementioned {
".docx": "application/vnd.openxmlformats-officedocument.wordprocessingml.document",
".doc": "application/msword"
} Therefore, after the original code is executed, the class GradioUI:
def upload_file(
self,
file,
file_uploads_log,
allowed_file_types=[
"application/pdf",
"application/vnd.openxmlformats-officedocument.wordprocessingml.document",
"text/plain",
],
):
"""
Handle file uploads, default allowed types are .pdf, .docx, and .txt
"""
import gradio as gr
if file is None:
return gr.Textbox("No file uploaded", visible=True), file_uploads_log
try:
mime_type, _ = mimetypes.guess_type(file.name)
except Exception as e:
return gr.Textbox(f"Error: {e}", visible=True), file_uploads_log
if mime_type not in allowed_file_types:
return gr.Textbox("File type disallowed", visible=True), file_uploads_log The core logic of if ext in types_map:
return types_map[ext], encoding It would be more straightforward to directly filter by extension in the |
@kingdomad thank you for the explanation! Being unfamiliar with the specific challenges that people will experience when using files with this functionality (because I didn't use it myself), I was thinking a "standard" solution like mimetypes would be more robust. But your explanations make total sense ! So let's merge this! 😃 |
But before that, could you add a test for uploading files? |
I would have kept proper validation instead, just picking a better validator. @aymeric-roucher I would have discussed it within the team |
Sure, I'll submit it shortly. |
@aymeric-roucher Done. |
@sysradium let's merge this for short term functionality, since public usage is not that wide-spread yet that security would be a huge concern for this functionality. But I see you've opened #569, it's great, checking type with filetype might be a good solution to add! |
@kingdomad you have to run |
Done. |
The original
upload_file
method inGradioUI
has several issues:mimetypes.guess_type
method, which internally matches file types based on file extensions. After determining the file type withmimetypes.guess_typ
e, it uses a reverse dictionary constructed frommimetypes.types_map
to match the file type back to a file extension. This is redundant and overcomplicated. Additionally, the original code introduces bugs because multiple different file extensions can correspond to the same file type. This unintentionally modifies the original file extension.My submitted code improves these two points by directly filtering valid files based on their extensions. This not only simplifies the logic but also makes the method more user-friendly.
Below is the problematic code from the original method: