-
-
Notifications
You must be signed in to change notification settings - Fork 351
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extend file type (updated) #603
Merged
sindresorhus
merged 36 commits into
sindresorhus:main
from
FredrikSchaefer:extend-file-type-updated
Nov 10, 2023
Merged
Changes from 12 commits
Commits
Show all changes
36 commits
Select commit
Hold shift + click to select a range
bda3f46
Allow specification of custom detectors + readme update
6007eff
Simplify logic in runCustomDetectors
c3dba6e
add custom detectors to fileTypeFromStream
fab97ae
fix linting issue
c7c3190
Execute custom detectors before default ones
4bcddff
add tests
733bfac
fix docs
37e1e57
compatibility with Node.js 14 and 16
bfd18b1
Remove blank space
FredrikSchaefer ee4cb2c
Wrap custom detectors into file type options
ad6d44f
Merge branch 'extend-file-type-updated' of github.com:FredrikSchaefer…
29930bf
Adjust fileTypeFromFile(...) to recent changes
FredrikSchaefer 7ea6efd
Moved custom detectors from function to constructor argument
748ffee
fix fileTypeStream (add back fileTypeOptions)
2adec69
Update documentation
0d1464c
add check for illegal tokenizer position change
6b6188c
Update core.d.ts
FredrikSchaefer 6806753
Update core.d.ts
FredrikSchaefer 61e052e
Update readme.md (move custom detectors section as suggested by revie…
eed198d
Remove fileType prefix from class member functions
ff84f3e
Make runCustomDetectors private
326ccd1
Add class based approach to fileTypeStream
011fa53
Change error handling for read operations of custom detectors
b346f7c
Remove obsolete @throws from documentation
9e24ed9
Make usage of FileTypeParser class consistent
a926bf2
Rename stream(...) to toDetectingStream(...)
5e2a0fd
Fix error handling
f38565d
Suggested changes to simplify code
Borewit e25c294
Merge pull request #2 from sindresorhus/extend-file-type-updated-sugg…
FredrikSchaefer 080ac75
Fix TypeScript declaration
de706c5
Remove comments from unit tests and redundant empty line
Borewit 331502d
Make code examples executable.
Borewit 9d85f05
Remove empty comment lines
Borewit ede94d9
Remove unused `fileTypeOptions` parameter from typings
Borewit ca6e449
Adjust number code and comment style suggestions
Borewit a50e37a
Update core.d.ts
sindresorhus File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
UNICORN FILE CONTENT |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if I agree with the "unless it's a stream". Essentially you can iterate to other detectors if you took a bite of the apple. Only peek is allowed, if read you you have consumed the tokenizer, which is very similar to a stream.
I fear this an area where we can expect a lot questions from users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey, thanks for the comment.
Yeah, guess you're right about the lot of questions.
I just mindlessly took this information from this previous discussion.
Let me suggest a more detailed explanation here:
If the detector returns
undefined
, thetokenizer.position
should typically be 0. This allows easy parsing by other detectors, unless subsequent custom detectors specify otherwise. Additionally, the detector shouldn't consume the tokenizer; while peeking is non-consuming, reading is.What do you think of this?
I'm really open to anything here!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See also my other comment: https://github.com/sindresorhus/file-type/pull/603/files#r1356979704
I suggest something like this:
If the detector returns no_match it is not allowed to read from the tokenizer (the tokeinzer.position must remain 0) otherwise following scanners will read from the wrong file offset.
If the detector return undefined the scanner is certain the file type cannot be determined, neither by other scanners.
no_match represents option 1 explained in here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with your point that custom detectors should be able to interrupt detection. However, I see two small downsides of the suggested approach:
The wording no_match does not really make clear to me whether it means no match for this detector, or no match at all.
The standard FileTypeParser returns undefined when no file type could be recognized. Therefore requiring the custom detectors to return something else is a bit counter intuitive.
I therefore suggest to do it the other way around:
If the detector returns undefined, it is not allowed to read from the tokenizer (the tokenizer.position must remain 0) otherwise following scanners will read from the wrong file offset.
If the detector returns file_type_undetectable, the detector is certain the file type cannot be determined, even by other scanners. The FileTypeParser interrupts the parsing and immediately returns undefined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, one could argue that file_type_undetectable also does not clearly say whether it means file type undetectable for this detector or for all detectors, but it still makes it a bit clearer in my opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good to me. The second case can be also be something like, detector started reading but for some reason failed to determine the file-type. Not ideally, but it can happen. If the detector starts reading, there is no way back.
We could also check the position after each custom scanner. It may not be 0 actually, there is also an iterated use case with ID3 header. The position should be remain unchanged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea! Just pushed a commit taking care of that check.