-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add try/except to test-tokenizer-random.py #10276
base: master
Are you sure you want to change the base?
Conversation
Added blank lines for Lint test; Added sequence import from typing Removed 'free' call from Object
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot reviewed 1 out of 1 changed files in this pull request and generated no suggestions.
tests/test-tokenizer-random.py
Outdated
# return min(len(ids1), len(ids2)) | ||
# Rewritten to use zip() and next() instead of for loop | ||
def find_first_mismatch(ids1: Sequence[Any], ids2: Sequence[Any]) -> int: | ||
return next((i for i, (a, b) in enumerate(zip(ids1, ids2)) if a != b), -1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not equivalent with different list sizes.
Check if v3 is a good alternative:
def find_first_mismatch_v1(ids1, ids2):
for i, (a, b) in enumerate(zip(ids1, ids2)):
if a != b:
return i
if len(ids1) == len(ids2):
return -1
return min(len(ids1), len(ids2))
def find_first_mismatch_v2(ids1, ids2) -> int:
return next((i for i, (a, b) in enumerate(zip(ids1, ids2)) if a != b), -1)
def find_first_mismatch_v3(ids1, ids2) -> int:
index = next((i for i, (a, b) in enumerate(zip(ids1, ids2)) if a != b), -1)
if index < 0 and len(ids1) != len(ids2):
index = min(len(ids1), len(ids2))
return index
print( find_first_mismatch_v1( [1,2,3], [1,2,3,4] ) ) # output: 3
print( find_first_mismatch_v2( [1,2,3], [1,2,3,4] ) ) # output: -1
print( find_first_mismatch_v3( [1,2,3], [1,2,3,4] ) ) # output: 3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching that/sorry for not catching it myself, will fix and push a change to my PR later today.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just replaced it with your modified method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh. It was not my intention to suggest removing the types:
- def find_first_mismatch(ids1, ids2) -> int:
+ def find_first_mismatch(ids1: Sequence[Any], ids2: Sequence[Any]) -> int:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re-added the types.
Updated `find_first_mismatch` from suggestion by jaime-m-p regarding incorrect checking.
What are you planning to do with The |
Re-added type declarations
Ah, I added them as I wasn't exactly sure of all intended use-cases, and figured adding them as optional args (with defaults that matched the initial settings) would allow for greater customizability down the line while not interfering with existing operations. |
Added try/except to main block;
Added (optional) additional cli args:
--max-errors
, default=10, "Maximum number of errors before stopping"--iterations
, default=100, help="Number of iterations for random generators"--tokenizers
, nargs="+", help="List of tokenizers to test", default=tokenizersrefactored
find_first_mismatch
function;Added
self.free()
to theLibLlama
class