-
Notifications
You must be signed in to change notification settings - Fork 524
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Script to automatically split off eval set #1525
Conversation
) | ||
if _is_empty_or_nonexistent(dirpath=dataset_name): | ||
log.error("Failed to safely load the dataset from HF Hub.") | ||
raise InvalidFileExtensionError( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved HF safe download logic into a separate function so I could reuse it. Code is basically unchanged.
Unrelated to meat of the PR, but per Daniel request, tried refactoring further to get rid of try-except block. I don't think it can be done without significant added complexity. You want to barrier/sync before throwing this InvalidFileExtensionError; however, this must be nested within this download logic so that only rank0 encounters it. So you can't both separate out this logic and have graceful exit without try-except. :'(
import numpy as np | ||
from typing import Optional | ||
|
||
import composer.utils as utils |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
import is formatted this way due to mocking difficulties - https://bhfsteve.blogspot.com/2012/06/patching-tip-using-mocks-in-python-unit.html
Co-authored-by: v-chen_data <[email protected]>
Co-authored-by: v-chen_data <[email protected]>
…p in attention and lm_head logits. (#1374)
Co-authored-by: v-chen_data <[email protected]>
Co-authored-by: v-chen_data <[email protected]>
Co-authored-by: v-chen_data <[email protected]> Co-authored-by: Daniel King <[email protected]>
Co-authored-by: Eitan Turok <[email protected]>
Co-authored-by: Eitan Turok <[email protected]> Co-authored-by: Mihir Patel <[email protected]>
Co-authored-by: v-chen_data <[email protected]>
Co-authored-by: v-chen_data <[email protected]>
Co-authored-by: v-chen_data <[email protected]>
Co-authored-by: Eitan Turok <[email protected]>
What changes are proposed in this pull request?
See go/sweeps-eval for more context. Corresponding MAPI changes are contained in https://github.com/databricks-mosaic/mcloud/pull/4562.
How is this tested?
Added unit tests.
The predecessor to the unit tests was this bash script I used to manually inspect the file outputs. Tests are the same as unit tests, but it invokes the
scripts/...
file instead of thecommand_utils/...
one. Confirmed file outputs looked OK._test.txt
Everything passes: