Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Find a way to not load all the tasks infos. #709

Open
thomasw21 opened this issue Jan 3, 2022 · 1 comment
Open

Find a way to not load all the tasks infos. #709

thomasw21 opened this issue Jan 3, 2022 · 1 comment

Comments

@thomasw21
Copy link
Member

When running from promptsource.seqio_tasks import tasks it takes a huge amount of time. One of the main reasons is this queries all dataset infos:

dataset_splits = utils.get_dataset_splits(dataset_name, subset_name)
This is problematic for two reasons:

IMO both are unnecessary and should be fixed. Is there a reasons why one cannot load seqio tasks dynamically, in the sense of fetching only what is necessary? Something along the lines of:

def add_seqio_task(task_name):
    seqio.TaskRegistry.add(...)
@craffel
Copy link
Contributor

craffel commented Jan 4, 2022

In order to use the module import functionality of seqio, importing the module needs to add the task you want to use to the task registry without calling any additional code. So, we either need to have a separate file for each task or change the underlying functionality in HF datasets.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants