Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot run this #21

Open
alyosha-swamy opened this issue Jun 30, 2023 · 2 comments
Open

Cannot run this #21

alyosha-swamy opened this issue Jun 30, 2023 · 2 comments

Comments

@alyosha-swamy
Copy link

Trying to download the dataset w python and it doesn't seem to be working

`
The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "", line 1, in
File "/root/miniconda3/envs/evaporate/lib/python3.8/site-packages/datasets/load.py", line 1809, in load_dataset
builder_instance.download_and_prepare(
File "/root/miniconda3/envs/evaporate/lib/python3.8/site-packages/datasets/builder.py", line 909, in download_and_prepare
self._download_and_prepare(
File "/root/miniconda3/envs/evaporate/lib/python3.8/site-packages/datasets/builder.py", line 1004, in _download_and_prepare
self._prepare_split(split_generator, **prepare_split_kwargs)
File "/root/miniconda3/envs/evaporate/lib/python3.8/site-packages/datasets/builder.py", line 1767, in _prepare_split
for job_id, done, content in self._prepare_split_single(
File "/root/miniconda3/envs/evaporate/lib/python3.8/site-packages/datasets/builder.py", line 1912, in _prepare_split_single
raise DatasetGenerationError("An error occurred while generating the dataset") from e
datasets.builder.DatasetGenerationError: An error occurred while generating the dataset

dataset = load_dataset("hazyresearch/evaporate")
Downloading and preparing dataset json/hazyresearch--evaporate to /root/.cache/huggingface/datasets/hazyresearch___json/hazyresearch--evaporate-f5023e4a47e5b45a/0.0.0/8bb11242116d547c741b2e8a1f18598ffdd40a1d4f2a2872c7a28b697434bc96...
Downloading data files: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 1324.80it/s]
Extracting data files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 30.24it/s]
Traceback (most recent call last):
File "/root/miniconda3/envs/evaporate/lib/python3.8/site-packages/datasets/packaged_modules/json/json.py", line 121, in _generate_tables
pa_table = paj.read_json(
File "pyarrow/_json.pyx", line 258, in pyarrow._json.read_json
File "pyarrow/error.pxi", line 144, in pyarrow.lib.pyarrow_internal_check_status
File "pyarrow/error.pxi", line 100, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: JSON parse error: Invalid value. in row 0

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/root/miniconda3/envs/evaporate/lib/python3.8/site-packages/datasets/builder.py", line 1879, in _prepare_split_single
for _, table in generator:
File "/root/miniconda3/envs/evaporate/lib/python3.8/site-packages/datasets/packaged_modules/json/json.py", line 144, in _generate_tables
dataset = json.load(f)
File "/root/miniconda3/envs/evaporate/lib/python3.8/json/init.py", line 293, in load
return loads(fp.read(),
File "/root/miniconda3/envs/evaporate/lib/python3.8/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "", line 1, in
File "/root/miniconda3/envs/evaporate/lib/python3.8/site-packages/datasets/load.py", line 1809, in load_dataset
builder_instance.download_and_prepare(
File "/root/miniconda3/envs/evaporate/lib/python3.8/site-packages/datasets/builder.py", line 909, in download_and_prepare
self._download_and_prepare(
File "/root/miniconda3/envs/evaporate/lib/python3.8/site-packages/datasets/builder.py", line 1004, in _download_and_prepare
self._prepare_split(split_generator, **prepare_split_kwargs)
File "/root/miniconda3/envs/evaporate/lib/python3.8/site-packages/datasets/builder.py", line 1767, in _prepare_split
for job_id, done, content in self._prepare_split_single(
File "/root/miniconda3/envs/evaporate/lib/python3.8/site-packages/datasets/builder.py", line 1912, in _prepare_split_single
raise DatasetGenerationError("An error occurred while generating the dataset") from e
datasets.builder.DatasetGenerationError: An error occurred while generating the dataset

`
Here's the error

@alyosha-swamy
Copy link
Author

I get this when trying to run it downloaded the other way
`Traceback (most recent call last):
File "/root/miniconda3/envs/evaporate/lib/python3.8/site-packages/datasets/builder.py", line 1879, in _prepare_split_single
for _, table in generator:
File "/root/miniconda3/envs/evaporate/lib/python3.8/site-packages/datasets/packaged_modules/json/json.py", line 144, in _generate_tables
dataset = json.load(f)
File "/root/miniconda3/envs/evaporate/lib/python3.8/json/init.py", line 293, in load
return loads(fp.read(),
File "/root/miniconda3/envs/evaporate/lib/python3.8/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "", line 1, in
File "/root/miniconda3/envs/evaporate/lib/python3.8/site-packages/datasets/load.py", line 1809, in load_dataset
builder_instance.download_and_prepare(
File "/root/miniconda3/envs/evaporate/lib/python3.8/site-packages/datasets/builder.py", line 909, in download_and_prepare
self._download_and_prepare(
File "/root/miniconda3/envs/evaporate/lib/python3.8/site-packages/datasets/builder.py", line 1004, in _download_and_prepare
self._prepare_split(split_generator, **prepare_split_kwargs)
File "/root/miniconda3/envs/evaporate/lib/python3.8/site-packages/datasets/builder.py", line 1767, in _prepare_split
for job_id, done, content in self._prepare_split_single(
File "/root/miniconda3/envs/evaporate/lib/python3.8/site-packages/datasets/builder.py", line 1912, in _prepare_split_single
raise DatasetGenerationError("An error occurred while generating the dataset") from e
datasets.builder.DatasetGenerationError: An error occurred while generating the dataset

(evaporate) root@Raghav:~/evaporate/evaporate# bash run.sh
Data lake
Traceback (most recent call last):
File "run_profiler.py", line 476, in
main()
File "run_profiler.py", line 472, in main
run_experiment(profiler_args)
File "run_profiler.py", line 235, in run_experiment
_, _, _, _, args = get_structure(data_lake)
File "/root/evaporate/evaporate/utils.py", line 105, in get_structure
files = get_all_files(args.data_dir)
File "/root/evaporate/evaporate/utils.py", line 49, in get_all_files
for file in os.listdir(data_dir):
FileNotFoundError: [Errno 2] No such file or directory: '/data/evaporate/fda-ai-pmas/510k/'
Data lake
Traceback (most recent call last):
File "run_profiler.py", line 476, in
main()
File "run_profiler.py", line 472, in main
run_experiment(profiler_args)
File "run_profiler.py", line 235, in run_experiment
_, _, _, _, args = get_structure(data_lake)
File "/root/evaporate/evaporate/utils.py", line 105, in get_structure
files = get_all_files(args.data_dir)
File "/root/evaporate/evaporate/utils.py", line 49, in get_all_files
for file in os.listdir(data_dir):
FileNotFoundError: [Errno 2] No such file or directory: '/data/evaporate/fda-ai-pmas/510k/'`

@brando90
Copy link

@alyosha-swamy try this: #21 let me know if it works or not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants