-
Notifications
You must be signed in to change notification settings - Fork 144
Failing to use mp execution #4
Comments
Sorry for the delay, I'm just back from paternity leave. Did you solve your issue in the meantime? I've never seen such error. Can you share more information about your execution environment ? OS and specific python version ? Thanks. |
I think the error is that you can't have a Multiprocessing process that starts its own pool of process. Can you try those ? |
Just type the command |
Hi, I got similar error when using execution mp as follows: Traceback (most recent call last):
File "/usr/lib/python3.7/multiprocessing/pool.py", line 121, in worker
result = (True, func(*args, **kwds))
File "/ssd2/dongzhe/cc_net/cc_net/execution.py", line 145, in global_fn
return f(*args[1:])
File "/ssd2/dongzhe/cc_net/cc_net/mine.py", line 218, in _hashes_shard
file=conf.get_cc_shard(shard),
File "/ssd2/dongzhe/cc_net/cc_net/jsonql.py", line 449, in run_pipes
for res in results:
File "/ssd2/dongzhe/cc_net/cc_net/jsonql.py", line 296, in map
for x in source:
File "/ssd2/dongzhe/cc_net/cc_net/process_wet_file.py", line 199, in __iter__
for doc in parse_warc_file(iter(f), self.min_len):
File "/ssd2/dongzhe/cc_net/cc_net/process_wet_file.py", line 117, in parse_warc_file
for doc in group_by_docs(lines):
File "/ssd2/dongzhe/cc_net/cc_net/process_wet_file.py", line 89, in group_by_docs
for warc in warc_lines:
File "/usr/lib/python3.7/gzip.py", line 300, in read1
return self._buffer.read1(size)
File "/usr/lib/python3.7/_compression.py", line 68, in readinto
data = self.read(len(byte_view))
File "/usr/lib/python3.7/gzip.py", line 493, in read
raise EOFError("Compressed file ended before the "
EOFError: Compressed file ended before the end-of-stream marker was reached
"""
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/ssd2/dongzhe/cc_net/cc_net/__main__.py", line 24, in <module>
main()
File "/ssd2/dongzhe/cc_net/cc_net/__main__.py", line 20, in main
func_argparse.parse_and_call(parser)
File "/home/dongzhe/dz_venv_3.7/lib/python3.7/site-packages/func_argparse/__init__.py", line 72, in parse_and_call
return command(**parsed_args)
File "/ssd2/dongzhe/cc_net/cc_net/mine.py", line 509, in main
regroup(conf)
File "/ssd2/dongzhe/cc_net/cc_net/mine.py", line 364, in regroup
mine(conf)
File "/ssd2/dongzhe/cc_net/cc_net/mine.py", line 257, in mine
hashes_groups = list(jsonql.grouper(hashes(conf), conf.hash_in_mem))
File "/ssd2/dongzhe/cc_net/cc_net/mine.py", line 206, in hashes
ex(_hashes_shard, repeat(conf), *_transpose(missing_outputs))
File "/ssd2/dongzhe/cc_net/cc_net/execution.py", line 174, in __call__
global_fn, zip(itertools.repeat(f_name), *args)
File "/usr/lib/python3.7/multiprocessing/pool.py", line 748, in next
raise value
EOFError: Compressed file ended before the end-of-stream marker was reached when I ran the command: python -m cc_net mine --config config/marathon.json where the marathon.json looks like: {
"dump": "2019-09",
"num_shards": 1600,
"lang_whitelist": ["en", "ja", "zh"],
"lm_languages": ["en", "ja", "zh"],
"mine_num_processes": 1,
"execution": "mp",
"num_segments_per_shard": -1,
"task_parallelism": 96,
"target_size": "4G"
} btw, |
I am trying to use the MPExecutor but I am getting the following error:
I am running the following command
And this is my config file:
The text was updated successfully, but these errors were encountered: