Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RecursionError occurs in Dask Distributed #8817

Open
naoyaikeda opened this issue Aug 6, 2024 · 3 comments
Open

RecursionError occurs in Dask Distributed #8817

naoyaikeda opened this issue Aug 6, 2024 · 3 comments

Comments

@naoyaikeda
Copy link

naoyaikeda commented Aug 6, 2024

Describe the issue:

I was creating a PoC for a code that involves seeking solutions to simultaneous linear equations using matrices and created the following code, which generated a RecursinError.

Minimal Complete Verifiable Example:

import dask.array as da
from distributed import Client

remote_server = 'localhost:8786'

client = Client(remote_server)

rows, cols = 4123, 4123
chunk_rows, chunk_cols = 1024, 1024

matrix1 = da.random.random(size=(rows, cols), chunks=(chunk_rows, chunk_cols))
matrix2 = da.random.random(size=(rows, cols), chunks=(chunk_rows, chunk_cols))

print(matrix1.compute_chunk_sizes())
print(matrix1.compute_chunk_sizes())

sz = matrix1.shape
dim0 = sz[0]
dim1 = sz[1]
chunk_remain = dim1 % chunk_rows
lack_rows = chunk_rows - chunk_remain
lack_cols = lack_rows

# Find the missing component where splitting into chunks does not make the chunks square

lack_mat0 = da.zeros(shape=[lack_cols, dim1], chunks=(chunk_rows, chunk_cols))
lack_mat1 = da.zeros(shape=[dim0 + lack_cols, lack_rows], chunks=(chunk_rows, chunk_cols))

# Combine generated components

new_arr0 = da.append(matrix1, lack_mat0, axis=0)
new_arr1 = da.append(new_arr0, lack_mat1, axis=1)

new_matrix1 = new_arr1

sz = matrix2.shape
dim0 = sz[0]
dim1 = sz[1]
chunk_remain = dim0 % chunk_rows
lack_rows = chunk_rows - chunk_remain
lack_cols = lack_rows

# Find the missing component where splitting into chunks does not make the chunks square

lack_mat0 = da.zeros(shape=[lack_cols, dim1], chunks=(chunk_rows, chunk_cols))
lack_mat1 = da.zeros(shape=[dim0 + lack_cols, lack_rows], chunks=(chunk_rows, chunk_cols))

# Combine generated components

new_arr0 = da.append(matrix2, lack_mat0, axis=0)
new_arr1 = da.append(new_arr0, lack_mat1, axis=1)

new_matrix2 = new_arr1

# Unit matrix the added component parts

for i in range(rows, new_matrix1.shape[1]):
    new_matrix1[i, i] = 1.0
    new_matrix2[i, i] = 1.0

# Reorganize chunks

new_new_matrix1 = new_matrix1.rechunk((chunk_rows, chunk_cols))
new_new_matrix2 = new_matrix1.rechunk((chunk_rows, chunk_cols))

result_graph  = da.linalg.solve(new_new_matrix1, new_new_matrix2)

print(result_graph.compute_chunk_sizes())

result_future = client.compute(result_graph)

result_result = client.gather(result_future)

print(result_result)

Anything else we need to know?:

Based on the errors, I created this modified code and the errors no longer occur, but when I output the graph in SVG, it is quite complex and I am wondering if it will work stably.

import dask.array as da
from distributed import Client

remote_server = 'localhost:8786'

client = Client(remote_server)

rows, cols = 4123, 4123
chunk_rows, chunk_cols = 1024, 1024
padded_rows, padded_cols = 8192, 8192

matrix1 = da.random.random(size=(rows, cols), chunks=(chunk_rows, chunk_cols))
matrix2 = da.random.random(size=(rows, cols), chunks=(chunk_rows, chunk_cols))

print(matrix1.compute_chunk_sizes())
print(matrix2.compute_chunk_sizes())

sz = matrix1.shape
dim0 = sz[0]
dim1 = sz[1]

padremain_rows = padded_rows - dim0
padremain_cols = padded_cols - dim1

matrix1_padded = da.pad(matrix1, ((0, padremain_rows), (0, padremain_cols)), mode='constant')
matrix2_padded = da.pad(matrix2, ((0, padremain_rows), (0, padremain_cols)), mode='constant')

def set_diagonal_to_one(block, block_id=None):
    block = block.copy()

    i_start, i_stop = block_id[0] * block.shape[0], (block_id[0] + 1) * block.shape[0]
    j_start, j_stop = block_id[1] * block.shape[1], (block_id[1] + 1) * block.shape[1]

    for i in range(block.shape[0]):
        if i_start + i >= rows and i_start + i == j_start + i:
            block[i, i] = 1

    return block

matrix1_padded = matrix1_padded.map_blocks(set_diagonal_to_one, dtype=matrix1_padded.dtype)
matrix2_padded = matrix2_padded.map_blocks(set_diagonal_to_one, dtype=matrix2_padded.dtype)

new_matrix1 = matrix1_padded.rechunk((chunk_rows, chunk_cols))
new_matrix2 = matrix2_padded.rechunk((chunk_rows, chunk_cols))

result_graph  = da.linalg.solve(new_matrix1, new_matrix2)

print(result_graph.compute_chunk_sizes())

result_future = client.compute(result_graph)

result_result = client.gather(result_future)

print(result_result)

Environment:

  • Dask version: 2024.6.0
  • Python version: 3.10
  • Operating System: Ubuntu 22.04 on Windows 11
  • Install method (conda, pip, source): pip
@jacobtomlinson
Copy link
Member

Could you also share the error?

@naoyaikeda
Copy link
Author

naoyaikeda commented Aug 7, 2024

Okay I share,

2024-08-07 09:10:20,199 - distributed.scheduler - INFO - Receive client connection: Client-6e77b63a-5451-11ef-a738-00155d29fb9f
2024-08-07 09:10:20,260 - distributed.core - INFO - Starting established connection to tcp://127.0.0.1:39900
2024-08-07 09:10:24,947 - distributed.protocol.pickle - ERROR - Failed to serialize <ToPickle: HighLevelGraph with 1 layers.
<dask.highlevelgraph.HighLevelGraph object at 0x77b6e8368340>
 0. 131627357574400
>.
Traceback (most recent call last):
  File "/home/gorn/Projects/Tohoku-U/solvetest/venv/lib/python3.10/site-packages/distributed/protocol/pickle.py", line 63, in dumps
    result = pickle.dumps(x, **dump_kwargs)
RecursionError: maximum recursion depth exceeded while pickling an object

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/gorn/Projects/Tohoku-U/solvetest/venv/lib/python3.10/site-packages/distributed/protocol/pickle.py", line 68, in dumps
    pickler.dump(x)
RecursionError: maximum recursion depth exceeded while pickling an object

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/gorn/Projects/Tohoku-U/solvetest/venv/lib/python3.10/site-packages/cloudpickle/cloudpickle.py", line 1245, in dump
    return super().dump(obj)
RecursionError: maximum recursion depth exceeded while pickling an object

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/gorn/Projects/Tohoku-U/solvetest/venv/lib/python3.10/site-packages/distributed/protocol/pickle.py", line 81, in dumps
    result = cloudpickle.dumps(x, **dump_kwargs)
  File "/home/gorn/Projects/Tohoku-U/solvetest/venv/lib/python3.10/site-packages/cloudpickle/cloudpickle.py", line 1479, in dumps
    cp.dump(obj)
  File "/home/gorn/Projects/Tohoku-U/solvetest/venv/lib/python3.10/site-packages/cloudpickle/cloudpickle.py", line 1249, in dump
    raise pickle.PicklingError(msg) from e
_pickle.PicklingError: Could not pickle object as excessively deep recursion required.
dask.array<random_sample, shape=(4123, 4123), dtype=float64, chunksize=(1024, 1024), chunktype=numpy.ndarray>
dask.array<random_sample, shape=(4123, 4123), dtype=float64, chunksize=(1024, 1024), chunktype=numpy.ndarray>
Traceback (most recent call last):
  File "/home/gorn/Projects/Tohoku-U/solvetest/venv/lib/python3.10/site-packages/distributed/protocol/pickle.py", line 63, in dumps
    result = pickle.dumps(x, **dump_kwargs)
RecursionError: maximum recursion depth exceeded while pickling an object

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/gorn/Projects/Tohoku-U/solvetest/venv/lib/python3.10/site-packages/distributed/protocol/pickle.py", line 68, in dumps
    pickler.dump(x)
RecursionError: maximum recursion depth exceeded while pickling an object

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/gorn/Projects/Tohoku-U/solvetest/venv/lib/python3.10/site-packages/cloudpickle/cloudpickle.py", line 1245, in dump
    return super().dump(obj)
RecursionError: maximum recursion depth exceeded while pickling an object

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/gorn/Projects/Tohoku-U/solvetest/venv/lib/python3.10/site-packages/distributed/protocol/serialize.py", line 366, in serialize
    header, frames = dumps(x, context=context) if wants_context else dumps(x)
  File "/home/gorn/Projects/Tohoku-U/solvetest/venv/lib/python3.10/site-packages/distributed/protocol/serialize.py", line 78, in pickle_dumps
    frames[0] = pickle.dumps(
  File "/home/gorn/Projects/Tohoku-U/solvetest/venv/lib/python3.10/site-packages/distributed/protocol/pickle.py", line 81, in dumps
    result = cloudpickle.dumps(x, **dump_kwargs)
  File "/home/gorn/Projects/Tohoku-U/solvetest/venv/lib/python3.10/site-packages/cloudpickle/cloudpickle.py", line 1479, in dumps
    cp.dump(obj)
  File "/home/gorn/Projects/Tohoku-U/solvetest/venv/lib/python3.10/site-packages/cloudpickle/cloudpickle.py", line 1249, in dump
    raise pickle.PicklingError(msg) from e
_pickle.PicklingError: Could not pickle object as excessively deep recursion required.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/gorn/Projects/Tohoku-U/solvetest/solve-test-104.py", line 68, in <module>
    print(result_graph.compute_chunk_sizes())
  File "/home/gorn/Projects/Tohoku-U/solvetest/venv/lib/python3.10/site-packages/dask/array/core.py", line 1500, in compute_chunk_sizes
    tuple(int(chunk) for chunk in chunks) for chunks in compute(tuple(c))[0]
  File "/home/gorn/Projects/Tohoku-U/solvetest/venv/lib/python3.10/site-packages/dask/base.py", line 662, in compute
    results = schedule(dsk, keys, **kwargs)
  File "/home/gorn/Projects/Tohoku-U/solvetest/venv/lib/python3.10/site-packages/distributed/protocol/serialize.py", line 392, in serialize
    raise TypeError(msg, str_x) from exc
TypeError: ('Could not serialize object of type HighLevelGraph', '<ToPickle: HighLevelGraph with 1 layers.\n<dask.highlevelgraph.HighLevelGraph object at 0x77b6e8368340>\n 0. 131627357574400\n>')
2024-08-07 09:10:24,968 - distributed.scheduler - INFO - Remove client Client-6e77b63a-5451-11ef-a738-00155d29fb9f
2024-08-07 09:10:24,969 - distributed.core - INFO - Received 'close-stream' from tcp://127.0.0.1:39900; closing.
2024-08-07 09:10:24,971 - distributed.scheduler - INFO - Remove client Client-6e77b63a-5451-11ef-a738-00155d29fb9f
2024-08-07 09:10:24,974 - distributed.scheduler - INFO - Close client connection: Client-6e77b63a-5451-11ef-a738-00155d29fb9f

If remove a compute chunk line, finally occur error.

2024-08-07 09:12:42,641 - distributed.scheduler - INFO - Receive client connection: Client-c3ae6acc-5451-11ef-a993-00155d29fb9f
2024-08-07 09:12:42,643 - distributed.core - INFO - Starting established connection to tcp://127.0.0.1:60694
2024-08-07 09:12:45,762 - distributed.protocol.pickle - ERROR - Failed to serialize <ToPickle: HighLevelGraph with 2 layers.
<dask.highlevelgraph.HighLevelGraph object at 0x7b5482e20910>
 0. 135602911593152
 1. edc5d9263cb7446c2b0da59729f7e42c
>.
Traceback (most recent call last):
  File "/home/gorn/Projects/Tohoku-U/solvetest/venv/lib/python3.10/site-packages/distributed/protocol/pickle.py", line 63, in dumps
    result = pickle.dumps(x, **dump_kwargs)
RecursionError: maximum recursion depth exceeded while pickling an object

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/gorn/Projects/Tohoku-U/solvetest/venv/lib/python3.10/site-packages/distributed/protocol/pickle.py", line 68, in dumps
    pickler.dump(x)
RecursionError: maximum recursion depth exceeded while pickling an object

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/gorn/Projects/Tohoku-U/solvetest/venv/lib/python3.10/site-packages/cloudpickle/cloudpickle.py", line 1245, in dump
    return super().dump(obj)
RecursionError: maximum recursion depth exceeded while pickling an object

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/gorn/Projects/Tohoku-U/solvetest/venv/lib/python3.10/site-packages/distributed/protocol/pickle.py", line 81, in dumps
    result = cloudpickle.dumps(x, **dump_kwargs)
  File "/home/gorn/Projects/Tohoku-U/solvetest/venv/lib/python3.10/site-packages/cloudpickle/cloudpickle.py", line 1479, in dumps
    cp.dump(obj)
  File "/home/gorn/Projects/Tohoku-U/solvetest/venv/lib/python3.10/site-packages/cloudpickle/cloudpickle.py", line 1249, in dump
    raise pickle.PicklingError(msg) from e
_pickle.PicklingError: Could not pickle object as excessively deep recursion required.
dask.array<random_sample, shape=(4123, 4123), dtype=float64, chunksize=(1024, 1024), chunktype=numpy.ndarray>
dask.array<random_sample, shape=(4123, 4123), dtype=float64, chunksize=(1024, 1024), chunktype=numpy.ndarray>
Traceback (most recent call last):
  File "/home/gorn/Projects/Tohoku-U/solvetest/venv/lib/python3.10/site-packages/distributed/protocol/pickle.py", line 63, in dumps
    result = pickle.dumps(x, **dump_kwargs)
RecursionError: maximum recursion depth exceeded while pickling an object

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/gorn/Projects/Tohoku-U/solvetest/venv/lib/python3.10/site-packages/distributed/protocol/pickle.py", line 68, in dumps
    pickler.dump(x)
RecursionError: maximum recursion depth exceeded while pickling an object

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/gorn/Projects/Tohoku-U/solvetest/venv/lib/python3.10/site-packages/cloudpickle/cloudpickle.py", line 1245, in dump
    return super().dump(obj)
RecursionError: maximum recursion depth exceeded while pickling an object

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/gorn/Projects/Tohoku-U/solvetest/venv/lib/python3.10/site-packages/distributed/protocol/serialize.py", line 366, in serialize
    header, frames = dumps(x, context=context) if wants_context else dumps(x)
  File "/home/gorn/Projects/Tohoku-U/solvetest/venv/lib/python3.10/site-packages/distributed/protocol/serialize.py", line 78, in pickle_dumps
    frames[0] = pickle.dumps(
  File "/home/gorn/Projects/Tohoku-U/solvetest/venv/lib/python3.10/site-packages/distributed/protocol/pickle.py", line 81, in dumps
    result = cloudpickle.dumps(x, **dump_kwargs)
  File "/home/gorn/Projects/Tohoku-U/solvetest/venv/lib/python3.10/site-packages/cloudpickle/cloudpickle.py", line 1479, in dumps
    cp.dump(obj)
  File "/home/gorn/Projects/Tohoku-U/solvetest/venv/lib/python3.10/site-packages/cloudpickle/cloudpickle.py", line 1249, in dump
    raise pickle.PicklingError(msg) from e
_pickle.PicklingError: Could not pickle object as excessively deep recursion required.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/gorn/Projects/Tohoku-U/solvetest/solve-test-104a.py", line 68, in <module>
    result_future = client.compute(result_graph)
  File "/home/gorn/Projects/Tohoku-U/solvetest/venv/lib/python3.10/site-packages/distributed/client.py", line 3675, in compute
    futures_dict = self._graph_to_futures(
  File "/home/gorn/Projects/Tohoku-U/solvetest/venv/lib/python3.10/site-packages/distributed/client.py", line 3351, in _graph_to_futures
    header, frames = serialize(ToPickle(dsk), on_error="raise")
  File "/home/gorn/Projects/Tohoku-U/solvetest/venv/lib/python3.10/site-packages/distributed/protocol/serialize.py", line 392, in serialize
    raise TypeError(msg, str_x) from exc
TypeError: ('Could not serialize object of type HighLevelGraph', '<ToPickle: HighLevelGraph with 2 layers.\n<dask.highlevelgraph.HighLevelGraph object at 0x7b5482e20910>\n 0. 135602911593152\n 1. edc5d9263cb7446c2b0da59729f7e42c\n>')
2024-08-07 09:12:45,771 - distributed.scheduler - INFO - Remove client Client-c3ae6acc-5451-11ef-a993-00155d29fb9f
2024-08-07 09:12:45,771 - distributed.core - INFO - Received 'close-stream' from tcp://127.0.0.1:60694; closing.
2024-08-07 09:12:45,775 - distributed.scheduler - INFO - Remove client Client-c3ae6acc-5451-11ef-a993-00155d29fb9f
2024-08-07 09:12:45,777 - distributed.scheduler - INFO - Close client connection: Client-c3ae6acc-5451-11ef-a993-00155d29fb9f

@jacobtomlinson
Copy link
Member

This may be a duplicate of #8378.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants