You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
importtensorflowastfimportkerastpu=tf.distribute.cluster_resolver.TPUClusterResolver(tpu="local")
strategy=tf.distribute.TPUStrategy(tpu)
withstrategy.scope():
# Construct and compile an instance of CustomModelinputs=keras.Input(shape=(32,))
outputs=keras.layers.Dense(1)(inputs)
model=keras.Model(inputs, outputs)
model.compile(optimizer="adam", loss="mse", metrics=["mae"])
# Just use `fit` as usualx=np.random.random((1000, 32))
y=np.random.random((1000, 1))
model.fit(x, y, epochs=3)
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1712289536.759567 13 device_compiler.h:186] Compiled cluster using XLA! This line is logged at most once for the lifetime of the process.
Epoch 1/3
---------------------------------------------------------------------------
NotFoundError Traceback (most recent call last)
Cell In[6], line 11
9 x = np.random.random((1000,32))
10 y = np.random.random((1000,1))
---> 11 model.fit(x, y, epochs=3)
File /usr/local/lib/python3.10/site-packages/keras/src/utils/traceback_utils.py:123, in filter_traceback.<locals>.error_handler(*args, **kwargs)
120 filtered_tb = _process_traceback_frames(e.__traceback__)
121 # To get the full stack trace, call:
122 # `keras.config.disable_traceback_filtering()`
--> 123 raise e.with_traceback(filtered_tb) from None
124 finally:
125 del filtered_tb
File /usr/local/lib/python3.10/site-packages/tensorflow/python/eager/execute.py:53, in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
51 try:
52 ctx.ensure_initialized()
---> 53 tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
54 inputs, attrs, num_outputs)
55 except core._NotOkStatusException as e:
56 if name is not None:
NotFoundError: Graph execution error:
Detected at node TPUReplicate/_compile/_15189418723048853925/_4 defined at (most recent call last):
<stack traces unavailable>
Detected at node TPUReplicate/_compile/_15189418723048853925/_4 defined at (most recent call last):
<stack traces unavailable>
Detected at node TPUReplicate/_compile/_15189418723048853925/_4 defined at (most recent call last):
<stack traces unavailable>
Detected at node TPUReplicate/_compile/_15189418723048853925/_4 defined at (most recent call last):
<stack traces unavailable>
Detected at node TPUReplicate/_compile/_15189418723048853925/_4 defined at (most recent call last):
<stack traces unavailable>
Detected at node TPUReplicate/_compile/_15189418723048853925/_4 defined at (most recent call last):
<stack traces unavailable>
Detected at node TPUReplicate/_compile/_15189418723048853925/_4 defined at (most recent call last):
<stack traces unavailable>
Detected at node TPUReplicate/_compile/_15189418723048853925/_4 defined at (most recent call last):
<stack traces unavailable>
Detected at node TPUReplicate/_compile/_15189418723048853925/_4 defined at (most recent call last):
<stack traces unavailable>
9 root error(s) found.
(0) NOT_FOUND: XLA:TPU compile permanent error. Container localhost does not exist. (Could not find resource: localhost/tpu_mesh_common_state)
[[{{node TPUReplicate/_compile/_15189418723048853925/_4}}]]
(1) NOT_FOUND: XLA:TPU compile permanent error. Container localhost does not exist. (Could not find resource: localhost/tpu_mesh_common_state)
[[{{node TPUReplicate/_compile/_15189418723048853925/_4}}]]
[[TPUReplicate/_compile/_15189418723048853925/_4/_316]]
(2) NOT_FOUND: XLA:TPU compile permanent error. Container localhost does not exist. (Could not find resource: localhost/tpu_mesh_common_state)
[[{{node TPUReplicate/_compile/_15189418723048853925/_4}}]]
[[TPUReplicate/_compile/_15189418723048853925/_4/_316]]
[[tpu_compile_succeeded_assert/_15801172523729505459/_5/_255]]
(3) NOT_FOUND: XLA:TPU compile permanent error. Container localhost does not exist. (Could not find resource: localhost/tpu_mesh_common_state)
[[{{node TPUReplicate/_compile/_15189418723048853925/_4}}]]
[[TPUReplicate/_compile/_15189418723048853925/_4/_316]]
[[tpu_compile_succeeded_assert/_15801172523729505459/_5/_255]]
[[tpu_compile_succeeded_assert/_15801172523729505459/_5/_271]]
(4) NOT_FOUND: XLA:TPU compile permanent error. Container localhost does not exist. (Could not find resource: localhost/tpu_mesh_common_state)
[[{{node TPUReplicate/_compile/_15189418723048853925/_4}}]]
[[TPUReplicate/_compile/_15189418723048853925/_4/_316]]
[[tpu_compile_succeeded_assert/_15801172523729505459/_5/_255]]
[[tpu_compile_succeeded_assert/_15801172523729505459/_5/_271]]
[[cluster_one_step_on_iterator/control_after/_1/_387]]
(5) NOT_FOUND: XLA:TPU compile permanent error. Container localhost does not exist. (Could not find resource: localhost/tpu_mesh_common_state)
[[{{node TPUReplicate/_compile/_15189418723048853925/_4}}]]
[[TPUReplicate/_compile/_15189418723048853925/_4/_316]]
[[tpu_compile_succeeded_assert/_15801172523729505459/_5/_255]]
[[tpu_compile_succeeded_assert/_15801172523729505459/_5/_271]]
[[cluster_one_step_on_iterator/control_after/_1/_387]]
[[TPUReplicate/_compile/_15189418723048853925/_4/_220]]
(6) NOT_FOUND: XLA:TPU compile permanent error. Container localhost does not exist. (Could not find resource: localhost/tpu_mesh_common_state)
[[{{node TPUReplicate/_compile/_15189418723048853925/_4}}]]
[[TPUReplicate/_compile/_15189418723048853925/_4/_316]]
[[tpu_compile_succeeded_assert/_15801172523729505459/_5/_255]]
[[tpu_compile_succeeded_assert/_15801172523729505459/_5/_271]]
[[cluster_one_step_on_iterator/control_after/_1/_387]]
[[TPUReplicate/_compile/_15189418723048853925/_4/_220]]
[[TPUReplicate/_compile/_15189418723048853925/_4/_284]]
(7) NOT_FOUND: XLA:TPU compile permanent error. Container localhost does not exist. (Could not find resource: localhost/tpu_mesh_common_state)
[[{{node TPUReplicate/_compile/_15189418723048853925/_4}}]]
[[TPUReplicate/_compile/_15189418723048853925/_4/_316]]
[[tpu_compile_succeeded_assert/_15801172523729505459/_5/_255]]
[[tpu_compile_succeeded_assert/_15801172523729505459/_5/_271]]
[[cluster_one_step_on_iterator/control_after/_1/_387]]
[[TPUReplicate/_compile/_15189418723048853925/_4/_220]]
[[TPUReplicate/_compile/_15189418723048853925/_4/_284]]
[[TPUReplicate/_compile/_15189418723048853925/_4/_236]]
(8) NOT_FOUND: XLA:TPU compile permanent error. Container localhost does not exist. (Could not find resource: localhost/tpu_mesh_common_state)
[[{{node TPUReplicate/_compile/_15189418723048853925/_4}}]]
[[TPUReplicate/_compile/_15189418723048853925/_4/_316]]
[[tpu_compile_succeeded_assert/_15801172523729505459/_5/_255]]
[[tpu_compile_succeeded_assert/_15801172523729505459/_5/_271]]
[[cluster_one_step_on_iterator/control_after/_1/_387]]
[[TPUReplicate/_compile/_15189418723048853925/_4/_220]]
[[TPUReplicate/_compile/_15189418723048853925/_4/_284]]
[[TPUReplicate/_compile/_15189418723048853925/_4/_236]]
[[tpu_compile_succeeded_assert/_15801172523729505459/_5/_303]]
0 successful operations.
0 derived errors ignored. [Op:__inference_one_step_on_iterator_2865]
The text was updated successfully, but these errors were encountered:
I have tested on colab TPU environment with TF 2.15 and Keras 3.1.1. and it seems working fine as per attached gist. Could you please cross check with keras 3.1.1 and update us ? Thanks!
There seems some conflict to use keras 3 in tpu-vm. Kaggle/docker-python#1370 (comment)
The text was updated successfully, but these errors were encountered: