-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update DTensor docs, lint notebooks #2276
Changes from 2 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -37,7 +37,7 @@ | |
"id": "VcQIa1uG86Wh" | ||
}, | ||
"source": [ | ||
"# DTensor Concepts" | ||
"# DTensor concepts" | ||
] | ||
}, | ||
{ | ||
|
@@ -76,7 +76,7 @@ | |
"\n", | ||
"By decoupling the application from sharding directives, DTensor enables running the same application on a single device, multiple devices, or even multiple clients, while preserving its global semantics.\n", | ||
"\n", | ||
"This guide introduces DTensor concepts for distributed computing, and how DTensor integrates with TensorFlow. To see a demo of using DTensor in model training, see [Distributed training with DTensor](https://www.tensorflow.org/tutorials/distribute/dtensor_ml_tutorial) tutorial." | ||
"This guide introduces DTensor concepts for distributed computing, and how DTensor integrates with TensorFlow. For a demo of using DTensor in model training, refer to the [Distributed training with DTensor](../tutorials/distribute/dtensor_ml_tutorial.ipynb) tutorial." | ||
] | ||
}, | ||
{ | ||
|
@@ -87,7 +87,9 @@ | |
"source": [ | ||
"## Setup\n", | ||
"\n", | ||
"DTensor is part of TensorFlow 2.9.0 release, and also included in the TensorFlow nightly builds since 04/09/2022." | ||
"DTensor (`tf.experimental.dtensor`) has been part of TensorFlow since the 2.9.0 release.\n", | ||
"\n", | ||
"First, install or upgrade TensorFlow:" | ||
] | ||
}, | ||
{ | ||
|
@@ -98,7 +100,7 @@ | |
}, | ||
"outputs": [], | ||
"source": [ | ||
"!pip install --quiet --upgrade --pre tensorflow" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @MarkDaoust removed the "--pre" |
||
"!pip install --quiet --upgrade tensorflow" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We usually don't bother with installing tensorflow. It was only here because the pre-installed version was insufficient. |
||
] | ||
}, | ||
{ | ||
|
@@ -107,9 +109,9 @@ | |
"id": "O3pG29uZIWYO" | ||
}, | ||
"source": [ | ||
"Once installed, import `tensorflow` and `tf.experimental.dtensor`. Then configure TensorFlow to use 6 virtual CPUs.\n", | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @MarkDaoust full API with Later, "vCPU" -> "virtual CPU" |
||
"Then, import `tensorflow` and `dtensor`, and configure TensorFlow to use 6 virtual CPUs.\n", | ||
"\n", | ||
"Even though this example uses vCPUs, DTensor works the same way on CPU, GPU or TPU devices." | ||
"Even though this example uses virtual CPUs, DTensor works the same way on CPU, GPU or TPU devices." | ||
] | ||
}, | ||
{ | ||
|
@@ -343,7 +345,7 @@ | |
"id": "TTalu6M-ISYb" | ||
}, | ||
"source": [ | ||
"### Single-Client and Multi-Client Applications\n", | ||
"### Single-client and multi-client applications\n", | ||
"\n", | ||
"DTensor supports both single-client and multi-client applications. The colab Python kernel is an example of a single client DTensor application, where there is a single Python process.\n", | ||
"\n", | ||
|
@@ -365,7 +367,8 @@ | |
"source": [ | ||
"## DTensor as a sharded tensor\n", | ||
"\n", | ||
"Now let's start coding with `DTensor`. The helper function, `dtensor_from_array`, demonstrates creating DTensors from something that looks like a `tf.Tensor`. The function performs 2 steps:\n", | ||
"Now, start coding with `DTensor`. The helper function, `dtensor_from_array`, demonstrates creating DTensors from something that looks like a `tf.Tensor`. The function performs two steps:\n", | ||
"\n", | ||
" - Replicates the tensor to every device on the mesh.\n", | ||
" - Shards the copy according to the layout requested in its arguments." | ||
] | ||
|
@@ -410,7 +413,7 @@ | |
" - A `Layout`, which defines the `Mesh` the `Tensor` belongs to, and how the `Tensor` is sharded onto the `Mesh`.\n", | ||
" - A list of **component tensors**, one item per local device in the `Mesh`.\n", | ||
"\n", | ||
"With `dtensor_from_array`, you can create your first DTensor, `my_first_dtensor`, and examine its contents." | ||
"With `dtensor_from_array`, you can create your first DTensor, `my_first_dtensor`, and examine its contents:" | ||
] | ||
}, | ||
{ | ||
|
@@ -426,7 +429,7 @@ | |
"\n", | ||
"my_first_dtensor = dtensor_from_array([0, 1], layout)\n", | ||
"\n", | ||
"# Examine the dtensor content\n", | ||
"# Examine the DTensor content\n", | ||
"print(my_first_dtensor)\n", | ||
"print(\"global shape:\", my_first_dtensor.shape)\n", | ||
"print(\"dtype:\", my_first_dtensor.dtype)" | ||
|
@@ -440,7 +443,7 @@ | |
"source": [ | ||
"#### Layout and `fetch_layout`\n", | ||
"\n", | ||
"The layout of a DTensor is not a regular attribute of `tf.Tensor`. Instead, DTensor provides a function, `dtensor.fetch_layout` to access the layout of a DTensor." | ||
"The layout of a DTensor is not a regular attribute of `tf.Tensor`. Instead, DTensor provides a function, `dtensor.fetch_layout` to access the layout of a DTensor:" | ||
] | ||
}, | ||
{ | ||
|
@@ -499,7 +502,7 @@ | |
"source": [ | ||
"The inverse operation of `dtensor.unpack` is `dtensor.pack`. Component tensors can be packed back into a DTensor.\n", | ||
"\n", | ||
"The components must have the same rank and dtype, which will be the rank and dtype of the returned DTensor. However there is no strict requirement on the device placement of component tensors as inputs of `dtensor.unpack`: the function will automatically copy the component tensors to their respective corresponding devices.\n" | ||
"The components must have the same rank and dtype, which will be the rank and dtype of the returned DTensor. However, there is no strict requirement on the device placement of component tensors as inputs of `dtensor.unpack`: the function will automatically copy the component tensors to their respective corresponding devices.\n" | ||
] | ||
}, | ||
{ | ||
|
@@ -528,7 +531,7 @@ | |
"\n", | ||
"So far you've worked with the `my_first_dtensor`, which is a rank-1 DTensor fully replicated across a dim-1 `Mesh`.\n", | ||
"\n", | ||
"Next create and inspect DTensors that are sharded across a dim-2 `Mesh`. The next example does this with a 3x2 `Mesh` on 6 CPU devices, where size of mesh dimension `'x'` is 3 devices, and size of mesh dimension`'y'` is 2 devices." | ||
"Next, create and inspect DTensors that are sharded across a dim-2 `Mesh`. The following example does this with a 3x2 `Mesh` on 6 CPU devices, where size of mesh dimension `'x'` is 3 devices, and size of mesh dimension`'y'` is 2 devices:" | ||
] | ||
}, | ||
{ | ||
|
@@ -620,7 +623,7 @@ | |
" - 1st axis sharded along the `'x'` mesh dimension.\n", | ||
" - 2nd axis replicated along the `'y'` mesh dimension.\n", | ||
"\n", | ||
"To achieve this sharding scheme, you just need to replace the sharding spec of the 2nd axis from `'y'` to `dtensor.UNSHARDED`, to indicate your intention of replicating along the 2nd axis. The layout object will look like `Layout(['x', dtensor.UNSHARDED], mesh)`." | ||
"To achieve this sharding scheme, you just need to replace the sharding spec of the 2nd axis from `'y'` to `dtensor.UNSHARDED`, to indicate your intention of replicating along the 2nd axis. The layout object will look like `Layout(['x', dtensor.UNSHARDED], mesh)`:" | ||
] | ||
}, | ||
{ | ||
|
@@ -659,7 +662,7 @@ | |
"source": [ | ||
"#### Tensor.numpy() and sharded DTensor\n", | ||
"\n", | ||
"Be aware that calling the `.numpy()` method on a sharded DTensor raises an error. The rationale for erroring is to protect against unintended gathering of data from multiple computing devices to the host CPU device backing the returned numpy array." | ||
"Be aware that calling the `.numpy()` method on a sharded DTensor raises an error. The rationale for erroring is to protect against unintended gathering of data from multiple computing devices to the host CPU device backing the returned NumPy array:" | ||
] | ||
}, | ||
{ | ||
|
@@ -704,8 +707,9 @@ | |
"Note: DTensor is still an experimental API which means you will be exploring and pushing the boundaries and limits of the DTensor programming model.\n", | ||
"\n", | ||
"There are 2 ways of triggering DTensor execution:\n", | ||
" - DTensor as operands of a Python function, e.g. `tf.matmul(a, b)` will run through DTensor if `a`, `b`, or both are DTensors.\n", | ||
" - Requesting the result of a Python function to be a DTensor, e.g. `dtensor.call_with_layout(tf.ones, layout, shape=(3, 2))` will run through DTensor because we requested the output of tf.ones to be sharded according to a `layout`." | ||
"\n", | ||
" - DTensor as operands of a Python function, such as `tf.matmul(a, b)`, will run through DTensor if `a`, `b`, or both are DTensors.\n", | ||
" - Requesting the result of a Python function to be a DTensor, such as `dtensor.call_with_layout(tf.ones, layout, shape=(3, 2))`, will run through DTensor because we requested the output of `tf.ones` to be sharded according to a `layout`." | ||
] | ||
}, | ||
{ | ||
|
@@ -714,7 +718,7 @@ | |
"id": "urKzmqAoPssT" | ||
}, | ||
"source": [ | ||
"### DTensor as Operands\n", | ||
"### DTensor as operands\n", | ||
"\n", | ||
"Many TensorFlow API functions take `tf.Tensor` as their operands, and returns `tf.Tensor` as their results. For these functions, you can express intention to run a function through DTensor by passing in DTensor as operands. This section uses `tf.matmul(a, b)` as an example." | ||
] | ||
|
@@ -755,7 +759,7 @@ | |
"print('Sharding spec:', dtensor.fetch_layout(c).sharding_specs)\n", | ||
"print(\"components:\")\n", | ||
"for component_tensor in dtensor.unpack(c):\n", | ||
" print(component_tensor.device, component_tensor.numpy())\n" | ||
" print(component_tensor.device, component_tensor.numpy())" | ||
] | ||
}, | ||
{ | ||
|
@@ -800,11 +804,10 @@ | |
"id": "IhD8yYgJiCEh" | ||
}, | ||
"source": [ | ||
"#### Additional Sharding\n", | ||
"#### Additional sharding\n", | ||
"\n", | ||
"You can perform additional sharding on the inputs, and they are appropriately carried over to the results. For example, you can apply additional sharding of operand `a` along its first axis to the `'y'` mesh dimension. The additional sharding will be carried over to the first axis of the result `c`.\n", | ||
"\n", | ||
"\n", | ||
"Total number of floating point mul operations is `6 devices * 2 result * 1 = 12`, an additional factor of 2 reduction compared to the case (24) above. The factor of 2 is due to the sharding along `y` mesh dimension with a size of `2` devices." | ||
] | ||
}, | ||
|
@@ -837,11 +840,11 @@ | |
"id": "c-1NazCVmLWZ" | ||
}, | ||
"source": [ | ||
"### DTensor as Output\n", | ||
"### DTensor as output\n", | ||
"\n", | ||
"What about Python functions that do not take operands, but returns a Tensor result that can be sharded? Examples of such functions are\n", | ||
"What about Python functions that do not take operands, but returns a Tensor result that can be sharded? Examples of such functions are:\n", | ||
"\n", | ||
" - `tf.ones`, `tf.zeros`, `tf.random.stateless_normal`,\n", | ||
" - `tf.ones`, `tf.zeros`, `tf.random.stateless_normal`\n", | ||
"\n", | ||
"For these Python functions, DTensor provides `dtensor.call_with_layout` which eagerly executes a Python function with DTensor, and ensures that the returned Tensor is a DTensor with the requested `Layout`." | ||
] | ||
|
@@ -876,7 +879,7 @@ | |
"source": [ | ||
"#### APIs that emit a single TensorFlow Op\n", | ||
"\n", | ||
"If a function emits a single TensorFlow Op, you can directly apply `dtensor.call_with_layout` to the function." | ||
"If a function emits a single TensorFlow Op, you can directly apply `dtensor.call_with_layout` to the function:" | ||
] | ||
}, | ||
{ | ||
|
@@ -911,7 +914,7 @@ | |
"source": [ | ||
"#### APIs that emit multiple TensorFlow Ops\n", | ||
"\n", | ||
"If the API emits multiple TensorFlow Ops, convert the function into a single Op through `tf.function`. For example `tf.random.stateleess_normal`" | ||
"If the API emits multiple TensorFlow Ops, convert the function into a single Op through `tf.function`. For example, `tf.random.stateleess_normal`:" | ||
] | ||
}, | ||
{ | ||
|
@@ -1030,7 +1033,7 @@ | |
"id": "QxBdNHWSu-kV" | ||
}, | ||
"source": [ | ||
"You can also assign a DTensor to a DVariable.\n" | ||
"You can also assign a DTensor to a DVariable:\n" | ||
] | ||
}, | ||
{ | ||
|
@@ -1051,7 +1054,7 @@ | |
"id": "4fvSk_VUvGnj" | ||
}, | ||
"source": [ | ||
"Attempting to mutate the layout of a `DVariable`, by assigning a DTensor with an incompatible layout produces an error." | ||
"Attempting to mutate the layout of a `DVariable`, by assigning a DTensor with an incompatible layout produces an error:" | ||
] | ||
}, | ||
{ | ||
|
@@ -1081,7 +1084,7 @@ | |
"source": [ | ||
"## What's next?\n", | ||
"\n", | ||
"In this colab, you learned about DTensor, an extension to TensorFlow for distributed computing. To try out these concepts in a tutorial, see [Distributed training with DTensor](https://www.tensorflow.org/tutorials/distribute/dtensor_ml_tutorial)." | ||
"In this colab, you learned about DTensor, an extension to TensorFlow for distributed computing. To try out these concepts in a tutorial, check out [Distributed training with DTensor](../tutorials/distribute/dtensor_ml_tutorial.ipynb)." | ||
] | ||
} | ||
], | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MarkDaoust introduced a relative link