Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update DTensor docs, lint notebooks #2276

Merged
merged 3 commits into from
Sep 29, 2023
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
61 changes: 32 additions & 29 deletions site/en/guide/dtensor_overview.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@
"id": "VcQIa1uG86Wh"
},
"source": [
"# DTensor Concepts"
"# DTensor concepts"
]
},
{
Expand Down Expand Up @@ -76,7 +76,7 @@
"\n",
"By decoupling the application from sharding directives, DTensor enables running the same application on a single device, multiple devices, or even multiple clients, while preserving its global semantics.\n",
"\n",
"This guide introduces DTensor concepts for distributed computing, and how DTensor integrates with TensorFlow. To see a demo of using DTensor in model training, see [Distributed training with DTensor](https://www.tensorflow.org/tutorials/distribute/dtensor_ml_tutorial) tutorial."
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MarkDaoust introduced a relative link

"This guide introduces DTensor concepts for distributed computing, and how DTensor integrates with TensorFlow. For a demo of using DTensor in model training, refer to the [Distributed training with DTensor](../tutorials/distribute/dtensor_ml_tutorial.ipynb) tutorial."
]
},
{
Expand All @@ -87,7 +87,9 @@
"source": [
"## Setup\n",
"\n",
"DTensor is part of TensorFlow 2.9.0 release, and also included in the TensorFlow nightly builds since 04/09/2022."
"DTensor (`tf.experimental.dtensor`) has been part of TensorFlow since the 2.9.0 release.\n",
"\n",
"First, install or upgrade TensorFlow:"
]
},
{
Expand All @@ -98,7 +100,7 @@
},
"outputs": [],
"source": [
"!pip install --quiet --upgrade --pre tensorflow"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MarkDaoust removed the "--pre"

"!pip install --quiet --upgrade tensorflow"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We usually don't bother with installing tensorflow. It was only here because the pre-installed version was insufficient.

]
},
{
Expand All @@ -107,9 +109,9 @@
"id": "O3pG29uZIWYO"
},
"source": [
"Once installed, import `tensorflow` and `tf.experimental.dtensor`. Then configure TensorFlow to use 6 virtual CPUs.\n",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MarkDaoust full API with tf.experimental is mentioned in the beginning of the section, so shortened this.

Later, "vCPU" -> "virtual CPU"

"Then, import `tensorflow` and `dtensor`, and configure TensorFlow to use 6 virtual CPUs.\n",
"\n",
"Even though this example uses vCPUs, DTensor works the same way on CPU, GPU or TPU devices."
"Even though this example uses virtual CPUs, DTensor works the same way on CPU, GPU or TPU devices."
]
},
{
Expand Down Expand Up @@ -343,7 +345,7 @@
"id": "TTalu6M-ISYb"
},
"source": [
"### Single-Client and Multi-Client Applications\n",
"### Single-client and multi-client applications\n",
"\n",
"DTensor supports both single-client and multi-client applications. The colab Python kernel is an example of a single client DTensor application, where there is a single Python process.\n",
"\n",
Expand All @@ -365,7 +367,8 @@
"source": [
"## DTensor as a sharded tensor\n",
"\n",
"Now let's start coding with `DTensor`. The helper function, `dtensor_from_array`, demonstrates creating DTensors from something that looks like a `tf.Tensor`. The function performs 2 steps:\n",
"Now, start coding with `DTensor`. The helper function, `dtensor_from_array`, demonstrates creating DTensors from something that looks like a `tf.Tensor`. The function performs two steps:\n",
"\n",
" - Replicates the tensor to every device on the mesh.\n",
" - Shards the copy according to the layout requested in its arguments."
]
Expand Down Expand Up @@ -410,7 +413,7 @@
" - A `Layout`, which defines the `Mesh` the `Tensor` belongs to, and how the `Tensor` is sharded onto the `Mesh`.\n",
" - A list of **component tensors**, one item per local device in the `Mesh`.\n",
"\n",
"With `dtensor_from_array`, you can create your first DTensor, `my_first_dtensor`, and examine its contents."
"With `dtensor_from_array`, you can create your first DTensor, `my_first_dtensor`, and examine its contents:"
]
},
{
Expand All @@ -426,7 +429,7 @@
"\n",
"my_first_dtensor = dtensor_from_array([0, 1], layout)\n",
"\n",
"# Examine the dtensor content\n",
"# Examine the DTensor content\n",
"print(my_first_dtensor)\n",
"print(\"global shape:\", my_first_dtensor.shape)\n",
"print(\"dtype:\", my_first_dtensor.dtype)"
Expand All @@ -440,7 +443,7 @@
"source": [
"#### Layout and `fetch_layout`\n",
"\n",
"The layout of a DTensor is not a regular attribute of `tf.Tensor`. Instead, DTensor provides a function, `dtensor.fetch_layout` to access the layout of a DTensor."
"The layout of a DTensor is not a regular attribute of `tf.Tensor`. Instead, DTensor provides a function, `dtensor.fetch_layout` to access the layout of a DTensor:"
]
},
{
Expand Down Expand Up @@ -499,7 +502,7 @@
"source": [
"The inverse operation of `dtensor.unpack` is `dtensor.pack`. Component tensors can be packed back into a DTensor.\n",
"\n",
"The components must have the same rank and dtype, which will be the rank and dtype of the returned DTensor. However there is no strict requirement on the device placement of component tensors as inputs of `dtensor.unpack`: the function will automatically copy the component tensors to their respective corresponding devices.\n"
"The components must have the same rank and dtype, which will be the rank and dtype of the returned DTensor. However, there is no strict requirement on the device placement of component tensors as inputs of `dtensor.unpack`: the function will automatically copy the component tensors to their respective corresponding devices.\n"
]
},
{
Expand Down Expand Up @@ -528,7 +531,7 @@
"\n",
"So far you've worked with the `my_first_dtensor`, which is a rank-1 DTensor fully replicated across a dim-1 `Mesh`.\n",
"\n",
"Next create and inspect DTensors that are sharded across a dim-2 `Mesh`. The next example does this with a 3x2 `Mesh` on 6 CPU devices, where size of mesh dimension `'x'` is 3 devices, and size of mesh dimension`'y'` is 2 devices."
"Next, create and inspect DTensors that are sharded across a dim-2 `Mesh`. The following example does this with a 3x2 `Mesh` on 6 CPU devices, where size of mesh dimension `'x'` is 3 devices, and size of mesh dimension`'y'` is 2 devices:"
]
},
{
Expand Down Expand Up @@ -620,7 +623,7 @@
" - 1st axis sharded along the `'x'` mesh dimension.\n",
" - 2nd axis replicated along the `'y'` mesh dimension.\n",
"\n",
"To achieve this sharding scheme, you just need to replace the sharding spec of the 2nd axis from `'y'` to `dtensor.UNSHARDED`, to indicate your intention of replicating along the 2nd axis. The layout object will look like `Layout(['x', dtensor.UNSHARDED], mesh)`."
"To achieve this sharding scheme, you just need to replace the sharding spec of the 2nd axis from `'y'` to `dtensor.UNSHARDED`, to indicate your intention of replicating along the 2nd axis. The layout object will look like `Layout(['x', dtensor.UNSHARDED], mesh)`:"
]
},
{
Expand Down Expand Up @@ -659,7 +662,7 @@
"source": [
"#### Tensor.numpy() and sharded DTensor\n",
"\n",
"Be aware that calling the `.numpy()` method on a sharded DTensor raises an error. The rationale for erroring is to protect against unintended gathering of data from multiple computing devices to the host CPU device backing the returned numpy array."
"Be aware that calling the `.numpy()` method on a sharded DTensor raises an error. The rationale for erroring is to protect against unintended gathering of data from multiple computing devices to the host CPU device backing the returned NumPy array:"
]
},
{
Expand Down Expand Up @@ -704,8 +707,9 @@
"Note: DTensor is still an experimental API which means you will be exploring and pushing the boundaries and limits of the DTensor programming model.\n",
"\n",
"There are 2 ways of triggering DTensor execution:\n",
" - DTensor as operands of a Python function, e.g. `tf.matmul(a, b)` will run through DTensor if `a`, `b`, or both are DTensors.\n",
" - Requesting the result of a Python function to be a DTensor, e.g. `dtensor.call_with_layout(tf.ones, layout, shape=(3, 2))` will run through DTensor because we requested the output of tf.ones to be sharded according to a `layout`."
"\n",
" - DTensor as operands of a Python function, such as `tf.matmul(a, b)`, will run through DTensor if `a`, `b`, or both are DTensors.\n",
" - Requesting the result of a Python function to be a DTensor, such as `dtensor.call_with_layout(tf.ones, layout, shape=(3, 2))`, will run through DTensor because we requested the output of `tf.ones` to be sharded according to a `layout`."
]
},
{
Expand All @@ -714,7 +718,7 @@
"id": "urKzmqAoPssT"
},
"source": [
"### DTensor as Operands\n",
"### DTensor as operands\n",
"\n",
"Many TensorFlow API functions take `tf.Tensor` as their operands, and returns `tf.Tensor` as their results. For these functions, you can express intention to run a function through DTensor by passing in DTensor as operands. This section uses `tf.matmul(a, b)` as an example."
]
Expand Down Expand Up @@ -755,7 +759,7 @@
"print('Sharding spec:', dtensor.fetch_layout(c).sharding_specs)\n",
"print(\"components:\")\n",
"for component_tensor in dtensor.unpack(c):\n",
" print(component_tensor.device, component_tensor.numpy())\n"
" print(component_tensor.device, component_tensor.numpy())"
]
},
{
Expand Down Expand Up @@ -800,11 +804,10 @@
"id": "IhD8yYgJiCEh"
},
"source": [
"#### Additional Sharding\n",
"#### Additional sharding\n",
"\n",
"You can perform additional sharding on the inputs, and they are appropriately carried over to the results. For example, you can apply additional sharding of operand `a` along its first axis to the `'y'` mesh dimension. The additional sharding will be carried over to the first axis of the result `c`.\n",
"\n",
"\n",
"Total number of floating point mul operations is `6 devices * 2 result * 1 = 12`, an additional factor of 2 reduction compared to the case (24) above. The factor of 2 is due to the sharding along `y` mesh dimension with a size of `2` devices."
]
},
Expand Down Expand Up @@ -837,11 +840,11 @@
"id": "c-1NazCVmLWZ"
},
"source": [
"### DTensor as Output\n",
"### DTensor as output\n",
"\n",
"What about Python functions that do not take operands, but returns a Tensor result that can be sharded? Examples of such functions are\n",
"What about Python functions that do not take operands, but returns a Tensor result that can be sharded? Examples of such functions are:\n",
"\n",
" - `tf.ones`, `tf.zeros`, `tf.random.stateless_normal`,\n",
" - `tf.ones`, `tf.zeros`, `tf.random.stateless_normal`\n",
"\n",
"For these Python functions, DTensor provides `dtensor.call_with_layout` which eagerly executes a Python function with DTensor, and ensures that the returned Tensor is a DTensor with the requested `Layout`."
]
Expand Down Expand Up @@ -876,7 +879,7 @@
"source": [
"#### APIs that emit a single TensorFlow Op\n",
"\n",
"If a function emits a single TensorFlow Op, you can directly apply `dtensor.call_with_layout` to the function."
"If a function emits a single TensorFlow Op, you can directly apply `dtensor.call_with_layout` to the function:"
]
},
{
Expand Down Expand Up @@ -911,7 +914,7 @@
"source": [
"#### APIs that emit multiple TensorFlow Ops\n",
"\n",
"If the API emits multiple TensorFlow Ops, convert the function into a single Op through `tf.function`. For example `tf.random.stateleess_normal`"
"If the API emits multiple TensorFlow Ops, convert the function into a single Op through `tf.function`. For example, `tf.random.stateleess_normal`:"
]
},
{
Expand Down Expand Up @@ -1030,7 +1033,7 @@
"id": "QxBdNHWSu-kV"
},
"source": [
"You can also assign a DTensor to a DVariable.\n"
"You can also assign a DTensor to a DVariable:\n"
]
},
{
Expand All @@ -1051,7 +1054,7 @@
"id": "4fvSk_VUvGnj"
},
"source": [
"Attempting to mutate the layout of a `DVariable`, by assigning a DTensor with an incompatible layout produces an error."
"Attempting to mutate the layout of a `DVariable`, by assigning a DTensor with an incompatible layout produces an error:"
]
},
{
Expand Down Expand Up @@ -1081,7 +1084,7 @@
"source": [
"## What's next?\n",
"\n",
"In this colab, you learned about DTensor, an extension to TensorFlow for distributed computing. To try out these concepts in a tutorial, see [Distributed training with DTensor](https://www.tensorflow.org/tutorials/distribute/dtensor_ml_tutorial)."
"In this colab, you learned about DTensor, an extension to TensorFlow for distributed computing. To try out these concepts in a tutorial, check out [Distributed training with DTensor](../tutorials/distribute/dtensor_ml_tutorial.ipynb)."
]
}
],
Expand Down
Loading
Loading