From 1ee41125aa143cf0a674387dca1c94b2c522602a Mon Sep 17 00:00:00 2001
From: The MediaPipe Authors <no-reply@google.com>
Date: Mon, 28 Aug 2023 14:37:07 -0700
Subject: [PATCH] Add a demo colab for customizing MediaPipe face stylizer.

PiperOrigin-RevId: 560821987
---
 examples/customization/face_stylizer.ipynb | 348 +++++++++++++++++++++
 1 file changed, 348 insertions(+)
 create mode 100644 examples/customization/face_stylizer.ipynb

diff --git a/examples/customization/face_stylizer.ipynb b/examples/customization/face_stylizer.ipynb
new file mode 100644
index 00000000..6f4cee6b
--- /dev/null
+++ b/examples/customization/face_stylizer.ipynb
@@ -0,0 +1,348 @@
+{
+  "cells": [
+    {
+      "metadata": {},
+      "cell_type": "code",
+      "source": [
+        "#@title License information\n",
+        "# Copyright 2023 The MediaPipe Authors.\n",
+        "# Licensed under the Apache License, Version 2.0 (the \"License\");\n",
+        "#\n",
+        "# you may not use this file except in compliance with the License.\n",
+        "# You may obtain a copy of the License at\n",
+        "#\n",
+        "# https://www.apache.org/licenses/LICENSE-2.0\n",
+        "#\n",
+        "# Unless required by applicable law or agreed to in writing, software\n",
+        "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
+        "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
+        "# See the License for the specific language governing permissions and\n",
+        "# limitations under the License."
+      ],
+      "outputs": [],
+      "execution_count": null
+    },
+    {
+      "metadata": {},
+      "cell_type": "markdown",
+      "source": [
+        "The MediaPipe Model Maker package is a low-code solution for customizing on-device machine learning (ML) Models.\n",
+        "\n",
+        "This notebook shows the end-to-end process of customizing a face stylizer model for transferring the style presented by a stylized face to a raw human face."
+      ]
+    },
+    {
+      "metadata": {},
+      "cell_type": "markdown",
+      "source": [
+        "# Face stylizer model customization guide\n",
+        "\n",
+        "The MediaPipe face stylizer solution provides several models you can use immediately to transform the face to the styles including (cartoon, oil painting, etc.) in your application. However, if you need to transfer the face to an unseen style not covered by the provided models, you can customize the pretrained model with your own data and MediaPipe [Model Maker](https://developers.google.com/mediapipe/solutions/model_maker). This model modification tool fine-tune a portion of the model using data you provide. This method is faster than training a new model from scatch and can produce a model adapt to your specific application.\n",
+        "\n",
+        "The following sections show you how to use Model Maker to retrain a pre-built model for face stylization with your own data, which you can then use with the MediaPipe [Face Stylizer](https://developers.google.com/mediapipe/solutions/vision/face_stylizer). The example retrain the face stylizer model to adapt to the provided cartoon style."
+      ]
+    },
+    {
+      "metadata": {},
+      "cell_type": "markdown",
+      "source": [
+        "## Setup\n",
+        "\n",
+        "This section describes key steps for setting up your development environment to retrain a model. These instructions describe how to update a model using [Google Colab](https://colab.research.google.com/), and you can also use Python in your own development environment. For general information on setting up your development environment for using MediaPipe, including platform version requirements, see the [Setup guide for Python](https://developers.google.com/mediapipe/solutions/setup_python).\n",
+        "\n",
+        "To install the libraries for customizing a model, run the following commands:"
+      ]
+    },
+    {
+      "metadata": {},
+      "cell_type": "code",
+      "source": [
+        "!pip install mediapipe-model-maker"
+      ],
+      "outputs": [],
+      "execution_count": null
+    },
+    {
+      "metadata": {},
+      "cell_type": "markdown",
+      "source": [
+        "Use the following code to import the required Python classes:"
+      ]
+    },
+    {
+      "metadata": {},
+      "cell_type": "code",
+      "source": [
+        "from google.colab import files\n",
+        "import os\n",
+        "import tensorflow as tf\n",
+        "assert tf.__version__.startswith('2')\n",
+        "\n",
+        "from mediapipe_model_maker import face_stylizer\n",
+        "from mediapipe_model_maker import image_utils\n",
+        "\n",
+        "import matplotlib.pyplot as plt"
+      ],
+      "outputs": [],
+      "execution_count": null
+    },
+    {
+      "metadata": {},
+      "cell_type": "markdown",
+      "source": [
+        "## Prepare and review data\n",
+        "\n",
+        "Retraining the face stylizer model requires user to provide a single stylized\n",
+        "face image. The stylized face is expected to be forward facing with visible left\n",
+        "right eyes and mouth. The face should only have minor rotation, i.e. less than\n",
+        "30 degress around the yaw, pitch, and roll axes.\n",
+        "\n",
+        "The following example shows how to provide a stylized face image and visualize\n",
+        "the image."
+      ]
+    },
+    {
+      "metadata": {},
+      "cell_type": "code",
+      "source": [
+        "style_image_path = 'color_sketch.jpg'\n",
+        "!wget -q -O {style_image_path} https://storage.googleapis.com/mediapipe-assets/face_stylizer_style_color_sketch.jpg"
+      ],
+      "outputs": [],
+      "execution_count": null
+    },
+    {
+      "metadata": {},
+      "cell_type": "code",
+      "source": [
+        "# @title Visualize the training image\n",
+        "import cv2\n",
+        "from google.colab.patches import cv2_imshow\n",
+        "\n",
+        "style_image_tensor = image_utils.load_image(style_image_path)\n",
+        "cv_image = cv2.cvtColor(style_image_tensor.numpy(), cv2.COLOR_RGB2BGR)\n",
+        "cv2_imshow(cv_image)"
+      ],
+      "outputs": [],
+      "execution_count": null
+    },
+    {
+      "metadata": {},
+      "cell_type": "markdown",
+      "source": [
+        "### Create dataset\n",
+        "We need to create a dataset to wrap the single training image for training the face stylizer model\n",
+        "\n",
+        "To create a dataset, use the Dataset.from_image method to load the image located at style_image_path. The face stylizer only support single style image."
+      ]
+    },
+    {
+      "metadata": {},
+      "cell_type": "code",
+      "source": [
+        "data = face_stylizer.Dataset.from_image(filename=style_image_path)"
+      ],
+      "outputs": [],
+      "execution_count": null
+    },
+    {
+      "metadata": {},
+      "cell_type": "markdown",
+      "source": [
+        "## Retrain model\n",
+        "\n",
+        "Once you have completed preparing your data, you can begin retraining the face\n",
+        "stylizer model to adapt to the new style. This type of model modification is\n",
+        "called [transfer learning](https://www.wikipedia.org/wiki/Transfer_learning). \n",
+        "The instructions below use the data prepared in the previous section to retrain\n",
+        "a face stylizer model to apply cartoon style to the raw human face.\n",
+        "\n",
+        "**Note:** For this type of model, the retraining process causes the model to\n",
+        "forget any style it can apply before. Once the retraining is complete, the new\n",
+        "model can *only* apply the new style defined by the new stylized image."
+      ]
+    },
+    {
+      "metadata": {},
+      "cell_type": "markdown",
+      "source": [
+        "### Set retraining options\n",
+        "\n",
+        "There are a few required settings to run a retraining aside from your training dataset: \n",
+        "\n",
+        "1. Model architecture: use the `SupportedModels` class to specify the model architecture. The only supported architecture is `face_stylizer.SupportedModels.BLAZE_FACE_STYLIZER_256`.\n",
+        "2. Swap layers: this parameter is used to determine how to mix the latent code layers between the learned style and the raw face images. The latent code is represented as a tensor of shape [1, 12, 512]. The second dimension of the latent code tensor is called the layer. The face stylizer mixes the learned style and raw face images by generating a weighted sum of the two latent codes on the swap layers. The swap layers are therefore integers within [1, 12]. The more layers are set, the more style will be applied to the output image. Although there is no explicit mapping between the style semantics and the layer index, the shallow layers, e.g. 8, 9, represent the global features of the face, while the deep layers, e.g. 10, 11, represent the fine-grained features of the face. The output stylized image is sensitive to the setting of swap layers. By default, it is set to [8, 9, 10, 11]. Users can set it through `ModelOptions`. \n",
+        "3. Learning rate and epochs: Use `HParams` object `learning_rate` and `epoch` to specify the these two hyperparameters. `learning_rate` is set to 4e-4 by default. `epochs` defines the number of iterations to fine-tune the BlazeStyleGAN model and are set to 100 by default. The lower the learning rate is, the greater the epochs is expected to retrain the model to converge.\n",
+        "4. Batch size: Use `HParams` object `batch_size` to specify it. The batch size is used to define the number of latent code samples we sample around the latent code extracted by the encoder with the input image. The batch of latent codes are used to fine-tune the decoder. The greater the batch size usually yield to better performance. It is also limited by the hardware memory. For A100 GPU, the maximum batch size is 8. For P100 and T4 GPU, the maximum batch size is 2.\n",
+        " \n",
+        "To set the required parameters, use the following code:"
+      ]
+    },
+    {
+      "metadata": {},
+      "cell_type": "code",
+      "source": [
+        "face_stylizer_options = face_stylizer.FaceStylizerOptions(\n",
+        "  model=face_stylizer.SupportedModels.BLAZE_FACE_STYLIZER_256,\n",
+        "  model_options=face_stylizer.ModelOptions(swap_layers=[10,11]),\n",
+        "  hparams=face_stylizer.HParams(\n",
+        "      learning_rate=8e-4, epochs=200, batch_size=2, export_dir=\"exported_model\"\n",
+        "  )\n",
+        ")"
+      ],
+      "outputs": [],
+      "execution_count": null
+    },
+    {
+      "metadata": {},
+      "cell_type": "markdown",
+      "source": [
+        "### Run retraining\n",
+        "\n",
+        "With your training dataset and retraining options prepared, you are ready to start the retraining process. This process requires running on GPU and can take a few minutes to a few hours depending on your available compute resources. Using a Google Colab environment with  GPU runtime, the example retraining below takes about 2 minutes. \n",
+        "\n",
+        "To begin the retraining process, use the `create()` method with dataset and options you previously defined:"
+      ]
+    },
+    {
+      "metadata": {},
+      "cell_type": "code",
+      "source": [
+        "face_stylizer_model = face_stylizer.FaceStylizer.create(\n",
+        "  train_data=data, options=face_stylizer_options\n",
+        ")"
+      ],
+      "outputs": [],
+      "execution_count": null
+    },
+    {
+      "metadata": {},
+      "cell_type": "markdown",
+      "source": [
+        "### Evaluate performance\n",
+        "\n",
+        "After retraining the model, you can do a subjective evaluation on a test image.\n",
+        "\n",
+        "Load the visualize the test image as below:"
+      ]
+    },
+    {
+      "metadata": {},
+      "cell_type": "code",
+      "source": [
+        "test_image_path = 'test.png'\n",
+        "!wget -q -O {test_image_path} https://storage.googleapis.com/mediapipe-assets/face_stylizer_test_image.png\n",
+        "\n",
+        "\n",
+        "test_image = image_utils.load_image(test_image_path)\n",
+        "test_cv_image = cv2.cvtColor(test_image.numpy(), cv2.COLOR_RGB2BGR)\n",
+        "cv2_imshow(test_cv_image)"
+      ],
+      "outputs": [],
+      "execution_count": null
+    },
+    {
+      "metadata": {},
+      "cell_type": "markdown",
+      "source": [
+        "To run an evaluation of the example model, run it against the test image as below:"
+      ]
+    },
+    {
+      "metadata": {},
+      "cell_type": "code",
+      "source": [
+        "eval_dataset = face_stylizer.Dataset.from_image(filename=test_image_path)\n",
+        "eval_output = face_stylizer_model.stylize(eval_dataset)\n",
+        "eval_output_data = eval_output.gen_tf_dataset()\n",
+        "iterator = iter(eval_output_data)\n",
+        "\n",
+        "output_image = (tf.squeeze(iterator.get_next()).numpy())\n",
+        "test_output_image = cv2.cvtColor(output_image, cv2.COLOR_RGB2BGR)\n",
+        "cv2_imshow(test_output_image)"
+      ],
+      "outputs": [],
+      "execution_count": null
+    },
+    {
+      "metadata": {},
+      "cell_type": "markdown",
+      "source": [
+        "**Caution:** While high accuracy of a model is a common goal for machine learning models, you should be cautious of training to a point of [overfitting](https://en.wikipedia.org/wiki/Overfitting), which causes the model to perform extremely well with its training data, but quite poorly on new data."
+      ]
+    },
+    {
+      "metadata": {},
+      "cell_type": "markdown",
+      "source": [
+        "## Export model\n",
+        "\n",
+        "After retraining a model, you must export it to Tensorflow Lite model format to use it with the MediaPipe in your application. The export process generates required model metadata, as well as a classification label file.\n",
+        "\n",
+        "To export the retrained model for use in your application, use the following command:"
+      ]
+    },
+    {
+      "metadata": {},
+      "cell_type": "code",
+      "source": [
+        "face_stylizer_model.export_model()"
+      ],
+      "outputs": [],
+      "execution_count": null
+    },
+    {
+      "metadata": {},
+      "cell_type": "markdown",
+      "source": [
+        "Use the follow commands with Google Colab to list model and download it to your development environment. The `face_stylizer.task` file is a bundle of three TFLite models which are required to run the face stylizer task library."
+      ]
+    },
+    {
+      "metadata": {},
+      "cell_type": "code",
+      "source": [
+        "!ls exported_model\n",
+        "files.download('exported_model/face_stylizer.task')"
+      ],
+      "outputs": [],
+      "execution_count": null
+    },
+    {
+      "metadata": {},
+      "cell_type": "markdown",
+      "source": [
+        "## Model tuning\n",
+        "\n",
+        "You can use the MediaPipe Model Maker tool to further improve and adjust the  model retraining with configuation options and performance techniques such as data quantization. *These steps are optional.* Model Maker uses reasonable default settings for all of the training configuration parameters, but if you want to further tune the model retraining, the instructions below describe the available options."
+      ]
+    },
+    {
+      "metadata": {},
+      "cell_type": "markdown",
+      "source": [
+        "### Retraining parameters\n",
+        "\n",
+        "You can further customize how the retraining process runs to adjust training time and potentially increase the retrained model's performance. *These parameters are optional*. Use the `FaceStylizerOptions` class and the `HParams` class to set these additional options.\n",
+        "\n",
+        "Use the `FaceStylizerModelOptions` class parameters to customize the existing model. It has the following customizable parameter that affects model accuracy:\n",
+        "* `swap_layers`: A set of layer index between 0 and the maximum layer index of the latent code layers. Since the latent code dimensiono of the default BLAZE_FACE_STYLIZER_256 is 12, the `swap_layers` must be between 0 and 11. This hyperparameters defines the layers that the style latent code is applied to be interpolated with the raw face feature. The more layers choose, the more style features will be applied. There is no explicit mapping between the layers index and semantic features. The users are expected to fine-tune these swap layers for subjective evaluation to determine the best options. The order does not matter.\n",
+        "* `alpha`: Weighting coefficient of style latent for swapping layer interpolation. Its valid range is [0, 1]. The greater weight means stronger style is applied to the output image. Expect to set it to a small value, i.e. \u003c 0.1.\n",
+        "* `perception_loss_weight`: Weighting coefficients of image perception quality loss. It contains three coefficients, `l1`, `content`, and `style` which control the difference between the generated image and raw input image, the content difference between generated face and raw input face, and the how similar the style between the generated image and raw input image. Users can increase the `style` weight to enforce stronger style or the `content` weight to reserve more raw input face details.\n",
+        "* `adv_loss_weight`: Weighting coeffcieint of adversarial loss versus image perceptual quality loss. This hyperparameter is used to control the realism of the generated image. It expects a small value, i.e. \u003c 0.2.\n",
+        "\n",
+        "Use the `HParams` class to customize other parameters related to training and saving the model:\n",
+        "\n",
+        "* `learning_rate`: The learning rate to use for gradient descent training. Defaults to 8e-4.\n",
+        "* `batch_size`: Batch size for training. Defaults to 4.\n",
+        "* `epochs`: Number of training iterations over the dataset. Defaults to 100.\n",
+        "* `beta_1`: beta_1 used in tf.keras.optimizers.Adam. Defaults to 0.0.\n",
+        "* `beta_2`: beta_2 used in tf.keras.optimizers.Adam. Defaults to 0.99.\n",
+        "\n"
+      ]
+    }
+  ],
+  "metadata": {},
+  "nbformat": 4,
+  "nbformat_minor": 0
+}