Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
irasin authored Jan 9, 2020
1 parent 03ae63f commit 6f2ad7f
Showing 1 changed file with 28 additions and 25 deletions.
53 changes: 28 additions & 25 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,36 @@ This repository provides pre-trained models for you to generate your own image

If you have any question, please feel free to contact me. (Language in English/Japanese/Chinese will be ok!)



## Abstract
Neural style transfer refers to a class of algorithms that render an image with characteristics of the style image while maintaining the arrangement of the content image.
Inspired by the power of convolutional neural networks(CNNs) for image recognition, Gatys et al.(2016) first studied how to use VGGNet to extract the content representation and the style representation and proposed an image-optimization-based method. Since then, neural style transfer has received increasing attention from both computer vision researchers and industries.

Recently, significant effort has been devoted to ASPM(Arbitrary Style Per Model), which aims at transferring an arbitrary style from only one single model, especially the gram-matrix-matching-based methods. However, most of them have an assumption that image styles should be represented as a unimodal distribution so that style transfer can be formulated as the matching of the global statistic of gram matrix or covariance matrix. Zhang et al.(2019) first introduced a multimodal style transfer method called MST which uses K-means to split style patterns from style image and combinate them with content image via graph-cut. However, due to the high-dimensional and low-resolution characteristics of the feature space and the characteristics of the graph cut method, there are problems such as the inability to consider the structural information of the content image, and this may produce undesirable results.

In this work, we analyze the shortcomings of MST and propose a structure-emphasized multimodal style transfer(SEMST). Specifically, instead of using VGGNet's high-dimensional feature space, we extract structural information from CIELAB color space via a K-means-based clustering algorithm which automatically determines the optimal number of clusters based on the area of clusters. Next, we design an approach that can flexibly match the content cluster and the style cluster even if the number of clusters differs, by using automatic matching based on the cluster center norm or manual matching defined according to user 's demand and preferences. Finally, WCT is performed between the corresponding clusters, and the target image is generated by the decoder.

In experiments, it was confirmed that the proposed method can generate more beautiful and natural images by explicit structure extraction and matching. Since the proposed method can render images in less than one second and can change various hyperparameters, it is considered to be a fairly fast and highly flexible method.

---
---
If you find this work useful for you, please cite it as follow in your paper. Thanks a lot.

```
@misc{Chen2020,
author = {Chen Chen},
title = {Structure-emphasized Multimodal Style Transfer},
year = {2020},
month = 2,
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/irasin/Structure-emphasized-Multimodal-Style-Transfer}},
}
```
---


## Requirements

- Python 3.7+
Expand Down Expand Up @@ -97,28 +125,3 @@ Some results of content image will be shown here.
![image](https://github.com/irasin/Structure-emphasized-Multimodal-Style-Transfer/blob/master/result/5_6.png)
![image](https://github.com/irasin/Structure-emphasized-Multimodal-Style-Transfer/blob/master/result/5_7.png)
![image](https://github.com/irasin/Structure-emphasized-Multimodal-Style-Transfer/blob/master/result/5_8.png)

---
If you find this work useful for you, please cite it as follow in your paper. Thanks a lot.

```
@misc{Chen2020,
author = {Chen Chen},
title = {Structure-emphasized Multimodal Style Transfer},
year = {2020},
month = 2,
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/irasin/Structure-emphasized-Multimodal-Style-Transfer}},
}
```
---
## Abstract
Neural style transfer refers to a class of algorithms that render an image with characteristics of the style image while maintaining the arrangement of the content image.
Inspired by the power of convolutional neural networks(CNNs) for image recognition, Gatys et al.(2016) first studied how to use VGGNet to extract the content representation and the style representation and proposed an image-optimization-based method. Since then, neural style transfer has received increasing attention from both computer vision researchers and industries.

Recently, significant effort has been devoted to ASPM(Arbitrary Style Per Model), which aims at transferring an arbitrary style from only one single model, especially the gram-matrix-matching-based methods. However, most of them have an assumption that image styles should be represented as a unimodal distribution so that style transfer can be formulated as the matching of the global statistic of gram matrix or covariance matrix. Zhang et al.(2019) first introduced a multimodal style transfer method called MST which uses K-means to split style patterns from style image and combinate them with content image via graph-cut. However, due to the high-dimensional and low-resolution characteristics of the feature space and the characteristics of the graph cut method, there are problems such as the inability to consider the structural information of the content image, and this may produce undesirable results.

In this work, we analyze the shortcomings of MST and propose a structure-emphasized multimodal style transfer(SEMST). Specifically, instead of using VGGNet's high-dimensional feature space, we extract structural information from CIELAB color space via a K-means-based clustering algorithm which automatically determines the optimal number of clusters based on the area of clusters. Next, we design an approach that can flexibly match the content cluster and the style cluster even if the number of clusters differs, by using automatic matching based on the cluster center norm or manual matching defined according to user 's demand and preferences. Finally, WCT is performed between the corresponding clusters, and the target image is generated by the decoder.

In experiments, it was confirmed that the proposed method can generate more beautiful and natural images by explicit structure extraction and matching. Since the proposed method can render images in less than one second and can change various hyperparameters, it is considered to be a fairly fast and highly flexible method.

0 comments on commit 6f2ad7f

Please sign in to comment.