Skip to content

Commit

Permalink
Updating Demo Page!
Browse files Browse the repository at this point in the history
  • Loading branch information
robbizorg committed Oct 8, 2023
1 parent 7be8c0d commit d8ed176
Show file tree
Hide file tree
Showing 3 changed files with 50 additions and 33 deletions.
73 changes: 45 additions & 28 deletions index.html
Original file line number Diff line number Diff line change
Expand Up @@ -118,10 +118,10 @@

<html>
<head>
<title>This is my paper title</title>
<title>PerMod: Perceptually Grounded Voice Modification with Latent Diffusion Models</title>
<meta property="og:image" content="Path to my teaser.png"/> <!-- Facebook automatically scrapes this. Go to https://developers.facebook.com/tools/debug/ if you update and want to force Facebook to rescrape. -->
<meta property="og:title" content="Creative and Descriptive Paper Title." />
<meta property="og:description" content="Paper description." />
<meta property="og:title" content="PerMod: Perceptually Grounded Voice Modification with Latent Diffusion Models" />
<meta property="og:description" content="Proposes a latent diffusion model that modifies a voice given perceptual qualities." />

<!-- Get from Google Analytics -->
<!-- Global site tag (gtag.js) - Google Analytics -->
Expand All @@ -138,23 +138,28 @@
<body>
<br>
<center>
<span style="font-size:36px">Creative and Descriptive Paper Title</span>
<span style="font-size:36px">PerMod: Perceptually Grounded Voice Modification with Latent Diffusion Models</span>
<table align=center width=600px>
<table align=center width=600px>
<tr>
<td align=center width=100px>
<center>
<span style="font-size:24px"><a href="https://en.wikipedia.org/wiki/James_J._Gibson">First Author</a></span>
<span style="font-size:24px"><a href="">Robin Netzorg</a></span>
</center>
</td>
<td align=center width=100px>
<center>
<span style="font-size:24px"><a href="https://en.wikipedia.org/wiki/James_J._Gibson">Second Author</a></span>
<span style="font-size:24px"><a href="">Ajil Jalal</a></span>
</center>
</td>
<td align=center width=100px>
<center>
<span style="font-size:24px"><a href="https://en.wikipedia.org/wiki/James_J._Gibson">Third Author</a></span>
<span style="font-size:24px"><a href="">Luna McNulty</a></span>
</center>
</td>
<td align=center width=100px>
<center>
<span style="font-size:24px"><a href="">Gopala Anumanchipalli</a></span>
</center>
</td>
</tr>
Expand All @@ -166,11 +171,11 @@
<span style="font-size:24px"><a href=''>[Paper]</a></span>
</center>
</td>
<td align=center width=120px>
<!-- <td align=center width=120px>
<center>
<span style="font-size:24px"><a href='https://github.com/richzhang/webpage-template'>[GitHub]</a></span><br>
</center>
</td>
</td> -->
</tr>
</table>
</table>
Expand All @@ -181,18 +186,11 @@
<tr>
<td width=260px>
<center>
<img class="round" style="width:500px" src="./resources/teaser.png"/>
<img class="round" style="width:500px" src="./resources/figure.png"/>
</center>
</td>
</tr>
</table>
<table align=center width=850px>
<tr>
<td>
This was a template originally made for <a href="http://richzhang.github.io/colorization/">Colorful Image Colorization</a>. The code can be found in this <a href="https://github.com/richzhang/webpage-template">repository</a>.
</td>
</tr>
</table>
</center>

<hr>
Expand All @@ -201,13 +199,32 @@
<center><h1>Abstract</h1></center>
<tr>
<td>
This is my abstract.
Perceptual modification of voice is an elusive goal.
While non-experts can modify an image or sentence perceptually with available tools, it is not clear how to similarly modify speech along perceptual axes. Voice conversion does make it possible to convert one voice to another, but these modifications are handled by black box models, and the specifics of what perceptual qualities to modify how to modify them are unclear. Towards allowing greater perceptual control over voice, we introduce PerMod, a conditional latent diffusion model that takes in an input voice and a perceptual qualities vector, and produces a voice with the matching perceptual qualities. Unlike prior work, PerMod generates a new voice corresponding to perceptual modifications. Evaluating perceptual quality vectors with RMSE from both human and predicted labels, we demonstrate that PerMod produces voices with the desired perceptual qualities for typical voices, but performs poorly on atypical voices.
</td>
</tr>
</table>
<br>

<hr>
<table align=center width=850px>
<center><h1>Demo</h1></center>
<tr>
<th>Task</th>
<th>Model</th>
<th>Input Speech</th>
<th>Target Speech</th>
<th>Generated Speech</th>
</tr>

<tr>
<td>Typical-to-Typical</td>
<td>PerMod-Pretrained</td>
<td>VCTK 1</td>
<td>VCTK 2</td>
</tr>
</table>

<!-- <hr>
<center><h1>Talk</h1></center>
<p align="center">
<iframe width="660" height="395" src="https://www.youtube.com/embed/dQw4w9WgXcQ" frameborder="0" allow="autoplay; encrypted-media" allowfullscreen align="center"></iframe>
Expand All @@ -222,8 +239,8 @@
</center>
</tr>
</table>
<hr>

<hr> -->
<!--
<center><h1>Code</h1></center>
<table align=center width=420px>
Expand Down Expand Up @@ -258,24 +275,24 @@
<span style="font-size:28px">&nbsp;<a href='https://github.com/richzhang/webpage-template'>[GitHub]</a>
</center>
</span>
</table>
</table> -->
<br>
<hr>
<table align=center width=450px>
<!-- <table align=center width=450px>
<center><h1>Paper and Supplementary Material</h1></center>
<tr>
<td><a href=""><img class="layered-paper-big" style="height:175px" src="./resources/paper.png"/></a></td>
<td><span style="font-size:14pt">F. Author, S. Author, T. Author.<br>
<b>Creative and Descriptive Paper Title.</b><br>
In Conference, 20XX.<br>
(hosted on <a href="">ArXiv</a>)<br>
<!-- (<a href="./resources/camera-ready.pdf">camera ready</a>)<br> -->
<span style="font-size:4pt"><a href=""><br></a>
(<a href="./resources/camera-ready.pdf">camera ready</a>)<br>
<span style="font-size:4pt"><a href=""><br></a>
</span>
</td>
</tr>
</table>
<br>
<br> -->

<table align=center width=600px>
<tr>
Expand All @@ -293,12 +310,12 @@
<td width=400px>
<left>
<center><h1>Acknowledgements</h1></center>
This template was originally made by <a href="http://web.mit.edu/phillipi/">Phillip Isola</a> and <a href="http://richzhang.github.io/">Richard Zhang</a> for a <a href="http://richzhang.github.io/colorization/">colorful</a> ECCV project; the code can be found <a href="https://github.com/richzhang/webpage-template">here</a>.
This work was supported by the UC Noyce Initiative, Society of Hellman Fellows, NSF, NIH/NIDCD and the Schwab Innovation Fund.
</left>
</td>
</tr>
</table>

<!-- This template was originally made by <a href="http://web.mit.edu/phillipi/">Phillip Isola</a> and <a href="http://richzhang.github.io/">Richard Zhang</a> for a <a href="http://richzhang.github.io/colorization/">colorful</a> ECCV project; the code can be found <a href="https://github.com/richzhang/webpage-template">here</a>. -->
<br>
</body>
</html>
Expand Down
10 changes: 5 additions & 5 deletions resources/bibtex.txt
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
@inproceedings{author20XXtitle,
title={Please cite me},
author={Author, First and Author, Second and Author, Third},
booktitle={Conference},
year={20XX}
@inproceedings{netzorg2023permod,
title={PerMod: Perceptually Grounded Voice Modification with Latent Diffusion Models},
author={Netzorg, Robin and Jalal, Ajil and McNulty, Luna and Anumanchipalli, Gopala},
booktitle={Workshop on Automatic Speech Recognition and Understanding (ASRU)},
year={2023}
}
Binary file added resources/figure.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit d8ed176

Please sign in to comment.