Updating Demo Page!

Berkeley-Speech-Group · Oct 8, 2023 · d8ed176 · d8ed176
1 parent 7be8c0d
commit d8ed176
Show file tree

Hide file tree

Showing 3 changed files with 50 additions and 33 deletions.
diff --git a/index.html b/index.html
@@ -118,10 +118,10 @@
 
 <html>
 <head>
-	<title>This is my paper title</title>
+	<title>PerMod: Perceptually Grounded Voice Modification with Latent Diffusion Models</title>
 	<meta property="og:image" content="Path to my teaser.png"/> <!-- Facebook automatically scrapes this. Go to https://developers.facebook.com/tools/debug/ if you update and want to force Facebook to rescrape. -->
-	<meta property="og:title" content="Creative and Descriptive Paper Title." />
-	<meta property="og:description" content="Paper description." />
+	<meta property="og:title" content="PerMod: Perceptually Grounded Voice Modification with Latent Diffusion Models" />
+	<meta property="og:description" content="Proposes a latent diffusion model that modifies a voice given perceptual qualities." />
 
 	<!-- Get from Google Analytics -->
 	<!-- Global site tag (gtag.js) - Google Analytics -->
@@ -138,23 +138,28 @@
 <body>
 	<br>
 	<center>
-		<span style="font-size:36px">Creative and Descriptive Paper Title</span>
+		<span style="font-size:36px">PerMod: Perceptually Grounded Voice Modification with Latent Diffusion Models</span>
 		<table align=center width=600px>
 			<table align=center width=600px>
 				<tr>
 					<td align=center width=100px>
 						<center>
-							<span style="font-size:24px"><a href="https://en.wikipedia.org/wiki/James_J._Gibson">First Author</a></span>
+							<span style="font-size:24px"><a href="">Robin Netzorg</a></span>
 						</center>
 					</td>
 					<td align=center width=100px>
 						<center>
-							<span style="font-size:24px"><a href="https://en.wikipedia.org/wiki/James_J._Gibson">Second Author</a></span>
+							<span style="font-size:24px"><a href="">Ajil Jalal</a></span>
 						</center>
 					</td>
 					<td align=center width=100px>
 						<center>
-							<span style="font-size:24px"><a href="https://en.wikipedia.org/wiki/James_J._Gibson">Third Author</a></span>
+							<span style="font-size:24px"><a href="">Luna McNulty</a></span>
+						</center>
+					</td>
+					<td align=center width=100px>
+						<center>
+							<span style="font-size:24px"><a href="">Gopala Anumanchipalli</a></span>
 						</center>
 					</td>
 				</tr>
@@ -166,11 +171,11 @@
 							<span style="font-size:24px"><a href=''>[Paper]</a></span>
 						</center>
 					</td>
-					<td align=center width=120px>
+					<!-- <td align=center width=120px>
 						<center>
 							<span style="font-size:24px"><a href='https://github.com/richzhang/webpage-template'>[GitHub]</a></span><br>
 						</center>
-					</td>
+					</td> -->
 				</tr>
 			</table>
 		</table>
@@ -181,18 +186,11 @@
 			<tr>
 				<td width=260px>
 					<center>
-						<img class="round" style="width:500px" src="./resources/teaser.png"/>
+						<img class="round" style="width:500px" src="./resources/figure.png"/>
 					</center>
 				</td>
 			</tr>
 		</table>
-		<table align=center width=850px>
-			<tr>
-				<td>
-					This was a template originally made for <a href="http://richzhang.github.io/colorization/">Colorful Image Colorization</a>. The code can be found in this <a href="https://github.com/richzhang/webpage-template">repository</a>.
-				</td>
-			</tr>
-		</table>
 	</center>
 
 	<hr>
@@ -201,13 +199,32 @@
 		<center><h1>Abstract</h1></center>
 		<tr>
 			<td>
-				This is my abstract.
+				Perceptual modification of voice is an elusive goal. 
+				While non-experts can modify an image or sentence perceptually with available tools, it is not clear how to similarly modify speech along perceptual axes. Voice conversion does make it possible to convert one voice to another, but these modifications are handled by black box models, and the specifics of what perceptual qualities to modify how to modify them are unclear. Towards allowing greater perceptual control over voice, we introduce PerMod, a conditional latent diffusion model that takes in an input voice and a perceptual qualities vector, and produces a voice with the matching perceptual qualities. Unlike prior work, PerMod generates a new voice corresponding to perceptual modifications. Evaluating perceptual quality vectors with RMSE from both human and predicted labels, we demonstrate that PerMod produces voices with the desired perceptual qualities for typical voices, but performs poorly on atypical voices.
 			</td>
 		</tr>
 	</table>
 	<br>
 
-	<hr>
+	<table align=center width=850px>
+		<center><h1>Demo</h1></center>
+		<tr>
+			<th>Task</th>
+			<th>Model</th>
+			<th>Input Speech</th>
+			<th>Target Speech</th>
+			<th>Generated Speech</th>
+		</tr>
+
+		<tr>
+			<td>Typical-to-Typical</td>
+			<td>PerMod-Pretrained</td>
+			<td>VCTK 1</td>
+			<td>VCTK 2</td>
+		</tr>
+	</table>
+
+	<!-- <hr>
 	<center><h1>Talk</h1></center>
 	<p align="center">
 		<iframe width="660" height="395" src="https://www.youtube.com/embed/dQw4w9WgXcQ" frameborder="0" allow="autoplay; encrypted-media" allowfullscreen align="center"></iframe>
@@ -222,8 +239,8 @@
 			</center>
 		</tr>
 	</table>
-	<hr>
-
+	<hr> -->
+<!-- 
 	<center><h1>Code</h1></center>
 
 	<table align=center width=420px>
@@ -258,24 +275,24 @@
 			<span style="font-size:28px">&nbsp;<a href='https://github.com/richzhang/webpage-template'>[GitHub]</a>
 			</center>
 		</span>
-	</table>
+	</table> -->
 	<br>
 	<hr>
-	<table align=center width=450px>
+	<!-- <table align=center width=450px>
 		<center><h1>Paper and Supplementary Material</h1></center>
 		<tr>
 			<td><a href=""><img class="layered-paper-big" style="height:175px" src="./resources/paper.png"/></a></td>
 			<td><span style="font-size:14pt">F. Author, S. Author, T. Author.<br>
 				<b>Creative and Descriptive Paper Title.</b><br>
 				In Conference, 20XX.<br>
 				(hosted on <a href="">ArXiv</a>)<br>
-				<!-- (<a href="./resources/camera-ready.pdf">camera ready</a>)<br> -->
-				<span style="font-size:4pt"><a href=""><br></a>
+				(<a href="./resources/camera-ready.pdf">camera ready</a>)<br>
+				 <span style="font-size:4pt"><a href=""><br></a> 
 				</span>
 			</td>
 		</tr>
 	</table>
-	<br>
+	<br> -->
 
 	<table align=center width=600px>
 		<tr>
@@ -293,12 +310,12 @@
 			<td width=400px>
 				<left>
 					<center><h1>Acknowledgements</h1></center>
-					This template was originally made by <a href="http://web.mit.edu/phillipi/">Phillip Isola</a> and <a href="http://richzhang.github.io/">Richard Zhang</a> for a <a href="http://richzhang.github.io/colorization/">colorful</a> ECCV project; the code can be found <a href="https://github.com/richzhang/webpage-template">here</a>.
+					This work was supported by the UC Noyce Initiative, Society of Hellman Fellows, NSF, NIH/NIDCD and the Schwab Innovation Fund.
 				</left>
 			</td>
 		</tr>
 	</table>
-
+	<!-- This template was originally made by <a href="http://web.mit.edu/phillipi/">Phillip Isola</a> and <a href="http://richzhang.github.io/">Richard Zhang</a> for a <a href="http://richzhang.github.io/colorization/">colorful</a> ECCV project; the code can be found <a href="https://github.com/richzhang/webpage-template">here</a>. -->
 <br>
 </body>
 </html>

diff --git a/resources/bibtex.txt b/resources/bibtex.txt
@@ -1,6 +1,6 @@
-@inproceedings{author20XXtitle,
-  title={Please cite me},
-  author={Author, First and Author, Second and Author, Third},
-  booktitle={Conference},
-  year={20XX}
+@inproceedings{netzorg2023permod,
+  title={PerMod: Perceptually Grounded Voice Modification with Latent Diffusion Models},
+  author={Netzorg, Robin and Jalal, Ajil and McNulty, Luna and Anumanchipalli, Gopala},
+  booktitle={Workshop on Automatic Speech Recognition and Understanding (ASRU)},
+  year={2023}
 }
diff --git a/resources/figure.png b/resources/figure.png