diff --git a/README.md b/README.md
index 5fd088a..44f2e7a 100644
--- a/README.md
+++ b/README.md
@@ -11,30 +11,29 @@
   <sup>*</sup> Equal Contribution
 </p>
 
-[![Project Page](https://img.shields.io/badge/Project-Page-blue)](https://compvis.github.io/CleanDIFT/)
-[![Paper](https://img.shields.io/badge/arXiv-PDF-b31b1b)](https://compvis.github.io/CleanDIFT/static/pdfs/cleandift.pdf)
+[![Project Page](https://img.shields.io/badge/Project-Page-blue)](https://compvis.github.io/cleandift/)
+[![Paper](https://img.shields.io/badge/arXiv-PDF-b31b1b)](https://compvis.github.io/cleandift/static/pdfs/cleandift.pdf)
 [![Weights](https://img.shields.io/badge/HuggingFace-Weights-orange)](https://huggingface.co/CompVis/cleandift)
 
-
-
 This repository contains the official implementation of the paper "CleanDIFT: Diffusion Features without Noise".
 
 We propose CleanDIFT, a novel method to extract noise-free, timestep-independent features by enabling diffusion models to work directly with clean input images. Our approach is efficient, training on a single GPU in just 30 minutes.
 
 ![teaser](./docs/static/images/teaser_fig.png)
 
-
 ## 🚀 Usage
+
 ### Setup
+
 Just clone the repo and install the requirements via `pip install -r requirements.txt`, then you're ready to go.
 
 ### Training
 
-In order to train a feature extractor on your own, you can run `python train.py`. The training script expects your data to be stored in `./data` with the following format: Single level directory with images named `filename.jpg` and corresponding json files `filename.json` that contain the key `caption`. 
+In order to train a feature extractor on your own, you can run `python train.py`. The training script expects your data to be stored in `./data` with the following format: Single level directory with images named `filename.jpg` and corresponding json files `filename.json` that contain the key `caption`.
 
 ### Feature Extraction
 
-For feature extraction, please refer to one of the notebooks at [`notebooks`](https://github.com/CompVis/CleanDIFT/tree/main/notebooks). We demonstrate how to extract features and use them for semantic correspondence detection and depth prediction. 
+For feature extraction, please refer to one of the notebooks at [`notebooks`](https://github.com/CompVis/cleandift/tree/main/notebooks). We demonstrate how to extract features and use them for semantic correspondence detection and depth prediction.
 
 Our checkpoints are fully compatible with the `diffusers` library. If you already have a pipeline using SD 1.5 or SD 2.1 from `diffusers`, you can simply replace the U-Net state dict:
 
@@ -48,7 +47,6 @@ state_dict = load_file(ckpt_pth)
 unet.load_state_dict(state_dict, strict=True)
 ```
 
-
 ## 🎓 Citation
 
 If you use this codebase or otherwise found our work valuable, please cite our paper:
@@ -62,4 +60,4 @@ If you use this codebase or otherwise found our work valuable, please cite our p
   archivePrefix={arXiv},
   primaryClass={cs.CV}
 }
-```
\ No newline at end of file
+```
diff --git a/docs/index.html b/docs/index.html
index f68192e..0dcee60 100644
--- a/docs/index.html
+++ b/docs/index.html
@@ -6,18 +6,20 @@
   <!-- Meta tags for social media banners, these should be filled in appropriatly as they are your "business card" -->
   <!-- Replace the content tag with appropriate information -->
   <meta name="description" content="CleanDIFT enables extracting Noise-Free, Timestep-Independent Diffusion Features">
-  <meta property="og:title" content="CleanDIFT: Diffusion Features without Noise"/>
-  <meta property="og:description" content="CleanDIFT enables extracting Noise-Free, Timestep-Independent Diffusion Features"/>
-  <meta property="og:url" content="https://compvis.github.io/CleanDIFT"/>
+  <meta property="og:title" content="CleanDIFT: Diffusion Features without Noise" />
+  <meta property="og:description"
+    content="CleanDIFT enables extracting Noise-Free, Timestep-Independent Diffusion Features" />
+  <meta property="og:url" content="https://compvis.github.io/cleandift" />
 
-   <!-- Path to banner image, should be in the path listed below. Optimal dimenssions are 1200X630-->
-   <meta property="og:image" content="static/images/teaser_fig.png" />
-   <meta property="og:image:width" content="2700" />
-   <meta property="og:image:height" content="1324" />
+  <!-- Path to banner image, should be in the path listed below. Optimal dimenssions are 1200X630-->
+  <meta property="og:image" content="static/images/teaser_fig.png" />
+  <meta property="og:image:width" content="2700" />
+  <meta property="og:image:height" content="1324" />
 
 
   <meta name="twitter:title" content="CleanDIFT: Diffusion Features without Noise">
-  <meta name="twitter:description" content="CleanDIFT enables extracting Noise-Free, Timestep-Independent Diffusion Features">
+  <meta name="twitter:description"
+    content="CleanDIFT enables extracting Noise-Free, Timestep-Independent Diffusion Features">
   <!-- Path to banner image, should be in the path listed below. Optimal dimenssions are 1200X600-->
   <meta name="twitter:image" content="static/images/teaser_fig.png">
   <meta name="twitter:card" content="CleanDIFT enables extracting Noise-Free, Timestep-Independent Diffusion Features">
@@ -25,10 +27,10 @@
   <meta name="keywords" content="Diffusion, Features, Noise-Free, Fine-Tuning">
   <meta name="viewport" content="width=device-width, initial-scale=1">
 
-  <link rel="icon" href="data:image/svg+xml,<svg xmlns=%22http://www.w3.org/2000/svg%22 viewBox=%220 0 32 32%22><text y=%221em%22 font-size=%2232%22>🧹</text></svg>">
+  <link rel="icon"
+    href="data:image/svg+xml,<svg xmlns=%22http://www.w3.org/2000/svg%22 viewBox=%220 0 32 32%22><text y=%221em%22 font-size=%2232%22>🧹</text></svg>">
   <title>CleanDIFT: Diffusion Features without Noise</title>
-  <link href="https://fonts.googleapis.com/css?family=Google+Sans|Noto+Sans|Castoro"
-  rel="stylesheet">
+  <link href="https://fonts.googleapis.com/css?family=Google+Sans|Noto+Sans|Castoro" rel="stylesheet">
 
 
   <link rel="stylesheet" href="static/css/bulma.min.css">
@@ -124,7 +126,7 @@ <h1 class="title is-1 publication-title">🧹 CleanDIFT: Diffusion Features with
               <div class="publication-links">
                 <!--  PDF link -->
                 <span class="link-block">
-                  <a href="https://compvis.github.io/CleanDIFT/static/pdfs/cleandift.pdf" target="_blank"
+                  <a href="https://compvis.github.io/cleandift/static/pdfs/cleandift.pdf" target="_blank"
                     class="external-link button is-normal is-rounded is-dark">
                     <span class="icon">
                       <i class="fas fa-file-pdf"></i>
@@ -136,7 +138,7 @@ <h1 class="title is-1 publication-title">🧹 CleanDIFT: Diffusion Features with
 
                 <!-- Github link -->
                 <span class="link-block">
-                  <a href="https://github.com/CompVis/CleanDIFT" target="_blank"
+                  <a href="https://github.com/CompVis/cleandift" target="_blank"
                     class="external-link button is-normal is-rounded is-dark">
                     <span class="icon">
                       <i class="fab fa-github"></i>
@@ -146,7 +148,7 @@ <h1 class="title is-1 publication-title">🧹 CleanDIFT: Diffusion Features with
                 </span>
 
                 <!-- ArXiv abstract Link -->
-                 <!--
+                <!--
                 <span class="link-block">
                   <a href="https://arxiv.org/abs/<ARXIV PAPER ID>" target="_blank"
                     class="external-link button is-normal is-rounded is-dark">
@@ -163,8 +165,8 @@ <h1 class="title is-1 publication-title">🧹 CleanDIFT: Diffusion Features with
                   <a href="https://huggingface.co/CompVis/cleandift" target="_blank"
                     class="external-link button is-normal is-rounded is-dark">
                     <span class="icon">
-                      <img src="https://huggingface.co/front/assets/huggingface_logo-noborder.svg" 
-                      alt="Hugging Face" style="height: 20px;">
+                      <img src="https://huggingface.co/front/assets/huggingface_logo-noborder.svg" alt="Hugging Face"
+                        style="height: 20px;">
                     </span>
                     <span>Weights</span>
                   </a>
@@ -187,23 +189,28 @@ <h1 class="title is-1 publication-title">🧹 CleanDIFT: Diffusion Features with
 
             <img src="static/images/teaser_fig.png" alt="" style="width: 100%; height: auto;">
           </div>
-          
-          </div>
-          
-          <div class="container is-max-desktop">
+
+        </div>
+
+        <div class="container is-max-desktop">
           <div class="columns is-centered has-text-centered">
             <div class="column is-four-fifths">
-          <div class="content has-text-justified">
-            <p style="margin-bottom: 20px; margin-top: 10px;">
-              <span style="font-weight: bold; font-size: 1.3em;">TL;DR:</span> Diffusion models learn powerful world representations that have proven valuable for tasks like semantic correspondence detection, depth estimation, semantic segmentation, and classification.
-              However, diffusion models require noisy input images, which destroys information and introduces the noise level as a hyperparameter that needs to be tuned for each task.
-              We propose a novel method to extract <span style="font-weight: bold;">noise-free, timestep-independent features</span> by enabling diffusion models to work directly with clean input images. Our approach is efficient, training on a single GPU in just 30 minutes.
-            </p>
+              <div class="content has-text-justified">
+                <p style="margin-bottom: 20px; margin-top: 10px;">
+                  <span style="font-weight: bold; font-size: 1.3em;">TL;DR:</span> Diffusion models learn powerful world
+                  representations that have proven valuable for tasks like semantic correspondence detection, depth
+                  estimation, semantic segmentation, and classification.
+                  However, diffusion models require noisy input images, which destroys information and introduces the
+                  noise level as a hyperparameter that needs to be tuned for each task.
+                  We propose a novel method to extract <span style="font-weight: bold;">noise-free, timestep-independent
+                    features</span> by enabling diffusion models to work directly with clean input images. Our approach
+                  is efficient, training on a single GPU in just 30 minutes.
+                </p>
 
+              </div>
+            </div>
           </div>
         </div>
-        </div>
-      </div>
       </div>
     </div>
   </section>
@@ -247,10 +254,10 @@ <h2 class="title is-3">Clean Features &rarr; Clean Predictions</h2>
           <div class="content">
 
             <p class="has-text-justified" style="font-size: 16px;">
-            We evaluate our features on a wide range of downstream tasks: unsupervised zero-shot semantic
-            correspondence, monocular depth estimation, semantic segmentation, and classification.
-            We compare our features against standard diffusion features, methods that combine diffusion features with
-            additional features, and non-diffusion-based approaches.
+              We evaluate our features on a wide range of downstream tasks: unsupervised zero-shot semantic
+              correspondence, monocular depth estimation, semantic segmentation, and classification.
+              We compare our features against standard diffusion features, methods that combine diffusion features with
+              additional features, and non-diffusion-based approaches.
             </p>
 
             <div class="container is-max-desktop">
@@ -262,17 +269,17 @@ <h2 class="title is-3">Clean Features &rarr; Clean Predictions</h2>
                       <h3 class="column-title small">Input Image</h3>
                     </div>
                     <div class="our-column depth">
-                        <h3 class="column-title small">Depth Estimation</h3>
+                      <h3 class="column-title small">Depth Estimation</h3>
                     </div>
                     <div class="our-column seg">
                       <h3 class="column-title small">Input Image</h3>
+                    </div>
+                    <div class="our-column seg">
+                      <h3 class="column-title small">Semantic Segmentation</h3>
+                    </div>
                   </div>
-                  <div class="our-column seg">
-                    <h3 class="column-title small">Semantic Segmentation</h3>
-                </div>
-                  </div>
-        
-        
+
+
                   <!-- First row: Image and Slider -->
                   <div class="our-container">
                     <div class="our-column depth">
@@ -302,7 +309,7 @@ <h3 class="column-title small">Semantic Segmentation</h3>
                       </div>
                     </div>
                   </div>
-        
+
                   <!-- Second row: Image and Slider -->
                   <div class="our-container">
                     <div class="our-column depth">
@@ -332,7 +339,7 @@ <h3 class="column-title small">Semantic Segmentation</h3>
                       </div>
                     </div>
                   </div>
-        
+
                   <!-- Third row: Image and Slider -->
                   <div class="our-container">
                     <div class="our-column depth">
@@ -365,56 +372,56 @@ <h3 class="column-title small">Semantic Segmentation</h3>
                 </div>
               </div>
             </div>
-        
-          <script>
-            $(window).on('load', function () {
-        
-              $(".first").twentytwenty({
-                before_label: 'SD',
-                after_label: 'Ours',
-                default_offset_pct: 0.75,
-              });
-              $(".second").twentytwenty({
-                before_label: 'SD',
-                after_label: 'Ours',
-                default_offset_pct: 0.35,
-              });
-              $(".third").twentytwenty({
-                before_label: 'SD',
-                after_label: 'Ours',
-                default_offset_pct: 0.5,
+
+            <script>
+              $(window).on('load', function () {
+
+                $(".first").twentytwenty({
+                  before_label: 'SD',
+                  after_label: 'Ours',
+                  default_offset_pct: 0.75,
+                });
+                $(".second").twentytwenty({
+                  before_label: 'SD',
+                  after_label: 'Ours',
+                  default_offset_pct: 0.35,
+                });
+                $(".third").twentytwenty({
+                  before_label: 'SD',
+                  after_label: 'Ours',
+                  default_offset_pct: 0.5,
+                });
               });
-            });
-          </script>
-        
-          <style>
-            .image-slider-row {
-              display: flex;
-              justify-content: space-between;
-              align-items: center;
-              margin-bottom: 20px;
-            }
-        
-            .input-image-container,
-            .slider-container {
-              flex: 1;
-              max-width: 48%;
-              /* Adjust this to control the width of the image/slider */
-            }
-        
-            .input-image-container img,
-            .slider-container img {
-              width: 100%;
-              height: auto;
-              display: block;
-              object-fit: cover;
-            }
-        
-            .twentytwenty-container {
-              width: 100%;
-              height: 100%;
-            }
-          </style>
+            </script>
+
+            <style>
+              .image-slider-row {
+                display: flex;
+                justify-content: space-between;
+                align-items: center;
+                margin-bottom: 20px;
+              }
+
+              .input-image-container,
+              .slider-container {
+                flex: 1;
+                max-width: 48%;
+                /* Adjust this to control the width of the image/slider */
+              }
+
+              .input-image-container img,
+              .slider-container img {
+                width: 100%;
+                height: auto;
+                display: block;
+                object-fit: cover;
+              }
+
+              .twentytwenty-container {
+                width: 100%;
+                height: 100%;
+              }
+            </style>
 
             <!--
             <img src="static/images/depth_pred.png" alt="">
@@ -422,14 +429,18 @@ <h3 class="column-title small">Semantic Segmentation</h3>
 
 
             <p class="has-text-justified" style="font-size: 16px;">
-              We compare Depth Estimation and Semantic Segmentation using linear probes on standard diffusion features and our CleanDIFT features.
+              We compare Depth Estimation and Semantic Segmentation using linear probes on standard diffusion features
+              and our CleanDIFT features.
               Note how the CleanDIFT features are far less noisy when compared to the standard diffusion features.
-              Depth probes are trained on NYUv2 dataset, Segmentation probes on PASCAL VOC. Standard diffusion features use t=100 for Semantic Segmentation and t=300 for depth prediction.
+              Depth probes are trained on NYUv2 dataset, Segmentation probes on PASCAL VOC. Standard diffusion features
+              use t=100 for Semantic Segmentation and t=300 for depth prediction.
             </p>
 
-            <img src="static/images/correspondences.png" alt="" style="max-width: 70%; height: auto; margin-bottom: 20px;">
+            <img src="static/images/correspondences.png" alt=""
+              style="max-width: 70%; height: auto; margin-bottom: 20px;">
             <p class="has-text-justified" style="font-size: 16px;">
-              Zero-Shot Semantic Correspondence matching using DIFT features with standard SD 2.1 (t=261) and our CleanDIFT
+              Zero-Shot Semantic Correspondence matching using DIFT features with standard SD 2.1 (t=261) and our
+              CleanDIFT
               features.
               Our clean features show significantly less incorrect matches than the standard diffusion features.
             </p>
@@ -442,7 +453,7 @@ <h3 class="column-title small">Semantic Segmentation</h3>
 
 
 
-  
+
 
 
   <section class="section hero is-light">
@@ -496,13 +507,16 @@ <h2 class="title is-3">Quantitative Comparison</h2>
 
 
             <h3 class="title">Zero-Shot Semantic Correspondence</h3>
-            <img src="static/images/correspondence_table.png" alt="" style="max-width: 50%; height: auto; margin-bottom: 20px;">
+            <img src="static/images/correspondence_table.png" alt=""
+              style="max-width: 50%; height: auto; margin-bottom: 20px;">
             <p class="has-text-justified" style="font-size: 16px;">
               Zero-shot unsupervised semantic correspondence matching performance comparison on SPair-71k. Our improved
-              features consistently lead to substantial improvements in matching performance. Numbers show our reproductions.
+              features consistently lead to substantial improvements in matching performance. Numbers show our
+              reproductions.
             </p>
 
-            <img src="static/images/correspondence_quantitative.png" alt="" style="max-width: 50%; height: auto; margin-bottom: 20px;">
+            <img src="static/images/correspondence_quantitative.png" alt=""
+              style="max-width: 50%; height: auto; margin-bottom: 20px;">
             <p class="has-text-justified" style="font-size: 16px;">
               We evaluate semantic correspondence matching accuracy for different noise levels. Our feature extractor
               outperforms the standard noisy diffusion features across all timesteps t.
@@ -518,7 +532,8 @@ <h3 class="title">Monocular Depth Estimation</h3>
               features can be reused for the clean features, but incur a smaller performance gain.
             </p>
             <h3 class="title">Semantic Segmentation</h3>
-            <img src="static/images/sem_seg_quantitative.png" alt="" style="max-width: 50%; height: auto; margin-bottom: 20px;">
+            <img src="static/images/sem_seg_quantitative.png" alt=""
+              style="max-width: 50%; height: auto; margin-bottom: 20px;">
 
             <p class="has-text-justified" style="font-size: 16px;">
               Performance on semantic segmentation for the PASCAL VOC dataset using linear probes. Our clean features