From 7436eafd66012962f3f0244a6d307c08471fac90 Mon Sep 17 00:00:00 2001 From: Malo OLIVIER Date: Thu, 2 Jan 2025 13:42:18 +0100 Subject: [PATCH] removed *.md --- GRU.md | 57 ---- LICENSE.md | 14 - README.md | 33 -- ablation_study.md | 317 ------------------ attention.md | 93 ------ ci_cd_ablation_study.md | 586 --------------------------------- generate_hnet_training_data.md | 48 --- rapport_221124.md | 69 ---- train_hnet.md | 74 ----- 9 files changed, 1291 deletions(-) delete mode 100644 GRU.md delete mode 100644 LICENSE.md delete mode 100644 README.md delete mode 100644 ablation_study.md delete mode 100644 attention.md delete mode 100644 ci_cd_ablation_study.md delete mode 100644 generate_hnet_training_data.md delete mode 100644 rapport_221124.md delete mode 100644 train_hnet.md diff --git a/GRU.md b/GRU.md deleted file mode 100644 index 51818b2..0000000 --- a/GRU.md +++ /dev/null @@ -1,57 +0,0 @@ -# Gated Recurrent Unit (GRU) - -A Gated Recurrent Unit (GRU) is a type of recurrent neural network (RNN) architecture designed to efficiently capture dependencies in sequential data. Introduced by Kyunghyun Cho et al. in 2014, GRUs address some of the limitations of traditional RNNs, particularly the issues of vanishing and exploding gradients, which hinder the learning of long-term dependencies. - -## Core Components of GRU - -A GRU cell comprises two primary gates: - -### Update Gate ($z_t$) - -- **Purpose**: Determines how much of the previous hidden state should be retained. -- **Functionality**: Balances the incorporation of new information with the preservation of existing knowledge. -- **Computation**: - \[ - z_t = \sigma(W_z \cdot [x_t, h_{t-1}]) - \] - Here, $\sigma$ represents the sigmoid activation function, $W_z$ is the weight matrix for the update gate, $x_t$ is the current input, and $h_{t-1}$ is the previous hidden state. - -### Reset Gate ($r_t$) - -- **Purpose**: Decides how much of the past information to forget. -- **Functionality**: Controls the extent to which the previous hidden state influences the candidate hidden state. -- **Computation**: - \[ - r_t = \sigma(W_r \cdot [x_t, h_{t-1}]) - \] - Similar to the update gate, $W_r$ is the weight matrix for the reset gate. - -### Candidate Hidden State ($\tilde{h}_t$) - -After determining the gates, the GRU computes a candidate hidden state that represents the new information to be potentially added to the model's memory. - -\[ -\tilde{h}_t = \tanh(W \cdot [x_t, (r_t * h_{t-1})]) -\] - -Here, $\tanh$ is the hyperbolic tangent activation function, and the reset gate $r_t$ modulates the influence of the previous hidden state $h_{t-1}$ on the candidate state. - -### Final Hidden State ($h_t$) - -The final hidden state, which will be passed to the next time step and potentially used for output, is a linear interpolation between the previous hidden state and the candidate hidden state, controlled by the update gate. - -\[ -h_t = (1 - z_t) * h_{t-1} + z_t * \tilde{h}_t -\] - -This mechanism allows the GRU to retain information over long sequences selectively, mitigating the vanishing gradient problem and enabling the capture of more extended dependencies compared to traditional RNNs. - -## Advantages of GRUs - -- **Simplicity**: GRUs have a simpler architecture compared to other gated RNNs like Long Short-Term Memory (LSTM) networks, as they omit the output gate and have fewer parameters. -- **Efficiency**: Due to their streamlined structure, GRUs are computationally efficient and often faster to train. -- **Performance**: GRUs perform comparably to LSTMs on various tasks, making them a popular choice for applications involving sequential data, such as natural language processing, time-series forecasting, and speech recognition. - -## Summary - -In essence, GRUs enhance the ability of RNNs to model sequential data by incorporating gating mechanisms that control the flow of information. These gates enable the network to decide which information to retain and which to discard, allowing for effective learning of both short-term and long-term dependencies without the complexity inherent in other gated architectures like LSTMs. \ No newline at end of file diff --git a/LICENSE.md b/LICENSE.md deleted file mode 100644 index ec544fc..0000000 --- a/LICENSE.md +++ /dev/null @@ -1,14 +0,0 @@ ------------COPYRIGHT NOTICE STARTS WITH THIS LINE------------ Copyright (c) 2021 Tampere University and its licensors All rights reserved. - -Permission is hereby granted, without written agreement and without license or royalty fees, to use and copy the code for the deep Hungarian network method/architecture, present in the GitHub repository with the handle hungarian-net, (“Work”) described in the paper with title "Differentiable Tracking-Based Training of Deep Learning Sound Source Localizers" and composed of files with code in the Python programming language. This grant is only for experimental and non-commercial purposes, provided that the copyright notice in its entirety appear in all copies of this Work, and the original source of this Work, Audio Research Group at Tampere University, is acknowledged in any publication that reports research using this Work. - -Any commercial use of the Work or any part thereof is strictly prohibited. Commercial use include, but is not limited to: - -selling or reproducing the Work -selling or distributing the results or content achieved by use of the Work -providing services by using the Work. -IN NO EVENT SHALL TAMPERE UNIVERSITY OR ITS LICENSORS BE LIABLE TO ANY PARTY FOR DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OF THIS WORK AND ITS DOCUMENTATION, EVEN IF TAMPERE UNIVERSITY OR ITS LICENSORS HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - -TAMPERE UNIVERSITY AND ALL ITS LICENSORS SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE WORK PROVIDED HEREUNDER IS ON AN "AS IS" BASIS, AND THE TAMPERE UNIVERSITY HAS NO OBLIGATION TO PROVIDE MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS. - ------------COPYRIGHT NOTICE ENDS WITH THIS LINE------------ \ No newline at end of file diff --git a/README.md b/README.md deleted file mode 100644 index 7412f66..0000000 --- a/README.md +++ /dev/null @@ -1,33 +0,0 @@ -
- -

- -

- -
-Python -PyTorch -Lightning -Config: hydra -Ray -Pytest -
- -Differentiable assignation problem resolution by deep learning method. - -
- -## Requirements -```pip install -r requirements.txt``` - -## Getting Started - -* `generate_hnet_training_data.py` generates synthetic distance matrices and association matrices. -* Then `run.py` to train Hnet with the generated data. - ---- - -> Sharath Adavanne*, Archontis Politis* and Tuomas Virtanen, "[Differentiable Tracking-Based Training of Deep Learning Sound Source Localizers](https://arxiv.org/pdf/2111.00030.pdf)" in the IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA 2021) - -## License -The repository is licensed under the [TAU License](LICENSE.md). diff --git a/ablation_study.md b/ablation_study.md deleted file mode 100644 index e87d2ab..0000000 --- a/ablation_study.md +++ /dev/null @@ -1,317 +0,0 @@ -Conducting an **ablation study** is an excellent way to understand the contribution of each component in your neural network model. An ablation study systematically removes or modifies parts of the model to observe the impact on performance, helping you identify which components are essential and which may be redundant or less impactful. - -Below is a **step-by-step plan** to perform an ablation study on your `HNetGRU` model as defined in `train_hnet.py`. - ---- - -### **1. Define the Objectives and Scope** - -**Objective:** -- Determine the contribution of each component (e.g., GRU layers, Attention Layer, Fully Connected Layers) to the overall performance of the `HNetGRU` model. - -**Scope:** -- Focus on major components such as GRU layers, Attention mechanisms, and any other significant architectural elements. -- Assess various hyperparameters if relevant (e.g., hidden size, number of GRU layers). - ---- - -### **2. Establish a Baseline Model** - -**Action:** -- **Use the existing `HNetGRU` model** as the baseline for comparison. -- Ensure that the baseline model includes all components: GRU layers, Attention Layer (if applicable), and Fully Connected Layers. - -**Purpose:** -- Provides a reference point to measure the impact of removing or altering specific components. - -**Code Example:** -```python -# Initialize the baseline model with all components -baseline_model = HNetGRU(max_len=max_len).to(device) -baseline_model.load_state_dict(torch.load("data/hnet_model.pt", map_location=device)) -baseline_model.eval() -``` - ---- - -### **3. Identify Components for Ablation** - -**Components to Consider:** -1. **Attention Layer:** - - Evaluate its impact by including and excluding it. -2. **GRU Layers:** - - Vary the number of GRU layers (e.g., 1, 2, 3) to assess depth. -3. **Hidden Size:** - - Test different hidden sizes (e.g., 32, 64, 128) to observe capacity effects. -4. **Fully Connected Layers:** - - Modify or remove the FC layers to see their role in model performance. -5. **Activation Functions:** - - Experiment with different activation functions (e.g., `tanh`, `ReLU`) after GRU layers. - -**Purpose:** -- Determines which components significantly influence model accuracy and performance. - ---- - -### **4. Develop Modified Models (Ablated Versions)** - -**Action:** -- Create multiple versions of the `HNetGRU` model, each missing or modifying a specific component. - -**Examples:** - -1. **Model Without Attention Layer:** - ```python - class HNetGRU_NoAttention(nn.Module): - def __init__(self, max_len=10, hidden_size = 128): - super().__init__() - self.nb_gru_layers = 1 - self.gru = nn.GRU(max_len, hidden_size, self.nb_gru_layers, batch_first=True) - # Attention layer removed - self.fc1 = nn.Linear(hidden_size, max_len) - - def forward(self, query): - out, _ = self.gru(query) - out = torch.tanh(out) - out = self.fc1(out) - out1 = out.view(out.shape[0], -1) - out2, _ = torch.max(out, dim=-1) - out3, _ = torch.max(out, dim=-2) - return out1.squeeze(), out2.squeeze(), out3.squeeze() - ``` - -2. **Model with Reduced GRU Layers:** - ```python - class HNetGRU_ReducedGRULayers(nn.Module): - def __init__(self, max_len=10, hidden_size = 128): - super().__init__() - self.nb_gru_layers = 1 # Reduced from 2 to 1 - self.gru = nn.GRU(max_len, hidden_size, self.nb_gru_layers, batch_first=True) - self.attn = AttentionLayer(hidden_size, hidden_size, hidden_size) - self.fc1 = nn.Linear(hidden_size, max_len) - - def forward(self, query): - out, _ = self.gru(query) - out = self.attn(out) - out = torch.tanh(out) - out = self.fc1(out) - out1 = out.view(out.shape[0], -1) - out2, _ = torch.max(out, dim=-1) - out3, _ = torch.max(out, dim=-2) - return out1.squeeze(), out2.squeeze(), out3.squeeze() - ``` - -3. **Model with Different Hidden Size:** - ```python - class HNetGRU_DifferentHiddenSize(nn.Module): - def __init__(self, max_len=10, hidden_size=64): - super().__init__() - self.nb_gru_layers = 1 - self.gru = nn.GRU(max_len, hidden_size, self.nb_gru_layers, batch_first=True) - self.attn = AttentionLayer(hidden_size, hidden_size, hidden_size) - self.fc1 = nn.Linear(hidden_size, max_len) - - def forward(self, query): - out, _ = self.gru(query) - out = self.attn(out) - out = torch.tanh(out) - out = self.fc1(out) - out1 = out.view(out.shape[0], -1) - out2, _ = torch.max(out, dim=-1) - out3, _ = torch.max(out, dim=-2) - return out1.squeeze(), out2.squeeze(), out3.squeeze() - ``` - -4. **Model Without Fully Connected Layer:** - ```python - class HNetGRU_NoFullyConnected(nn.Module): - def __init__(self, max_len=10, hidden_size = 128): - super().__init__() - self.nb_gru_layers = 1 - self.gru = nn.GRU(max_len, hidden_size, self.nb_gru_layers, batch_first=True) - self.attn = AttentionLayer(hidden_size, hidden_size, hidden_size) - # Fully connected layer removed - - def forward(self, query): - out, _ = self.gru(query) - out = self.attn(out) - out = torch.tanh(out) - # Skip FC layer - out1 = out.view(out.shape[0], -1) - out2, _ = torch.max(out, dim=-1) - out3, _ = torch.max(out, dim=-2) - return out1.squeeze(), out2.squeeze(), out3.squeeze() - ``` - -**Purpose:** -- Each modified model isolates the impact of a specific component, making it easier to attribute performance changes to that component. - ---- - -### **5. Prepare the Experimental Setup** - -**Action:** -- **Ensure Consistency:** Use the same training and evaluation datasets, hyperparameters, and training conditions across all models to ensure fair comparisons. -- **Set Random Seeds:** To ensure reproducibility, set random seeds for all relevant libraries (e.g., `torch`, `numpy`, `random`). - -**Code Example:** -```python -import random - -import numpy as np -import torch - - -def set_seed(seed=42): - torch.manual_seed(seed) - np.random.seed(seed) - random.seed(seed) - if torch.cuda.is_available(): - torch.cuda.manual_seed_all(seed) - -set_seed(42) -``` - ---- - -### **6. Train and Evaluate Each Model** - -**Action:** -- **Training:** - - Train each ablated model using the same training loop and hyperparameters as the baseline. - - Ensure that each model is trained for the same number of epochs and under similar conditions. - -- **Evaluation:** - - After training, evaluate each model on the same validation or test set. - - Collect relevant performance metrics such as weighted accuracy, F1-score, precision, recall, and loss values. - -**Code Example:** -```python -def train_and_evaluate(model, train_loader, val_loader, criterion, optimizer, num_epochs=10): - for epoch in range(num_epochs): - model.train() - for features, labels in train_loader: - # Forward pass - outputs = model(features) - loss = criterion(outputs, labels) - - # Backward and optimize - optimizer.zero_grad() - loss.backward() - optimizer.step() - - # Evaluation - model.eval() - with torch.no_grad(): - total_loss = 0 - for features, labels in val_loader: - outputs = model(features) - loss = criterion(outputs, labels) - total_loss += loss.item() - - avg_loss = total_loss / len(val_loader) - print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {avg_loss:.4f}') - - # After training, evaluate on test set - # Compute metrics like weighted accuracy, F1-score, etc. -``` - -**Purpose:** -- Ensures that each model undergoes identical training and evaluation processes, making performance comparisons valid. - ---- - -### **7. Analyze and Compare Results** - -**Action:** -- **Compile Metrics:** - - Create a table or dataset to record the performance metrics of the baseline and each ablated model. - -- **Compare Performance:** - - Identify which components' removal led to significant drops or improvements in performance. - - Assess whether certain components are critical for maintaining high accuracy or if they contribute minimally. - -**Example:** - -| Model Version | Weighted Accuracy | F1-Score | Precision | Recall | Notes | -|----------------------------|--------------------|----------|-----------|--------|----------------------------------| -| Baseline (All Components) | 85.0% | 0.80 | 0.82 | 0.78 | | -| No Attention Layer | 80.0% | 0.75 | 0.78 | 0.72 | Attention significantly aids performance | -| Reduced GRU Layers (1) | 83.0% | 0.78 | 0.80 | 0.75 | Slight performance drop | -| Different Hidden Size (128)| 88.0% | 0.82 | 0.85 | 0.79 | Larger hidden size improves accuracy | -| No Fully Connected Layer | 75.0% | 0.70 | 0.73 | 0.68 | FC layers crucial for mapping outputs | - -**Purpose:** -- Visualizes the impact of each component, facilitating informed decisions on model architecture optimizations. - ---- - -### **8. Interpret the Findings** - -**Action:** -- **Identify Critical Components:** - - Components whose removal causes significant degradation in performance are deemed essential. - -- **Determine Non-Essential Components:** - - Components whose removal has minimal or no impact might be candidates for simplification, potentially enhancing model efficiency without sacrificing accuracy. - -- **Understand Trade-offs:** - - Some components may offer benefits that go beyond the measured metrics, such as improved training stability or faster convergence. - -**Purpose:** -- Provides actionable insights into model design, guiding future iterations and optimizations. - ---- - -### **9. Document the Ablation Study** - -**Action:** -- **Create a Detailed Report:** - - Summarize the methodology, models tested, metrics collected, and the results. - -- **Include Visualizations:** - - Use graphs and charts to illustrate performance differences across models. - -- **Provide Recommendations:** - - Based on the findings, suggest which components to retain, modify, or remove in future model versions. - -**Purpose:** -- Facilitates knowledge sharing within the team and informs decision-making for subsequent development phases. - ---- - -### **10. Iterate Based on Insights** - -**Action:** -- **Refine the Model:** - - Incorporate the findings to enhance the model architecture, such as optimizing certain components or exploring alternative configurations. - -- **Conduct Further Experiments:** - - If necessary, perform additional ablation studies or experiments to explore other aspects of the model. - -**Purpose:** -- Continuously improve the model, leveraging empirical evidence to drive enhancements and achieve better performance. - ---- - -### **Additional Tips** - -- **Maintain Clear Naming Conventions:** - - Name each ablated model distinctly to avoid confusion during experiments and result analysis. - -- **Automate Experiments:** - - Use scripts or tools to automate the training and evaluation of different model versions, ensuring consistency and saving time. - -- **Use Version Control:** - - Track changes in your codebase using version control systems like Git, enabling easy rollback and comparison between different model versions. - -- **Leverage Visualization Tools:** - - Utilize tools like TensorBoard or Matplotlib to visualize training curves, loss distributions, and other relevant metrics for each model. - ---- - -### **Conclusion** - -Conducting an ablation study is a systematic approach to understanding the significance of each component in your `HNetGRU` model. By following this step-by-step plan, you can identify which parts of your model are crucial for achieving high performance and which may be simplified or optimized further. This process not only enhances model performance but also contributes to a deeper understanding of the underlying mechanisms driving your neural network's success. - -If you need further assistance in implementing specific steps or have additional questions, feel free to ask! \ No newline at end of file diff --git a/attention.md b/attention.md deleted file mode 100644 index af4aa3d..0000000 --- a/attention.md +++ /dev/null @@ -1,93 +0,0 @@ -# The Attention Mechanism - -The attention mechanism is a fundamental concept in modern neural network architectures, particularly in the fields of natural language processing (NLP), computer vision, and other areas involving sequential or relational data. Introduced to address specific limitations of traditional models, attention allows neural networks to dynamically focus on relevant parts of the input data, enhancing their ability to capture intricate dependencies and relationships. Here's a comprehensive overview of how attention fundamentally works and its significance in neural network models. - -## 1. The Core Idea of Attention - -At its essence, the attention mechanism enables a neural network to selectively concentrate on specific segments of the input data when making decisions or generating outputs. Instead of processing the entire input uniformly, attention allows the model to assign varying degrees of importance to different parts, ensuring that the most relevant information is emphasized while less crucial data is downplayed. - -## 2. Motivation Behind Attention - -Traditional neural network architectures, such as Recurrent Neural Networks (RNNs) and their variants (e.g., GRUs and LSTMs), process input sequences sequentially. While effective for capturing temporal dependencies, these models often struggle with long-range dependencies due to issues like vanishing gradients. Additionally, they treat all parts of the input with equal importance, which can be inefficient and less effective for complex tasks. - -The attention mechanism was introduced to overcome these challenges by: - -- **Enhancing Long-Range Dependency Capture**: By allowing the model to focus on relevant distant parts of the input, attention mitigates the limitations of RNNs in handling long sequences. -- **Improving Interpretability**: Attention provides insights into which parts of the input the model deems important, offering a window into the model's decision-making process. -- **Boosting Performance**: By emphasizing pertinent information, attention often leads to better performance in tasks like machine translation, text summarization, and image recognition. - -## 3. How Attention Works: Key Components - -The attention mechanism operates through the interplay of three fundamental components: - -- **Queries (Q)**: Represent the current state or the element seeking information. In NLP, this could be the current word in a translation task. -- **Keys (K)**: Serve as identifiers for different parts of the input. Each key corresponds to a specific part of the input data. -- **Values (V)**: Contain the actual information or content associated with each key. When a query attends to a key, it retrieves the corresponding value. - -The interaction between queries, keys, and values facilitates the attention process, allowing the model to weigh and integrate information effectively. - -## 4. The Attention Process: Step-by-Step - -Here's a simplified breakdown of how attention operates within a neural network: - -### a. Computing Similarity Scores - -For each query, the model computes a similarity score with every key. This score indicates how relevant each key (and its associated value) is to the current query. Common methods for computing similarity include: - -- **Dot Product**: Measures the cosine similarity between queries and keys. -- **Scaled Dot Product**: Similar to the dot product but scaled by the square root of the key dimension to prevent extremely large values. -- **Additive Attention**: Applies a feed-forward network to the concatenated query and key vectors before computing the score. - -### b. Generating Attention Weights - -The raw similarity scores are then transformed into attention weights using a softmax function. This ensures that the weights are positive and sum up to one, effectively creating a probability distribution over the input elements. These weights determine the importance of each input part relative to the query. - -\[ -\text{Attention Weights}_i = \frac{\exp(\text{Score}_i)}{\sum_j \exp(\text{Score}_j)} -\] - -### c. Combining Values - -Each value vector is multiplied by its corresponding attention weight, and the results are summed to produce the attended output. This output is a weighted combination of the input values, emphasizing the most relevant information based on the query. - -\[ -\text{Output} = \sum_{i} (\text{Attention Weights}_i \times V_i) -\] - -## 5. Types of Attention - -Attention mechanisms come in various forms, each tailored to specific applications and requirements: - -- **Soft Attention**: Differentiable and can be trained using standard gradient-based optimization. It considers all input elements with varying degrees of importance. -- **Hard Attention**: Non-differentiable and requires techniques like reinforcement learning for training. It selects specific input elements to focus on. -- **Self-Attention**: A form where the queries, keys, and values all come from the same source, allowing the model to relate different positions within a single sequence. This is a cornerstone of transformer architectures. -- **Multi-Head Attention**: Extends self-attention by allowing the model to attend to information from multiple representation subspaces at different positions. It enhances the model's ability to capture diverse relationships in the data. - -## 6. Attention in Transformer Architecture - -The Transformer model, introduced by Vaswani et al. in 2017, revolutionized the use of attention mechanisms. Unlike RNNs, Transformers rely entirely on attention for processing sequences, eliminating the need for recurrent structures. Key aspects include: - -- **Encoder-Decoder Structure**: The Transformer consists of an encoder that processes the input and a decoder that generates the output, both utilizing multi-head self-attention mechanisms. -- **Layered Architecture**: Multiple layers of self-attention and feed-forward networks allow the model to capture complex relationships in the data. -- **Parallelization**: Attention mechanisms facilitate parallel processing of sequence elements, leading to significant speedups in training compared to sequential RNNs. - -## 7. Benefits of Using Attention Mechanisms - -- **Enhanced Expressiveness**: Attention allows models to dynamically focus on relevant parts of the input, making them more flexible and powerful in handling complex tasks. -- **Improved Long-Term Dependency Handling**: By directly modeling dependencies across disparate parts of the input, attention mitigates issues like vanishing gradients and enables effective learning of long-range relationships. -- **Better Interpretability**: Attention weights provide a transparent view of which input elements influence the output, aiding in model interpretability and trustworthiness. -- **Scalability**: Especially in Transformer models, attention mechanisms enable efficient parallel computation, making them suitable for large-scale tasks and datasets. - -## 8. Practical Applications of Attention - -Attention mechanisms have been pivotal in advancing various applications, including but not limited to: - -- **Machine Translation**: Enhancing translation quality by effectively mapping words and phrases between languages. -- **Text Summarization**: Selecting and emphasizing key information to generate coherent and concise summaries. -- **Image Captioning**: Associating specific regions of an image with descriptive text. -- **Speech Recognition**: Focusing on relevant parts of audio signals to improve transcription accuracy. -- **Question Answering Systems**: Identifying pertinent information segments to provide accurate responses. - -## 9. Conclusion - -The attention mechanism fundamentally transforms how neural networks process data by introducing a dynamic, context-aware focus on input elements. By enabling models to weigh the importance of different parts of the input, attention enhances their ability to capture complex dependencies, improve performance, and provide greater interpretability. Its integration into architectures like Transformers has set new standards in various domains, making it an indispensable tool in the arsenal of modern deep learning techniques. \ No newline at end of file diff --git a/ci_cd_ablation_study.md b/ci_cd_ablation_study.md deleted file mode 100644 index 1826f50..0000000 --- a/ci_cd_ablation_study.md +++ /dev/null @@ -1,586 +0,0 @@ -Integrating **Continuous Integration (CI)** and **Continuous Deployment (CD)** into your **ablation study** workflow enhances automation, ensures consistency, and streamlines the development-to-deployment pipeline. Below is a **comprehensive step-by-step guide** to implement CI/CD for your ablation study on HunNet using popular tools like **GitHub Actions**, **MLflow**, and **Docker**. - ---- - -## **1. Prerequisites** - -Before setting up CI/CD pipelines, ensure you have the following: - -- **Version Control:** Your project is hosted on a platform like **GitHub**. -- **MLOps Tooling:** **MLflow** is integrated for experiment tracking. -- **Containerization:** **Docker** is installed for creating consistent environments. -- **Access to CI/CD Tools:** Utilize **GitHub Actions** (integrated with GitHub) or other CI/CD platforms like **Jenkins**, **GitLab CI**, etc. - ---- - -## **2. Organize Your Repository** - -Ensure your repository is well-structured to support CI/CD. A typical structure might look like: - -``` -├── .github -│ └── workflows -│ └── ci-cd.yml -├── data -├── models -├── scripts -│ ├── train.py -│ └── ablation_study.py -├── Dockerfile -├── requirements.txt -├── mlflow_config.yaml -└── README.md -``` - ---- - -## **3. Containerize Your Application with Docker** - -Containerization ensures that your application runs consistently across different environments. - -### **a. Create a `Dockerfile`** - -```dockerfile -# Use an official Python runtime as a parent image -FROM python:3.9-slim - -# Set environment variables -ENV PYTHONDONTWRITEBYTECODE=1 -ENV PYTHONUNBUFFERED=1 - -# Set work directory -WORKDIR /app - -# Install dependencies -COPY requirements.txt . -RUN pip install --upgrade pip -RUN pip install -r requirements.txt - -# Copy project -COPY . . - -# Expose any ports if necessary (e.g., MLflow UI) -EXPOSE 5000 - -# Define the default command -CMD ["bash"] -``` - -### **b. Build and Test the Docker Image Locally** - -```bash -# Build the Docker image -docker build -t hunnet-ablation:latest . - -# Run the Docker container -docker run -it hunnet-ablation:latest -``` - ---- - -## **4. Set Up MLflow for Experiment Tracking** - -Ensure MLflow is properly configured to log experiments, models, and metrics. - -### **a. Configure MLflow Tracking URI** - -You can run an MLflow tracking server or use the local filesystem. - -```python -import mlflow - -# Set the tracking URI (local filesystem in this example) -mlflow.set_tracking_uri("file:///app/mlruns") -``` - -### **b. Update Training Scripts to Log with MLflow** - -```python -import mlflow -import mlflow.pytorch - - -def train_model(config): - with mlflow.start_run(): - # Log hyperparameters - mlflow.log_params(config['hyperparameters']) - - # Initialize and train your model - model = HunNetGRU(**config['model_params']) - # Training logic... - - # Log metrics - mlflow.log_metric("f1_score", f1_score) - mlflow.log_metric("weighted_accuracy", weighted_accuracy) - - # Log the model - mlflow.pytorch.log_model(model, "model") -``` - ---- - -## **5. Implement CI/CD with GitHub Actions** - -GitHub Actions allows you to automate workflows directly from your GitHub repository. - -### **a. Create a Workflow File** - -Create a file at `.github/workflows/ci-cd.yml` with the following content: - -```yaml -name: CI/CD Pipeline for HunNet Ablation Study - -on: - push: - branches: - - main - pull_request: - branches: - - main - -jobs: - build: - runs-on: ubuntu-latest - - steps: - - name: Checkout Repository - uses: actions/checkout@v3 - - - name: Set up Python - uses: actions/setup-python@v4 - with: - python-version: '3.9' - - - name: Install Dependencies - run: | - pip install --upgrade pip - pip install -r requirements.txt - - - name: Build Docker Image - run: | - docker build -t hunnet-ablation:latest . - - - name: Run Unit Tests - run: | - pytest tests/ - - - name: Execute Ablation Study - env: - MLFLOW_TRACKING_URI: file:///app/mlruns - run: | - python scripts/ablation_study.py - - - name: Push Docker Image to Docker Hub (Optional) - if: github.ref == 'refs/heads/main' && github.event_name == 'push' - uses: docker/build-push-action@v5 - with: - push: true - tags: yourdockerhubusername/hunnet-ablation:latest - - - name: Deploy (Optional) - if: github.ref == 'refs/heads/main' && github.event_name == 'push' - run: | - echo "Deploy steps go here" -``` - -### **b. Explanation of Workflow Steps** - -1. **Triggering Events:** - - The workflow runs on pushes and pull requests to the `main` branch. - -2. **Jobs:** - - **Build Job:** - - **Checkout Repository:** Retrieves your code. - - **Set up Python:** Configures the Python environment. - - **Install Dependencies:** Installs required Python packages. - - **Build Docker Image:** Builds the Docker container. - - **Run Unit Tests:** Executes tests to ensure code integrity. - - **Execute Ablation Study:** Runs your ablation study script, which includes training and evaluating models. - - **Push Docker Image (Optional):** Pushes the Docker image to Docker Hub for deployment. - - **Deploy (Optional):** Placeholder for deployment steps (e.g., deploying to a server or cloud platform). - ---- - -## **6. Automate Model Versioning and Deployment** - -Automate the versioning of models and their deployment using MLflow and CI/CD pipelines. - -### **a. Versioning with MLflow** - -MLflow automatically versions models when you log them. Ensure each run is uniquely identifiable. - -### **b. Deployment Strategies** - -Depending on your requirements, choose from the following deployment strategies: - -- **Local Deployment:** Serve models locally using MLflow’s built-in server. -- **Cloud Deployment:** Deploy models to cloud platforms like AWS SageMaker, Azure ML, or Google AI Platform. -- **Containerized Deployment:** Use Docker to create scalable and portable deployments. - -### **c. Example: Deploying with MLflow and Docker** - -1. **Serve the Model with MLflow:** - - ```bash - mlflow models serve -m runs://model -p 5000 - ``` - -2. **Dockerize the MLflow Server:** - - ```dockerfile - FROM python:3.9-slim - - WORKDIR /app - - RUN pip install mlflow - - EXPOSE 5000 - - CMD ["mlflow", "models", "serve", "-m", "runs://model", "-p", "5000"] - ``` - -3. **Build and Run the Docker Container:** - - ```bash - docker build -t mlflow-server:latest . - docker run -p 5000:5000 mlflow-server:latest - ``` - ---- - -## **7. Implement Testing and Quality Assurance** - -Ensure robustness and reliability through automated testing. - -### **a. Write Unit and Integration Tests** - -Use frameworks like **pytest** to write tests for your scripts. - -```python -# tests/test_train.py - -def test_train_model(): - config = { - 'hyperparameters': { - 'learning_rate': 0.001, - 'batch_size': 32 - }, - 'model_params': { - 'hidden_size': 64, - 'nb_gru_layers': 2, - 'use_attention': True - } - } - model, metrics = train_model(config) - assert metrics['f1_score'] > 0.5 -``` - -### **b. Integrate Tests into CI Pipeline** - -Ensure tests run automatically in GitHub Actions as shown in the workflow file above. - ---- - -## **8. Monitor and Visualize Experiments** - -Use MLflow’s UI or integrate with other visualization tools to monitor experiments. - -### **a. Access MLflow UI** - -Start the MLflow server to view your experiments. - -```bash -mlflow ui -``` - -Access it at [http://localhost:5000](http://localhost:5000). - -### **b. Integrate with Visualization Tools (Optional)** - -For enhanced visualization, integrate MLflow with tools like **TensorBoard** or **Weights & Biases**. - ---- - -## **9. Ensure Reproducibility** - -Maintain consistency across experiments to ensure results can be replicated. - -### **a. Set Random Seeds** - -Ensure reproducibility by setting seeds in your scripts. - -```python -import random - -import numpy as np -import torch - - -def set_seed(seed=42): - random.seed(seed) - np.random.seed(seed) - torch.manual_seed(seed) - if torch.cuda.is_available(): - torch.cuda.manual_seed_all(seed) - -set_seed(42) -``` - -### **b. Document Environment** - -Use environment files and Docker for consistent environments. - -- **`requirements.txt`:** List all dependencies. -- **Docker:** Encapsulates the environment. - ---- - -## **10. Continuous Deployment (CD) for Model Updates** - -Automate the deployment of new models as they become available. - -### **a. Update Deployment Steps in CI/CD Workflow** - -Enhance your GitHub Actions workflow to include deployment steps upon successful training and validation. - -```yaml -# Add to .github/workflows/ci-cd.yml - -- name: Deploy to Production - if: github.ref == 'refs/heads/main' && success() - run: | - echo "Deploying the latest model to production..." - # Add deployment commands, e.g., upload to a server, cloud platform, etc. -``` - -### **b. Use Infrastructure as Code (IaC)** - -Manage your deployment infrastructure using tools like **Terraform** or **Ansible** for scalability and maintainability. - ---- - -## **11. Security and Access Control** - -Ensure that your CI/CD pipelines and MLOps tools are secure. - -### **a. Manage Secrets** - -Use GitHub Secrets to store sensitive information like API keys, Docker Hub credentials, etc. - -- **Add Secrets:** - - Navigate to your GitHub repository > Settings > Secrets > Actions. - - Add necessary secrets (e.g., `DOCKER_USERNAME`, `DOCKER_PASSWORD`). - -- **Use Secrets in Workflow:** - - ```yaml - - name: Push Docker Image - uses: docker/build-push-action@v5 - with: - push: true - tags: yourdockerhubusername/hunnet-ablation:latest - env: - DOCKER_USERNAME: ${{ secrets.DOCKER_USERNAME }} - DOCKER_PASSWORD: ${{ secrets.DOCKER_PASSWORD }} - ``` - -### **b. Limit Permissions** - -Grant only necessary permissions to your CI/CD workflows and users to minimize security risks. - ---- - -## **12. Documentation and Knowledge Sharing** - -Maintain clear documentation to facilitate collaboration and future maintenance. - -### **a. Update - -README.md - -** - -Include instructions on setting up the environment, running experiments, and understanding the CI/CD pipeline. - -### **b. Maintain Experiment Logs** - -Leverage MLflow’s logging capabilities to keep comprehensive records of each experiment. - ---- - -## **13. Example Workflow Integration** - -Here’s how you can integrate all the above steps into your - -train_hnet.py - - and `ablation_study.py` scripts. - -### **a. - -train_hnet.py - - Example Integration** - -```python -import argparse - -import mlflow -import mlflow.pytorch -import torch -from model import HunNetGRU - -from data import get_data_loader - - -def train(config): - set_seed(config['seed']) - device = torch.device("cuda" if torch.cuda.is_available() else "cpu") - - # Initialize model - model = HunNetGRU(**config['model_params']).to(device) - - # Define optimizer and loss - optimizer = torch.optim.Adam(model.parameters(), lr=config['hyperparameters']['learning_rate']) - criterion = torch.nn.BCELoss() - - # Data loaders - train_loader, val_loader = get_data_loader(config['data']) - - best_f1 = 0.0 - best_model_path = "models/best_model.pth" - - with mlflow.start_run(): - # Log hyperparameters - mlflow.log_params(config['hyperparameters']) - mlflow.log_params(config['model_params']) - - for epoch in range(config['epochs']): - model.train() - for data, labels in train_loader: - data, labels = data.to(device), labels.to(device) - optimizer.zero_grad() - outputs = model(data) - loss = criterion(outputs, labels) - loss.backward() - optimizer.step() - - # Validation - f1 = validate(model, val_loader, device) - mlflow.log_metric("f1_score", f1, step=epoch) - - # Save best model - if f1 > best_f1: - best_f1 = f1 - torch.save(model.state_dict(), best_model_path) - mlflow.log_metric("best_f1_score", best_f1, step=epoch) - mlflow.pytorch.log_model(model, "best_model") - - mlflow.log_param("best_epoch", epoch) - -def main(): - parser = argparse.ArgumentParser() - parser.add_argument('--config', type=str, default='config.yaml') - args = parser.parse_args() - - config = load_config(args.config) - train(config) - -if __name__ == "__main__": - main() -``` - -### **b. `ablation_study.py` Example Integration** - -```python -import argparse -import json - -import mlflow -import mlflow.pytorch -import torch -from model import HunNetGRU - -from data import get_data_loader - - -def ablation_experiment(config): - set_seed(config['seed']) - device = torch.device("cuda" if torch.cuda.is_available() else "cpu") - - # Initialize model based on ablation config - model = HunNetGRU(**config['model_params']).to(device) - - # Define optimizer and loss - optimizer = torch.optim.Adam(model.parameters(), lr=config['hyperparameters']['learning_rate']) - criterion = torch.nn.BCELoss() - - # Data loaders - train_loader, val_loader = get_data_loader(config['data']) - - best_f1 = 0.0 - best_model_path = f"models/best_model_{config['run_name']}.pth" - - with mlflow.start_run(run_name=config['run_name']): - # Log hyperparameters - mlflow.log_params(config['hyperparameters']) - mlflow.log_params(config['model_params']) - - for epoch in range(config['epochs']): - model.train() - for data, labels in train_loader: - data, labels = data.to(device), labels.to(device) - optimizer.zero_grad() - outputs = model(data) - loss = criterion(outputs, labels) - loss.backward() - optimizer.step() - - # Validation - f1 = validate(model, val_loader, device) - mlflow.log_metric("f1_score", f1, step=epoch) - - # Save best model - if f1 > best_f1: - best_f1 = f1 - torch.save(model.state_dict(), best_model_path) - mlflow.log_metric("best_f1_score", best_f1, step=epoch) - mlflow.pytorch.log_model(model, "best_model") - - mlflow.log_param("best_epoch", epoch) - -def main(): - parser = argparse.ArgumentParser() - parser.add_argument('--config', type=str, default='ablation_configs.json') - args = parser.parse_args() - - with open(args.config, 'r') as f: - ablation_configs = json.load(f) - - for config in ablation_configs: - ablation_experiment(config) - -if __name__ == "__main__": - main() -``` - ---- - -## **14. Additional Tips** - -- **Parallel Testing:** Configure your CI/CD pipeline to run multiple ablation experiments in parallel to save time. -- **Resource Management:** Monitor resource usage (CPU, GPU, memory) to optimize training processes. -- **Automated Notifications:** Integrate notifications (e.g., Slack, email) to receive updates on pipeline status. -- **Scalability:** Use cloud-based CI/CD runners or Kubernetes for handling large-scale experiments. -- **Logging and Monitoring:** Implement comprehensive logging within your scripts to facilitate debugging and performance analysis. - ---- - -## **15. Conclusion** - -Implementing CI/CD in your ablation study workflow for HunNet enhances automation, ensures consistency, and facilitates efficient experiment tracking and deployment. By leveraging tools like **GitHub Actions**, **MLflow**, and **Docker**, you can create a robust pipeline that supports continuous integration and seamless deployment of your machine learning models. - -This structured approach not only streamlines your development process but also fosters reproducibility and scalability, essential for advanced MLOps practices. - ---- - diff --git a/generate_hnet_training_data.md b/generate_hnet_training_data.md deleted file mode 100644 index 2464a1a..0000000 --- a/generate_hnet_training_data.md +++ /dev/null @@ -1,48 +0,0 @@ -# Overview - -The `generate_hnet_training_data.py` script is a comprehensive data generation tool designed to create synthetic datasets for training and testing a neural network model named HNetGRU. This model is aimed at associating directions of arrival (DOAs) in various applications, such as signal processing or audio source localization. Here's a detailed, non-technical overview of how the script functions and its intended purpose: - -## Purpose of the Script - -The primary objective of this script is to generate synthetic training and testing data that the HNetGRU model can learn from and be evaluated against. By creating a diverse set of data points with known associations, the script ensures that the model is well-equipped to recognize and predict associations in real-world scenarios. This synthetic data aids in training the model to accurately map predicted DOAs to their corresponding reference DOAs, enhancing its predictive capabilities. - -## Key Functionalities - -### Setting Up the Environment and Parameters - -- **Libraries and Functions**: The script utilizes essential Python libraries such as NumPy for numerical operations, SciPy for advanced mathematical computations, and Pickle for data serialization. It also defines utility functions like `sph2cart` to convert spherical coordinates (azimuth and elevation angles) to Cartesian coordinates, which are easier for the model to process. -- **Configuration Parameters**: Key parameters like `max_doas` (maximum number of DOAs) and `sample_range` (number of samples to generate for each DOA combination) are initialized. These parameters dictate the breadth and depth of the generated dataset. -`max_doas` is used to specify the number of sound sources the Hungarian Net should be able to process (with no regards with how well the _main_ neural network is able to predict DoAs from multiple sources). -`sample_range` is used to specify how many samples should be generated for each combination of reference and predicted DOAs. This parameter helps in controlling the size of the dataset and the diversity of the training and testing data. - -### Generating Training and Testing Data - -- **Iterating Over Angular Resolutions**: The script loops through various angular resolutions (e.g., 1°, 2°, up to 30°) to create data at different levels of granularity. This ensures that the model is exposed to data with varying directions of precision. -- **Combining Reference and Predicted DOAs**: For each resolution, the script varies the number of reference DOAs (`nb_ref`) and predicted DOAs (`nb_pred`). It systematically generates combinations where the number of reference and predicted DOAs ranges from 0 up to the defined maximum (`max_doas`). This exhaustive approach ensures that the model encounters a wide array of association scenarios during training. - -### Creating Synthetic DOA Data - -- **Random Angle Generation**: For each combination of reference and predicted DOAs, the script randomly samples azimuth and elevation angles within specified ranges. These angles represent the directional beams in applications like audio source localization. -- **Handling Edge Cases**: To maintain data consistency and prevent unrealistic scenarios, the script initializes some DOAs to a fixed value (e.g., 10) under certain conditions. This strategy helps the model handle cases where predictions might initially be random or inaccurate, promoting robustness in learning. - -### Converting to Cartesian Coordinates - -- **Spherical to Cartesian Conversion**: The randomly generated spherical coordinates (azimuth and elevation) are converted into Cartesian coordinates using the `sph2cart` function. This transformation simplifies the computation of distances between different DOAs, making it easier for the model to process spatial relationships. - -### Computing Distance and Association Matrices - -- **Distance Matrix Calculation**: The script computes a distance matrix using the Minkowski distance metric (specifically, the Euclidean distance with p=2). This matrix quantifies the spatial distances between every pair of reference and predicted DOAs. -- **Optimal Association via Hungarian Algorithm**: To establish the most accurate associations between reference and predicted DOAs, the script employs the Hungarian Algorithm (implemented via `linear_sum_assignment`). This algorithm finds the optimal pairing that minimizes the total distance, ensuring that each predicted DOA is matched to the most appropriate reference DOA. - -### Shuffling for Data Diversity - -- **Randomizing Data Order**: To prevent the model from learning any spurious patterns based on the order of the data, the script randomly shuffles the rows or columns of both the distance matrix and the association matrix. This randomization enhances the model's ability to generalize by ensuring that it doesn't become biased towards any specific ordering of DOAs. - -### Storing and Saving the Data - -- **Data Structuring**: For each generated sample, the script stores relevant information, including the number of reference and predicted DOAs, the shuffled distance and association matrices, and the Cartesian coordinates of both reference and predicted DOAs. -- **Serialization and Saving**: Once a substantial number of samples are generated, the script serializes the data using Pickle and saves it to designated files (`hung_data_train.pkl` for training and `hung_data_test.pkl` for testing). These files serve as the input datasets for training and evaluating the HNetGRU model. - -### Output - -Supervised training of HungarianNet is available thanks to (feat, labels) = (matrix distance D, association matrix A*) where A* is obtained deterministically using the Hungarian algorithm on the distance matrices D. \ No newline at end of file diff --git a/rapport_221124.md b/rapport_221124.md deleted file mode 100644 index 37b8584..0000000 --- a/rapport_221124.md +++ /dev/null @@ -1,69 +0,0 @@ - -## Qu'est-ce que j'ai fait cette semaine? - -### Veille bibliographique : -- Lecture E. GRINSTEIN 2023 - *The Neural-SRP method for universal robust multi-source tracking* -- Lecture M. SAAD AYUB 2023 - *Disambiguation of measurements for multiple acoustic source localization using deep multi-dimensional assignments* -- Lecture S. ADAVANNE 2021 - *Differentiable Tracking-Based Training of Deep Learning Sound Source Localizers* -- Lecture Y. XU 2020 - *How To Train Your Deep Multi-Object Tracker* - -### Analyse des modèles : -Le modèle **HungarianNet** conçu par ADAVANNE est une architecture simplifiée - et je pense qu'ils ont mené une étude d'ablation - du modèle **Deep Hungarian Net (DHN)** de XU. - -**HunNet** est une architecture à trois couches cachées : -- Une couche GRU à 128 neurones -- Une couche d'auto-attention à une seule tête de 128 neurones cachés -- Une couche fully-connected de taille F - -Trois outputs possèdent une fonction de coût BCE loss. Une combinaison linéaire de ces trois fonctions de coût est rétropropagée pour la descente de gradients des poids de ce réseau. La tâche apprise est une tâche de classification binaire. - -Alors que le modèle **DHN** fonctionne avec des réseaux Bi-RNNS et une fonction de coût focal loss. La tâche apprise est une tâche de classification binaire 2D. - -### Questions et observations : -Comment a été réalisée l'étude d'ablation de ADAVANNE pour concevoir son modèle si différent de celui de XU dont il prétend s'inspirer? - -XU prétend que son modèle possède des performances équilibrées pour les deux classes (0 et 1). - -### Prise en main HungarianNet (version ADAVANNE) : -- Analyse et compréhension des étapes de génération des données, de l'entraînement du réseau, des résultats -- Analyse du fonctionnement et de la conception de HungarianNet -- Passage du fonctionnement de 2 sources à 10 sources (+ est possible) - -### Génération des données : -Comment fonctionne la génération des données? Voir le document `generate_hnet_training_data.md`. - -Pourquoi générer des combinaisons (`('nb_ref', 'nb_pred') = [(0, 0), (0, 1), (1, 0) ... (max_doas, max_doas)]`) pour l'entraînement supervisé de HunNet? - -J'ai constaté également que l'algorithme de génération des données de ADAVANNE génère des labels A* avec plus d'occurrences de la classe 0 que de la classe 1 lorsque je veux l'entraîner dans un contexte de 10 DoAs. -- C'est une class imbalance. Mais pour 10 DoAs ground-truth correctement prédits, cela veut dire qu'il y a 10 uns dans une matrice 10x10 donc 90 zéros. -- Et puisque HunNet est entraîné à reproduire la tâche de la résolution du problème d'assignation de l'algorithme Hungarian. -- Alors dans une très grande majorité des cas (99%), HunNet résout le problème d'assignation (puisqu'il n'a jamais vu de matrice avec 100 uns). - -### Entraînement de HunNet : -Comment fonctionne l'entraînement de HunNet? Voir le document `train_hnet.md`. - -Quelle a été l'intuition de ADAVANNE pour concevoir son HunNet? - -### Prochaine étape : -La prochaine étape à réaliser est l'étude d'ablation de HunNet. C'est surtout un problème d'organisation, de gestion et de suivi de l'étude car : -- Il est nécessaire de faire un suivi des expériences. -- Il est nécessaire de faire un versioning des modèles. -- Il est nécessaire de garantir la reproductibilité des expériences. - -Pour cela, nous pourrions tirer profit des outils MLOps. -- Leverager l'outil MLflow quasi-identique à l'outil de visualisation TensorBoard (projet programme réentraînement Vibravox). -- Sont disponibles les outils MLflow, Weights & Biases, TensorBoard, Kubeflow, CometML. -- Il pourrait être envisagé d'intégrer du CI/CD dans l'étude d'ablation et dans la conception de BeamLearning v2 (comme ce qu'a fait Julien) (workflow d'automatisation des tests, de garantie de la consistance du modèle et de l'intégrité du code, créer une pipeline développement-à-déploiement pour une meilleure efficacité). -- Conteneuriser l'environnement d'entraînement avec Docker pour meilleure consistance de l'environnement, reproductibilité, isolation des dépendances, intégration avec les outils MLOps, collaboration en partageant les images Docker. -- leverager les frameworks fast.ai, TorchScript et Pytorch Lightning - DataParallel with Pytorch.Lightning -- leverager des outils comme SigOpt, Ray Tune, W&B pour la recherche d'hyperparamètres - -### Prévision utilisation d'un bras de levier : MLOps -Voir le document `ablation_study.md` et `ci_cd_ablation_study.md`. -**Fil de pensée :** -1. Mener une étude d'ablation sur HungarianNet, qui est un petit modèle. -2. Une étude d'ablation nécessite de retirer des composants d'un modèle afin d'en comprendre son fonctionnement dans le but d'optimiser ses performances. -3. Ainsi, il est nécessaire d'implémenter, d'entraîner et de tester plusieurs modèles. -4. Il est donc aussi nécessaire de s'organiser, de log, d'enregistrer afin de ne pas se perdre i.e. enregistrer les modèles et leurs métadonnées dans une base de données (comment ont été entraînés ces modèles?) (F-score, epochs entraînés, temps d'inférence sur CPU...). -5. Il est nécessaire de rédiger des tests unitaires et d'intégration pour garantir la robustesse du ou des modèles. -6. Le MLOps (ou DevOps pour Machine Learning) peut être un outil très puissant dans cette tâche et pour la conception de BeamLearning v2. \ No newline at end of file diff --git a/train_hnet.md b/train_hnet.md deleted file mode 100644 index 6a9e450..0000000 --- a/train_hnet.md +++ /dev/null @@ -1,74 +0,0 @@ -# Overview and Purpose - -The `train_hnet.py` script is a comprehensive tool designed to train a neural network model, specifically the HNetGRU, for effectively associating Directions of Arrival (DOAs) in various applications such as signal processing and audio source localization. This script orchestrates the entire training workflow, from data preparation to model evaluation, ensuring that the neural network learns to accurately map predicted DOAs to their corresponding reference DOAs. Below is a detailed, non-technical overview of how this script functions and its intended purpose. - -## 1. Overview and Purpose - -The primary objective of `train_hnet.py` is to train the HNetGRU neural network model using synthetic datasets generated for associating DOAs. This model is engineered to handle complex spatial data, enabling it to discern and associate directional information effectively. The training process ensures that the model can generalize well to real-world scenarios, delivering reliable performance in tasks like audio source localization. - -## 2. Data Preparation - -### a. Dataset Loading - -- **Training and Validation Datasets**: The script utilizes a custom `HungarianDataset` class to load both training and validation datasets. These datasets contain pre-generated samples that include reference and predicted DOAs, along with their associated distance and association matrices. -- **Weight Calculations**: For the training dataset, the script computes weights that balance the contribution of different parts of the loss function. These weights help the model prioritize learning from more significant data points, enhancing the overall training efficiency and effectiveness. - -### b. DataLoader Configuration - -- **Batch Processing**: Using PyTorch's `DataLoader`, the script organizes the data into manageable batches (`batch_size = 256`). Batch processing accelerates training by leveraging parallel computations, especially when utilizing GPUs. -- **Shuffling and Dropping Incomplete Batches**: The training data is shuffled to ensure that each batch is diverse, preventing the model from learning any order-based patterns. Incomplete batches are dropped to maintain consistency across training iterations. - -## 3. Model Architecture - -### a. Attention Mechanism - -- **AttentionLayer Class**: The `AttentionLayer` is a custom neural network module that enables the model to focus on relevant parts of the data. It uses convolutional layers to generate query, key, and value representations, facilitating the attention mechanism that enhances the model's ability to process spatial relationships effectively. - -### b. GRU Integration - -- **HNetGRU Class**: Building upon the `AttentionLayer`, the `HNetGRU` class incorporates a Gated Recurrent Unit (GRU) layer, which is adept at handling sequential data. The GRU processes input sequences to capture temporal dependencies, while the attention mechanism ensures that the model attends to the most pertinent information within these sequences. - -### c. Fully Connected Layers - -- **Output Processing**: After the GRU and attention layers, the model employs a fully connected layer (`fc1`) to introduce non-linearity. - -## 4. Training Process - -### a. Optimization and Loss Functions - -- **Optimizer**: The script utilizes the Adam optimizer, a popular choice for training neural networks due to its adaptive learning rate capabilities. The optimizer adjusts the model's weights based on the computed gradients, facilitating efficient convergence during training. -- **Loss Functions**: Three separate Binary Cross-Entropy Loss functions (`criterion1`, `criterion2`, `criterion3`) are employed to evaluate different aspects of the model's predictions. These losses are combined using predefined weights (`criterion_wts = [1., 1., 1.]`) to form the total loss, guiding the optimizer in updating the model's parameters effectively. - -### b. Epoch Loop - -- **Iterative Training**: The training loop runs for a specified number of epochs (`nb_epochs = 10`). In each epoch, the model processes all batches of training data, computes the loss, performs backpropagation to calculate gradients, and updates the model's weights accordingly. -- **Loss Accumulation and Averaging**: Throughout each epoch, the script accumulates the individual loss components (`train_l1`, `train_l2`, `train_l3`) and the total loss (`train_loss`) across all batches. These accumulated losses are then averaged to provide a clear indication of the model's performance during the epoch. - -## 5. Evaluation and Testing - -### a. Model Evaluation Mode - -- **Switching to Evaluation Mode**: After each training epoch, the model is switched to evaluation mode (`model.eval()`). This mode disables certain layers like dropout, ensuring consistent and reliable performance during evaluation. - -### b. Validation Loop - -- **No Gradient Tracking**: During evaluation, the script disables gradient calculations (`torch.no_grad()`) to reduce memory consumption and speed up computations, as gradients are not needed for inference. -- **Performance Metrics**: The script computes the same loss components during validation (`test_l1`, `test_l2`, `test_l3`) and aggregates these to assess the model's performance on unseen data. Additionally, it calculates the F1 score—a measure of the model's accuracy that balances precision and recall—using predicted and reference labels. - -## 6. Early Stopping and Model Saving - -- **Monitoring Performance**: To prevent overfitting and ensure that the model generalizes well, the script tracks the best F1 score achieved during validation. -- **Saving the Best Model**: If the current epoch's F1 score surpasses the previously recorded best, the script saves the model's state (`hnet_model.pt`). This mechanism ensures that the most effective version of the model is retained for future use. - -## 7. Logging and Output - -- **Comprehensive Logging**: After each epoch, the script prints a detailed summary that includes the epoch number, training and testing times, loss values, F1 score, and information about the best epoch achieved so far. This logging provides clear insights into the model's training progress and performance improvements over time. -- **Final Summary**: Upon completion of all epochs, the script outputs the best epoch number and the corresponding best F1 score, offering a concise summary of the model's peak performance during training. - -## 8. Execution Entry Point - -- **Main Function**: The entire training workflow is encapsulated within the `main()` function, which is invoked when the script is run. This structure promotes modularity and ease of maintenance. - -# Summary - -In essence, `train_hnet.py` meticulously manages the end-to-end training process of the HNetGRU neural network. By systematically preparing data, defining a robust model architecture, executing an efficient training loop, and rigorously evaluating performance, the script ensures that the model becomes proficient at associating predicted DOAs with their reference counterparts. The incorporation of mechanisms like the attention layer, GRU integration, comprehensive loss functions, and early stopping further enhance the model's accuracy and reliability. Overall, this script serves as a foundational component in developing advanced systems for tasks that require precise association and localization of directional data. \ No newline at end of file