Skip to content

Latest commit

 

History

History
96 lines (68 loc) · 3.85 KB

README.md

File metadata and controls

96 lines (68 loc) · 3.85 KB

Codebase documentation

Setup Instructions

These instructions will guide you through the process of setting up the project environment. Please follow each step carefully.

  1. Clone the Repository: Start by cloning the repository to your local machine.
    git clone [email protected]:YerevaNN/incontext_spurious.git
    cd incontext_spurious
    
  2. Configure Environment Variables:
    • Copy the sample environment file:
      cp .env.sample .env
      
    • Open the .env file and fill in the necessary environment variables as specified in the file.
  3. Create a Conda Environment:
    • Create a new Conda environment using the provided environment file:
      conda env create
      
    • Activate the new environment:
      conda activate incontext_spurious
      
  4. Initialize Aim logger:
    aim init
    

Data Download and Preparation Instructions

Overview:

The dataset required for this process will be automatically downloaded during the encoding extraction run, eliminating the need for manual downloading!

  1. Extracting and Saving Encodings

    To begin, run the script for extracting and saving encodings. By default, this uses the dinov2_vitb14 configuration.

    python run.py --config-name=extract_encodings
    
  2. Computing Average Norm and Generating Tokens

    This step involves computing the average norm of encoding vectors. It also generates fixed tokens. During training, you have the option to use these fixed tokens or generate new ones for each instance.

    python run.py --config-name=compute_encodings_avg_norm_and_generate_tokens
    
  3. Generating and Saving Validation Sets

    In this step, the script generates and saves validation sets.

    python run.py --config-name=generate_and_save_val_sets datamodule.inner_train_len=null datamodule.saved_val_sets_path=null
    

    Note: The command line arguments provided are essential for overriding the default configurations of the datamodule. These default settings are tailored for training purposes and might cause errors if not adjusted for this script.

❗❗❗❗❗❗❗ Attention ❗❗❗❗❗❗❗

  • Points 2 and 3 should only be executed by one individual to ensure consistency in the validation sets.
  • These steps have already been completed. The generated files (avg_norms and context_val_sets) are available. Team members can access their location via the Notion documentation.

Training

This section provides instructions for running the training script, using Hydra, a Python library, for configuration management. Hydra allows for flexible and powerful configuration, enabling you to modify settings directly from the command line.

Running the Training Script

python run.py

Customizing Configurations with Hydra

Hydra configurations provide a flexible way to adjust training parameters. You can modify these configurations either directly in the configuration files or via the command line.

  • Reviewing Configurations in Files:

    • Configuration files are located in the configs folder.
    • The train.yaml file is the root configuration file for the training script.
  • Command Line Configuration Overrides (Recommended):

    • Hydra allows you to override configurations directly from the command line, which is the recommended approach (alternatively, you can modify the config files).
    • This method is quick and does not require modifying the configuration files directly.

    Example Command:

    python run.py optimizer=adam optimizer.lr=0.01

    In this example, the optimizer is set to adam, and the learning rate (lr) is set to 0.01.