This project is an attempt to democratize BERT by reducing its number of trainable parameters and in-turn making it faster to train and fine-tune. Since cosine similarity (computed in BERT while performing dot product for Query, Key and Value matrices) is prone to convex hull, we plan to replace cosine with other similarity metrics and check how that impacts reduced model dimensions.
We are benchmarking the model's performance on the following distance/similarity measures:
- Cosine
- Euclidean
- Gaussian softmax
Due to constraint of compute resources, we are currently validating our hypothesis on just 1% book corpus data. We intend to increase the train data in the subsequent iterations.
- Clone the repo
git clone https://github.com/gaushh/optimized-bert.git
- Create a virtual environment on your IDE
Now you can set everything up using the single shell script or following step-by-step instructions.
Run the shell script to set everything up with default configs and start pre-training BERT.
- shell script
sh setup.sh
Here are the step by step instructions to setup the repo and in-turn understand the process.
- Install required packages.
pip install -r requirements.txt
- Login to Weights and Biases
wandb login --relogin 8c46e02a8d52f960fb349e009c5b6773c25b6957
- Writing config file
cd helper python write_config.py cd ..
- Preparing dtataset
cd src/data python dataset.py cd ../..
- Training tokenizer
cd src/modelling python train_tokenizer.py
- Performing post-processing
python preparation.py
- Starting model training
python train_bert.py
Distributed under the MIT License. See LICENSE.txt
for more information.