Yihua Cheng , Yaning Zhu, Zongji Wang, Hongquan Hao, Yongwei Liu, Shiqing Cheng, Xi Wang, Hyung Jin Chang, CVPR 2024
This repository provides offical code of the paper titled What Do You See in Vehicle? Comprehensive Vision Solution for In-Vehicle Gaze Estimation, accepted at CVPR24. Our contribution includes:
- We provide a dataset IVGaze collected on vehicles containing 44k images of 125 subjects.
- We propose a gaze pyramid transformer (GazePTR) that leverages transformer-based multilevel features integration.
- We introduce the dual-stream gaze pyramid transformer (GazeDPTR). Employing perspective transformation, we rotate virtual cameras to normalize images, utilizing camera pose to merge normalized and original images for accurate gaze estimation.
Please visit our project page for details. The dataset is available on this page .
- Install Pytorch and torchvision. This code is written in
Python 3.8
and utilizesPyTorch 1.13.1
withCUDA 11.6
on Nvidia GeForce RTX 3090. While this environment is recommended, it is not mandatory. Feel free to run the code on your preferred environment.
pip install torch==1.13.1+cu116 torchvision==0.14.1+cu116 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu116
- Install other packages.
pip install opencv-python PyYAML easydict warmup_scheduler
If you have any issues due to missing packages, please report them. I will update the requirements. Thank you for your cooperation.
Step 1: Choose the model file.
We provide three models GazePTR.py
, GazeDPTR.py
and GazeDPTR_v2.py
. The links of pre-trained weights are the same. Please load corresponding weights based on your requirement.
Name | Description | Input | Output | Accuracy | Pretrained Weights | |
---|---|---|---|---|---|---|
1 | GazePTR | This method leverages multi-level feature. | Normalized Images | Gaze Directions | 7.04° | Link |
2 | GazeDPTR | This method integrates feature from two images. | Normalized Images Original Images | Gaze Directions | 6.71° | Link |
3 | GazeDPTR_V2 | This method contains a diffierential projection for gaze zone prediction. | Normalized Images Original Images | Gaze Directions Gaze Zone | 6.71° 81.8% | Link |
Please choose one model and rename it as model.py
, e.g.,
cp GazeDPTR.py model.py
Step 2: Modify the config file
Please modify config/train/config_iv.yaml
according to your environment settings.
- The
Save
attribute specifies the save path, where the model will be stored atos.path.join({save.metapath}, {save.folder})
. Each saved model will be named asIter_{epoch}_{save.model_name}.pt
- The
data
attribute indicates the dataset path. Update theimage
andlabel
to match your dataset location.
Step 3: Training models
Run the following command to initiate training. The argument 3
indicates that it will automatically perform three-fold cross-validation:
python trainer/leave.py config/train/config_iv.yaml 3
Once the training is complete, you will find the weights saved at os.path.join({save.metapath}, {save.folder})
.
Within the checkpoint
directory, you will find three folders named train1.txt
, train2.txt
, and train3.txt
, corresponding to the three-fold cross-validation. Each folder contains the respective trained model."
Run the following command for testing.
python tester/leave.py config/train/config_iv.yaml config/test/config_iv.yaml 3
Similarly,
- Update the
image
andlabel
inconfig/test/config_iv.yaml
based on your dataset location. - The
savename
attribute specifies the folder to save prediction results, which will be stored atos.path.join({save.metapath}, {save.folder})
as defined inconfig/train/config_iv.yaml
. - The code
tester/leave.py
provides the gaze zone prediction results. Remove it if you do not require gaze zone prediction.
We provide evaluation.py
script to assess the accuracy of gaze direction estimation. Run the following command:
python evaluation.py {PATH}
Replace {PATH}
with the path of {savename}
as configured in your settings.
Please find the visualization code in the issues.
Please send email to [email protected]
if you have any questions.