Related Work for Active Speaker Detection

Roth J, Chaudhuri S, Klejch O, et al. Ava active speaker: An audio-visual dataset for active speaker detection, ICASSP, 2020.
Sharma R, Somandepalli K, Narayanan S. Crossmodal learning for audio-visual speech event localization, arXiv preprint, 2020.
Alcázar J L, Caba F, Mai L, et al. Active speakers in context , CVPR, 2020.
León-Alcázar J, Heilbron F C, Thabet A, et al. MAAS: Multi-modal Assignation for Active Speaker Detection, arXiv preprint, 2021.
Huang C, Koishida K. Improved Active Speaker Detection based on Optical Flow, CVPR Workshops, 2020
Assunção G, Gonçalves N, Menezes P. Bio-Inspired Modality Fusion for Active Speaker Detection, Applied Sciences, 2021
Pouthier B, Pilati L, Gudupudi L K, et al. Active Speaker Detection as a Multi-Objective Optimization with Uncertainty-based Multimodal Fusion, arXiv preprint, 2021
Köpüklü O, Taseska M, Rigoll G. How to Design a Three-Stage Architecture for Audio-Visual Active Speaker Detection in the Wild, arVix preprint, 2021
Ruijie Tao, Zexu Pan, Rohan Kumar Das, Xinyuan Qian, Mike Zheng Shou, Haizhou Li. Is Someone Speaking? Exploring Long-term Temporal Features for Audio-visual Active Speaker Detection, ACM Multimedia (MM), 2021
Yuanhang Zhang, Susan Liang, Shuang Yang, Xiao Liu, Zhongqin Wu, Shiguang Shan, Xilin Chen. UniCon: Unified Context Network for Robust Active Speaker Detection, ACM Multimedia (MM), 2021

Chung J S. Naver at ActivityNet Challenge 2019--Task B Active Speaker Detection (AVA), 2019.
Zhang Y H, Xiao J, Yang S, et al. Multi-Task Learning for Audio-Visual Active Speaker Detection, 2019
Alcázar J L, Caba F, Mai L, et al. Universidad de los Andes at ActivityNet Challenge 2020 - Task B Active Speaker Detection (AVA), 2020
Köpüklü O, Taseska M, Rigoll G. ASDNet at ActivityNet Challenge 2021-Active Speaker Detection (AVA), 2021
Zhang Y, Liang S, Yang S, et al. ICTCAS-UCAS-TAL Submission to the AVA-ActiveSpeaker Task at ActivityNet Challenge 2021, 2021
Tao R, Pan Z, Das R K, et al. NUS-HLT Report for ActivityNet Challenge 2021 AVA (Speaker), 2021

Chakravarty P, Tuytelaars T. Cross-modal supervision for learning active speaker detection in video, ECCV, 2016
Chung J S, Zisserman A. Out of time: automated lip sync in the wild, ECCV, 2016
Shahid M, Beyan C, Murino V. Voice activity detection by upper body motion analysis and unsupervised domain adaptation, ICCV Workshops, 2019
Afouras T, Owens A, Chung J S, et al. Self-supervised learning of audio-visual objects from video, ECCV, 2020
Shahid M, Beyan C, Murino V. Comparisons of visual activity primitives for voice activity detection, ICIAP, 2019
Shahid M, Beyan C, Murino V. S-VVAD: Visual Voice Activity Detection by Motion, WACV, 2021
Beyan C, Shahid M, Murino V. RealVAD: A real-world dataset and a method for voice activity detection by body motion analysis, IEEE Transactions on Multimedia, 2020.

Kim You Jin and Heo Hee-Soo, Soyeon Choe, et al. Look Who’s Talking: Active Speaker Detection in the Wild, Interspeech, 2021

Provide feedback