- Roth J, Chaudhuri S, Klejch O, et al. Ava active speaker: An audio-visual dataset for active speaker detection, ICASSP, 2020.
- Sharma R, Somandepalli K, Narayanan S. Crossmodal learning for audio-visual speech event localization, arXiv preprint, 2020.
- Alcázar J L, Caba F, Mai L, et al. Active speakers in context , CVPR, 2020.
- León-Alcázar J, Heilbron F C, Thabet A, et al. MAAS: Multi-modal Assignation for Active Speaker Detection, arXiv preprint, 2021.
- Huang C, Koishida K. Improved Active Speaker Detection based on Optical Flow, CVPR Workshops, 2020
- Assunção G, Gonçalves N, Menezes P. Bio-Inspired Modality Fusion for Active Speaker Detection, Applied Sciences, 2021
- Pouthier B, Pilati L, Gudupudi L K, et al. Active Speaker Detection as a Multi-Objective Optimization with Uncertainty-based Multimodal Fusion, arXiv preprint, 2021
- Köpüklü O, Taseska M, Rigoll G. How to Design a Three-Stage Architecture for Audio-Visual Active Speaker Detection in the Wild, arVix preprint, 2021
- Ruijie Tao, Zexu Pan, Rohan Kumar Das, Xinyuan Qian, Mike Zheng Shou, Haizhou Li. Is Someone Speaking? Exploring Long-term Temporal Features for Audio-visual Active Speaker Detection, ACM Multimedia (MM), 2021
- Yuanhang Zhang, Susan Liang, Shuang Yang, Xiao Liu, Zhongqin Wu, Shiguang Shan, Xilin Chen. UniCon: Unified Context Network for Robust Active Speaker Detection, ACM Multimedia (MM), 2021
- Chung J S. Naver at ActivityNet Challenge 2019--Task B Active Speaker Detection (AVA), 2019.
- Zhang Y H, Xiao J, Yang S, et al. Multi-Task Learning for Audio-Visual Active Speaker Detection, 2019
- Alcázar J L, Caba F, Mai L, et al. Universidad de los Andes at ActivityNet Challenge 2020 - Task B Active Speaker Detection (AVA), 2020
- Köpüklü O, Taseska M, Rigoll G. ASDNet at ActivityNet Challenge 2021-Active Speaker Detection (AVA), 2021
- Zhang Y, Liang S, Yang S, et al. ICTCAS-UCAS-TAL Submission to the AVA-ActiveSpeaker Task at ActivityNet Challenge 2021, 2021
- Tao R, Pan Z, Das R K, et al. NUS-HLT Report for ActivityNet Challenge 2021 AVA (Speaker), 2021
- Chakravarty P, Tuytelaars T. Cross-modal supervision for learning active speaker detection in video, ECCV, 2016
- Chung J S, Zisserman A. Out of time: automated lip sync in the wild, ECCV, 2016
- Shahid M, Beyan C, Murino V. Voice activity detection by upper body motion analysis and unsupervised domain adaptation, ICCV Workshops, 2019
- Afouras T, Owens A, Chung J S, et al. Self-supervised learning of audio-visual objects from video, ECCV, 2020
- Shahid M, Beyan C, Murino V. Comparisons of visual activity primitives for voice activity detection, ICIAP, 2019
- Shahid M, Beyan C, Murino V. S-VVAD: Visual Voice Activity Detection by Motion, WACV, 2021
- Beyan C, Shahid M, Murino V. RealVAD: A real-world dataset and a method for voice activity detection by body motion analysis, IEEE Transactions on Multimedia, 2020.
- Kim You Jin and Heo Hee-Soo, Soyeon Choe, et al. Look Who’s Talking: Active Speaker Detection in the Wild, Interspeech, 2021