Skip to content

midusi/LSA-T

Repository files navigation

LSA-T: The first continuous LSA dataset

LSA-T is the first continuous Argentinian Sign Language (LSA) dataset. It contains 14,880 sentence level videos of LSA extracted from the CN Sordos YouTube channel with labels and keypoints annotations for each signer. Videos are in 30 FPS full HD (1920x1080).

Format

Samples are organized in directories according to the playlists and video they belong to. For each sample i there are four files:

  • i.mp4: the clip corresponding to the ith line of subtitles.
  • i.json contains:
    • label: the line of subtitles corresponding to the clip.
    • start: time in seconds where the subtitle starts.
    • end: time in seconds where the subtitle ends.
    • video: title of the video which the clip belongs to.
    • playlist: title of the playlist which the clip belongs to.
  • i_ap.json: the raw AlphaPose results over the clip using Halpe KeyPoints in AlphaPose default output format.
  • i_signer.json contains:
    • scores: for each person in the clip, the amount of "movement" in its hands. It is used to infer who is the signer.
    • roi: the considered region of interest of the clip (bounding box of the infered signer).
    • keypoints: list of keypoints for each frame of the infered signer in same format that in i_ap.json.

Usage

This repository can be installed via pip and contains the LSA_Dataset class (in lsat.dataset.LSA_Dataset module). This class inherits from the Pytorch dataset class and implements all necessary methods for using it with a Pytorch dataloader. It also manages the downloading and extraction of the database.

Also, useful transforms for the clips and keypoints are provided in lsat.dataset.transforms

Statistics and comparison with other DBs

LSA-T PHOENIX* SIGNUM CSL GSL KETI
language Spanish German German Chinese Greek Korean
sign language LSA GSL GSL CSL GSL KLS
real life Yes Yes No No No No
signers 103 9 25 50 7 14
duration (h) 21.78 10.71 55.3 100+ 9.51 28
# samples 14,880 7096 33,210 25,000 10,295 14,672
# unique sentences 14,254 5672 780 100 331 105
% unique sentences 95.79% 79.93% 2.35% 0.4% 3.21% 0.71%
vocab. size (w) 14,239 2887 N/A 178 N/A 419
# singletons (w) 7150 1077 0 0 0 0
% singletons (w) 50.21% 37.3% 0% 0% 0% 0%
vocab. size (gl) - 1066 450 - 310 524
# singletons (gl) - 337 0 - 0 0
# singletons (gl) - 31.61% 0% - 0% 0%
resolution 1920x1080 210x260 776x578 1920x1080 848x480 1920x1080
fps 30 25 30 30 30 30

*Data was not available for the whole PHOENIX dataset, so the table show its train set statistics.

Evaluation splits

LSA-T Full version Reduced version
Train Test Train Test
signers 103 X X X X
duration [h] 21.78 17.49 4.29 15.85 3.89
# sentences 14,880 11,065 2735 3767 910
% unique sentences 95.79% 96.64% 92.78% 96.88% 98.35%
vocab. size 14,239 12,385 5546 2694 1579
% singletons 50.21% 52.01% 61.9% 23.2% 48.83%
% sentences with singletons 34.97% 40.98% 67.97% 14.36% 54.29%
% sentences with words not in train vocabulary - - 59.2% - 84.5%

Citation

@article{bianco2022lsa,
  title={LSA-T: The first continuous Argentinian Sign Language dataset for Sign Language Translation}, 
  author={Bianco, Pedro Dal and R{\'\i}os, Gast{\'o}n and Ronchetti, Franco and Quiroga, Facundo and Stanchi, Oscar and Hasperu{\'e}, Waldo and Rosete, Alejandro},
  journal={arXiv preprint arXiv:2211.15481},
  year={2022}
}