Adapted Miami Corpus for Speaker Diarization

Dataset Overview

This dataset is derived from the Bangor Miami Corpus, a Spanish-English code-switching dataset. It includes 8.5 hours of annotated audio across 23 tracks, featuring 36 unique speakers. Some tracks have been adapted to be monolingual by excluding code-switching segments. Below is a breakdown of the minutes of Spanish and English monolingual segments versus Spanish-English code-switch segments.

Reference RTTM Files: Annotation files containing speaker diarization labels.
Audio Files: link to audio files on one drive
Transcription Files: The .tr files include speaker labels, timestamps, and language labels. Although they also contain transcriptions of the spoken content, these should not be considered accurate since the removal of audio segments has led to some discrepancy between the text and the spoken words.

Access

The dataset is made publicly available.

Links

link to audio files: https://onedrive.live.com/?id=DD72E4A05B8E96B0%21609&cid=DD72E4A05B8E96B0 Bangor Miami Corpus source: http://bangortalk.org.uk/speakers.php?c=miami

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
images		images
reference		reference
transcriptions		transcriptions
.gitignore		.gitignore
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Adapted Miami Corpus for Speaker Diarization

Dataset Overview

Contents

Access

Links

About

Releases

Packages

Brono25/MIAMI-Corpus

Folders and files

Latest commit

History

Repository files navigation

Adapted Miami Corpus for Speaker Diarization

Dataset Overview

Contents

Access

Links

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages