-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathREADME
142 lines (85 loc) · 5.15 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
LabROSA Lyrics / Audio Repository
=================================
This is the documentation for LYRICAL: the Labrosa repositorY of Recorded Information Concerning Audio and Lyrics. Contained within are the lyrics to popular music tracks, annotated at the structure level (verse, chorus etc.) with accompanying acapella and polyphonic audio.
The database is intended to be used for providing a common dataset for Automatic Lyric Transcription (ALT). Also contained within are therefore scripts for reproducing our baseline ALT performance using the Sphinx4 Automatic Speech Recognition system.
Installation
============
Lyrics
------
The easiest way of getting the lyrics is to say:
git clone https://github.com/mattmcvicar/lyric_database.git
Audio
-----
For queries relating to audio, please email [email protected]
Sphinx4
-------
If you want to run Reproduce_baseline.py, you will need to set up a Sphinx4 project. The following list of conjures was found to be effective on Mac OSX 10.6.8, revision revision 12489. If the following doesn't work for you, it may be that the sphinx4 structure has changed, in which case email [email protected] and I'll update this README.
1) Download latest stable Sphinx4 *source* release from:
http://cmusphinx.sourceforge.net/wiki/download/
2) unzip and cd into repo
jar xvf sphinx4-5prealpha-src.zip
cd sphinx4-5prealpha
3) build Sphinx4 and all demos
ant
4) We're going to adapt one of the demos for our own use. Open up
/src/apps/edu/cmu/sphinx/demo/transcriber/Transcriber.java
in an editor. First, note that Transcriber currently spits out quite a bit of info, so comment out everything in the while loop (lines 51-63) and replace it with the following:
System.out.format("%s\n", result.getHypothesis());
so that it just echoes the best hypothesis for each detected utterance to stdout
5) re-complile the demo:
ant
6) Run the demo, with 512mb starting heap memory, up to 1GB. Run the following:
java -Xms512m -Xmx1024m -jar ./bin/Transcriber.jar
It's had a pretty good go at transcribing some digits. You should see:
what zero is zero zero one
nine oh two one oh
cyril one eight zero three
7) The demo currently reads the audio file stored in
src/apps/edu/cmu/sphinx/demo/aligner/10001-90210-01803.wav
and transcribes it using a fairly general-purpose model. Let's adapt it to take an audio file as an additional input. Set the line:
recognizer.startRecognition(new URL("file:src/apps/edu/cmu/sphinx/demo/aligner/10001-90210-01803.wav").openStream());
to be:
recognizer.startRecognition(new URL("file:" + args[0] ).openStream());
8) re-compile the demo for the last time:
ant
9) Now try running on your own audio:
java -Xms512m -Xmx1024m -jar ./bin/Transcriber.jar
10) That's it! Note you can also change the acoustic model, language model and dictionary in Transcriber.java too. Just remember to re-compile after each change. To reproduce our baseline results, see the file
resources/Reproduce_baseline.py
Contents
========
Lyrics
------
The database is split into two disjoint subsets: "Train" and "Test", and two genres: "Rap" and "Sing". The "Train" set is to be used for training or adapting acoustic or language models, the test set reserved for evaluation. We believe the acoustic and language models for rapped and sung audio to be quite different, so have kept them distinct.
Each file is written in plain text, with each line being one of:
- annotation line
- lyric line
- empty line
Annotation lines are enclosed within square brackets and contain structural information. lyric lines contain the lyrics to the song, made up of the lower and upper-case letters of the latin alphabet, together with the following punctuation set:
,!?'-
(commas, exclamation marks, questions marks, single quotes and hyphens). Empty lines are used purely for aesthetic purposes. Example file snippet (first verse of "30_Seconds_to_Mars_-_The_Kill.lyrics"):
-----------------------------------
[VERSE]
What if I wanted to break
Laugh it all off in your face
What would you do? Oh, oh oh oh
What if I fell to the floor
Couldn't take all this anymore
What would you do do, do do, do do?
-----------------------------------
Resources
---------
Since the database contains a considerable amount of slang, we also include a file
resources/Train_Test_Holdout.dict
which contains pronounciations from the CMU set of 39 unstressed phonemes for each word in the train, test, and holdout sets. These are not used in our baseline experiments (many will most likely not be in standard language models) except in evaluation, but will be essential in training new language models.
Reproduce_baseline.py
---------------------
Also contained within the repository is a simple python script "Reproduce_baseline.py", used to reproduce the results reported in the accompanying paper (see below)
The script assumes you are familiar with Sphinx4. If you require assistance in installing and setting up Sphinx4, please see:
http://cmusphinx.sourceforge.net/wiki/tutorialsphinx4
Attribution
===========
If you make use of this database in your research, please consider citing the following paper:
Contact
=======
For all queries, please contact [email protected]