-
Notifications
You must be signed in to change notification settings - Fork 0
/
index.html
333 lines (306 loc) · 26.6 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
<!DOCTYPE html
PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<!--
This HTML was auto-generated from MATLAB code.
To make changes, update the MATLAB code and republish this document.
--><title>Robust Landmark-Based Audio Fingerprinting</title><meta name="generator" content="MATLAB 7.14"><link rel="schema.DC" href="http://purl.org/dc/elements/1.1/"><meta name="DC.date" content="2013-10-21"><meta name="DC.source" content="demo_fingerprint.m"><style type="text/css">
html,body,div,span,applet,object,iframe,h1,h2,h3,h4,h5,h6,p,blockquote,pre,a,abbr,acronym,address,big,cite,code,del,dfn,em,font,img,ins,kbd,q,s,samp,small,strike,strong,sub,sup,tt,var,b,u,i,center,dl,dt,dd,ol,ul,li,fieldset,form,label,legend,table,caption,tbody,tfoot,thead,tr,th,td{margin:0;padding:0;border:0;outline:0;font-size:100%;vertical-align:baseline;background:transparent}body{line-height:1}ol,ul{list-style:none}blockquote,q{quotes:none}blockquote:before,blockquote:after,q:before,q:after{content:'';content:none}:focus{outine:0}ins{text-decoration:none}del{text-decoration:line-through}table{border-collapse:collapse;border-spacing:0}
html { min-height:100%; margin-bottom:1px; }
html body { height:100%; margin:0px; font-family:Arial, Helvetica, sans-serif; font-size:10px; color:#000; line-height:140%; background:#fff none; overflow-y:scroll; }
html body td { vertical-align:top; text-align:left; }
h1 { padding:0px; margin:0px 0px 25px; font-family:Arial, Helvetica, sans-serif; font-size:1.5em; color:#d55000; line-height:100%; font-weight:normal; }
h2 { padding:0px; margin:0px 0px 8px; font-family:Arial, Helvetica, sans-serif; font-size:1.2em; color:#000; font-weight:bold; line-height:140%; border-bottom:1px solid #d6d4d4; display:block; }
h3 { padding:0px; margin:0px 0px 5px; font-family:Arial, Helvetica, sans-serif; font-size:1.1em; color:#000; font-weight:bold; line-height:140%; }
a { color:#005fce; text-decoration:none; }
a:hover { color:#005fce; text-decoration:underline; }
a:visited { color:#004aa0; text-decoration:none; }
p { padding:0px; margin:0px 0px 20px; }
img { padding:0px; margin:0px 0px 20px; border:none; }
p img, pre img, tt img, li img { margin-bottom:0px; }
ul { padding:0px; margin:0px 0px 20px 23px; list-style:square; }
ul li { padding:0px; margin:0px 0px 7px 0px; }
ul li ul { padding:5px 0px 0px; margin:0px 0px 7px 23px; }
ul li ol li { list-style:decimal; }
ol { padding:0px; margin:0px 0px 20px 0px; list-style:decimal; }
ol li { padding:0px; margin:0px 0px 7px 23px; list-style-type:decimal; }
ol li ol { padding:5px 0px 0px; margin:0px 0px 7px 0px; }
ol li ol li { list-style-type:lower-alpha; }
ol li ul { padding-top:7px; }
ol li ul li { list-style:square; }
.content { font-size:1.2em; line-height:140%; padding: 20px; }
pre, tt, code { font-size:12px; }
pre { margin:0px 0px 20px; }
pre.error { color:red; }
pre.codeinput { padding:10px; border:1px solid #d3d3d3; background:#f7f7f7; }
pre.codeoutput { padding:10px 11px; margin:0px 0px 20px; color:#4c4c4c; }
@media print { pre.codeinput, pre.codeoutput { word-wrap:break-word; width:100%; } }
span.keyword { color:#0000FF }
span.comment { color:#228B22 }
span.string { color:#A020F0 }
span.untermstring { color:#B20000 }
span.syscmd { color:#B28C00 }
.footer { width:auto; padding:10px 0px; margin:25px 0px 0px; border-top:1px dotted #878787; font-size:0.8em; line-height:140%; font-style:italic; color:#878787; text-align:left; float:none; }
.footer p { margin:0px; }
</style></head><body><a href="http://www.ee.columbia.edu/~dpwe/">Dan Ellis</a> : <a href="http://www.ee.columbia.edu/~dpwe/resources/">Resources</a>: <a href="http://www.ee.columbia.edu/~dpwe/resources/matlab/">Matlab</a>: <div class="content"> <IMG SRC="finger_thumb.png" ALIGN="LEFT" HSPACE="10"><h1>Robust Landmark-Based Audio Fingerprinting</h1><!--introduction--><p><i>Note: please see also <a href="http://labrosa.ee.columbia.edu/matlab/audfprint/">AUDFPRINT</a> which is a more fully-featured fingerprint tool developed from this codebase.</i></p><p>These routines implement a landmark-based audio fingerprinting system that is very well suited to identifying small, noisy excerpts from a large number of items. It is based on the ideas used in the Shazam music matching service, which can identify seemingly any commercial music tracks from short snippets recorded via cellphones even in very noisy conditions. I don't know if my algorithm is as good as theirs, but the approach, as described in the paper below, certainly seems to work:</p><p>Avery Wang "An Industrial-Strength Audio Search Algorithm", Proc. 2003 ISMIR International Symposium on Music Information Retrieval, Baltimore, MD, Oct. 2003. <a href="http://www.ee.columbia.edu/~dpwe/papers/Wang03-shazam.pdf">http://www.ee.columbia.edu/~dpwe/papers/Wang03-shazam.pdf</a></p><p>The basic operation of this scheme is that each audio track is analyzed to find prominent onsets concentrated in frequency, since these onsets are most likely to be preserved in noise and distortion. These onsets are formed into pairs, parameterized by the frequencies of the peaks and the time inbetween them. These values are quantized to give a relatively large number of distinct landmark hashes (about 1 million in my implementation). Parameters are tuned to give around 20-50 landmarks per second.</p><p>Each reference track is described by the (many hundreds) of landmarks it contains, and the times at which they occur. This information is held in an inverted index, which, for each of the 1 million distinct landmarks, lists the tracks in which they occur (and when they occur in those tracks).</p><p>To identify a query, it is similarly converted to landmarks. Then, the database is queried to find all the reference tracks that share landmarks with the queries, and the relative time differences between where they occur in the query and where they occur in the reference tracks. Once a sufficient number of landmarks have been identified as coming from the same reference track, with the same relative timing, a match can be confidently declared. Normally, a small number of matches (e.g. 5) is sufficient to declare a match, since chance matches are very unlikely.</p><p>The beauty, and robustness, of this approach is that only a few of the maxima (or landmarks) have to be the same in the refererence and query examples to allow a match. If the query example is noisy, or filtered strangely, or truncated, there's still a good chance that enough of the hashed landmarks will match to work. In the examples below, a 5 second excerpt recorded from a very low-quality playback is successfully matched.</p><!--/introduction--><h2>Contents</h2><div><ul><li><a href="#1">Example use</a></li><li><a href="#2">Implementation notes</a></li><li><a href="#3">For Windows Users</a></li><li><a href="#4">For Octave Users</a></li><li><a href="#5">See also AUDFPRINT Compiled Binary</a></li><li><a href="#6">Download</a></li><li><a href="#7">Referencing</a></li><li><a href="#8">Acknowledgment</a></li></ul></div><h2>Example use<a name="1"></a></h2><p>In the example below, we'll load a small database of audio tracks over the internet. You will need to have the latest version of my mp3read installed, which supports reading files from URLs. See <a href="http://labrosa.ee.columbia.edu/matlab/mp3read.html">http://labrosa.ee.columbia.edu/matlab/mp3read.html</a> . <a href="myls.m">myls</a> also relies on having curl available to work (should be fine on Linux/Mac; if you want to run it on Windows, you can download a version of curl from <a href="http://curl.haxx.se/download.html">http://curl.haxx.se/download.html</a> , but you'll have to modify myls.m to make it invoke it correctly).</p><pre class="codeinput"><span class="comment">% Get the list of reference tracks to add (URLs in this case, but</span>
<span class="comment">% filenames work too)</span>
tks= myls([<span class="string">'http://labrosa.ee.columbia.edu/~dpwe/tmp/Nine_Lives/*.mp3'</span>]);
<span class="comment">% Initialize the hash table database array</span>
clear_hashtable
<span class="comment">% Calculate the landmark hashes for each reference track and store</span>
<span class="comment">% it in the array (takes a few seconds per track).</span>
add_tracks(tks);
<span class="comment">% Load a query waveform (recorded from playback on a laptop)</span>
[dt,srt] = mp3read(<span class="string">'<A HREF="Q-full-circle.mp3">Q-full-circle.mp3</A>'</span>);
<span class="comment">% Run the query</span>
R = match_query(dt,srt);
<span class="comment">% R returns all the matches, sorted by match quality. Each row</span>
<span class="comment">% describes a match with three numbers: the index of the item in</span>
<span class="comment">% the database that matches, the number of matching hash landmarks,</span>
<span class="comment">% and the time offset (in 32ms steps) between the beggining of the</span>
<span class="comment">% reference track and the beggining of the query audio.</span>
R(1,:)
<span class="comment">% 5 18 1 18 means tks{5} was matched with 18 matching landmarks, at a</span>
<span class="comment">% time skew of 1 frame (query starts ~ 0.032s after beginning of</span>
<span class="comment">% reference track), and a total of 18 hashes matched that track at</span>
<span class="comment">% any time skew (meaning that in this case all the matching hashes</span>
<span class="comment">% had the same time skew of 1).</span>
<span class="comment">%</span>
<span class="comment">% Plot the matches</span>
illustrate_match(dt,srt,tks);
colormap(1-gray)
<span class="comment">% This re-runs the match, then plots spectrograms of both query and</span>
<span class="comment">% the matching part of the reference, with the landmark pairs</span>
<span class="comment">% plotted on top in red, and the matching landmarks plotted in</span>
<span class="comment">% green.</span>
</pre><pre class="codeoutput">Max entries per hash = 20
Target density = 10 hashes/sec
Adding #1 http://labrosa.ee.columbia.edu/~dpwe/tmp/Nine_Lives/01-Nine_Lives.mp3 ...
Adding #2 http://labrosa.ee.columbia.edu/~dpwe/tmp/Nine_Lives/02-Falling_In_Love.mp3 ...
Adding #3 http://labrosa.ee.columbia.edu/~dpwe/tmp/Nine_Lives/03-Hole_In_My_Soul.mp3 ...
Adding #4 http://labrosa.ee.columbia.edu/~dpwe/tmp/Nine_Lives/04-Taste_Of_India.mp3 ...
Adding #5 http://labrosa.ee.columbia.edu/~dpwe/tmp/Nine_Lives/05-Full_Circle.mp3 ...
Adding #6 http://labrosa.ee.columbia.edu/~dpwe/tmp/Nine_Lives/06-Something_s_Gotta_Give.mp3 ...
Adding #7 http://labrosa.ee.columbia.edu/~dpwe/tmp/Nine_Lives/07-Ain_t_That_A_Bitch.mp3 ...
Adding #8 http://labrosa.ee.columbia.edu/~dpwe/tmp/Nine_Lives/08-The_Farm.mp3 ...
Adding #9 http://labrosa.ee.columbia.edu/~dpwe/tmp/Nine_Lives/09-Crash.mp3 ...
Adding #10 http://labrosa.ee.columbia.edu/~dpwe/tmp/Nine_Lives/10-Kiss_Your_Past_Good-bye.mp3 ...
Adding #11 http://labrosa.ee.columbia.edu/~dpwe/tmp/Nine_Lives/11-Pink.mp3 ...
Adding #12 http://labrosa.ee.columbia.edu/~dpwe/tmp/Nine_Lives/12-Attitude_Adjustment.mp3 ...
Adding #13 http://labrosa.ee.columbia.edu/~dpwe/tmp/Nine_Lives/13-Fallen_Angels.mp3 ...
added 13 tracks (130 secs, 1194 hashes, 9.1846 hashes/sec)
landmarks 371 -> 306 hashes
ans =
5 17 0 17
landmarks 371 -> 306 hashes
</pre><img vspace="5" hspace="5" src="demo_fingerprint_01.png" alt=""> <h2>Implementation notes<a name="2"></a></h2><p>The main work in finding the landmarks is done by <a href="find_landmarks.m">find_landmarks</a>. This contains a number of parameters which can be tuned to control the density of landmarks. In general, denser landmarks result in more robust matching (since there are more opportunities to match), but greater load on the database.</p><p>The scheme relies on just a few landmarks being common to both query and reference items. The greater the density of landmarks, the more like this is to occur (meaning that shorter and noisier queries can be tolerated), but the greater the load on the database holding the hashes.</p><p>The factors influencing the number of landmarks returned are:</p><p>1. The number of local maxima found, which in turn depends on the spreading width applied to the masking skirt from each found peak, the decay rate of the masking skirt behind each peak, and the high-pass filter applied to the log-magnitude envelope.</p><p>2. The number of landmark pairs made with each peak. All maxes within a "target region" following the seed max are made into pairs, so the larger this region is (in time and frequency), the more maxes there will be. The target region is defined by a freqency half-width, and a time duration.</p><p>Landmarks, which are 4-tuples of start time, start frequency, end frequency, and time difference, are converted into hashes with a start time and a 20 bit value describing the two frequencies and time difference, by <a href="landmark2hash.m">landmark2hash</a>. <a href="hash2landmark.m">hash2landmark</a> undoes this quantization.</p><p>The matching relies on the "inverted index" hash table, implemented by <a href="clear_hashtable.m">clear_hashtable</a>, <a href="record_hashes.m">record_hashes</a>, and <a href="get_hash_hits.m">get_hash_hits</a>. Currently, it's implemented as a single Matlab array, HashTable, stored in a global of 20 x 10^20 uint32s (about 80MB of core). This can record just 20 occurrences of each landmark hash, but as the reference database grows, this may fill up. You can change the size it is initialized to in clear_hashtable.m, but really it should be replaced with a real key/value database system.</p><p>Because the track IDs are stored in just 18 bits of the uint32, we can only handle 256k unique tracks in this implementation. Actually, I've only tried it with up to a few thousand.</p><p>Other functions included are <a href="add_tracks.m">add_tracks</a>, which simply ties together find_landmarks, landmarks2hash, and save_hashes to enter a new reference track into the database, <a href="match_query.m">match_query</a>, which extracts landmarks from a query audio and matches them against the reference database to returned ranked matches, <a href="show_landmarks.m">show_landmarks</a>, a utility to plot the actual landmarks on top of the spectrogram of a sound, and <a href="illustrate_match.m">illustrate_match</a>, which uses match_query and show_landmarks to show exactly which landmarks matched for a particular query's top match. Utility function <a href="myls.m">myls</a> is used by this demo script, <a href="demo_fingerprint.m">demo_fingerprint</a> to build a list of files, including possibly over http.</p><p>Also included are <a href="gen_random_queries.m">gen_random_queries</a> to generate random queries of a certain length from a list of tracks (possibly adding noise), <a href="eval_fprint.m">eval_fprint</a> to query the fingerprinter with a cell array of query waveforms such as gen_random_queries produces, and <a href="addprefixsuffix.m">addprefixsuffix</a> which is used to construct a cell array list of full file names from a cell array of unique ID strings.</p><h2>For Windows Users<a name="3"></a></h2><p>Robert Macrae of C4DM Queen Mary Univ. London sent me some notes on <a href="windows-notes.txt">running this code under Windows</a>.</p><h2>For Octave Users<a name="4"></a></h2><p>Joren Six has a nice page on <a href="http://tarsos.0110.be/artikels/lees/Dan_Ellis%2527_Robust_Landmark-Based_Audio_Fingerprinting_-_With_Octave">Porting this code to Octave</a>.</p><h2>See also AUDFPRINT Compiled Binary<a name="5"></a></h2><p>To make it easier to use in some large-scale experiments, I rewrote this code to function as a compiled Matlab stand-alone executable. See the <a href="http://labrosa.ee.columbia.edu/matlab/audfprint/">AUDFPRINT</a> page for more details.</p><h2>Download<a name="6"></a></h2><p>You can download all the code and data for these examples here: <a href="fingerprint.tgz">fingerprint.tgz</a>. For the demo code above, you will also need <a href="http://www.ee.columbia.edu/~dpwe/resources/matlab/mp3read.html">mp3 read/write</a>.</p><h2>Referencing<a name="7"></a></h2><p>If you use this code in your research and you want to make a reference to where you got it, you can use the following citation:</p><p>D. Ellis (2009), "Robust Landmark-Based Audio Fingerprinting", web resource, available: <a href="http://labrosa.ee.columbia.edu/matlab/fingerprint/">http://labrosa.ee.columbia.edu/matlab/fingerprint/</a> .</p><h2>Acknowledgment<a name="8"></a></h2><p>This material is based in part upon work supported by the National Science Foundation under Grant No. IIS-0713334. Any opinions, findings and conclusions or recomendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation (NSF).</p><p>Last updated: $Date: 2012/05/14 19:45:56 $ Dan Ellis <a href="[email protected]">[email protected]</a></p><p class="footer"><br>
Published with MATLAB® 7.14<br></p></div><!--
##### SOURCE BEGIN #####
%% Robust Landmark-Based Audio Fingerprinting
%
% _Note: please see also
% <http://labrosa.ee.columbia.edu/matlab/audfprint/ AUDFPRINT>
% which is a more fully-featured fingerprint tool developed from
% this codebase._
%
% These routines implement a landmark-based audio fingerprinting
% system that is very well suited to identifying small, noisy
% excerpts from a large number of items. It is based on the
% ideas used in the Shazam music matching service, which can
% identify seemingly any commercial music tracks from short
% snippets recorded via cellphones even in very noisy conditions.
% I don't know if my algorithm is as good as theirs, but the
% approach, as described in the paper below, certainly seems to
% work:
%
% Avery Wang "An Industrial-Strength Audio Search Algorithm",
% Proc. 2003 ISMIR International Symposium on Music Information
% Retrieval, Baltimore, MD, Oct. 2003.
% http://www.ee.columbia.edu/~dpwe/papers/Wang03-shazam.pdf
%
% The basic operation of this scheme is that each audio track is
% analyzed to find prominent onsets concentrated in frequency,
% since these onsets are most likely to be preserved in noise and
% distortion. These onsets are formed into pairs, parameterized by
% the frequencies of the peaks and the time inbetween them. These
% values are quantized to give a relatively large number of
% distinct landmark hashes (about 1 million in my implementation).
% Parameters are tuned to give around 20-50 landmarks per second.
%
% Each reference track is described by the (many hundreds) of
% landmarks it contains, and the times at which they occur. This
% information is held in an inverted index, which, for each of the
% 1 million distinct landmarks, lists the tracks in which they
% occur (and when they occur in those tracks).
%
% To identify a query, it is similarly converted to landmarks.
% Then, the database is queried to find all the reference tracks
% that share landmarks with the queries, and the relative time
% differences between where they occur in the query and where they
% occur in the reference tracks. Once a sufficient number of
% landmarks have been identified as coming from the same reference
% track, with the same relative timing, a match can be confidently
% declared. Normally, a small number of matches (e.g. 5) is
% sufficient to declare a match, since chance matches are very
% unlikely.
%
% The beauty, and robustness, of this approach is that only a few
% of the maxima (or landmarks) have to be the same in the
% refererence and query examples to allow a match. If the query
% example is noisy, or filtered strangely, or truncated, there's
% still a good chance that enough of the hashed landmarks will
% match to work. In the examples below, a 5 second excerpt
% recorded from a very low-quality playback is successfully
% matched.
%% Example use
%
% In the example below, we'll load a small database of audio tracks
% over the internet. You will need to have the latest version of
% my mp3read installed, which supports reading files from URLs.
% See http://labrosa.ee.columbia.edu/matlab/mp3read.html .
% <myls.m myls> also relies on having curl available to
% work (should be fine on Linux/Mac; if you want to run it
% on Windows, you can download a version of curl from
% http://curl.haxx.se/download.html , but you'll have to
% modify myls.m to make it invoke it correctly).
% Get the list of reference tracks to add (URLs in this case, but
% filenames work too)
tks= myls(['http://labrosa.ee.columbia.edu/~dpwe/tmp/Nine_Lives/*.mp3']);
% Initialize the hash table database array
clear_hashtable
% Calculate the landmark hashes for each reference track and store
% it in the array (takes a few seconds per track).
add_tracks(tks);
% Load a query waveform (recorded from playback on a laptop)
[dt,srt] = mp3read('<A HREF="Q-full-circle.mp3">Q-full-circle.mp3</A>');
% Run the query
R = match_query(dt,srt);
% R returns all the matches, sorted by match quality. Each row
% describes a match with three numbers: the index of the item in
% the database that matches, the number of matching hash landmarks,
% and the time offset (in 32ms steps) between the beggining of the
% reference track and the beggining of the query audio.
R(1,:)
% 5 18 1 18 means tks{5} was matched with 18 matching landmarks, at a
% time skew of 1 frame (query starts ~ 0.032s after beginning of
% reference track), and a total of 18 hashes matched that track at
% any time skew (meaning that in this case all the matching hashes
% had the same time skew of 1).
%
% Plot the matches
illustrate_match(dt,srt,tks);
colormap(1-gray)
% This re-runs the match, then plots spectrograms of both query and
% the matching part of the reference, with the landmark pairs
% plotted on top in red, and the matching landmarks plotted in
% green.
%% Implementation notes
%
% The main work in finding the landmarks is done by
% <find_landmarks.m find_landmarks>. This contains a number of
% parameters which can be tuned to control the density of
% landmarks. In general, denser landmarks result in more robust
% matching (since there are more opportunities to match), but
% greater load on the database.
%
% The scheme relies on just a few landmarks being common to both
% query and reference items. The greater the density of landmarks,
% the more like this is to occur (meaning that shorter and noisier
% queries can be tolerated), but the greater the load on the
% database holding the hashes.
%
% The factors influencing the number of landmarks returned are:
%
% 1. The number of local maxima found, which in turn depends on the
% spreading width applied to the masking skirt from each found
% peak, the decay rate of the masking skirt behind each peak, and
% the high-pass filter applied to the log-magnitude envelope.
%
% 2. The number of landmark pairs made with each peak. All maxes within
% a "target region" following the seed max are made into pairs,
% so the larger this region is (in time and frequency), the
% more maxes there will be. The target region is defined by a
% freqency half-width, and a time duration.
%
% Landmarks, which are 4-tuples of start time, start frequency, end
% frequency, and time difference, are converted into hashes with a
% start time and a 20 bit value describing the two frequencies and
% time difference, by <landmark2hash.m landmark2hash>.
% <hash2landmark.m hash2landmark> undoes this quantization.
%
% The matching relies on the "inverted index" hash table,
% implemented by <clear_hashtable.m clear_hashtable>,
% <record_hashes.m record_hashes>, and <get_hash_hits.m get_hash_hits>.
% Currently, it's implemented as a single Matlab array, HashTable,
% stored in a global of 20 x 10^20 uint32s (about 80MB of core).
% This can record just 20 occurrences of each landmark hash, but as
% the reference database grows, this may fill up. You can change
% the size it is initialized to in clear_hashtable.m, but really
% it should be replaced with a real key/value database system.
%
% Because the track IDs are stored in just 18 bits of the uint32,
% we can only handle 256k unique tracks in this implementation.
% Actually, I've only tried it with up to a few thousand.
%
% Other functions included are <add_tracks.m add_tracks>, which
% simply ties together find_landmarks, landmarks2hash, and
% save_hashes to enter a new reference track into the database,
% <match_query.m match_query>, which extracts landmarks from a
% query audio and matches them against the reference database to
% returned ranked matches, <show_landmarks.m show_landmarks>, a
% utility to plot the actual landmarks on top of the spectrogram of
% a sound, and <illustrate_match.m illustrate_match>, which uses
% match_query and show_landmarks to show exactly which landmarks
% matched for a particular query's top match. Utility function
% <myls.m myls> is used by this demo script,
% <demo_fingerprint.m demo_fingerprint> to build a list of files,
% including possibly over http.
%
% Also included are <gen_random_queries.m gen_random_queries> to
% generate random queries of a certain length from a list of tracks
% (possibly adding noise), <eval_fprint.m eval_fprint> to query the
% fingerprinter with a cell array of query waveforms such as
% gen_random_queries produces, and <addprefixsuffix.m
% addprefixsuffix> which is used to construct a cell array list of
% full file names from a cell array of unique ID strings.
%% For Windows Users
%
% Robert Macrae of C4DM Queen Mary Univ. London sent me some
% notes on <windows-notes.txt running this code under Windows>.
%% For Octave Users
%
% Joren Six has a nice page on
% <http://tarsos.0110.be/artikels/lees/Dan_Ellis%2527_Robust_Landmark-Based_Audio_Fingerprinting_-_With_Octave Porting this code to Octave>.
%% See also AUDFPRINT Compiled Binary
%
% To make it easier to use in some large-scale experiments, I
% rewrote this code to function as a compiled Matlab stand-alone
% executable. See the
% <http://labrosa.ee.columbia.edu/matlab/audfprint/ AUDFPRINT>
% page for more details.
%% Download
%
% You can download all the code and data for these examples here:
% <fingerprint.tgz fingerprint.tgz>.
% For the demo code above, you will also need
% <http://www.ee.columbia.edu/~dpwe/resources/matlab/mp3read.html mp3 read/write>.
%% Referencing
%
% If you use this code in your research and you want to make a
% reference to where you got it, you can use the following
% citation:
%
% D. Ellis (2009), "Robust Landmark-Based Audio Fingerprinting", web resource, available: http://labrosa.ee.columbia.edu/matlab/fingerprint/ .
%% Acknowledgment
%
% This material is based in part upon work supported by the
% National Science Foundation under Grant No. IIS-0713334. Any
% opinions, findings and conclusions or recomendations expressed in
% this material are those of the author(s) and do not necessarily
% reflect the views of the National Science Foundation (NSF).
%
% Last updated: $Date: 2012/05/14 19:45:56 $
% Dan Ellis <[email protected]>
##### SOURCE END #####
--></body></html>