Skip to content

Commit

Permalink
add files
Browse files Browse the repository at this point in the history
  • Loading branch information
HW-Lee committed Jul 26, 2016
1 parent 4d21227 commit 9f05d85
Show file tree
Hide file tree
Showing 13 changed files with 518 additions and 1 deletion.
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
# *.pdf

## Bibliography auxiliary files (bibtex/biblatex/biber):
*.bbl
# *.bbl
*.bcf
*.blg
*-blx.aux
Expand Down
39 changes: 39 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
# Thanks to Tz-Huan Huang [http://www.csie.ntu.edu.tw/~tzhuan] for building this script.

MAIN=thesis
LATEX=xelatex
BIBTEX=bibtex
RM=rm -f

.SUFFIXES: .tex

ifdef PASSWORD
all: $(MAIN).pdf $(MAIN)-with-pass.pdf
else
all: $(MAIN).pdf
endif

ifdef WATERMARK
TEXFLAG="\def\withwatermark{1}\input{$(MAIN)}"
else
TEXFLAG=
endif

$(MAIN).pdf: *.tex nthuthesis.cls
$(LATEX) $(TEXFLAG) $(MAIN)
$(BIBTEX) $(MAIN)
$(LATEX) $(TEXFLAG) $(MAIN)
$(LATEX) $(TEXFLAG) $(MAIN)

ifdef PASSWORD
$(MAIN)-with-pass.pdf: $(MAIN).pdf
pdftk $^ output $@ owner_pw "$(PASSWORD)" allow printing allow ScreenReaders
endif

clean:
$(RM) *.log *.aux *.dvi *.lof *.lot *.toc *.bbl *.blg

clean-pdf:
$(RM) -f $(MAIN).pdf $(MAIN)-with-pass.pdf

clean-all: clean clean-pdf
33 changes: 33 additions & 0 deletions abstract.tex
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
\begin{abstractzh}
在音樂應用上,人們發明樂譜原先是為了方便以圖像的方式紀錄一段音樂的資訊,
而光學音樂辨識旨在設計一套演算流程,讓電腦也能夠自動辨識原先是設計給人閱讀的樂譜。
一般而言,樂譜被存為電子檔的格式大多為圖片檔,因此光學音樂辨識的目的在於從一張圖片上取得其音樂資訊。
本論文主要探討兩個面向:樂譜的前處理以及對於單一組五線譜的辨識演算。
一個樂譜會先經過前處理將其分割成更小的單位來獨立運算以及處理一些印刷上所造成的雜訊或瑕疵,
讓後續的辨識能夠得到最好的輸入圖片。
辨識則是本論文的核心,本論文以樣本匹配法及支持向量機實作辨識演算法,在實際的樂譜圖片上都有不錯的結果。
除此之外,在演算法的設計上也與以往有所不同。
第一點,在前處理中使用隨機抽樣一致法,使其結果多了隨機性,每一次的結果在同一張圖上都會不一樣,因此讓重複執行變得有意義。
其不同次執行的結果,可以歸納出一個更好的結果,使一些原先穩定演算法無法辨識到的符號因為其隨機性而有機會被辨識。
第二點則是其演算法基於分治法的概念,意即其分割出來的子問題幾乎是完全獨立的,也因此讓此實作更適合平行處理來加快運算速度。
\end{abstractzh}

\begin{abstracten}
The purpose of optical music recognition is to develop a computer program that is able to understand the musical score, which is invented for human beings to annotate melody. A score is usually stored as an image. Therefore, a recognition system must retrieve musical information from a set of pixels.
This dissertation deals with two major issues: preprocessing and recognition. Preprocessing aims at dividing the input image into several slices that can be processed independently and handling the defects in the printing step. The goal of preprocessing is to simplify the subsequent recognition stage. Afterward, recognition on a staff image is the core of this dissertation. The implementation is based on template matching and the support vector machine. For real score images, the present algorithm works well.
The design of the present algorithm brings a different perspective to optical music recognition. First, the preprocessing uses \emph{random sample consensus} (RANSAC) as a part of staff detection. Such randomness makes it meaningful to repeat the same operation; by comparing the results between different iterations, consensus-based correction provides possibility of finding symbols that other existing stable algorithms cannot find. Secondly, the algorithm is based on the \emph{divide and conquer} concept, which means the subtasks have little correlation, and hence the algorithm can be readily parallelized.
\end{abstracten}

% \keywords{Optical Music Recognition, Pattern Recognition, Music Technology}

\begin{comment}
\category{I2.10}{Computing Methodologies}{Artificial Intelligence --
Vision and Scene Understanding} \category{H5.3}{Information
Systems}{Information Interfaces and Presentation (HCI) -- Web-based
Interaction.}

\terms{Design, Human factors, Performance.}

\keywords{Region of interest, Visual attention model, Web-based
games, Benchmarks.}
\end{comment}
7 changes: 7 additions & 0 deletions acknowledgements.tex
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
\begin{acknowledgementszh}
感謝\ldots
\end{acknowledgementszh}

\begin{acknowledgementsen}
I'm glad to thank\ldots
\end{acknowledgementsen}
Binary file added figsrc/DnC.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added figsrc/bookspine.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added figsrc/watermark.pdf
Binary file not shown.
37 changes: 37 additions & 0 deletions introduction.tex
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
\chapter{Introduction}
\label{c:intro}

\section{Motivation}
\label{section:motivation}
High-tech tools are prevalent nowadays and many of our daily are now routinely performed with computers. People write articles with computers; people draw diagrams with computers; people, of course, design programs with computers. Among our various usages of computers, one of them is music composition. For the purpose of storing and visualizing musicians' creation, the standard western musical score, which contains information pertaining to how a piece of music should be played, has been used for hundreds of years and around the globe. However, the score was designed for human beings instead of computers, and most of scores are scanned and stored as images, which means nothing but lots of pixels for computers. In other words, these scores are not yet symbolically represented. Therefore, the concern of this dissertation is \emph{optical music recognition} (OMR), which refers to the development of methods that automatically convert score images into their symbolic representation.

\section{Goal}
\label{section:goal}
Design a software that converts a score image (.png / .jpeg / .bmp / .pdf) into its symbolic representation encoded in a format that is readable by a computer such as MusicXML.

\section{Divide and Conquer}
\subsection{Definition}
\label{section:divide-and-conquer}

Fig.~\ref{fig:DnC} shows the concepts of \emph{divide and conquer} (D\&C). D\&C is an algorithm design paradigm that breaks a complex problem into a couple of relatively simple subproblems, to \emph{divide}, then solves them respectively, to \emph{conquer}. Before conquering, the problem will be divided recursively until it is simple enough to be processed. Finally, the solutions to the subproblems will be merged as those to the original problem.

\begin{figure}[!htb]
\centering
\includegraphics[width=\textwidth]{figsrc/DnC.png}
\caption{A diagram showing how divide and conquer works.\label{fig:DnC}}
\end{figure}

\subsection{Main Contribution of This Dissertation}
\label{subsec:advantages}

\subsubsection{Reducing the Difficulty of Problems}

Due to characteristics of D\&C, all problems that can be accurately split are expected to be solved. For this dissertation, particularly, if the function detecting staves is reliable, then we can analyze arbitrarily complicated scores.

\subsubsection{Independence of Subproblems}

Typically, a score contains something useless for recognition such as the metadata of the song, lyrics, and even printed defects. By partitioning the original images into subimages where each contains only one staff, the amount of noisy information can be reduced and interference between staves is eliminated. Therefore, the detection tasks are independent between different staves.

\subsubsection{Parallelism}

Nowadays, a processor usually has multiple cores, and lots of computational tasks are implemented to be executed with parallel programs. In D\&C algorithm, the functions solving split subproblems are identically designed. With high independence and similar operations between subproblems, it is a good strategy to process them simultaneously. In other word, the original problem is suitable to be solved with \emph{SIMD (Single-Instruction-Multiple-Data)} parallel programs.
178 changes: 178 additions & 0 deletions nthuthesis.cls
Original file line number Diff line number Diff line change
@@ -0,0 +1,178 @@
% This is a thesis template of NTHU in LaTex, which is modified by Hao-Wei Lee
% author: Tz-Huan Huang [http://www.csie.ntu.edu.tw/~tzhuan]

% ----------------------------------------------------------------------------
% "THE CHOCOLATE-WARE LICENSE":
% Tz-Huan Huang wrote this file. As long as you retain this notice you
% can do whatever you want with this stuff. If we meet some day, and you think
% this stuff is worth it, you can buy me a chocolate in return Tz-Huan Huang
% ----------------------------------------------------------------------------

\NeedsTeXFormat{LaTeX2e}
% \ProvidesClass{ntuthesis}[2013/04/23 Thesis template for National Taiwan University]
% Line 12 is not deleted intentionally.
\ProvidesClass{nthuthesis}[2016/07/25 Thesis template for National Tsing Hua University] % modified by HW Lee

% Derived from book class
\DeclareRobustCommand{\@typeen}{Master}
\DeclareRobustCommand{\@typezh}{碩士}
\DeclareRobustCommand{\@classen}{Thesis}
\DeclareRobustCommand{\@classzh}{論文}
\DeclareOption{phd}{\gdef\@typeen{Doctoral}\gdef\@typezh{博士}\gdef\@classen{Dissertation}}
\DeclareOption{proposal}{\gdef\@typeen{Proposal for Doctoral}\gdef\@typezh{博士論文計畫提案書}\gdef\@classen{Dissertation}\gdef\@classzh{}}
\DeclareRobustCommand{\@setspacing}{\doublespacing}
\DeclareOption{singlespacing}{\gdef\@setspacing{\singlespacing}}
\DeclareOption{onehalfspacing}{\gdef\@setspacing{\onehalfspacing}}
\DeclareOption*{\PassOptionsToClass{\CurrentOption}{book}}
\ProcessOptions\relax
\LoadClass[a4paper,12pt]{book}

% Required packages
\RequirePackage[top=1in,left=1.5in,bottom=1in,right=1in]{geometry}
\RequirePackage{xeCJK}
\RequirePackage{setspace}

% Declarations
\DeclareRobustCommand{\university}[2]{\gdef\@universityen{#1}\gdef\@universityzh{#2}}
\DeclareRobustCommand{\college}[2]{\gdef\@collegeen{#1}\gdef\@collegezh{#2}}
\DeclareRobustCommand{\institute}[2]{\gdef\@instituteen{#1}\gdef\@institutezh{#2}}
\DeclareRobustCommand{\division}[1]{\gdef\@divisionzh{#1}}
\DeclareRobustCommand{\title}[2]{\gdef\@titleen{#1}\gdef\@titlezh{#2}}
\DeclareRobustCommand{\author}[2]{\gdef\@authoren{#1}\gdef\@authorzh{#2}}
\DeclareRobustCommand{\studentid}[1]{\gdef\@studentid{#1}}
\DeclareRobustCommand{\advisor}[2]{\gdef\@advisoren{#1}\gdef\@advisorzh{#2}}
\DeclareRobustCommand{\defenseyear}[2]{\gdef\@yearen{#1}\gdef\@yearzh{#2}}
\DeclareRobustCommand{\defensemonth}[2]{\gdef\@monthen{#1}\gdef\@monthzh{#2}}
\DeclareRobustCommand{\defenseday}[1]{\gdef\@day{#1}}
\DeclareRobustCommand{\abstractname}[2]{\gdef\@abstractnameen{#1}\gdef\@abstractnamezh{#2}}
\DeclareRobustCommand{\acknowledgements}[2]{\gdef\@acknowledgementsen{#1}\gdef\@acknowledgementszh{#2}}

\abstractname{Abstract}{摘要}
\acknowledgements{Acknowledgements}{誌謝}

% The command \makecover has been re-designed to be a valid format for the Dept. EE
\DeclareRobustCommand{\makecover}{
\begin{titlepage}
\begin{center}
\fontsize{30}{30}\selectfont
\makebox[9cm][s]{\textbf{\@universityzh}}\\
\underline{\makebox[5cm][s]{\textbf{\@typezh\@classzh}}}
\end{center}
\vspace{\fill}
\begin{center}
\fontsize{25}{25}\selectfont
\textbf{\@titlezh}\par
\textbf{\@titleen}\par
\end{center}
\vspace{\fill}
\begin{flushleft}
\fontsize{20}{30}\selectfont
\textbf{系所別:\underline{{\@institutezh\@typezh}班}
\hspace{\fill}組別:\underline{\@divisionzh 組}}\\
\textbf{學號姓名:\underline{\@studentid \@authorzh} (\@authoren)}\\
\textbf{指導教授:\underline{\@advisorzh\hspace{.3cm}博士} (Prof. \@advisoren)}
\end{flushleft}
\vspace{1.5cm}
\begin{center}
\fontsize{20}{15}\selectfont
\makebox[10cm][s]{\textbf{中華民國 \@yearzh\@monthzh}}
\end{center}
\end{titlepage}
}

% The command \makecopyright is created and designed by HW Lee
\DeclareRobustCommand{\makecopyright}{
\if@openright\cleardoublepage\else\clearpage\fi
\begin{singlespace}
\thispagestyle{empty}
\vspace*{\fill}
\begin{center}
\fontsize{14}{20}\selectfont
\textcopyright Copyright by \@authorzh (\@authoren), \@yearen\\
All Right Reserved
\end{center}
\end{singlespace}
}

% stolen from CJKfntef
%
% myCJKfilltwosides environment:
% Align CJK characters to fill two sides.
%
% Usage:
% \begin{myCJKfilltwosides}{width}
% first line \\
% second line \\
% ... \\
% last line
% \end{myCJKfilltwosides}
%
\newif\ifmyCJK@fillbegin@
\global\myCJK@fillbegin@false
\newif\ifmyCJK@filltwosides@
\global\myCJK@filltwosides@false

\newenvironment{myCJKfilltwosides}[1]{
\leavevmode
\vbox\bgroup
\global\myCJK@filltwosides@true
\global\let\myCJK@filltwosidesSymbol \CJKsymbol

\def\myCJK@ftscr{
\egroup
\global\myCJK@fillbegin@false
\hbox to #1\bgroup
\ignorespaces}

\let\\ \myCJK@ftscr

\def\CJKsymbol##1{
\ifmyCJK@fillbegin@
\hfill
\myCJK@filltwosidesSymbol{##1}
\else
\myCJK@filltwosidesSymbol{##1}
\global\myCJK@fillbegin@true
\fi}

\hbox to #1\bgroup
\ignorespaces
}{
\egroup
\egroup

\global\let\CJKsymbol \myCJK@filltwosidesSymbol
\global\myCJK@fillbegin@false
\global\myCJK@filltwosides@false}


\DeclareRobustCommand{\CJKmove}[1]{\raisebox{.35em}{#1}}
\DeclareRobustCommand{\makespine}{
\noindent\rotatebox{-90}{
\CJKfamily{sidepagefont}
\begin{tabular}{m{3.5cm}m{0.1cm}m{2cm}m{0.1cm}m{10cm}m{1cm}m{2.5cm}m{0.1cm}m{1.5cm}}
\fontsize{8}{6}\selectfont
\begin{myCJKfilltwosides}{3cm}\CJKmove{\@universityzh}\end{myCJKfilltwosides}\newline
\begin{myCJKfilltwosides}{3cm}\CJKmove{\@institutezh}\end{myCJKfilltwosides} & &
\CJKmove{\@typezh\@classzh} & &
{\fontsize{14}{14}\selectfont\CJKmove{\@titlezh}} & &
{\fontsize{14}{14}\selectfont\CJKmove{\@authorzh{} 撰}} & &
\raisebox{-0.25em}{\rotatebox{90}{\@yearzh{}} \rotatebox{90}{\hspace{0.25em}\@monthzh{}}}
\end{tabular}}
}

\newenvironment{quotationpage}[1]
{\if@openright\cleardoublepage\else\clearpage\fi
\chapter*{\centerline{#1}}
\addcontentsline{toc}{chapter}{#1}
\quotation}
{\endquotation}

\newenvironment{abstracten}{\begin{quotationpage}{\@abstractnameen}}{\end{quotationpage}}
\newenvironment{abstractzh}{\begin{quotationpage}{\@abstractnamezh}}{\end{quotationpage}}
\newenvironment{acknowledgementsen}{\begin{quotationpage}{\@acknowledgementsen}}{\end{quotationpage}}
\newenvironment{acknowledgementszh}{\begin{quotationpage}{\@acknowledgementszh}}{\end{quotationpage}}

\setcounter{tocdepth}{2}
\pagestyle{plain}
\@setspacing
21 changes: 21 additions & 0 deletions nthuvars.tex
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
% author: Tz-Huan Huang [http://www.csie.ntu.edu.tw/~tzhuan]

% ----------------------------------------------------------------------------
% "THE CHOCOLATE-WARE LICENSE":
% Tz-Huan Huang wrote this file. As long as you retain this notice you
% can do whatever you want with this stuff. If we meet some day, and you think
% this stuff is worth it, you can buy me a chocolate in return Tz-Huan Huang
% ----------------------------------------------------------------------------

% Syntax: \var{English}{Chinese}
\university{National Tsing Hua University}{國立清華大學}
\college{College of Electrical Engineering and Computer Science}{電機資訊學院}
\institute{Department of Electrical Engineering}{電機工程學系}
\division{系統}
\title{Automatic Music Score Recognition}{自動樂譜辨識}
\author{HW Lee}{李豪韋}
\studentid{103061xxx}
\advisor{Yi-Wen Liu}{劉奕汶}
\defenseyear{2016}{105}
\defensemonth{July}{7}
\defenseday{10}
48 changes: 48 additions & 0 deletions omr-overview.tex
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
\chapter{Overview of OMR}
\label{c:overveiw-of-omr}

In this section, previous works of OMR are mentioned. Preprocessing (binarization, staff profiling, staff detection, and staff removal) and recognition (symbol segmentation, symbol classification) are included.

\section{Binarization}

\if 0
\bibliography{thesis}
\graphicspath{{./figsrc/}}
\fi

In recognition of printed scores, the color information, namely R/G/B or R/G/B/A vectors, is not useful. Instead, only the intensity information is considered for recognition, so gray-scaled images are always used as the raw input. Furthermore, people always determine if each pixel is background (white) or foreground (black) in advance, and hence the binarization is included in most applications of OMR.

In Pinto's research~\cite{Pinto:2011:MSB}, two kinds of binarization methods were introduced depending on whether the binarization threshold is locally adjustable. The simplest way is applying a constant threshold to all pixels in the image, which is called \emph{global thresholding}. The global threshold can be obtained by finding a value that maximizes the variance~\cite{Otsu:1979:ATSMfGLH} between foreground and background pixels, preserves the most edge information~\cite{Chen:2008:ADTIBMboED}, or maximizes the similarity between the binarized image and the original image~\cite{Huang:1995:ITbMtMoF,Tsai:1995:AFTSPfMaUH}. However, it cannot be expected that the intensity in different small regions is constant over the document, and a constant threshold might not work at a different intensity level. In particular, near the boundary of a page in a book, the image might show a gradient-like difference in terms of the average intensity as compared to the region far from the book spine (Fig.~\ref{fig:bookspine}). To deal with such situations, the choice of the threshold should be determined by local information (nearby pixels)~\cite{Bernsen:2005:DToGLI}, which is called \emph{local thresholding}. In general, global thresholding is easier to be implemented, while local thresholding is more adaptive and robust.

\begin{figure}[ht]
\includegraphics[width=\textwidth]{bookspine}
\caption{Example of the gray-scale image near the book spine\label{fig:bookspine}.}
\end{figure}

\section{Staff Detection and Removal}

Dalitz et al.~\cite{Dalitz:2008:CSoSRA} introduced a systematic way for testing the staff removal algorithms. A dataset was generated from a set of ideal score images with the deformation methods listed in Table.~\ref{table:deformation}. The deformation algorithms and the CVC-MUSCIMA dataset are made openly available by Forns et al.~\cite{Forns:2012:CVC-MUSCIMA}.

\begin{table}[ht]
\hspace{-.5in}
\begin{tabular}{|c|c|c|}
\hline
{\bf Deformation} & {\bf Type} & {\bf Parameter Description} \\
\hline
Curvature & deterministic & height/width ratio of sine curve \\
\hline
Typeset Emulation & both & \parbox[c]{9cm}{gap width, maximal height and variance of vertical shift} \\
\hline
Line Interruptions & random & \parbox[c]{9cm}{interruption frequency, maximal width and variance of gap width} \\
\hline
Thickness Variation & random & \parbox[c]{9cm}{Markov chain stationary distribution and inertia factor} \\
\hline
$y$-variation & random & \parbox[c]{9cm}{Markov chain stationary distribution and inertia factor} \\
\hline
Degradation & random & \parbox[c]{9cm}{emulating local distortions suggested by Kanungo et al.~\cite{Kanungo:2000:Degradation}} \\
\hline
White Speckles & random & \parbox[c]{9cm}{speckle frequency, random walk length and smoothing factor} \\
\hline
\end{tabular}
\caption{Deformation Methods\label{table:deformation}.}
\end{table}
Loading

0 comments on commit 9f05d85

Please sign in to comment.