Skip to content

Latest commit

 

History

History
27 lines (14 loc) · 8.09 KB

Bark.md

File metadata and controls

27 lines (14 loc) · 8.09 KB

Bark Scale (Domain) for Perceptual Audio Evaluation and Filter Design

Documentation by Matthew Cohen and Stephen Welch

###Introduction

The Bark scale is a psychoacoustical scale used in audio applications that aims to employ more perceptually accurate representations of audio and speech signals. Traditionally, the Bark scale ranges from 1 to 24 Barks, corresponding to the first 24 critical bands of hearing. Critical bands are related to the bandwidth of auditory filters that the cochlea in the inner ear create; within a critical band, audio frequencies will interfere such that the perception of one tone could be masked by another tone. The Bark scale critical bands can be expressed by center frequencies and lower/upper cut-off frequencies, though the bandwidth of a critical band at a frequency can be used and is as important as precisely measured center and cut-off frequencies. The conversion from frequencies in Hertz to Barks involves nonlinear trigonometric transformations.

The importance of the Bark domain (we will refer to the Bark scale as the Bark domain, since we are treating it as a mathematical transform domain), and other related scales such as the Mel scale, is in how it models how the cochlea processes auditory signals. Mapping to the Bark domain provides a more perceptual representation than the Fourier (Hz) domain; this has allowed for both better evaluation of quality of signals through perceptual comparison of two signals. The scale was encountered when we delved into the Perceptual Evaluation of Audio Quality (PEAQ) standardized algorithm (PEAQ.md). Many of the Model Output Variables (MOVs), which represent important attributes of the signal and are used to compare signals, are computed from the Bark-domain representation of the signals.

The intuition acquired from the PEAQ exploration was that the Bark domain could potentially be used as a metric or objective function in a learning algorithm, such as a Neural Network (NN), or as a means of designing perceptual audio filters for improving audio quality. We decided to explore the Bark domain more to see how useful it could be as a tool for enhancing audio quality through filter design. The following sections detail this exploration process and what was learned from it.

###Bark Domain and Filter Design

One important thing that we wanted to spend time with is the possibility of training learning algorithms and designing filters in the Bark domain. For these problems, the matter of convexity can be a concern, especially for certain learning approaches and constrained filter design methods. Filter design approaches that allow for the specification of pass-band/stop-band frequencies and ripples can produce nice, well-behaved filters in the end. However, these problems require very specific formulation, especially to ensure a convex problem. We learned that the filter design problem can be formulated in such a way that the problem is convex and can employ the power spectral density (FFT of the autocorrelation) and the spectral factorization technique to design the desired filters. However, the complexity of incorporating such an approach into our process given the time constraint we were working under forced us to ignore this technique for the time being.

Instead, we focused our attention on a [Frequency-Based Least Squares Approach to Filter Design](Frequency-Based Least Squares.ipynb). With this approach, the filter coefficients for a linear-phase type I (symmetric impulse response with odd length) FIR filter are computed using a least-squares/optimal square error design method. It is essentially a frequency-based least-squares/modeling technique that can be useful for various other frequency modeling or comparison formulations. The technique works from Fourier-domain (DFTs) representation of the desired magnitude response and the assumed form of the approximated response, which places this approach primarily in the frequency domain. The mathematics results in a least squares (matrix system) formulation from which the filter coefficients can be computed solving the system. This requires the computation of the inverse (in this case, the Moore-Penrose pseudo-inverse) of a cosine matrix, or of the product of a weighting term and the cosine matrix. As shown in the iPython notebook, this approach can successfully design nice lowpass filters. However, our hopes were to use this approach to design audio filters of arbitrary shape based on audio-signal transfer functions.

To try out this filter design process in the Bark domain, we worked on reformulating the problem with the addition of the Bark transform. This transform simply involves grouping frequency (DFT) bins and performing weighted sums of these components to produce the appropriately grouped Bark terms in the Bark domain. We formulated this process into a matrix transform operation, with two separate matrix forms: 1) the PEAQ form, introduced in the PEAQ standard; and 2) the RASTA form, used by Dan Ellis and LabROSA at Columbia University (http://labrosa.ee.columbia.edu/matlab/rastamat/). We created a [Bark Transform class](Bark Domain Transform.ipynb) and tested it out for both forms in the linked iPython notebook. To incorporate this transform into the frequency-based least squares filter design approach, we assumed that this transform could be applied to both sides of the least squares system, creating a modified system to be solved. This [Bark domain filter design approach](Explore Bark Domain Filter Design.ipynb) is explained in detail at the linked iPython notebook. Ideally, this problem could be solved by simply computing the pseudo-inverse of the new matrix consisting of the product of the Bark transform matrix and the cosine matrix (ignoring a weighting matrix for now).

###Issues

We encountered several issues when working on the Bark domain filter design approach. The main issue was that the characteristics of the matrix made the inversion process difficult. Matrix BC (the product of the Bark transform and the cosine matrix for the least squares portion) is a rank deficient matrix, i.e. its rank is less than the number of columns (and rows). For this approach, we were using a total of 109 Barks instead of 24 Barks, which is what the PEAQ method uses. To keep the system overdetermined, we tried to design filters with 64 coefficients. The rank of matrix BC of size 109 x 64 was 57, making BC rank deficient. To handle the rank deficiency when computing the pseudo-inverse, an SVD-based approach must be used. [Rank-Deficient Least Squares](Explore Rank-Deficient Least Squares.ipynb) is explored in more detail in this iPython notebook. Even when using these alternate approaches for computing the pseudo-inverse and solving the least squares problem, the results were basically garbage. It appears that we were unable to find the right number of singular values to include in the SVD inverse computation, which caused the resulting inverse to take on very large values and eventually go to zero along the rows. The iPython notebook addressing [Bark domain issues](Explore Bark Domain Issues.ipynb) displays plots of certain rows of the SVD pseudo-inverse of BC, and the rows take on a chirp-like form. The reason for this is currently unclear to us. What we have been able to determine thus far is that matrix BC appears to take on an unnatural form that is hard to work with. Figuring out how to properly handle this matrix and solve this problem will require more understanding and insight into the problem.

###Conclusions

In conclusion, we were able to explore the Bark domain/transform in the context of filter design, and were able to determine that incorporating the Bark transform into the frequency-based least squares (FLS) formulation causes the matrix system to take on undesired forms and rank deficiency. In order to solve this problem, a better understanding of the effects of applying the Bark transform to this problem/matrix system will need to be explored; the assumptions made in the FLS approach will need to be fully understood, and better assumptions and constraints will need to be considered; and a successful method for solving rank-deficient least squares problems will need to be determined.