News about ISAD

ISMIR 2022

A paper from the ISAD 2 project has been accepted for the 23rd International Society for Music Information Retrieval Conference (ISMIR 2022) in Bengaluru, India.
  • The paper titled "Multi-pitch Estimation meets Microphone Mismatch: Applicability of Domain Adaptation" suggests that a U-net based multi-pitch estimation (MPE) model for automatic piano transcription might be robust to domain shift caused by a microphone mismatch, since various domain adaptation methods did not improve its performance significantly in our experiments.
More details will be provided after publication.

IEEE/ACM Transactions on Audio, Speech, and Language Processing 2021

A journal article from the ISAD project has been accepted for the IEEE/ACM Transactions on Audio, Speech, and Language Processing.
  • The article is titled "CTC-Based Learning of Chroma Features for Score—Audio Music Retrieval", and can be found here.
For more details please refer to our publications section.

Electronics: Special Issue Machine Learning Applied to Music/Audio Signal Processing 2021

Two journals from the ISAD project have been accepted for the MDPI Electronics Journal, Special Issue: Machine Learning Applied to Music/Audio Signal Processing.
  • The first journal is titled "Jazz Bass Transcription Using a U-Net Architecture", and can be found here.
  • The second journal is titled "Informing Piano Multi-Pitch Estimation with Inferred Local Polyphony Based on Convolutional Neural Networks", and can be found here.
For more details please refer to our publications section.

ISMIR 2020

Two papers from the ISAD project have been accepted for the 21st Conference of the International Society for Music Information Retrieval (ISMIR 2020) in Montréal, Canada.
  • The paper titled "Using Weakly Aligned Score–Audio Pairs to Train Deep Chroma Models for Cross-Modal Music Retrieval" shows how to use the Connectionist Temporal Classification (CTC) loss to train a deep learning model for computing an enhanced chroma representation, using weakly aligned score–audio pairs. We then apply this model to a cross-modal retrieval task, where we aim to find relevant audio recordings of Western classical music, given a short monophonic musical theme in symbolic notation as a query. We present systematic experiments that show improved state-of-the-art results for this theme-based retrieval task.
  • In our second paper, titled "Classifying Leitmotifs in Recordings of Operas by Richard Wagner", we approach the task of classifying leitmotifs in audio recordings. Such leitmotifs (short musical ideas referring to semantic entities such as characters, places, items, or feelings) are used by composers of Western opera for guiding the audience through the plot and illustrating the events on stage. Our findings demonstrate the possibilities and limitations of leitmotif classification in audio recordings and pave the way towards the fully automated detection of leitmotifs in music recordings.
For more details please refer to our publications section.

Applied Sciences 2020

An article from ISAD has been accepted for publication in the Special Issue on Digital Audio Effects of the Applied Sciences journal.
  • Cross-version music retrieval aims at identifying all versions of a given piece of music using a short query audio fragment. One previous approach, which is particularly suited for Western classical music, is based on a nearest neighbor search using short sequences of chroma features, also referred to as audio shingles. From the viewpoint of efficiency, indexing and dimensionality reduction are important aspects. In this paper, we extend previous work by adapting two embedding techniques; one is based on classical principle component analysis, and the other is based on neural networks with triplet loss. Furthermore, we report on systematically conducted experiments with Western classical music recordings and discuss the trade-off between retrieval quality and embedding dimensionality. As one main result, we show that, using neural networks, one can reduce the audio shingles from 240 to fewer than 8 dimensions with only a moderate loss in retrieval accuracy. In addition, we present extended experiments with databases of different sizes and different query lengths to test the scalability and generalizability of the dimensionality reduction methods. We also provide a more detailed view into the retrieval problem by analyzing the distances that appear in the nearest neighbor search.
For more details please refer to our publications section.

IEEE/ACM TASLP 2019

An article from ISAD has been accepted for publication at the IEEE/ACM Transactions on Audio, Speech, and Language Processing (TASLP).
  • Ever wondered what deep neural networks learn when they are trained to isolate the signing voice? Our work entitled "Examining the Mapping Functions of Denoising Autoencoders in Singing Voice Separation" proposes a method to analyze deep models, with a signal processing twist. That twist brings the notion of a well established signal processing operation, that is filtering, allowing the intuitive examination of such deep models.
For more details please refer to our publications section.

ISMIR 2019

Two papers from the ISAD project have been accepted for the 20th Conference of the International Society for Music Information Retrieval (ISMIR 2019) in Delft, NL.
  • The paper titled "Investigating CNN-Based Instrument Family Recognition for Western Classical Music Recordings" analyzes the influence of data normalization, patch pre-processing and augmentation techniques on the generalization capabilities of CNN models. Experiments are conducted on three datasets covering different levels of timbral complexity (isolated notes, isolated melodies, polyphonic pieces), with a final cross-dataset experiment revealing model performance on unseen data. The results indicate that current CNN models need further optimization methods, i.e. domain adaptation, to increase generalization capability.
  • In our second paper, we introduce a novel collection of educational material for teaching and learning fundamentals of music processing (FMP) with a particular focus on the audio domain. This collection is referred to as FMP notebooks, which include open-source Python code, Jupyter notebooks, detailed explanations, as well as numerous audio and music examples for teaching and learning MIR and audio signal processing. The notebooks are accessible at https://www.audiolabs-erlangen.de/FMP.
For more details please refer to our publications section.

ICASSP 2019

We are glad to anounce that three papers from the ISAD project have been accepted for at the 44th nternational Conference on Acoustics, Speech, and Signal Processing (ICASSP 2019) in Brighton (UK).
  • The first paper compares hand-crafted feature sets for fundamental frequency contour classification with automatically learnt feature representations based on convolutional neural networks. The evaluation scenarios include different tasks such as classifying music instruments, playing techniques, as well as music genres. The results show a comparable performance of both feature sets. Interestingly, the automatically learnt features show a higher degree of redundancy since multiple convolution kernels (filters) specialize on similar contour shapes.
  • Our second paper investigates transitions between subsequent chords within a piece, extracted from audio recordings. We propose novel mid-level features that capture chord transitions in a “soft” way. Our method exploits the Baum–Welch algorithm, which does not involve hard decisions on chord labels. In several experiments, we evaluate these features within a style classification scenario discriminating four historical periods of Western classical music.
  • In our third paper, we consider the cross-modal retrieval scenario of finding short monophonic musical themes in symbolic format within audio recordings of Western classical music. We propose to perform the cross-modal comparison on the basis of melody-enhanced salience representations. As the main contribution, we evaluate several conceptually different salience representations for our cross-modal retrieval scenario.
For more details please refer to our publications section.

ISMIR 2018

We are glad to anounce that our three papers have been accepted for presentation at the 19th Conference of the International Society for Music Information Retrieval, which takes place in Paris.
  • The first paper re-considers a method for analyzing the tonal complexity of music recordings within a whole corpus. We transfer such a study - which have shown interesting results for Western classical music - to a jazz music scenario using the Weimar Jazz Database (WJD). Based on the audio recordings as well as the high-quality transcriptions of the WJD, we investigate the influence of the input representation type on the corpus-level observations.
  • Our second paper investigates two approaches to improve a bass pitch detection algorithm based on deep neural networks. First, we show that by adding isolated recordings of the targeted instrument (upright bass) to the training set, we can improve the accuracy. Second, we increase the amount of training data by adding unannotated audio data, for which we obtain bass pitch annotations via label propagation.
  • In our third paper we focus on predominant melody instrument recognition in ensemble recordings of popular and jazz music. First, we investigate, how source separation techniques for separating harmonic / percussive signal components or for extracting the main melody instrument from a mixture can be used as pre-processing step. Second, we evaluate transfer / continuous learning to obtain specified models for tasks such as jazz solo instrument recognition from models, which were trained on a larger set of instruments before.
For more details please refer to our publications section.