In music information retrieval (MIR), the development of computational methods for analyzing, segmenting, and classifying music signals is of fundamental importance. In this project's first phase (initial proposal), we explored fundamental techniques for detecting characteristic sound events present in a given music recording. Here, our focus was on informed approaches that exploit musical knowledge in the form of score information, instrument samples, or musically salient sections. We considered concrete tasks such as locating audio sections with a specific timbre or instrument, identifying monophonic themes in complex polyphonic music recordings, and classifying music genres or playing styles based on melodic contours. We tested our approaches within complex music scenarios, including instrumental Western classical music, jazz, and opera recordings.
In the second phase of the project (renewal proposal), our goals will be significantly extended. First, we want to go beyond the music scenario by considering environmental sounds as a second challenging audio domain. As a central methodology, we plan to explore and combine the benefits of model-based and data-driven techniques to learn task-specic sound event representations. Furthermore, we will investigate hierarchical approaches to simultaneously incorporate, exploit, learn, and capture sound events that manifest on dierent temporal scales and belong to hierarchically ordered categories. An overarching goal of the project's second phase is to develop explainable deep learning models that provide a better understanding of the structural and acoustic properties of sound events.