A RESEARCHER IN AUDIO

Zafar Rafii

PhD in Electrical Engineering & Computer Science
Berkeley, CA, USA
zafarrafii@gmail.com

ABSTRACT

In this website, we present a researcher in audio. The proposed researcher has a PhD in electrical engineering and computer science from Northwestern University, with a focus on audio signal analysis. He has over 30 publications, including conference papers, journal articles, and patents, with more than 1200 citations overall. He is actively involved within the research community, as a reviewer for numerous conferences and journals, a member of the IEEE audio and acoustic signal processing technical committee, and an organizer of networking meetups in the San Francisco Bay Area. He is currently a senior research engineer at Gracenote, where he is working on a number of projects, among others, audio content recognition, audio encoding analysis, and audio beamforming.

Index Terms— audio, research, signal processing, source separation, content recognition

1. INTRODUCTION

zafar
Fig.1. Overview of the proposed researcher.

The proposed researcher is named Zafar Rafii. He received a PhD in electrical engineering and computer science from Northwestern University in 2014. He was with the Interactive Audio Lab, under the supervision of professor Bryan Pardo. Before that, he was a research engineer at Audionamix, in France. He is now a senior research engineer in the audio group of the Media Technology Lab at Gracenote.

The proposed researcher has interest and expertise in audio signal analysis, somewhere between signal processing, machine learning, and cognitive science. He has worked on a number of projects, including:

For more information on the proposed researcher, the reader is referred to the following materials:

For other relevant information related to the proposed researcher, such as the meetups he organizes, the mentoring program he is involved in, or the audio dataset he created, the reader is referred to the following links:

The rest of the website is organized as follows. In Section 2, we present a selection of projects the proposed researcher has worked on. In Section 3, we introduce his PhD thesis work on the REpeating Pattern Extraction Technique (REPET) for blind source separation. In Section 4, we share links to his GitHub repositories where some of his source codes reside. In Section 5, we provide references to all of his publications, presentations, and other materials.

2. RESEARCH

2.1. Adaptive Reverberation Effects (2008)

reverberation
Fig.2. A user rating a sound modified by a series of reverberation settings as to how well it fits her/his audio concept of "boomy" she/he had in mind.

People often think about sound in terms of subjective concepts which do not necessarily have known mappings onto the controls of existing audio tools. For example, a bass player may wish to use a reverberation effect to make her/his bass sound more "boomy", but unfortunately there is no "boomy" knob to be found. We developed a system that can quickly learn an audio concept from a user (e.g., a "boomy" effect) and generate a simple controller than can manipulate sounds in terms of that audio concept (e.g., make a sound more "boomy"), bypassing the bottleneck of technical knowledge of complex interfaces and individual differences in subjective terms.

For this study, we focused on reverberation effects. We developed a digital reverberator, mapping the parameters of the digital filters to measures of the reverberation effect, so that the reverberator can be controlled through meaningful descriptors such as "reverberation time" or "spectral centroid." In the learning process, a sound is first modified by a series of reverberation settings using the reverberator. The user then listens and rates each modified sound as to how well it fits the audio concept she/he has in mind. The ratings are finally mapped onto the controls of the reverberator and a simple controller is built with which the user is able to manipulate the degree of her/his audio concept on a sound. Several experiments conducted on human subjects showed that the system learns quickly (under 3 minutes), predicts user responses well (mean correlation of 0.75), and meets users' expectations (average human rating of 7.4 out of 10).

For more information about this project, the reader is referred to [32], [14], and [35].

2.2. DUET using the CQT (2011)

duet
Fig.3. Blind separation of a stereo recording of Homer, Bart, and Lisa using DUET.

The Degenerate Unmixing Estimation Technique (DUET) is a blind source separation method that can separate an arbitrary number of unknown sources using a single stereo mixture. DUET builds a two-dimensional histogram from the amplitude ratio and phase difference between channels, where each peak indicates a source, with peak location corresponding to the mixing parameters associated with that source. Provided that the time-frequency bins of the sources do not overlap too much - an assumption generally validated by speech mixtures, DUET partitions the time-frequency representation of the mixture by assigning each bin to the source with the closest mixing parameters. However, when time-frequency bins of the sources start to overlap more - as generally seen in music mixtures when using the common short-time Fourier transform (STFT), peaks start to fuse in the 2d histogram and DUET cannot perform separation effectively.

We proposed to improve peak/source separation in DUET by building the 2d histogram from an alternative time-frequency representation based on the constant-Q transform (CQT). Unlike the Fourier transform, the CQT has a logarithmic frequency resolution, mirroring the human auditory system and matching the geometrically spaced frequencies of the Western music scale, therefore better adapted to music mixtures. We also proposed other contributions to enhance DUET, such as adaptive boundaries for the 2d histogram to improve peak resolving when sources are spatially too close to each other, and Wiener filtering to improve source reconstruction. Experiments on mixtures of piano notes and harmonic sources showed that peak/source separation is overall improved, especially at low octaves (under 200 Hz) and for small mixing angles (under pi/6 rad).

Unlike the classic DUET based on the Fourier transform, DUET combined with the CQT can resolve adjacent pitches in low octaves as well as in high octaves thanks to the log frequency resolution of the CQT:

Mixture of 3 piano notes
Estimated A2
Estimated Bb2
Estimated B2
Original A2
Original Bb2
Original B2

DUET combined with the CQT and adaptive boundaries helps to improve separation when sources have low pitches (for example, here, between the two cellos) and/or are spatially too close to each other:

Mixture of 4 instruments
Estimated cello 1
Estimated cello 2
Estimated flute
Estimated strings
Original cello 1
Original cello 2
Original flute
Original strings

For more information about this project, the reader is referred to [31].

2.3. Live Music Fingerprinting (2014)

thresholding
Fig.4. Overview of the fingerprinting stage. The audio signal is first transformed into a log-frequency spectrogram by using the CQT. The CQT-based spectrogram is then transformed into a binary image by using an adaptive thresholding method.

Suppose that you are at a music festival checking on an artist, and you would like to quickly know about the song that is being played (e.g., title, lyrics, album, etc.). If you have a smartphone, you could record a sample of the live performance and compare it against a database of existing recordings from the artist. Services such as Shazam or SoundHound will not work here, as this is not the typical framework for audio fingerprinting or query-by-humming systems, as a live performance is neither identical to its studio version (e.g., variations in instrumentation, key, tempo, etc.) nor it is a hummed or sung melody. We propose an audio fingerprinting system that can deal with live version identification by using image processing techniques. Compact fingerprints are derived using a log-frequency spectrogram and an adaptive thresholding method, and template matching is performed using the Hamming similarity and the Hough Transform.

For more information about this project, the reader is referred to [24].





2.4. Lossy Audio Compression Identification (2018)

compression
Fig.5. Results for an audio example encoded with AC-3. The system identified traces of compression corresponding to AC-3, but not to other lossy coding formats such as MP3, AAC, Vorbis, or WMA.

We propose a system which can estimate from an audio recording that has previously undergone lossy compression the parameters used for the encoding, and therefore identify the corresponding lossy coding format. The system analyzes the audio signal and searches for the compression parameters and framing conditions which match those used for the encoding. In particular, we propose a new metric for measuring traces of compression which is robust to variations in the audio content and a new method for combining the estimates from multiple audio blocks which can refine the results. We evaluated this system with audio excerpts from songs and movies, compressed into various coding formats, using different bit rates, and captured digitally as well as through analog transfer. Results showed that our system can identify the correct format in almost all cases, even at high bitrates and with distorted audio, with an overall accuracy of 0.96.

For more information about this project, the reader is referred to [15].











2.5. Sliding DFT with Kernel Windowing (2018)

kernels
Fig.6. Kernels derived from the (a) Hanning, (b) Blackman, (c) triangular, (d) Parzen, (e) Gaussian (with α = 2.5), and (f) Kaiser (with β = 0.5) windows. The kernels were derived for an N-point DFT where N = 2,048 samples. Only the first 100 coefficients at the bottom-left corner of the N-by-N kernels are shown. The values are displayed in log of amplitude.

The sliding discrete Fourier transform (SDFT) is an efficient method for computing the N-point DFT of a given signal starting at a given sample from the N-point DFT of the same signal starting at the previous sample. However, the SDFT does not allow the use of a window function, generally incorporated in the computation of the DFT to reduce spectral leakage, as it would break its sliding property. We show how windowing can be included in the SDFT by using a kernel derived from the window function, while keeping the process computationally efficient. In addition, this approach allows for turning other transforms, such as the modified discrete cosine transform (MDCT), into efficient sliding versions of themselves.

For more information about this project, the reader is referred to [9].

3. REPET

repet
Fig.7. Overview of REPET.

Repetition is a fundamental element in generating and perceiving structure. In audio, mixtures are often composed of structures where a repeating background signal is superimposed with a varying foreground signal. On this basis, we present the REpeating Pattern Extraction Technique (REPET), a simple approach for separating the repeating background from the non-repeating foreground in an audio mixture. The basic idea is to find the repeating elements in the mixture, derive the underlying repeating models, and extract the repeating background by comparing the models to the mixture. Unlike other separation approaches, REPET does not depend on special parameterizations, does not rely on complex frameworks, and does not require external information. Because it is only based on repetition, it has the advantage of being simple, fast, blind, and therefore completely and easily automatable.

3.1 Original REPET (2011)

repet_original_overview
Fig.8. Overview of the original REPET. Stage 1: calculation of the beat spectrum b and estimation of a repeating period p. Stage 2: segmentation of the mixture spectrogram V and calculation of the repeating segment S. Stage 3: calculation of the repeating spectrogram W and derivation of the time-frequency mask M.

The original REPET aims at identifying and extracting the repeating patterns in an audio mixture, by estimating a period of the underlying repeating structure and modeling a segment of the periodically repeating background.

Experiments on a data set of song clips showed that REPET can be effectively applied for music/voice separation. Experiments also showed that REPET can be combined with other methods to improve background/foreground separation; for example, it can be used as a preprocessor to pitch detection algorithms to improve melody extraction, or as a postprocessor to a singing voice separation algorithm to improve music/voice separation.

REPET can be easily extended to handle varying repeating structures, by simply applying the method along time, on individual segments or via a sliding window. Experiments on a data set of full-track real-world songs showed that this method can be effectively applied for music/voice separation.

For more information about this project, the reader is referred to [30], [13], and [34].

Mixture
Estimated background
Estimated foreground
Original accompaniment
Original vocals
repet_original_example
Fig.9. Music/voice separation using REPET. The mixture is a female singer (foreground) singing over a guitar accompaniment (background). The guitar has a repeating chord progression that is stable along the song. The spectrograms and the mask are shown for 5 seconds and up to 2.5 kHz.

3.2 Adaptive REPET (2012)

repet_adaptive_overview
Fig.10. Overview of the adaptive REPET. Stage 1: calculation of the beat spectrogram B and estimation of the repeating periods pj’s. Stage 2: filtering of the mixture spectrogram V and calculation of an initial repeating spectrogram U. Stage 3: calculation of the refined repeating spectrogram W and derivation of the time-frequency mask M.

The original REPET works well when the repeating background is relatively stable (e.g., a verse or the chorus in a song); however, the repeating background can also vary over time (e.g., a verse followed by the chorus in the song). The adaptive REPET is an extension of the original REPET that can handle varying repeating structures, by estimating the time-varying repeating periods and extracting the repeating background locally, without the need for segmentation or windowing.

Experiments on a data set of full-track real-world songs showed that the adaptive REPET can be effectively applied for music/voice separation.

For more information about this project, the reader is referred to [28] and [34].

Mixture
Estimated background
Estimated foreground
Original accompaniment
Original vocals
repet_adaptive_example
Fig.11. Music/voice separation using the adaptive REPET. The mixture is a male singer (foreground) singing over a guitar and drums accompaniment (background). The guitar has a repeating chord progression that changes around 15 seconds. The spectrograms and the mask are shown for 5 seconds and up to 2.5 kHz.

3.3 REPET-SIM (2012)

repet_sim_overview
Fig.12. Overview of REPET-SIM. Stage 1: calculation of the similarity matrix S and estimation of the repeating indices jk’s. Stage 2: filtering of the mixture spectrogram V and calculation of an initial repeating spectrogram U. Stage 3: calculation of the refined repeating spectrogram W and derivation of the time-frequency mask M.

The REPET methods work well when the repeating background has periodically repeating patterns (e.g., jackhammer noise); however, the repeating patterns can also happen intermittently or without a global or local periodicity (e.g., frogs by a pond). REPET-SIM is a generalization of REPET that can also handle non-periodically repeating structures, by using a similarity matrix to identify the repeating elements.

Experiments on a data set of full-track real-world songs showed that REPET-SIM can be effectively applied for music/voice separation.

REPET-SIM can be easily implemented online to handle real-time computing, particularly for real-time speech enhancement. The online REPET-SIM simply processes the time frames of the mixture one after the other given a buffer that temporally stores past frames. Experiments on a data set of two-channel mixtures of one speech source and real-world background noise showed that the online REPET-SIM can be effectively applied for real-time speech enhancement.

For more information about this project, the reader is referred to [27], [26], and [34].

Mixture
Estimated foreground
Estimated background
Original speech
Original noise
repet_sim_example
Fig.13. Noise/speech separation using REPET-SIM. The mixture is a female speaker (foreground) speaking in a town square (background). The square has repeating noisy elements (passers-by and cars) that happen intermittently. The spectrograms and the mask are shown for 5 seconds and up to 2 kHz.





3.4 uREPET (2015)

Repetition is a fundamental element in generating and perceiving structure in audio. Especially in music, structures tend to be composed of patterns that repeat through time (e.g., rhythmic elements in a musical accompaniment), and also frequency (e.g., different notes of the same instrument). The auditory system has the remarkable ability to parse such patterns by identifying repetitions within the audio mixture. On this basis, we propose a simple user interface system for recovering patterns repeating in time and frequency in mixtures of sounds. A user selects a region in the log-frequency spectrogram of an audio recording from which she/he wishes to recover a repeating pattern covered by an undesired element (e.g., a note covered by a cough). The selected region is then cross-correlated with the spectrogram to identify similar regions where the underlying pattern repeats. The identified regions are finally averaged over their repetitions and the repeating pattern is recovered.

For more information about this project, the reader is referred to [20].

Melody covered by a cough
Recovered melody
Original melody
Original cough
urepet_example1a
Fig.14. Log-spectrogram of a melody with a cough covering the first note. The user selected the region of the cough (solid line) and the system identified similar regions where the underlying note repeats (dashed lines).


urepet_example1b
Fig.15. Log-spectrogram of the melody with the first note recovered. The system averaged the identified regions over their repetitions and filtered out the cough from the selected region.
Accompaniment covered by vocals
Recovered accompaniment
Original accompaniment
Original vocals
urepet_example2a
Fig.16. Log-spectrogram of a song with vocals covering an accompaniment. The user selected the region of the first measure (solid line) and the system identified similar regions where the underlying accompaniment repeats (dashed lines).
urepet_example2b
Fig.17. Log-spectrogram of the song with the first measure of the accompaniment recovered. The system averaged the identified regions over their repetitions and filtered out the vocals from the selected region.
Speech covering a noise
Recovered speech
Original speech
Original noise
urepet_example3a
Fig.18. Log-spectrogram of a speech covering a noise. The user selected the region of the first sentence (solid line) and the system identified similar regions where the underlying noise repeats (dashed lines).
urepet_example3b
Fig.19. Log-spectrogram of the first sentence of the speech extracted. The system averaged the identified regions over their repetitions and extracted the speech from the selected region.

3.5 PROJET-MAG (2017)

We propose a simple user-assisted method for the recovery of repeating patterns in time and frequency which can occur in mixtures of sounds. Here, the user selects a region in a logfrequency spectrogram from which they seek to recover the underlying pattern which is obscured by another interfering source, such as a chord masked by a cough. A cross-correlation is then performed between the selected region and the spectrogram, revealing similar regions. The most similar region is then selected and a variant on the PROJET algorithm, termed PROJET-MAG, is then used to extract the common time-frequency components from the two regions, as well as extracting the components which are not common to both. The results obtained are compared to another user-assisted method based on REPET, and the PROJET-MAG method is demonstrated to give improved results over this baseline.

For more information about this project, the reader is referred to [17].

Melody covered by a cough
Recovered melody uREPET
Recovered melody PROJET-MAG
Original melody
Original cough
Accompaniment covered by vocals
Recovered acc uREPET
Recovered acc PROJET-MAG
Original accompaniment
Original vocals
Speech covering a noise
Recovered speech uREPET
Recovered speech PROJET-MAG
Original speech
Original noise

























4. CODES

4.1. Z

This repository includes a Matlab class, a Python module, a Jupyter notebook, and a Julia module which implement/illustrate several methods/functions for audio signal processing.

ft-spectrogramcqt-spectrogramcqt-chromagrammfcc
Fig.20. FT-spectrogram, CQT-spectrogram, CQT-chromagram, and MFCCs using the Z class.

4.2. REPET

This repository includes a Matlab class and a Python module which implement a number of methods/functions for the different algorithms of the REpeating Pattern Extraction Technique (REPET).

This repository includes Matlab GUIs to demo the original REPET and REPET-SIM.

This repository contains a Matlab GUI for uREPET, a simple user interface system for recovering patterns repeating in time and frequency in mixtures of sounds.

urepet
Fig.21. uREPET GUI.

For more information, the reader is referred to [30], [28], [27], [26], [13], [34], and [20].

4.3. Others

This repository contains a Matlab GUI which implements Zafar's audio player (Zap), featuring some practical functionalities such as a synchronized spectrogram, a select/drag tool, and a playback line.

Fig.21. Zap GUI.

This repository contains Jupyter notebooks with Python coding problems (and solutions). These can be good exercises for beginners and more experienced users to improve and review their programming skills in Python. The problems are borrowed from the Internet and the solutions are given in Jupyter notebooks with detailed comments to help understand them. The proposed solutions are not necessarily optimized so feel free to to contact me if you find anything wrong with them.

5. REFERENCES

5.1. Patents

[1]Rober Coover and Zafar Rafii. "Methods and Apparatus to Fingerprint an Audio Signal via Normalization," 16453654, March 2020. [url]
[2]Zafar Rafii, Markus Cremer, and Bongjun Kim. "Methods and Apparatus to Perform Windowed Sliding Transforms," 15942369, April 2019. [url]
[3]Zafar Rafii, Markus Cremer, and Bongjun Kim. "Methods, Apparatus and Articles of Manufacture to Identify Sources of Network Streaming Services," 15793543, April 2019. [url]
[4]Zafar Rafii. "Methods and Apparatus to Extract a Pitch-independent Timbre Attribute from a Media Signal," 15920060, January 2019. [url]
[5]Markus Cremer, Zafar Rafii, Robert Coover, and Prem Seetharaman. "Automated Cover Song Identification," 15698557, July 2018. [url]
[6]Zafar Rafii and Prem Seetharaman. "Audio Identification based on Data Structure," 15698532, March 2018. [url]
[7]Zafar Rafii. "Audio Matching based on Harmonogram," 14980622, July 2016. [url]
[8] Bryan Pardo and Zafar Rafii. "Acoustic Separation System and Method," 13612413, March 2013. [url]

5.2. Journal Articles

[9]Zafar Rafii. "Sliding Discrete Fourier Transform with Kernel Windowing," IEEE Signal Processing Magazine, vol. 35, no. 6, November 2018. [article]
[10]Zafar Rafii, Antoine Liutkus, Fabian-Robert Stöter, Stylianos Ioannis Mimilakis, Derry FitzGerald, and Bryan Pardo. "An Overview of Lead and Accompaniment Separation in Music," IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 26, no. 8, August 2018. [article]
[11]Zafar Rafii, Zhiyao Duan, and Bryan Pardo. "Combining Rhythm-based and Pitch-based Methods for Background and Melody Separation," IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 22, no. 12, December 2014. [article]
[12]Antoine Liutkus, Derry FitzGerald, Zafar Rafii, Bryan Pardo, and Laurent Daudet. "Kernel Additive Models for Source Separation," IEEE Transactions on Signal Processing, vol. 62, no. 16, August 2014. [article]
[13]Zafar Rafii and Bryan Pardo. "REpeating Pattern Extraction Technique (REPET): A Simple Method for Music/Voice Separation," IEEE Transactions on Audio, Speech, and Language Processing, vol. 21, no. 1, January 2013. [article]
[14]Andrew Todd Sabin, Zafar Rafii, and Bryan Pardo. "Weighting-Function-Based Rapid Mapping of Descriptors to Audio Processing Parameters," Journal of the Audio Engineering Society, vol. 59, no. 6, June 2011. [article]

5.3. Conference Articles

[15]Bongjun Kim and Zafar Rafii. "Lossy Audio Compression Identification," 26th European Signal Processing Conference, Rome, Italy, September 3-7, 2018. [article][poster]
[16]Prem Seetharaman and Zafar Rafii. "Cover Song Identification with 2d Fourier Transform Sequences," 42nd IEEE International Conference on Acoustics, Speech and Signal Processing, New Orleans, USA, March 5-9, 2017. [article][poster]
[17]Derry FitzGerald, Zafar Rafii, and Antoine Liutkus. "User Assisted Separation of Repeating Patterns in Time and Frequency using Magnitude Projections," 42nd IEEE International Conference on Acoustics, Speech and Signal Processing, New Orleans, USA, March 5-9, 2017. [article][poster]
[18]Antoine Liutkus, Fabian-Robert Stöter, Zafar Rafii, Daichi Kitamura, Bertrand Rivet, Nobutaka Ito, Nobutaka Ono, and Lujie Fontecave. "The 2016 Signal Separation Evaluation Campaign," 13th International Conference on Latent Variable Analysis and Signal Separation, Grenoble, France, February 21-23, 2017. [article]
[19]Nobutaka Ono, Zafar Rafii, Daichi Kitamura, Nobutaka Ito, and Antoine Liutkus. "The 2015 Signal Separation Evaluation Campaign," 12th International Conference on Latent Variable Analysis and Signal Separation, Liberec, Czech Republic, August 25-28, 2015. [article]
[20]Zafar Rafii, Antoine Liutkus, and Bryan Pardo. "A Simple User Interface System for Recovering Patterns Repeating in Time and Frequency in Mixtures of Sounds," 40th IEEE International Conference on Acoustics, Speech and Signal Processing, Brisbane, Australia, April 19-24, 2015. [article][poster]
[21]Antoine Liutkus, Derry FitzGerald, and Zafar Rafii. "Scalable Audio Separation with Light Kernel Additive Modelling," 40th IEEE International Conference on Acoustics, Speech and Signal Processing, Brisbane, Australia, April 19-24, 2015. [article][slides]
[22]Derry FitzGerald, Antoine Liutkus, Zafar Rafii, Bryan Pardo, and Laurent Daudet. "Harmonic/Percussive Separation using Kernel Additive Modelling," 25th IET Irish Signals and Systems Conference, Limerick, Ireland, June 26-27, 2014. [article]
[23]Antoine Liutkus, Zafar Rafii, Bryan Pardo, Derry FitzGerald, and Laurent Daudet. "Kernel Spectrogram Models for Source Separation," 4th Joint Workshop on Hands-free Speech Communication Microphone Arrays, Nancy, France, May 12-14, 2014. [article][slides]
[24]Zafar Rafii, Bob Coover, and Jinyu Han. "An Audio Fingerprinting System for Live Version Identification using Image Processing Techniques," 39th IEEE International Conference on Acoustics, Speech and Signal Processing, Florence, Italy, May 4-9, 2014. [article][poster]
[25]Zafar Rafii, Francois G. Germain, Dennis L. Sun, and Gautham J. Mysore. "Combining Modeling of Singing Voice and Background Music for Automatic Separation of Musical Mixtures," 14th International Society for Music Information Retrieval, Curutiba, PR, Brazil, November 4-8, 2013. [article][poster]
[26]Zafar Rafii and Bryan Pardo. "Online REPET-SIM for Real-time Speech Enhancement," 38th IEEE International Conference on Acoustics, Speech and Signal Processing, Vancouver, BC, Canada, May 26-31, 2013. [article][poster]
[27]Zafar Rafii and Bryan Pardo. "Music/Voice Separation using the Similarity Matrix," 13th International Society for Music Information Retrieval, Porto, Portugal, October 8-12, 2012. [article][slides]
[28]Antoine Liutkus, Zafar Rafii, Roland Badeau, Bryan Pardo, and Gaël Richard. "Adaptive Filtering for Music/Voice Separation Exploiting the Repeating Musical Structure," 37th IEEE International Conference on Acoustics, Speech and Signal Processing, Kyoto, Japan, March 25-30, 2012. [article][slides]
[29]Mark Cartwright, Zafar Rafii, Jinyu Han, and Bryan Pardo. "Making Searchable Melodies: Human vs. Machine," 3rd Human Computation Workshop, San Francisco, CA, USA, August 8, 2011. [article][poster]
[30]Zafar Rafii and Bryan Pardo. "A Simple Music/Voice Separation Method based on the Extraction of the Repeating Musical Structure," 36th IEEE International Conference on Acoustics, Speech and Signal Processing, Prague, Czech Republic, May 22-27, 2011. [article][poster]
[31]Zafar Rafii and Bryan Pardo. "Degenerate Unmixing Estimation Technique using the Constant Q Transform," 36th IEEE International Conference on Acoustics, Speech and Signal Processing, Prague, Czech Republic, May 22-27, 2011. [article][poster]
[32]Zafar Rafii and Bryan Pardo. "Learning to control a Reverberator using Subjective Perceptual Descriptors," 10th International Society for Music Information Retrieval, Kobe, Japan, October 26-30 2009. [article][poster]

5.4. Book Chapters

[33]Bryan Pardo, Zafar Rafii, and Zhiyao Duan. "Audio Source Separation in a Musical Context," Handbook of Systematic Musicology, Springer, Berlin, Heidelberg, 2018. [article]
[34]Zafar Rafii, Antoine Liutkus, and Bryan Pardo. "REPET for Background/Foreground Separation in Audio," Blind Source Separation, Springer, Berlin, Heidelberg, 2014. [article]

5.5. Technical Reports

[35]Zafar Rafii and Bryan Pardo. "A Digital Reverberator controlled through Measures of the Reverberation," Northwestern University, EECS Department Technical Report, NWU-EECS-09-08, 2009. [article]

5.6. Tutorials

[36]Josh McDermott, Bryan Pardo, and Zafar Rafii. "Leveraging Repetition to Parse the Auditory Scene," 13th International Society for Music Information Retrieval, Porto, Portugal, October 8-12, 2012. [slides]

5.7. Talks

[37]Zafar Rafii. "Identifying Video Sources by Identifying Audio Compression," Télécom ParisTech, Paris, France, April 13, 2018. [slides]
[38]Zafar Rafii. "Source Separation by Repetition," Center for Computer Research in Music and Acoustics, Stanford University, Stanford, CA, USA, July 15, 2015.
[39]Zafar Rafii. "Source Separation by Repetition," Center for New Music and Audio Technologies, University of California at Berkeley, Berkeley, CA, USA, May 11, 2015.
[40]Zafar Rafii. "An Audio Fingerprinting System for Live Version Identification using Image Processing Techniques," Midwest Music Information Retrieval Gathering, Northwestern University, Evanston, IL, USA, June 14, 2014. [slides]
[41]Zafar Rafii. "A Simple Music/Voice Separation Method based on the Extraction of the Repeating Musical Structure," Télécom ParisTech, Paris, France, July 29, 2011.
[42]Zafar Rafii. "REPET," Midwest Music Information Retrieval Gathering, Northwestern University, Evanston, IL, USA, June 24, 2011. [slides]
[43]Zafar Rafii, Raphael Blouet, and Antoine Liutkus. "Discriminant within Non-negative Matrix Factorization for Musical Components Recognition," DMRN+2: Digital Music Research Network One-day Workshop 2007, Queen Mary, University of London, London, UK, December 18, 2007. [poster]

5.8. Lectures

[44]Zafar Rafii. "Audio Fingerprinting," EECS 352: Machine Perception of Music and Audio, Northwestern University, 2014. [slides]
[45]Zafar Rafii. "REpeating Pattern Extraction Technique (REPET)," EECS 352: Machine Perception of Music and Audio, Northwestern University, 2014. [slides]
[46]Zafar Rafii. "Rhythm Analysis in Music," EECS 352: Machine Perception of Music and Audio, Northwestern University, 2014. [slides]
[47]Zafar Rafii. "Time-frequency Masking," EECS 352: Machine Perception of Music and Audio, Northwestern University, 2014. [slides]

5.9. Data Sets

[48]Zafar Rafii, Antoine Liutkus, Fabian-Robert Stöter, Stylianos Ioannis Mimilakis, and Rachel Bittner. "MUSDB18-HQ – an uncompressed version of MUSDB18," 2019. [url]
[49]Zafar Rafii, Antoine Liutkus, Fabian-Robert Stöter, Stylianos Ioannis Mimilakis, and Rachel Bittner. "The MUSDB18 corpus for music separation," 2017. [url]