||Laboratory for the Recognition and Organization of Speech and Audio - LabROSA
Room 7LE4, 7th floor, Schapiro CEPSR, Columbia University New York NY 10027
Lab phone: (212) 854-0235
Contact: Dan Ellis, Asst. Prof. of Elec. Eng.
firstname.lastname@example.org - Room 718 Schapiro CEPSR - (212) 854-8928
How to get here
The Laboratory for the Recognition and Organization of Speech and
Audio (LabROSA) conducts research into automatic means of extracting
useful information from sound. Our vision is of an intelligent
'machine listener', able to interpret live or recorded sound of any
type in terms of the descriptions and abstractions that would make
sense to a human listener.
Our research areas include:
- speech, to extract the words, prosodics, speaker characteristics, etc.
- music, including transcription, classification, and similarity estimation
- environmental sound, such as everyday acoustic ambiences, or even from atypical environments including underwater
- sound mixtures, composed of any or all of the above, where the challenge is extracting whatever information is available when observations are partial or obscured.
Applications for automatic high-level sound analysis to be developed
- indexing, summarization and searching within large audio archives,
such as recorded broadcasts, film catalogs, personal recording devices
- intelligent interaction technologies that have an 'awareness' of their
acoustic environment, and can react appropriately
- automatic monitoring devices e.g. for rapid response to emergencies
in public complexes.
- intelligent handling of audio and music content, including content-based
retrieval, annotation, and recommendation.
- MEAP - the Music Engineering Art Projects (a collaboration with Columbia's Computer Music Center)
- Data-driven Music Understanding - an NSF funded project into mining for structure in music audio
- The Listening Machine - an NSF funded project into techniques for separating and recognizing sounds in mixtures
- Consumer video classification by soundtrack - using features from the soundtrack of environmental-type recordings for classification
- Separating Speech from Speech Noise - an NSF funded project into signal separation aimed specifically at speech
- Music similarity - including a database of popular music along with a collection of subjective similarity measurements for the artists involved.
- Beat tracking and tempo estimation - a simple dynamic programming approach to finding the regularly-spaced events in music audio.
- Cover Song Identification - trying to identify music audio that is based on the same melodic-harmonic content, despite changes in tempo, instrumentation, etc.
- Artist Identification by Timbral and Chroma Features - an implementation of a baseline artist ID system, but also trying out chroma features to see if they help.
- Chord recognition - including our submission to the 2008 MIREX Chord Recognition evaluation.
- artist20 dataset and baseline system for Artist ID - a dataset of 20 artists (1413 tracks) plus a baseline artist identification system
- Music melody extraction - including example data for training and evaluating systems for extracting melody from ensemble music recordings.
- Polyphonic piano transcription - including example data for training and evaluating piano transcription systems.
- Personal Audio Life Logs - our project investigating the use of continuous audio recordings of daily life.
- Speech recognition in noisy, overlapped conditions
- Music analysis to recover structure, and to navigate archives
- Rhythm analysis and modeling - John Arroyo's website describing the project to analyze pop music drum patterns
- Marine Mammal Sounds - whales and dolphins in natural environments
- Pitch contour stylization - using drastically simplified pitch contours with almost no effect on perceived naturalness.
- MESSL - a system for separating and localizing mulitple sound sources even in reverberation.
- snreval - implementations of a collection of signal distortion measures for characterizing corrupted speech.
- renoiser - tool to decompose a noisy speech file in to filtered speech component and noise, and to recombine at new SNR.
- skewview - tool to visualize timing skew between different audio files via short-time cross-correlation.
- chimefind - tool to locate "chime" sounds in long soundfiles.
- findNTs - tool to label "non transmission" (NT) regions in recorded radio signals.
- otanalyze - a cut-down version of renoiser that can decompose "over the air" recordings into a filtered version of a clean signal and a noise residual.
- SAcC - Noise-robust pitch detection based on trained classification of subband autocorrelation principal components.
LabROSA + friends, December 2012.
Last updated: $Date: 2012/12/18 15:43:04 $
Dan Ellis <email@example.com>