Authors:
Jordi Janer
and
Esteban Maestre
Affiliation:
Music Technology Group, Pompeu Fabra University, Spain
Keyword(s):
Singing-driven interfaces, Phonetics, Mapping, Sound synthesis.
Related
Ontology
Subjects/Areas/Topics:
Applications
;
Audio and Speech Processing
;
Digital Signal Processing
;
Multimedia
;
Multimedia Signal Processing
;
Pattern Recognition
;
Software Engineering
;
Telecommunications
Abstract:
In voice-driven sound synthesis applications, phonetics convey musical information that might be related to the sound of an imitated musical instrument. Our initial hypothesis is that phonetics are user- and instrument-dependent, but they remain constant for a single subject and instrument. Hence, a user-adapted system is proposed, where mappings depend on how subjects performs musical articulations given a set of examples. The system will consist of, first, a voice imitation segmentation module that automatically determines note-to-note transitions. Second, a classifier determines the type of musical articulation for each transition from a set of phonetic features. For validating our hypothesis, we run an experiment where a number of subjects imitated real instrument recordings with the voice. Instrument recordings consisted of short phrases of sax and violin performed in three grades of musical articulation labeled as: staccato, normal, legato. The results of a supervised training
classifier (user-dependent) are compared to a classifier based on heuristic rules (user- independent). Finally, with the previous results we improve the quality of a sample-concatenation synthesizer by selecting the most appropriate samples.
(More)