Tadarav

TADARAV Project

Technologies for automatic annotation of audio data and for the creation of automatic speech recognition interfaces

Project financed by the Ministry of Research and Innovation, Program PN-III-P1-1.2.-PCCDI, no. 73/2018, duration: 2018-2020

Component project of complex project

RETEROM

...
Parallel Projects

COBILIRO TEPROLIN SINTERO

Description

Name: Technologies for automatic annotation of audio data and for the creation of automatic speech recognition interfaces (TADARAV)

General Objective
The project "Technologies for automatic annotation of audio data and for the creation of automatic speech recognition interfaces" (TADARAV) has as main purpose the design, implementation and validation of automated annotation technologies for speech units. This project primarily aims at developing a set of advanced technologies for generating transcriptions correctly aligned with the voice signal from the corpus collected in the COBILIRO component project. As a side effect, the project aims to increase the accuracy of SpeeD's current automatic speech recognition system (RAV) by retraining its acoustic model based on the entire collected speech corpus and using more powerful language models generated in the TEPROLIN component project.

Motivation
At present, over 7,000 different languages or dialects are spoken in the various countries of the world. In this context, if the automated speech recognition systems (RAV) for international languages are very performant, for a large part of the existing languages the task is quite difficult. The reason is very simple: there are not enough acoustic and linguistic resources annotated for these languages. Hence the name of poor-endowed languages. For the development of a RAV system, huge amounts of annotated data are needed to train acoustic, linguistic and phonetic models. The creation of performing RAV systems for poor-endowed languages through the normal development stages (resource acquisition, modeling, algorithm adaptation) is impossible in the near future. This motivates the effort to identify new automatic methods of annotating acoustic and linguistic data (existing abundantly but not yet annotated) in order to obtain RAV systems for poor-endowed languages.
Thus, the specific objectives of the TADARAV project are:

Design, implementation and validation of various methods of filtering and aligning approximate transcripts with speech signal

Performing an analysis of various ways of calculating RAV reliability scores and then designing, implementing and validating various methods for generating confidence scores well correlated with the correct transcription

Design, implement, and validate an automated speech annotation method using multiple complementary RAV systems

Working Plan
The project is structured in 3 stage, corresponding to the reporting stages. Each stage is based on the results of the previous stages or studies in the project and on the results obtained in the parallel projects within the complex project, as follows:

Stage 1: Conducting state-of-the-art studies for automatic annotation of speech corpuses by going through specialized literature

Stage 2: Design and implementation of basic solutions for automatic annotation of speech corpuses using existing RAV systems

Stage 3: Implementation of prototypes / demonstration for automatic annotation of speech corpuses using existing RAV systems

Expected Results
The expected results in each phase of the project (some also based on the results obtained in the parallel projects within the complex project) are as follows:

Stage 1:

State-of-the-art study - Methods for using complementary RAV systems to automatically generate annotations

State-of-the-art study - Methods for aligning approximate transcripts with speech signal

State-of-the-art study - Methods for generating RAV confidence scores

Basic solution for automatic annotation of speech signal using complementary RAV systems

Stage 2:

Basic filter solution and alignment of approximate transcripts with speech signal

Basic solution for generating RAV confidence scores

Enhanced automatic speech annotation solution using complementary RAV systems

Stage 3:

Analysis report on the impact of complementary RAV use for generating annotations in the context of RAV systems improvement

Improved solution for generating reliable RAV scores

RAV system

Analysis report of the impact of using approximate transcripts for RAV retrainings

Analysis report of the impact of using the confidence scores for filtering RAV transcripts for retraining RAV systems

CONSORTIUM

Research Institute for Artificial Intelligence "Mihai Drăgănescu", Bucharest

Technical University Of Cluj-Napoca

University Politehnica Of Bucharest

"Alexandru Ioan Cuza" University Of Iași

Team

University POLITEHNICA of Bucharest

Prof. Corneliu Burileanu

Prof. Dragoș Burileanu

Assoc. Prof. Horia Cucu

PhD Dan Oneață

PhD student Gheorghe Pop

PhD student Lucian Georgescu

Eng. Cristian Manolache

Institute for Research for Artificial Intelligence "Mihai Drăgănescu", Bucharest

Acad. Dan Tufiș

PhD Verginica Mititelu

PhD Radu Ion

PhD Elena Irimia

Technical University of Cluj-Napoca

Prof. Mircea Giurgiu

PhD Adriana Stan

„Alexandru Ioan Cuza” University of Iași

Prof. Dan Cristea

PhD Anca Bibiri

PhD Ionuț Pistol

PhD Diana Trandabăț

Scientific Reports

State-of-the-art study - Methods for using complementary RAV systems to automatically generate annotations
State-of-the-art study - Methods for aligning approximate transcripts with speech signal
State-of-the-art study - Methods for generating RAV confidence scores
Analysis report on the impact of complementary RAV use for generating annotations in the context of RAV systems improvement
Analysis report of the impact of using approximate transcripts for RAV retrainings
Analysis report of the impact of using the confidence scores for filtering RAV transcripts for retraining RAV systems

Report stage 1/2018
Report stage 2/2019
Report stage 3/2020

Publications

D. Oneață, A. Caranica, A. Stan, H. Cucu, “An evaluation of word-level confidence estimation for end-to-end automatic speech recognition,” in the Proceedings of the IEEE Spoken Language Technology Workshop (SLT), Virtual, 2021

A.-L. Georgescu,, C. Manolache, D. Oneață, H. Cucu, C. Burileanu, “Data-filtering methods for self-training of automatic speech recognition systems,” in the Proceedings of the IEEE Spoken Language Technology Workshop (SLT), Virtual, 2021

G. Pop, H. Cucu, D. Burileanu, C. Burileanu, “Cough Sound Recognition in Respiratory Disease Epidemics,” in Romanian Journal of Information Science and Technology, vol. 23, no. S, pp. S77–S89, 2020, ISSN 1453-8245, ISI IF 0.661

D. Oneaţă, A.-L. Georgescu, H. Cucu, D. Burileanu, C. Burileanu, “Revisiting SincNet: An Evaluation of Feature and Network Hyperparameters for Speaker Recognition,” in the Proceedings of the 28th European Signal Processing Conference (EUSIPCO), Amsterdam, The Netherlands, 2020.

C. Manolache, A.-L. Georgescu, A. Caranica, H. Cucu, “Automatic Annotation of Speech Corpora using Approximate Transcripts,” in the Proceedings of the 43rd International Conference on Telecommunications and Signal Processing (TSP), 2020, Milano, Italy

A.-L. Georgescu, H. Cucu, A. Buzo, C. Burileanu, “RSC: A Romanian Read Speech Corpus for Automatic Speech Recognition,” in the Proceedings of The 12th Language Resources and Evaluation Conference (LREC), pp. 6606-6612, 2020, Marseille, France

Alexandru-Lucian Georgescu, Horia Cucu, Corneliu Burileanu “Progress on automatic annotation of speech corpora using complementary ASR systems”, in the Proceedings of the 42nd International Conference on Telecommunications and Signal Processing (TSP), 2019, Budapest, Hungary

Gheorghe Pop, Șerban Mihalache, Dragoș Burileanu "Forensic Recognition of Narrowband AMR Signals", in the Proceedings of the 10th Conference on Speech Technology and Human-Computer Dialogue (SpeD), 2019, Timișoara, România

Alexandru-Lucian Georgescu, Horia Cucu, Corneliu Burileanu "Kaldi-based DNN architectures for speech recognition in Romanian", in the Proceedings of the 10th Conference on Speech Technology and Human-Computer Dialogue (SpeD), 2019, Timișoara, România

Gheorghe Pop and Dragoș Burileanu "Speech Enhancement for Forensic Purposes", in UPB Scientific Bulletin, Series C, Vol. 81, Issue 3, pp. 41‑52, 2019

Florin Iordache, Alexandru-Lucian Georgescu, Dan Oneaţă, Horia Cucu "Romanian Automatic Diacritics Restoration Challenge", in the Proceedings of the 14th International Conference on Linguistics Resources and Tools for Natural Language Processing, Cluj-Napoca, Romania, 2019

Alexandru-Lucian Georgescu, Horia Cucu, “Automatic annotation of speech corpora using complementary GMM and DNN acoustic models,” in the Proceedings of the 41st International Conference on Telecommunications and Signal Processing (TSP), 2018, Athens, Greece

Contact

corneliu.burileanu@upb.ro

horia.cucu@upb.ro