TADARAV Project

Technologies for automatic annotation of audio data and for the creation of automatic speech recognition interfaces


Project financed by the Ministry of Research and Innovation, Program PN-III-P1-1.2.-PCCDI, no. 73/2018, duration: 2018-2020

Component project of complex project

RETEROM

...
Parallel Projects

COBILIRO TEPROLIN SINTERO

Description

Name: Technologies for automatic annotation of audio data and for the creation of automatic speech recognition interfaces (TADARAV)

General Objective
The project "Technologies for automatic annotation of audio data and for the creation of automatic speech recognition interfaces" (TADARAV) has as main purpose the design, implementation and validation of automated annotation technologies for speech units. This project primarily aims at developing a set of advanced technologies for generating transcriptions correctly aligned with the voice signal from the corpus collected in the COBILIRO component project. As a side effect, the project aims to increase the accuracy of SpeeD's current automatic speech recognition system (RAV) by retraining its acoustic model based on the entire collected speech corpus and using more powerful language models generated in the TEPROLIN component project.

Motivation
At present, over 7,000 different languages ​​or dialects are spoken in the various countries of the world. In this context, if the automated speech recognition systems (RAV) for international languages are very performant, for a large part of the existing languages ​​the task is quite difficult. The reason is very simple: there are not enough acoustic and linguistic resources annotated for these languages. Hence the name of poor-endowed languages. For the development of a RAV system, huge amounts of annotated data are needed to train acoustic, linguistic and phonetic models. The creation of performing RAV systems for poor-endowed languages ​​through the normal development stages (resource acquisition, modeling, algorithm adaptation) is impossible in the near future. This motivates the effort to identify new automatic methods of annotating acoustic and linguistic data (existing abundantly but not yet annotated) in order to obtain RAV systems for poor-endowed languages.
Thus, the specific objectives of the TADARAV project are:
  • Design, implementation and validation of various methods of filtering and aligning approximate transcripts with speech signal
  • Performing an analysis of various ways of calculating RAV reliability scores and then designing, implementing and validating various methods for generating confidence scores well correlated with the correct transcription
  • Design, implement, and validate an automated speech annotation method using multiple complementary RAV systems

Working Plan
The project is structured in 3 stage, corresponding to the reporting stages. Each stage is based on the results of the previous stages or studies in the project and on the results obtained in the parallel projects within the complex project, as follows:
  • Stage 1: Conducting state-of-the-art studies for automatic annotation of speech corpuses by going through specialized literature
  • Stage 2: Design and implementation of basic solutions for automatic annotation of speech corpuses using existing RAV systems
  • Stage 3: Implementation of prototypes / demonstration for automatic annotation of speech corpuses using existing RAV systems

Expected Results
The expected results in each phase of the project (some also based on the results obtained in the parallel projects within the complex project) are as follows:

Stage 1:

  • State-of-the-art study - Methods for using complementary RAV systems to automatically generate annotations
  • State-of-the-art study - Methods for aligning approximate transcripts with speech signal
  • State-of-the-art study - Methods for generating RAV confidence scores

Stage 2:

  • Basic solution for automatic annotation of speech signal using complementary RAV systems
  • Basic filter solution and alignment of approximate transcripts with speech signal
  • Basic solution for generating RAV confidence scores
  • Enhanced automatic speech annotation solution using complementary RAV systems

Stage 3:

  • Analysis report on the impact of complementary RAV use for generating annotations in the context of RAV systems improvement
  • Improved solution for generating reliable RAV scores
  • RAV system
  • Analysis report of the impact of using approximate transcripts for RAW retrainings
  • Analysis report of the impact of using the confidence scores for filtering RAV transcripts for retraining RAV systems

CONSORTIUM

Research Institute for Artificial Intelligence "Mihai Drăgănescu", Bucharest
Technical University Of Cluj-Napoca
University Politehnica Of Bucharest
"Alexandru Ioan Cuza" University Of Iași

Team

University POLITEHNICA of Bucharest

Prof. Corneliu Burileanu

Prof. Dragoș Burileanu

Assoc. Prof. Horia Cucu

PhD Dan Oneață

PhD student Gheorghe Pop

PhD student Lucian Georgescu

Eng. Cristian Manolache


Institute for Research for Artificial Intelligence "Mihai Drăgănescu", Bucharest

Acad. Dan Tufiș

PhD Verginica Mititelu

PhD Radu Ion

PhD Elena Irimia


Technical University of Cluj-Napoca

Prof. Mircea Giurgiu

PhD Adriana Stan


„Alexandru Ioan Cuza” University of Iași

Prof. Dan Cristea

PhD Anca Bibiri

PhD Ionuț Pistol

PhD Diana Trandabăț


Scientific Reports

  • State-of-the-art study - Methods for using complementary RAV systems to automatically generate annotations
  • State-of-the-art study - Methods for aligning approximate transcripts with speech signal
  • State-of-the-art study - Methods for generating RAV confidence scores
  • Report stage 1/2018
  • Report stage 2/2019
  • Report stage 3/2020

Publications

Contact

corneliu.burileanu@upb.ro

horia.cucu@upb.ro