The Savas Project

The SAVAS project will collect spoken and textual resources in six European languages and build domain-specific Large Vocabulary Continuous Speech Recognizers (LVCSR) to solve the automated subtitling needs of the Media Industry. More specifically, the main objectives of the project are:

  1. To make more effective the acquisition and annotation of audiovisual language resources produced by broadcasters and subtitling companies for the development of LVCSR systems targeting automated subtitling
  2. To deploy a platform to share audiovisual language resources between the media industry and the LVCSR developers through the most suitable legal and business data trading approaches within the Media Industry
  3. To show the impact of feeding LVCSR technology with existing audiovisual language resources for automated subtitling purposes

In order to achieve these goals, SAVAS will:

  • collect spoken and textual resources in the languages addressed from the broadcasters and subtitling companies acting as data providers within the consortium;
  • transcribe and annotate the collected corpora into a form suitable to train acoustic and language models of LVCSR systems using a combination of automatic and collaborative approaches;
  • build a local META-SHARE repository containing the collected and annotated SAVAS language resources to allow their reuse;
  • adapt and train dictation and transcription LVCSR systems with the SAVAS language resources;
  • integrate and evaluate the developed systems into several automated subtitling application scenarios in order to show the impact of audiovisual data sharing for automated subtitling.