MUNST Multilingual Natural Speech Technology ITC-irst

 
 
Courses  2002/2003 at ICT 
Statistical Spoken Language Processing 

Instructor 
Marcello Federico (ITC-irst)

Description
The course will introduce  statistical methods for speech and language processing through  the analysis of  a case study:  the news-on-demand system currently under development at ITC-irst.  Hence, by assuming an audio/video stream of a broadcast  news program, the course will cover the following issues:

  • audio partitioning: how to find and classify speech segments; 
  • speech  transcription: how  to  convert  speech segments  into corresponding text;
  • named entity recognition: how to detect citations  of names of persons, places, etc.;
  • spoken document retrieval, i.e. how to search news in an archive.

  • As each of the above items  represents a research topic by itself, the sake of the  case study will be that  of presenting different problems under a common statistical framework.

    Pre-requisites
    Elementary calculus and statistics. 

    TextBooks and Course Material
    Copy of the transparencies used  during the lectures will be necessary material to follow the course.  References of book chapters and papers will be made available during the lectures.

    Final examination
    It will be an oral presentation and discussion of one or more research papers from  the literature (assigned  during the course)  followed by questions about the contents of the lessons.

    Schedule

    ITC-irst, Sala Conferenze, Edificio Est
    via Sommarive,  18 - Povo

    2003 Jan  20,21,22,23,24     09:30 - 11:30
    2003 Jan  27,28,29,30,31     09:30 - 11:30 


    Statistical Machine Translation 

    Instructor
    Marcello Federico (ITC-irst)

    Description
    Machine Translation (MT) is one of the oldest and still far from being solved challenges taken on by comnputer science. The course will start by   briefly  reviewing   the  history,   approaches,   progress,  and difficulties  of MT.   The central  topic of  the course  will  be the statistical  MT approach  introduced in  the  early 90's  at IBM.   In particular,  the   following  issues  will   be  covered:  statistical framework  of   MT,  word   alignment  models,  training   and  search algorithms,   performance   evaluation.    Finally,   alternative   MT approaches will be discusses, and current research trends in the field will be shown.

    Pre-requisites
    Elementary calculus and statistics. 

    TextBooks and Course Material
    Copy of the transparencies used  during the lectures will be necessary material to follow the course.  References of book chapters and papers will be made available during the lectures.

    Final examination
    It will be an oral presentation and discussion of one or more research papers from  the literature (assigned  during the course)  followed by questions about the contents of the lessons.

    Schedule

    ITC-irst, Sala Conferenze, Edificio Est
    via Sommarive,  18 - Povo

    2003 Feb 6,10,11,12,13       09:30 -  11:30
    2003 Feb 17,18,19,20,21     09:30 - 11:30 

    References

    Peter F. Brown; John Cocke; Stephen A. Della Pietra; Vincent J. Della Pietra; Fredrick Jelinek; John D. Lafferty; Robert L. Mercer; Paul S. Roossin, "A Statistical Approach to Machine Translation",  Computational Linguistics, Volume 16, Number 2, June 1990. [pdf]

    Peter E Brown; Vincent J. Della Pietra; Stephen A. Della Pietra; Robert L. Mercer, "The Mathematics of Statistical Machine Translation: Parameter Estimation", Computational Linguistics, Volume 19, Number 2, June 1993. [pdf]

    Ulrich Germann; Michael Jahr; Kevin Knight; Daniel Marcu; Kenji Yamada, "Fast Decoding and Optimal Decoding for Machine Translation", Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics, 2001. [pdf]

    Christoph Tillmann; Hermann Ney,"Word Re-ordering and DP-based Search in Statistical Machine Translation", Proceedings of COLING 2000 , The 18th International Conference on Computational Linguistics, 2000. [ pdf ]

    C. Tillmann. "Word Re-Ordering and Dynamic Programming based Search Algorithm for Statistical Machine Translation". PhD dissertation, Aachen, Germany, Mai 2001. [  ps ]

    Interesting  material and alignment software

  • NIST, Automatic Evaluation of Machine Translation Quality using N-gram Co-Occurrence Statistics. [pdf]
  • Johns Hopkins University Summer Workshop on MT, 1999
  • Alignment Software:  GIZA++ (new version)
  • Parallel corpora and others: see Home Page of Kevin Knight (U. of Southern California)

  • Contact :
    Marcello Federico tel. (+39) 0461 314552  federico@fbk.eu