CSTAR Workplan

At the  C-STAR consortium  meeting held in Trento on December 2002,
the decision was taken to organize, on a regular basis, speech translation
evaluation campaigns and workshops, mainly focusing on  speech translation
research and evaluation.  Activities within C-STAR will as well  include the
development of a large multilingual parallel corpus to be used for common

Evaluation Campaign 2003

The first evaluation campaign and workshop will be in May 2003 and Sept. 2003,
respectively. This year, both events will be restricted to CSTAR members only,
and  the evaluation will be limited  to written texts. In particular, training and
testing data will be based on the BTEC  corpus developed by ATR and extended
by the partners to their respective languages.


- The  first  evaluation campaign  will  concentrate  on assessing  text translation
   algorithms on  the tourism domain.  Translation directions will be from Chinese,
   Italian,  Japanese, and Korean into English, for the primary  condition, and any
   other direction  for the  secondary condition.

- Training  data will  consist of  a fixed  amount of  English sentences provided
    with   translations  into the respective source language. Participants will be allowed
    to use any additional monolingual resources, e.g.  text corpora,  grammars,  word lists,
    segmentation tools.

- Test  data of  the primary condition   will  consist  of  English  sentences
    taken  from phrase-books not  included in  the training data.   Test data  for the
    secondary condition will consist of manual translations of the English sentences
    into  all the  considered  source  languages.

-  The  primary condition will  be mandatory for all  participants.  Participants will
    be  invited   to  submit  more  runs  for   each  condition,  possibly corresponding to
    different translation directions.

Evaluation Protocol

- Automatic scoring will  be carried out with  the  NIST/BLEU software. In particular, a
   server  will be  set-up which will  permit  participants  to remotely score  the  output  of
   their  system.  Hence,  for  each  translation direction, multiple translations will be used
   as references.

- Subjective  evaluation on  the primary  condition will  be distributed across the  participant
   sites.  English native  speakers will evaluate the    output   of    each   systems    against   one
   gold-standard reference. Evaluation will follow  guidelines similar to those applied by   LDC
   in the NIST MT evaluation campaigns.

- While  automatic evaluation  will be  applied to  all  submitted runs, subjective evaluation will
    be applied to only one run per participant, namely the first run submitted  under the primary

- Finally, participants are allowed to discuss their results without restriction. Disclosure of the
   results of other  participants is not allowed without their permission.

Important dates

  • Specifications:                                                   28 February 2003
  • Test set preparation:                                        31 March 2003
  • Release of development set:                           26 May 2003
  • Automatic evaluation server:                        08 June 2003
  • Primary Condition:
  • - Test set release:                                      02 June 2003
    - Begin of  run submission:                     09 June 2003
    - End of  run submission:                        20 June 2003  12:00 GMT
    - Subjective evaluation:                           30 July  2003
  • Seconday Condition:
  • - Test set release:                                       02 June 2003
    - Begin of run submission:                       23 June 2003
    - End of run submission:                          04 July 2003  12:00 GMT
  • Workshop (provisional):              08-09 September 2003



    Steering Committee

    H.   Blanchon (CLIPS)
    M.   Federico (ITC-irst)
    H.   Nakaiwa (ATR)
    S.   Oh (ETRI)
    A.  Tribble (CMU/UKA)
    C.   Zong (NLPR)

    Action Plan with Responsible

    - Coordination of evaluation campaign: ITC-irst
    - Maintenance of BTEC corpus (training/test data): ATR
    - Translation and multiple references production in Chinese, Italian, Japanese, and Korean:
        NLPR, ITC-irst, ATR, ETRI
    - Subjective evaluation responsible: CMU/UKA
    - Subjective Evaluators: one English native people per participant
    - Automatic evaluation responsible: ATR

    Registered participants (28 May, 2003)

    - ITC-irst, Italy:        Italian - English, Chinese - English
    - ATR, Japan:            Japanese -English
    - ETRI, Korea:           Korean - English
    - NLPR, China:          Chinese - English
    - UKA, Germany:     Chinese - English