Cross-Language Spoken Document Retrieval (CL-SDR)

After preliminary work carried out during 2002, we are pleased to propose a  cross-language
spoken document retrieval track for CLEF 2003.  The track is mostly based on existing resources,
kindly made available by NIST, which were used at TREC 8 and 9.  Hence,  the track results closer
to a benchmark than to a real evaluation.
 

Benchmark Track of 2003

It derives from an extension of evaluation data prepared by NIST for TREC 8-9 SDR tracks:
- a collection of  automatic transcripts  (557  hours) of  American-English news  recordings
   broadcasted  by ABC, CNN,  Public  Radio International,  and Voice of  America  between
   February and June 1998.   Transcripts are provided both with unknown story boundaries,
   and with known story boundaries (21,754 stories).
- a collection of  100  English topics, either in terse or short format.
- relevance assessments
- scoring software for the unknown story boundary condition

The TREC collection has been extended with translations of the short topics into five European
languages: Dutch, Italian, French, German, and Spanish.

A description of the last TREC SDR track can be found  here .

Specifications

- Objective: the track aims at evaluating CLIR systems on noisy automatic transcripts of
   spoken documents with known story boundaries.

- Development data (from TREC 8 SDR):
    a) Document collection:  B1SK Baseline Transcripts, known bounds   download from NIST
    b) Topics:  Short topics in  EnglishDutch, French, German, Italian, and Spanish.
    c) Relevance assessments:  Topics-074-123
    d) Parallel document collections (optional and only available through LDC ):  Textual resources

- Evaluation data (from TREC 9 SDR):
   a) Document collection:  B1SK Baseline Transcripts, known bounds  download from NIST
   b) Topics:  Short topics in EnglishDutch, French, German, Italian, and Spanish.
   c) Relevance assessments:  Topics-124-173
   d) Parallel document collections (optional and only available through LDCTextual resources

- Primary Conditions (mandatory for all participants):
    - Monolingual IR without using any parallel collection (contrastive condition)
    - Bilingual IR from French or German

- Secondary Condition (optional)
    - Monolingual IR using any available parallel collections
    - Bilingual IR from other languages

 - Submission of runs
    - Maximum 12 runs per participant,  with the limit of 3 runs for each considered source language.
    - Runs submitted in the same format of CLEF by e-mail to Marcello Federico (ITC-irst).
 

Deadlines

Registration Open:                                                                          15  January 2003
Data Release:                                                                                    30  January 2003
Topic and Relevance Assessment Release:                               30 January 2003
Submission of Runs by Participants:                                           15 June 2003
Release of Individual Results:                                                        30 June 2003
Submission of Paper for Working Notes:                                   20  July 2003
Workshop:                                                                                         21-22 August 2003
 
 

In order to participate in CLEF 2003, a registration form   must be first compiled, signed, and sent
(by express mail) to Carol Peters at the address below.

Carol Peters - ISTI-CNR
Area della Ricerca di San Cataldo
56124 PISA (Italy)
Tel: +39 050 315 2897 - Fax: +39 050 315 3464
E-mail: carol@iei.pi.cnr.it
Registered participants should contact Marcello Federico  (ITC-irst) to get access
to the  topics and relevance assessments.

Track coordinators: Marcello Federico (ITC-irst) and Gareth Jones (U. Exeter, UK)
 

Back to  CLEF main page