

Benchmark Track of 2003
It derives from an extension of evaluation data prepared by NIST for
TREC 8-9 SDR tracks:
- a collection of automatic transcripts (557 hours)
of American-English news recordings
broadcasted by ABC, CNN, Public Radio
International, and Voice of America between
February and June 1998. Transcripts are provided
both with unknown story boundaries,
and with known story boundaries (21,754 stories).
- a collection of 100 English topics, either in terse or
short format.
- relevance assessments
- scoring software for the unknown story boundary condition
The TREC collection has been extended with translations of the short
topics into five European
languages: Dutch, Italian, French, German, and Spanish.
A description of the last TREC SDR track can be found here .
Specifications
- Objective: the track aims at evaluating CLIR systems on noisy automatic
transcripts of
spoken documents with known story boundaries.
- Development data (from TREC 8 SDR):
a) Document collection: B1SK Baseline Transcripts,
known bounds download
from NIST
b) Topics: Short topics in English
, Dutch,
French,
German,
Italian,
and
Spanish.
c) Relevance assessments: Topics-074-123
d) Parallel document collections (optional and only
available through LDC ): Textual
resources
- Evaluation data (from TREC 9 SDR):
a) Document collection: B1SK Baseline Transcripts,
known bounds download
from NIST
b) Topics: Short topics in English
, Dutch,
French,
German,
Italian,
and
Spanish.
c) Relevance assessments: Topics-124-173
d) Parallel document collections (optional and only available
through LDC ) Textual
resources
- Primary Conditions (mandatory for all participants):
- Monolingual IR without using any parallel collection
(contrastive condition)
- Bilingual IR from French or German
- Secondary Condition (optional)
- Monolingual IR using any available parallel collections
- Bilingual IR from other languages
- Submission of runs
- Maximum 12 runs per participant, with the
limit of 3 runs for each considered source language.
- Runs submitted in the same format of CLEF by e-mail
to Marcello Federico (ITC-irst).
Deadlines
Registration Open:
15 January 2003
Data Release:
30 January 2003
Topic and Relevance Assessment Release:
30 January 2003
Submission of Runs by Participants:
15 June 2003
Release of Individual Results:
30 June 2003
Submission of Paper for Working Notes:
20 July 2003
Workshop:
21-22 August 2003
In order to participate in CLEF 2003, a registration
form must be first compiled, signed, and sent
(by express mail) to Carol Peters at the address below.
Carol Peters - ISTI-CNRRegistered participants should contact Marcello Federico (ITC-irst) to get access
Area della Ricerca di San Cataldo
56124 PISA (Italy)
Tel: +39 050 315 2897 - Fax: +39 050 315 3464
E-mail: carol@iei.pi.cnr.it
Track coordinators: Marcello Federico (ITC-irst) and Gareth Jones (U.
Exeter, UK)
Back to CLEF main page