TIMEX2 TIDES Standard for the Annotation of Temporal Expressions
Annotation Guidelines Articles Annotated Corpora Annotation Tools Automatic Taggers Related Links

TIMEX2 Home

 

Time Expression Recognition and Normalization Evaluation, April - September 2004

The TERN Evaluation requires systems to detect and normalize temporal expressions occurring in English and Chinese text. Participants may choose to forgo the normalization portion of the task if they wish.

 

TERN-2004 Evaluation Workshop

September 23, 2004

The 2004 TERN Evaluation is over. A workshop was held September 23, 2004 to discuss the research and results. Four of the briefings are available for download here.

Briefings

TERN Evaluation Task Overview and Corpus

Annotating the TERN Corpus (includes a comparison to system results, with site names omitted)

TIMEX2 Guideline Issues: Markability and Extent

TIMEX2 Guideline Issues: Normalization

 


 

Call for Partipation


Time Expression Recognition and Normalization (TERN) Evaluation
April-September 2004
Sponsored by the Automatic Content Extraction (ACE) program

Introduction

The objective of the Automatic Content Extraction (ACE) program is to develop natural language processing technology to support automatic understanding of textual data. This includes classification, filtering, and selection based on the meaning conveyed by the data. Thus, the ACE program requires the development of technologies that automatically detect and characterize this meaning.

The Time Expression Recognition and Normalization (TERN) evaluation is based on work that began in 1999 to establish a set of useful guidelines for text annotation and data interchange. The guidelines define a tag called TIMEX2, including attributes for expressing the normalized, intended meaning or value of a broad range of temporal expressions. The work extends the Message Understanding Conferences' definition of the TIMEX category of named entity to include a broader variety of expressions and to offer a normalization scheme.

TIMEX2 is influencing the definition of ACE tasks, in which temporal expressions covered by TIMEX2 will contribute to filling temporal attributes for extracted relations and events. Thus, the production of TIMEX2 annotations is viewed as an ACE component technology. The TERN evaluation is open to sites that want to develop this type of component technology. The evaluation will be offered in both English and Chinese.

Task Definition

The TIMEX2 task requires that temporal expressions mentioned in the source data be detected and normalized according to the “2003 Standard for the Annotation of Temporal Expressions” by Ferro et al., as updated and posted on the project website. Guidelines that are particular to Chinese are documented (with extensive examples) in a separate supplement.

Temporal expressions to be marked include both absolute expressions ("July 17, 1999", "12:00", "the summer of '69") and relative expressions (“yesterday,” “last week,” “the next millennium”). Also markable are durations ("one-hour", "two weeks"), event-anchored expressions (“two days before departure”), and sets of times (“every week”). The degree to which these expressions can be normalized given the current TIMEX2 guidelines varies according to the type and specificity of the expression.

Data

Annotated training and test data are being prepared by the MITRE Corporation, under the supervision of the SPAWAR Systems Center. The text corpora to be used for evaluation are drawn from those selected for the basic ACE 2004 evaluation tasks (Entity Detection and Tracking, Relation Detection and Characterization). Both training and test sets are drawn from broadcast news and news wire sources. The Linguistic Data Consortium (LDC) is managing the distribution of the training materials to TERN participants.

Evaluation

Scores will be reported in terms of precision, recall, and F-measure, as well as in error-based terms of undergeneration, overgeneration, substitution and overall error. The National Institute of Standards and Technology (NIST) is responsible for administering the TERN evaluation, using scoring software prepared by the MITRE Corporation.

Three aspects of TIMEX2 performance will be measured:

  • Detection (correct/missing/spurious): whether a markable expression is detected
    and given a TIMEX2 tag
  • Text (correct/incorrect): the byte offsets of the markable expression (extent)
  • Attributes (correct/incorrect/missing/spurious): the values assigned to each of the
    attributes (VAL, MOD, ANCHOR_DIR, ANCHOR_VAL, SET) within the TIMEX2 tag

Sites that are not prepared to undertake the Attributes (normalization) portion of the task may elect to be evaluated only on the Detection and Text aspects.

Schedule for 2004 Evaluation

Now: Interested sites may obtain evaluation information from this web site and http://www.nist.gov/speech/tests/ace/ace04/index.htm
April 12 - July 1: Increments of training data released by LDC to participants (see below).
June 30: Last day to register as evaluation participant (see below).
August 2 - 13: Evaluation text corpus (English/Chinese) available to participants. Participants must return results to NIST within 24 hours of receipt of the evaluation corpus.
August 13: Last day for participants to submit official results to NIST.
September (date TBD): Evaluation scores released by NIST to participants.
September 23: One-day meeting in conjunction with ACE workshop.

 

 

TERN Resources

Training Data

Training data were available to particpants during the evaluation. The final release, TERN 2004 Training Data V1.3, contains resources for English and Chinese:

English: 862 files (306K words) annotated with TIMEX2 tags, drawn from LDC's TDT2, TDT4, Arabic Treebank (English translations), and Chinese Treebank (English translations) corpora.

Chinese: 503 files (158K words) annotated with TIMEX2 tags, drawn from LDC's TDT4 and Chinese Treebank corpora.

Annotation Guidelines

The English annotation guidelines. Last updated April 12, 2004

Chinese Supplement. Last updated June 04, 2004

Scoring Software

score_timex2.pl. This Perl script compares TIMEX2 tags from two input files, evaluates each on a tag-by-tag basis, and produces summary metrics. See the documentation within the .pl file for further information. Last updated Aug 23, 2004

score_text_timex2.pl. This version of the scorer scores only the portion of the document between <TEXT>...</TEXT> tags. Last updated December 3, 2004.

Annotation Software The annotation software used to create the TERN corpus is Callisto, the configurable annotation workbench. This Java-based software runs on Windows and Unix/Linux. It is available at no cost from callisto.mitre.org. Participants in the TERN evaluation who have received the training corpus can view the annotated .sgml files by importing the files into Callisto. (Go to the File menu, then select Import.)
Evaluation Plan The TERN 2004 Evaluation Plan provides an overview of the data, task definition, and scoring metrics, and contains procedures for submitting results. Last updated April 30, 2004
Contact Us

Questions about the TERN Evaluation should be directed to Lisa Ferro:

Last updated <TIMEX2 val="2004-12-03">December 3 , 2004</TIMEX2>.

TIMEX2 Home