TIMEX2 TIDES Standard for the Annotation of Temporal Expressions
Annotation Guidelines Articles Annotated Corpora Annotation Tools Automatic Taggers Related Links

Home

Corpora

The following corpora have been annotated with TIMEX2 tags.

ACE Time Normalization (TERN) 2004 English Training Data

  • Available from LDC, catalog # LDC2005T07.
  • 306K words annotated for TIMEX2.
  • See attached README for more information.

TIDES Temporal Corpus, parts 1 and 2:

  • Part 1: A parallel corpus of 95 Spanish dialogs, their English translations, and temporal annotations of all the dialogs and their translations (44,081 words of raw text in all). The raw Spanish dialogs are part of the Enthusiast corpus collected earlier at Carnegie-Mellon University.
  • Part 2: 193 documents of the Linguistic Data Consortium’s English TDT-2 corpus annotated with temporal annotations (171,535 words of raw text). This portion of the corpus cannot be released due to copyright restrictions, but the TIMEX2 annotation will soon be available in stand-off format.

ACE-2 Corpus. Available from LDC. This corpus was annotated with TIMEX2 tags according to a slightly updated 2001 version of the standard (one that introduced the anchoring attributes). In preparation for the ACE evaluation, the TIMEX2 tags were re-named <rel_mention_time>, and some attributes were dropped. The original TIMEX2 annotation will be available soon in stand-off format.

Remedia Corpus. This is a collection of Reading Comprehension tests, consisting of short non-fiction stories plus five questions per story and an answer key. The stories and questions have been annotated with a version of the guidelines somewhere between 2001 and 2003 standards. The annotated corpus is freely available, but proof of ownership of the hardcopy version (available from Remedia Publications for $32.99) is required before the annotations can be distributed. Contact Lisa Ferro for more information:

Korean Corpus. This corpus consists of 200 Korean news articles annotated according to the 2001 standard. This corpus was collected as a project of the Korean Government, and then annotated at Georgetown University. The Korean government permits the use of this corpus for non-commercial purposes. For more information contact Inderjeet Mani.


Last updated June 10, 2005.

Home