ISLRN

MultiTACRED

Full Official Name: MultiTACRED

Submission date: Oct. 15, 2024, 10:01 p.m.

MultiTACRED was developed by the German Research Center for Artificial Intelligence (DFKI) Speech and Language Technology Lab and is a machine translation of TAC Relation Extraction Dataset (LDC2018T24) (TACRED) into twelve languages with projected entity annotations. TACRED is a large-scale relation extraction dataset containing 106,264 examples built over English newswire and web text used in the NIST TAC KBP English slot filling evaluations during the period 2009-2014. The training and evaluation data for the TAC KBP slot filling tasks was developed by the Linguistic Data Consortium. TACRED training, development and test splits were translated into Arabic, Chinese, Finnish, French, German, Hindi, Hungarian, Japanese, Polish, Russian, Spanish, and Turkish using DeepL or Google Translate. The test split was back-translated into English to generate machine-translated English test data. TACRED annotations are specified by token offsets. For translation, tokens were concatenated with white space, and the entity offsets were converted into XML-style markers to denote argument. Data is presented in JSON format encoded in UTF-8.

Creator(s)

Leonhard Hennig

Philippe Thomas

Sebastian Möller

Distributor(s)

Linguistic Data Consortium

Right Holder(s)

Portions © 1994-1997, 2001-2010 Agence France Presse, © 2005 Aljazeera, © 1996-1997 American Broadcasting Corporation, © 1994-2010 The Associated Press, © 1994-1997 Cable News Network, LP, LLLP, © 1997-1999, 2001, 2003-2010 Central News Agency (Taiwan), © 2005 Dubai TV, © 2005-2006 National Broadcasting Company, Inc., © 1996-1997 National Cable Satellite Corporation, © 1994-1998, 2003-2009 Los Angeles Times - Washington Post News Service, Inc., © 1994-2010 New York Times, © 1996-1997 Public Radio International, © 1994-1995 Reuters America, Inc., © 1996 The University of California, USC Radio and Marketplace, © 2010 The Washington Post Service with Bloomberg News, © 1995-2010 Xinhua News Agency, © 2024 Deutsches Forschungszentrum für Künstliche Intelligenz GmbH, © 2018 The Board of Trustees of the Leland Stanford Junior University, © 1996-1998, 2007, 2008, 2009, 2011, 2014, 2018, 2024 Trustees of the University of Pennsylvania

Status : Accepted

ISLRN :

754-937-284-790-9

Version

1.0

Source

https://catalog.ldc.upenn.edu/LDC2024T09

Resource Type

Primary Text

Media Type

Text

Language(s)

Arabic

English

Finnish

French

German

Hindi

Hungarian

Japanese

Mandarin

Polish

Russian

Spanish

Turkish

Access Medium

Web Download