SemEval-2007

Task #10: English Lexical Substitution

Organized by:
- Diana McCarthy University of Sussex, UK
- Roberto Navigli University of Rome "La Sapienza"., Italy
Issues with the trial data
1. We are aware that there was an error in the dtd released with the trial dataset. The lines
  "<!ATTLIST instance
  id ID #REQUIRED>"
  should be replaced with
  "<!ATTLIST instance id CDATA #REQUIRED>"
  because our IDs are numeric. We will fix this in the test run version.
2. We are aware that there are xml errors in the input file (lexsub_trial.xml). For example, XML entities representing extended characters are sometimes messed up. This partcularly happens with quotes and apostrophes and also non-English characters. This is because we have left the original corpus data (http://corpus.leeds.ac.uk/internet.html) as it was without any further cleaning of the sentences. This is how the annotators will have received it. You can fix some of the xml issues if you wish with the perl script available here hope to provide a simple tool for participants to use on the data if they wish. Thanks to Richard Wicentowski for drawing this to our attention this and suggesting the fix.
3. Also thanks to Richard Wicentowski, we realise there are a few spelling mistakes (optimisitc,ascertail, ininformed, assesment, vicioous) in the following lines of the file "gold.trial":-
Contact
if you have questions please write to lexsub at sussex dot ac dot uk
Back to task web page

For more information, visit the SemEval-2007 home page.