MANAGING NOISE IN THE SIGNAL: ERROR HANDLING IN NATURAL LANGUAGE PROCESSING
The language that real-world natural language processing systems have to deal with bears little resemblance to the perfectly grammatical examples often found in linguistics textbooks. Instead, it comes to us damaged in various ways: authors introduce spelling and grammatical errors into the texts they type, speakers produce incomplete or otherwise disfluent sentences, OCR systems misrecognize the characters on the printed page, and speech recognition systems produce inaccurate hypotheses as to what was actually said.
Noisy input is a fact of life: our systems ignore it at their peril. For some applications, we require mechanisms which are robust to error; for example, a spoken language dialog system may assign a low confidence to a hypothesis, and as a consequence ask the user to repeat his/her utterance. For other applications, we need to make use of error correction techniques, so that, for example, an OCR system might use contextual post-processing to validate the spellings of words.
This special issue aims to bring together work on error handling in natural language processing from a range of different application areas. Many subfields of NLP have a need to do something about noise in the signal, but rarely do researchers from these diverse areas have an opportunity to compare their methods and techniques. Our aim is to juxtapose work from these different areas in order to encourage cross-fertilization of ideas.
We consider as in-scope for this special issue any papers which describe and discuss techniques that are concerned with processing linguistic data which are in some regard noisy. The most developed subfields here are spelling correction and, to a lesser extent, grammar correction; neither of these are completely solved problems, and as far as errors at the stylistic, semantic, and discourse levels are concerned, automated textual error correction has barely scratched the surface. Robust processing regimes, where the aim is to extract something useful from a broken input, are also of interest, for both speech and text input; and more broadly, repair and recovery techniques in dialog systems are also of relevance.
We encourage submissions on any aspect of natural language processing related to the handling of errors, including in particular: * automatic spelling and grammar correction * semantic and logical errors * stylistic and discourse-level correction * automatic correction of machine-produced texts (OCRs, speech transcripts, etc.) * spelling correction in web search * errors in controlled language input * acquisition, annotation and analysis of errors in real texts * errors in language learning * handling performance errors * building error corpora * text normalization issues * robust NLP techniques * handling disfluent speech * handling errors in speech recognition * confidence measure estimation * managing noise in training corpora * error metrics * error as signatures; watermarking with errors * measuring the seriousness of errors