Truecasing

Truecasing is the problem in natural language processing (NLP) of determining the proper capitalization of words where such information is unavailable. This commonly comes up due to the standard practice (in English and many other languages) of automatically capitalizing the first word of a sentence. It can also arise in badly cased or noncased text (for example, all-lowercase or all-uppercase text messages). Truecasing aids in many other NLP tasks, such as named entity recognition, machine translation and Automatic Content Extraction.^[1]

Truecasing is unnecessary in languages whose scripts do not have a distinction between uppercase and lowercase letters. This includes all languages not written in the Latin, Greek, Cyrillic or Armenian alphabets, such as Japanese, Chinese, Thai, Hebrew, Arabic, Hindi, Georgian, etc.

References

↑ Lita, L. V.; Ittycheriah, A.; Roukos, S.; Kambhatla, N. (2003). "tRuEcasIng". Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics. Sapporo, Japan. pp. 152–159.

Natural language processing

General terms	Text corpus Speech corpus Stopwords Bag-of-words AI-complete n-gram (Bigram, Trigram)

Text analysis	Text segmentation Part-of-speech tagging Text chunking Compound term processing Collocation extraction Stemming Lemmatisation Named-entity recognition Coreference resolution Sentiment analysis Concept mining Parsing Word sense disambiguation Terminology extraction Truecasing

Automatic summarization	Multi-document summarization Sentence extraction Text simplification

Machine translation	Computer-assisted Example-based Rule-based

Automatic identification and data capture	Speech recognition Speech synthesis Optical character recognition Natural language generation

Topic model	Pachinko allocation Latent Dirichlet allocation Latent semantic analysis

Computer-assisted reviewing	Automated essay scoring Concordancer Grammar checker Predictive text Spell checker Syntax guessing

Natural language user interface	Automated online assistant Chatterbot Interactive fiction Question answering

This article is issued from Wikipedia - version of the 2/8/2016. The text is available under the Creative Commons Attribution/Share Alike but additional terms may apply for the media files.