This technical note describes Lexbe's Auto-Language Detection+ service which uses a Natural Language Processing (NLP) service to discover insights from text. This is offered as part of Lexbe’s AI Insights service.
Auto-Language Detection+ service is available to quickly and economically identify and detect foreign language documents (non-English). The AI service provides machine identification of foreign language native documents using AWS auto-language identification functionality. Requires native documents (not images or other documents requiring OCR) for best quality auto-detection. This service identifies from the following non-English languages:
Afrikaans, Albanian, Amharic, Arabic, Armenian, Assamese, Azerbaijani, Bashkir, Basque, Belarusian, Bengali, Bosnian, Bulgarian, Burmese, Catalan, Cebuano, Central Khmer, Chinese (Simplified), Chinese (Traditional), Chuvash, Croatian, Czech, Danish, Dutch, English, Esperanto, Estonian, Finnish, French, Galician, Georgian, German, Greek, Gujarati, Haitian, Hebrew, Hindi, Hungarian, Icelandic, Iloko, Indonesian, Irish, Italian, Japanese, Javanese, Kannada, Kazakh, Kirghiz, Korean, Kurdish, Latin, Latvian, Lithuanian, Luxembourgish, Macedonian, Malagasy, Malay, Malayalam, Marathi, Mongolian, Nepali, Newari, Norwegian, Oriya, Persian, Polish, Portuguese, Punjabi, Pushto, Quechua, Romanian, Russian, Sanskrit, Scottish Gaelic, Serbian, Sindhi, Sinhala, Slovak, Slovenian, Somali, Spanish, Sundanese, Swahili, Swedish, Tagalog, Tajik, Tamil, Tatar, Telugu, Thai, Turkish, Turkmen, Uighur, Ukrainian, Urdu, Uzbek, Vietnamese, Welsh, Yiddish, Yoruba
Lexbe’s Auto-Language Detection+ service can be run on a case (or an isolated document pool within a case) per request by Lexbe’s Professional Service team. Through automated processing, the dominant lanuages are recognized and maintained in a custom document field available for searching and filtering.