Rosette Base Linguistics for English

Comprehensive morphological analysis of English text

The English language presents a number of challenges for automated analysis. For example, its large vocabulary contains a significant number of borrowed and recently invented words. Additional difficulties are presented by the language's verb-particle constructions (i.e. “pick up”), which may be separated by other words (“John picked the newspaper up”) but must be processed as a single dictionary entry. In addition, verbal auxiliaries such as “be,” “have,” and “do” can be considered stop phrases in some cases, and verbs in other cases, adding another layer of complexity to the analysis process.

Basis Technology’s Rosette® Base Linguistics for English, a portable, high-performance linguistic analysis engine, solves these issues by providing a sophisticated morphological analysis of the English language, including stemming and part of speech analysis. It is designed for integration into any application that needs to accurately analyze large volumes of unstructured English text.

Features

  • Normalization
    Separates contractions and converts tokens to lower case
  • Lemmatization and Stemming
    Removes word affixes and returns stem forms, for example returning “walk” for “walked”, “go” for “went” and “they” for “their”
  • Part of speech Analysis
    Accurately identifies parts of speech such as nouns, proper nouns, verbs, and adjectives