PRODUCTS


Rosette Base Linguistics for English

Comprehensive morphological analysis of English text

The English language presents a number of challenges for automated analysis. For example, its large vocabulary contains a significant number of borrowed and recently invented words. Additional difficulties are presented by the language's verb-particle constructions (i.e. “pick up”), which may be separated by other words (“John picked the newspaper up”) but must be processed as a single dictionary entry. In addition, verbal auxiliaries such as “be,” “have,” and “do” can be considered stop phrases in some cases, and verbs in other cases, adding another layer of complexity to the analysis process.

Basis Technology’s Rosette® Base Linguistics for English, a portable, high-performance linguistic analysis engine, provides a sophisticated morphological analysis of the English language, including lemmatization, part of speech analysis, and base noun phrase extraction. It is designed for integration into any application that needs to accurately analyze large volumes of unstructured English text.

Features

  • Lemmatization
    Removes word affixes and returns the lemma or “dictionary” form, for example returning “walk” for “walked”, “go” for “went” and “they“ for “their”
  • Part of speech Analysis
    Accurately identifies parts of speech such as nouns, proper nouns, verbs, and adjectives
  • Base Noun Phrase Extraction
    Extracts complete phrases that include the head noun and any associated modifiers.