“Basis Technology’s linguistics software increases the ease, speed and precision of detecting critical information by assisting analysts in mining documents in their native language and identifying, disambiguating and clearly labeling the most important words and phrases. These capabilities are in demand throughout government and industry as the sheer volume of multilingual data collected continues to grow. Like industry, government faces a dramatically increasing need for automated processes to help users quickly find and act on the most important pieces of information.” — Greg Pepus |
National Security depends on the ability to extract critical information from documents written in foreign languages. Information must be filtered and analyzed in time to act.
The volumes of data are massive, the need is urgent, and the missions are complex.
They are:
- Open Source Intelligence (OSINT)
- Document and Media Exploitation (DOCEX)
- E-mail and instant message analysis
- Chat room and web site monitoring
- Terrorist and money laundering watch lists
Even with automated assistance, information extraction presents many challenges:
You may not know what is relevant. Until you see a specific word in context, you might not know that you should have looked for it. For example, you might not know ahead of time that specific names, organizations, locations, and dates might be relevant.
Written forms of the same name will vary. Will you recognize a name when you see it again? Names orginating from the Arabic language can have several different spellings when using the Latin alphabet. In addtion to this, in one context "Fouad" may be the common noun "heart" or "mind" while in another, it's the name of a person.
Search tools only know exact match. In some languages, word forms and sentence structures vary widely in text. For example, in Arabic, words frequently include affixes which are linguistic "subwords" that change or add meaning. This prevents an exact text match from working.
Multilingual Information Extraction: A Multi-Tier Approach
Fast, accurate information extraction from high volumes of non-English text is a multi-tier problem. Basis Technology delivers a multi-tier set of advanced linguistic components whose functions can be called by your application as it needs them. Mix and match these interoperable software modules to meet your particular needs. Each is accessible at the program level via Basis-provided C++, .NET, or Java APIs and for the end user via a command line interface.
With 40+ government deployments (200+ commercial) and over a decade of experience, Basis Technology is a trusted source of multilingual information processing components for Federal government agencies.
Here’s a product overview, with problems solved:
| Product | Problem solved |
|---|---|
| Rosette Language Identifier | Identifies the language(s) in a document so that applications can properly categorize, search, process, and store its data |
| Rosette Base Linguistics | Discovers structure in unstructured text so that large scale document handling systems can identify, classify, analyze, index and search |
| Rosette Name Translator | Provides the precise, correct English version of a name |
| Rosette Name Indexer | Matches names written in English with the official name of a person or or location in a foreign country |
| Rosette Entity Extractor | Locates important concepts (e.g., names, locations, dates) |
| Desktop Solutions | Problem Solved |
|---|---|
| Transliteration Assistant | Automates the process of writing Arabic names according to officially mandated spellings |
| Arabic Editor | Powerful desktop environment for composition and analysis of Arabic language documents |
| GeoScope | Smart viewer for digital maps capable of searching multilingual place names |
Partial List of Government
Customers:
- Department of Defense
- FBI
- U.S. Intelligence Community
- Commercial customers
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Read about our work with the Intelligence Community
The 451 Group- “Basis looks good to exploit changing text analysis market in any language” |
|
![]() |
KMWorld - “Speaking in tongues: Foreign language KM Technologies” |
Economist |
|
![]() |
EContent Magazine |
![]() |
EContent Magazine |
Military Information Technology |
|
Military Information Technology |
Information Week |
CNET News.com |
![]() |
Government Computer News |
![]() |
Federal News Radio |
![]() |
The Washington Post |
Complete list of press coverage















