Making Content
Findable
TM Extractor is an
agnostic* Text Analytics technology that automatically,
without biased human intervention, parses
any subject domain content - text, news,
unstructured information, documents, email,
web pages ... into relevant and contextually
accurate kephrase highlights. In perfect
Context, Accurately and with absolute Relevance. |
Uniquely positioned for web services, Extractor
and its
xAIgent web service can be immediately deployed consuming documents
of any length and subject matter - distilling
that information, news, textual content into precise,
contextual, personally meaningful information
presented in keyword and key phrase summaries.
Extractor's unique patented technology delivers
precise content summaries from any subject
domain automatically - without training or human intervention. |
|
Contextual
A
unique feature of the patented Extractor technology is the ability
to summarize content by showing how keywords and key phrases are used in context of a document
allowing for accurate definition of terminology
and use of the subject. Resulting summaries provide
unparalleled levels of subject relevance. In particular allowing exact analytical comparison of
information and news items to other but
contextually different. Analyzing collections of documents
with contextual relevance is now possible. |
|
Relevant
Information
By design Extractor is an objective provider of content summaries in
contrast to traditional human influenced subjective summary approaches
or approximated systems using Bayesian and
Heuristic approaches.
Statistically proven, Extractor is 85% to 93% accurate across all subject domains.
The ability to quickly discern relevant and meaningful news and information
- in personal context - is
the corner stone of the Extractor Technology.
|
|
Definition of Key Phrase Extraction
Many journals ask their
authors to provide a list of key words for their articles. We call these
key phrases, rather than key words, because they are often phrases
of two or more words, rather than single words. We define a key phrase
list as a short list of phrases (typically five to fifteen phrases)
that capture the main topics discussed in a given document. We define
automatic key phrase extraction as the automatic selection of
important, topical phrases from within the body of a document. Automatic
key phrase extraction is a special case of the more general task of
automatic key phrase generation, in which the generated phrases do
not necessarily appear in the body of the given document. |
|
Key Phrases for Metadata
Many researchers
believe that metadata is essential to address the problems of document
management. Metadata is meta-information about a document or set of
documents. There are several standards for document metadata, including
the Dublin Core Metadata Element Set (championed by the US Online
Computer Library Center), the MARC (Machine-Readable Cataloging) format
(maintained by the US Library of Congress), the GILS (Government
Information Locator Service) standard (from the US Office of Social and
Economic Data Analysis), and the CSDGM (Content Standards for Digital
Geospatial Metadata) standard (from the US Federal Geographic Data
Committee). All of these standards include a field for key phrases
(although they have different names for this field). |
|
Key Phrases for
Highlighting
When we skim a document, we scan for key phrases, to quickly determine
the topic of the document. Highlighting is the practice of emphasizing
key phrases and key passages (e.g., sentences or paragraphs) by
underlining the key text, using a special font, or marking the key text
with a special colour. The purpose of highlighting is to facilitate
skimming. Automatic key phrase extraction can be used for highlighting
and also to enable text-to-speech software to provide audio skimming
capability. |
|
Key Phrases for Indexing
An alphabetical list of key phrases, taken from a collection of documents
or from parts of a single long document (chapters in a book), can serve
as an index. |
|
Interactive Query Refinement
Using a search engine is often an iterative process. The user enters a
query, examines the resulting hit list, modifies the query, then tries
again. Most search engines do not have any special features that support
the iterative aspect of searching. One approach to interactive query
refinement is to take the user's query, fetch the first round of
documents, extract key phrases from them, and then display the first
round of documents to the user, along with suggested refinements to the
first query, based on combinations of the first query with the extracted
key phrases. |
|
Key Phrases for Web Log
Analysis
Web site managers often want to know what visitors to their site are
seeking. Most web servers have log files that record information about
visitors, including the Internet address of the client machine, the file
that was requested by the client, and the date and time of the request.
There are several commercial products that analyze these logs for web
site managers. Typically these tools will give a summary of general
traffic patterns and produce an ordered list of the most popular files
on the web site. A web log analysis program can use key phrases to
provide a deeper view of traffic. Instead of producing an ordered list
of the most popular files on the web site, a log analysis tool can
produce a list of the most popular key phrases on the site. This can give
web site managers insight into which topics on their web site are most
popular. |
|
Workforce
Optimization
Relevant
information is a critical tool for the success of any business today and
providing relevant information in the right context is what gives an
organization an ultimate competitive advantage. Rather than working
through traditional, time consuming, iterative search processes - engage
the Text Mining power of
Extractor to empower information workers with relevant and meaningful
results in direct relation to their needs and
those of today's
dynamic workforce. |
|
|
|