|
Web Content Summarization
Extractor is a web content summarization utility incorporating patented technology
to summarize text, e-mail and
html content such as Search Engine results into weighted lists of keywords and keyphrases.
Uniquely positioned for web services, Extractor is immediately capable
of consuming documents of any length and subject matter, distilling the
precise, contextual meaning of any content into keyword and keyphrase
summary formats. Extractor's unique patented technology delivers precise content summaries in
any subject domain without retraining and without human intervention.
Contextual
A
unique feature of the patented Extractor technology is the ability
to summarize content by showing how keywords and keyphrases are used in context of a document. The resulting summary provides an
unparalleled level of subject relevance. This particular feature of
Extractor, for instance, allows an analytical comparison of one document
against another or collection of documents displaying similar or
dissimilar characteristics. Ideal for portal content aggregation,
document indexing, keyword linking or semantic-based information systems.
Relevant
Information
By design Extractor is an objective provider of content summaries in
contrast to traditional human influenced subjective summary approaches.
Statistically proven, Extractor is 85% to 93% accurate regardless of subject domain.
The ability to quickly discern relevant and meaningful information is
the corner stone of the Extractor Technology.
Definition of Keyphrase Extraction
Many journals ask their
authors to provide a list of key words for their articles. We call these
keyphrases, rather than key words, because they are often phrases
of two or more words, rather than single words. We define a keyphrase
list as a short list of phrases (typically five to fifteen phrases)
that capture the main topics discussed in a given document. We define
automatic keyphrase extraction as the automatic selection of
important, topical phrases from within the body of a document. Automatic
keyphrase extraction is a special case of the more general task of
automatic keyphrase generation, in which the generated phrases do
not necessarily appear in the body of the given document.
Keyphrases for Metadata
Many researchers
believe that metadata is essential to address the problems of document
management. Metadata is meta-information about a document or set of
documents. There are several standards for document metadata, including
the Dublin Core Metadata Element Set (championed by the US Online
Computer Library Center), the MARC (Machine-Readable Cataloging) format
(maintained by the US Library of Congress), the GILS (Government
Information Locator Service) standard (from the US Office of Social and
Economic Data Analysis), and the CSDGM (Content Standards for Digital
Geospatial Metadata) standard (from the US Federal Geographic Data
Committee). All of these standards include a field for keyphrases
(although they have different names for this field).
Keyphrases for
Highlighting
When we skim a document, we scan for keyphrases, to quickly determine
the topic of the document. Highlighting is the practice of emphasizing
keyphrases and key passages (e.g., sentences or paragraphs) by
underlining the key text, using a special font, or marking the key text
with a special colour. The purpose of highlighting is to facilitate
skimming. Automatic keyphrase extraction can be used for highlighting
and also to enable text-to-speech software to provide audio skimming
capability.
Keyphrases for Indexing
An alphabetical list of keyphrases, taken from a collection of documents
or from parts of a single long document (chapters in a book), can serve
as an index.
Keyphrases for
Interactive Query Refinement
Using a search engine is often an iterative process. The user enters a
query, examines the resulting hit list, modifies the query, then tries
again. Most search engines do not have any special features that support
the iterative aspect of searching. One approach to interactive query
refinement is to take the user's query, fetch the first round of
documents, extract keyphrases from them, and then display the first
round of documents to the user, along with suggested refinements to the
first query, based on combinations of the first query with the extracted
keyphrases.
Keyphrases for Web Log
Analysis
Web site managers often want to know what visitors to their site are
seeking. Most web servers have log files that record information about
visitors, including the Internet address of the client machine, the file
that was requested by the client, and the date and time of the request.
There are several commercial products that analyze these logs for web
site managers. Typically these tools will give a summary of general
traffic patterns and produce an ordered list of the most popular files
on the web site. A web log analysis program can use keyphrases to
provide a deeper view of traffic. Instead of producing an ordered list
of the most popular files on the web site, a log analysis tool can
produce a list of the most popular keyphrases on the site. This can give
web site managers insight into which topics on their web site are most
popular.
Workforce
Optimization
Delegating
responsibility is a management 101 lesson. Providing the tools to
appropriately empower a workforce for their delegated responsibilities
has changed dramatically since management 101 was written. Relevant
information is a critical tool for the success of any business and
providing relevant information in exact context is what gives an
organization the ultimate competitive advantage. Rather than working
through the normal, time consuming, iterative search engine process,
Extractor empowers corporate information with relevant and meaningful
presentations in direct relation to the changing needs of today's
dynamic workforce.
Keyphrase definitions and examples are
courtesy of Dr. Peter Turney
|