Web Content Summarization

Extractor is a web content summarization utility incorporating patented technology  to summarize text, e-mail and html content such as Search Engine results into weighted lists of keywords and keyphrases. Uniquely positioned for web services, Extractor is immediately capable of consuming documents of any length and subject matter, distilling the precise, contextual meaning of any content into keyword and keyphrase summary formats. Extractor's unique patented technology delivers precise content summaries in any subject domain without retraining and without human intervention.


Contextual

A unique feature of  the patented Extractor technology is the ability to summarize content by showing how keywords and keyphrases are used in context of a document. The resulting summary provides an unparalleled level of subject relevance. This particular feature of Extractor, for instance, allows an analytical comparison of one document against another or collection of documents displaying similar or dissimilar characteristics. Ideal for portal content aggregation, document indexing, keyword linking or semantic-based information systems.


Relevant Information

By design Extractor is an objective provider of content summaries in contrast to traditional human influenced subjective summary approaches. Statistically proven, Extractor is 85% to 93% accurate regardless of subject domain.  The ability to quickly discern relevant and meaningful information is the corner stone of the Extractor Technology.


Definition of Keyphrase Extraction

Many journals ask their authors to provide a list of key words for their articles. We call these keyphrases, rather than key words, because they are often phrases of two or more words, rather than single words. We define a keyphrase list as a short list of phrases (typically five to fifteen phrases) that capture the main topics discussed in a given document. We define automatic keyphrase extraction as the automatic selection of important, topical phrases from within the body of a document. Automatic keyphrase extraction is a special case of the more general task of automatic keyphrase generation, in which the generated phrases do not necessarily appear in the body of the given document.


Keyphrases for Metadata

Many researchers believe that metadata is essential to address the problems of document management. Metadata is meta-information about a document or set of documents. There are several standards for document metadata, including the Dublin Core Metadata Element Set (championed by the US Online Computer Library Center), the MARC (Machine-Readable Cataloging) format (maintained by the US Library of Congress), the GILS (Government Information Locator Service) standard (from the US Office of Social and Economic Data Analysis), and the CSDGM (Content Standards for Digital Geospatial Metadata) standard (from the US Federal Geographic Data Committee). All of these standards include a field for keyphrases (although they have different names for this field).


Keyphrases for Highlighting

When we skim a document, we scan for keyphrases, to quickly determine the topic of the document. Highlighting is the practice of emphasizing keyphrases and key passages (e.g., sentences or paragraphs) by underlining the key text, using a special font, or marking the key text with a special colour. The purpose of highlighting is to facilitate skimming. Automatic keyphrase extraction can be used for highlighting and also to enable text-to-speech software to provide audio skimming capability. 


Keyphrases for Indexing

An alphabetical list of keyphrases, taken from a collection of documents or from parts of a single long document (chapters in a book), can serve as an index.


Keyphrases for Interactive Query Refinement

Using a search engine is often an iterative process. The user enters a query, examines the resulting hit list, modifies the query, then tries again. Most search engines do not have any special features that support the iterative aspect of searching. One approach to interactive query refinement is to take the user's query, fetch the first round of documents, extract keyphrases from them, and then display the first round of documents to the user, along with suggested refinements to the first query, based on combinations of the first query with the extracted keyphrases.


Keyphrases for Web Log Analysis

Web site managers often want to know what visitors to their site are seeking. Most web servers have log files that record information about visitors, including the Internet address of the client machine, the file that was requested by the client, and the date and time of the request. There are several commercial products that analyze these logs for web site managers. Typically these tools will give a summary of general traffic patterns and produce an ordered list of the most popular files on the web site. A web log analysis program can use keyphrases to provide a deeper view of traffic. Instead of producing an ordered list of the most popular files on the web site, a log analysis tool can produce a list of the most popular keyphrases on the site. This can give web site managers insight into which topics on their web site are most popular.


Workforce Optimization

Delegating responsibility is a management 101 lesson. Providing the tools to appropriately empower a workforce for their delegated responsibilities has changed dramatically since management 101 was written. Relevant information is a critical tool for the success of any business and providing relevant information in exact context is what gives an organization the ultimate competitive advantage. Rather than working through the normal, time consuming, iterative search engine process, Extractor empowers corporate information with relevant and meaningful presentations in direct relation to the changing needs of today's dynamic workforce.


Keyphrase definitions and examples are courtesy of Dr. Peter Turney
    

 
    
    
Features

     Evaluate
            online demonstration
           
sample application
           
software development kit
      
     Platform
            operating system
                    Windows
                    Solaris
                    Linux
                    Mac OS
                    HP/UX
                    ...
            development
                    C / C#
                    Java
                    Perl
                    Python
                    Visual Basic

     API Functions

     Great for...
         
workforce optimization
          web log tagging
          refined search
          knowledge management (KM)
          information retrieval (IR)
          semantic web development
          indexing
          categorization
          cataloguing
          inference engines
          document management
          Portal Services

     Examples:
         
Research
          Internet Communications
          HomeLand Security
          Contextual Web Search
          Document Mangement
          Indexing
          Knowledge Management
          Intellectual Property Filter
          Intelligent Search
          Text Summarization
          Wireless Push Technology


     Supporting Documentation

     FAQ

     Purchase

     About

     Contact

     Home