|
|
![]()
|
|||
![]() |
||||
|
The Extractor API may be used in many different ways, depending on the intended application. Here are a few examples of how you might implement for: The API is designed to
allow maximum flexibility for a wide variety of applications. One Document, One Set of Stop WordsThis is a sketch of how to use the API to process a single text file. This example assumes that there is no need to customize the stop words.
Note that Extractor does not manage the text buffer. Extractor reads the text buffer, but does not change the state of the text buffer in any way. The text buffer must be allocated and freed outside of Extractor. This sketch is essentially what is implemented in the API test wrapper, test_api.c. =============================== Many Documents, One Set of Stop WordsThis is a sketch of how to use the API to process many documents. This example assumes that there is no need to customize the stop words.
In this example, all of the documents share the same set of stop words. Therefore the stop word memory is only created once. This is more efficient than putting ExtrCreateStopMemory inside the for each document loop. =============================== Many Document, Many Sets of Stop WordsThis is a sketch of how to use the API to process many documents. In this example, each document is processed with its own set of stop words.
If the application is a server with many different users, then the users could each have their own personal list of stop words. For example, if the server processes e-mail, then the users might want their own names to be stop words. =============================== Process a Document in SectionsThis is a sketch of how to use the API to process a large document, one section at a time. This example assumes that the same stop words are used for all sections. This could be useful for producing an annotated table of contents for a book. Each section in the book could be annotated by a list of keyphrases, where the keyphrases are extracted from that section alone. This could also be useful for producing an index. Extractor generates a list of three to thirty keyphrases for each document that it processes (depending on ExtrSetNumberPhrases). Thirty keyphrases is not enough to make an index for a book. However, if the book is processed in blocks of about one to five pages per block, then Extractor will generate up to thirty keyphrases for each block. A two-hundred page book could then yield six thousand keyphrases. This will be more than enough to make a good index.
Note that Extractor can efficiently handle very large documents without requiring the documents to be split into smaller chunks. Splitting a document into sections is not necessary to increase the speed or capacity of Extractor. |
||||
|
Features Evaluate online demonstration sample application software development kit Platform operating system Windows Solaris Linux Mac OS HP/UX ... development C / C# Java Perl Python Visual Basic API Functions Great for... workforce optimization web log tagging refined search knowledge management (KM) information retrieval (IR) semantic web development indexing categorization cataloguing inference engines document management Portal Services Examples: Research Internet Communications HomeLand Security Contextual Web Search Document Mangement Indexing Knowledge Management Intellectual Property Filter Intelligent Search Text Summarization Wireless Push Technology Supporting Documentation FAQ Purchase About |
||||