Package ca.cgjennings.algo
Class TextIndexer
- java.lang.Object
 - 
- ca.cgjennings.algo.TextIndexer
 
 
- 
- All Implemented Interfaces:
 MonitoredAlgorithm
public class TextIndexer extends java.lang.Object implements MonitoredAlgorithm
Creates a file that can be used to create aTextIndexby indexing the contents of a number of source texts. The indexer uses aTextIndexer.TextMapperto locate source texts from a set of identifiers. The resulting index- Since:
 - 3.0
 - Author:
 - Chris Jennings 
 
 
- 
- 
Nested Class Summary
Nested Classes Modifier and Type Class Description static classTextIndexer.DefaultTextMapperA default text mapper implementation that assumes that the source IDs represent URLs.static interfaceTextIndexer.TextMapperA text mapper maps an identifier to a source text to be indexed. 
- 
Constructor Summary
Constructors Constructor Description TextIndexer()Creates a new text indexer. 
- 
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description static voidcreateIndex(java.io.File indexFile, java.lang.String[] sourceURLs, java.lang.String[] indexIDs)A convenience method that creates an index using the default configuration.java.text.BreakIteratorgetBreakIterator()Returns the break iterator used to split the document into words.TextIndexer.TextMappergetTextMapper()Returns the text mapper used to map source identifiers to texts.TextIndexmakeIndex(java.util.Collection<java.lang.String> sourceIDs)Generates aTextIndexin memory.voidsetBreakIterator(java.text.BreakIterator it)Sets the break iterator used to split the document into words.ProgressListenersetProgressListener(ProgressListener li)Sets the progress listener that will listen for progress on this algorithm, replacing the existing listener (if any).voidsetTextMapper(TextIndexer.TextMapper mapper)Sets the text mapper used to map source identifiers to texts.voidwrite(java.io.File f, java.util.Collection<java.lang.String> sourceIDs)Creates an index for a collection of sources, writing that index to a file.voidwrite(java.io.OutputStream stream, java.util.Collection<java.lang.String> sourceIDs)Creates an index for a collection of sources, writing that index to a stream. 
 - 
 
- 
- 
Method Detail
- 
getTextMapper
public TextIndexer.TextMapper getTextMapper()
Returns the text mapper used to map source identifiers to texts.- Returns:
 - the current mapper
 
 
- 
setTextMapper
public void setTextMapper(TextIndexer.TextMapper mapper)
Sets the text mapper used to map source identifiers to texts.- Parameters:
 mapper- the mapper to use to locate source texts
 
- 
getBreakIterator
public java.text.BreakIterator getBreakIterator()
Returns the break iterator used to split the document into words. Each word will become a searchable word in the index entry unless it is on the stop word list.- Returns:
 - the break iterator used to find words in the source texts
 
 
- 
setBreakIterator
public void setBreakIterator(java.text.BreakIterator it)
Sets the break iterator used to split the document into words.- Parameters:
 it- the break iterator that tokenizes the source texts
 
- 
setProgressListener
public ProgressListener setProgressListener(ProgressListener li)
Description copied from interface:MonitoredAlgorithmSets the progress listener that will listen for progress on this algorithm, replacing the existing listener (if any). A listener should only be set before the algorithm begins executing, not while it is already in progress.- Specified by:
 setProgressListenerin interfaceMonitoredAlgorithm- Parameters:
 li- the listener to set (may benull)- Returns:
 - the previous listener, or 
null 
 
- 
makeIndex
public TextIndex makeIndex(java.util.Collection<java.lang.String> sourceIDs)
Generates aTextIndexin memory. This has a similar effect to writing the index to a file and then immediately creating aTextIndexinstance from the file, but without actually creating the file.- Parameters:
 sourceIDs- the IDs of the documents to include in the index- Returns:
 - a searchable index
 
 
- 
write
public void write(java.io.File f, java.util.Collection<java.lang.String> sourceIDs) throws java.io.IOExceptionCreates an index for a collection of sources, writing that index to a file.- Parameters:
 f- the file to write the index tosourceIDs- the IDs to index- Throws:
 java.io.IOException- if an I/O error occurs
 
- 
write
public void write(java.io.OutputStream stream, java.util.Collection<java.lang.String> sourceIDs) throws java.io.IOExceptionCreates an index for a collection of sources, writing that index to a stream.- Parameters:
 stream- the output stream to write the index tosourceIDs- the IDs to index- Throws:
 java.io.IOException- if an I/O error occurs
 
- 
createIndex
public static void createIndex(java.io.File indexFile, java.lang.String[] sourceURLs, java.lang.String[] indexIDs) throws java.io.IOExceptionA convenience method that creates an index using the default configuration.- Parameters:
 indexFile- the file to write the index tosourceURLs- an array of source URLsindexIDs- an array of identifers to use in the index for the source URL at the same index, ornullto use the sourceURLs- Throws:
 java.io.IOException- if an error occurs while writing the file
 
 - 
 
 -