Package ca.cgjennings.algo
Class TextIndexer
- java.lang.Object
-
- ca.cgjennings.algo.TextIndexer
-
- All Implemented Interfaces:
MonitoredAlgorithm
public class TextIndexer extends java.lang.Object implements MonitoredAlgorithm
Creates a file that can be used to create aTextIndex
by indexing the contents of a number of source texts. The indexer uses aTextIndexer.TextMapper
to locate source texts from a set of identifiers. The resulting index- Since:
- 3.0
- Author:
- Chris Jennings
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
TextIndexer.DefaultTextMapper
A default text mapper implementation that assumes that the source IDs represent URLs.static interface
TextIndexer.TextMapper
A text mapper maps an identifier to a source text to be indexed.
-
Constructor Summary
Constructors Constructor Description TextIndexer()
Creates a new text indexer.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description static void
createIndex(java.io.File indexFile, java.lang.String[] sourceURLs, java.lang.String[] indexIDs)
A convenience method that creates an index using the default configuration.java.text.BreakIterator
getBreakIterator()
Returns the break iterator used to split the document into words.TextIndexer.TextMapper
getTextMapper()
Returns the text mapper used to map source identifiers to texts.TextIndex
makeIndex(java.util.Collection<java.lang.String> sourceIDs)
Generates aTextIndex
in memory.void
setBreakIterator(java.text.BreakIterator it)
Sets the break iterator used to split the document into words.ProgressListener
setProgressListener(ProgressListener li)
Sets the progress listener that will listen for progress on this algorithm, replacing the existing listener (if any).void
setTextMapper(TextIndexer.TextMapper mapper)
Sets the text mapper used to map source identifiers to texts.void
write(java.io.File f, java.util.Collection<java.lang.String> sourceIDs)
Creates an index for a collection of sources, writing that index to a file.void
write(java.io.OutputStream stream, java.util.Collection<java.lang.String> sourceIDs)
Creates an index for a collection of sources, writing that index to a stream.
-
-
-
Method Detail
-
getTextMapper
public TextIndexer.TextMapper getTextMapper()
Returns the text mapper used to map source identifiers to texts.- Returns:
- the current mapper
-
setTextMapper
public void setTextMapper(TextIndexer.TextMapper mapper)
Sets the text mapper used to map source identifiers to texts.- Parameters:
mapper
- the mapper to use to locate source texts
-
getBreakIterator
public java.text.BreakIterator getBreakIterator()
Returns the break iterator used to split the document into words. Each word will become a searchable word in the index entry unless it is on the stop word list.- Returns:
- the break iterator used to find words in the source texts
-
setBreakIterator
public void setBreakIterator(java.text.BreakIterator it)
Sets the break iterator used to split the document into words.- Parameters:
it
- the break iterator that tokenizes the source texts
-
setProgressListener
public ProgressListener setProgressListener(ProgressListener li)
Description copied from interface:MonitoredAlgorithm
Sets the progress listener that will listen for progress on this algorithm, replacing the existing listener (if any). A listener should only be set before the algorithm begins executing, not while it is already in progress.- Specified by:
setProgressListener
in interfaceMonitoredAlgorithm
- Parameters:
li
- the listener to set (may benull
)- Returns:
- the previous listener, or
null
-
makeIndex
public TextIndex makeIndex(java.util.Collection<java.lang.String> sourceIDs)
Generates aTextIndex
in memory. This has a similar effect to writing the index to a file and then immediately creating aTextIndex
instance from the file, but without actually creating the file.- Parameters:
sourceIDs
- the IDs of the documents to include in the index- Returns:
- a searchable index
-
write
public void write(java.io.File f, java.util.Collection<java.lang.String> sourceIDs) throws java.io.IOException
Creates an index for a collection of sources, writing that index to a file.- Parameters:
f
- the file to write the index tosourceIDs
- the IDs to index- Throws:
java.io.IOException
- if an I/O error occurs
-
write
public void write(java.io.OutputStream stream, java.util.Collection<java.lang.String> sourceIDs) throws java.io.IOException
Creates an index for a collection of sources, writing that index to a stream.- Parameters:
stream
- the output stream to write the index tosourceIDs
- the IDs to index- Throws:
java.io.IOException
- if an I/O error occurs
-
createIndex
public static void createIndex(java.io.File indexFile, java.lang.String[] sourceURLs, java.lang.String[] indexIDs) throws java.io.IOException
A convenience method that creates an index using the default configuration.- Parameters:
indexFile
- the file to write the index tosourceURLs
- an array of source URLsindexIDs
- an array of identifers to use in the index for the source URL at the same index, ornull
to use the sourceURLs- Throws:
java.io.IOException
- if an error occurs while writing the file
-
-