Package ca.cgjennings.algo
Class TextIndexer.DefaultTextMapper
- java.lang.Object
-
- ca.cgjennings.algo.TextIndexer.DefaultTextMapper
-
- All Implemented Interfaces:
TextIndexer.TextMapper
- Enclosing class:
- TextIndexer
public static class TextIndexer.DefaultTextMapper extends java.lang.Object implements TextIndexer.TextMapper
A default text mapper implementation that assumes that the source IDs represent URLs. The returned indexed IDs are identical to the source IDs.
-
-
Constructor Summary
Constructors Constructor Description DefaultTextMapper()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description java.lang.StringgetIndexID(java.lang.String sourceID)Maps a source identifier to an index identifier.java.lang.StringgetText(java.lang.String sourceID)Given a source ID, return the text associated with that ID.protected java.lang.Stringpreprocess(java.lang.String sourceID, java.net.URL url, java.lang.String text)Preprocesses the text after it is read but before it is returned to the caller ofgetText(java.lang.String).protected java.lang.Stringread(java.lang.String sourceID, java.net.URL url, java.lang.String encodingHint)Reads the source document from the URL and returns it as a string of indexable words.protected java.net.URLtoURL(java.lang.String sourceID)Return a URL for the source ID.
-
-
-
Method Detail
-
getIndexID
public java.lang.String getIndexID(java.lang.String sourceID)
Description copied from interface:TextIndexer.TextMapperMaps a source identifier to an index identifier. If the source ID should be identified differently in the index, this returns the version to include in the index.- Specified by:
getIndexIDin interfaceTextIndexer.TextMapper- Parameters:
sourceID- the ID used to locate the text during indexing- Returns:
- the ID used to locate the text when using the index
-
getText
public java.lang.String getText(java.lang.String sourceID) throws java.io.IOExceptionGiven a source ID, return the text associated with that ID. The default mapper does this by callingtoURL(java.lang.String)on the source ID, reading and then preprocessing the result.- Specified by:
getTextin interfaceTextIndexer.TextMapper- Parameters:
sourceID- an identifier that the mapper uses to locate the text- Returns:
- the text mapped to by the ID
- Throws:
java.io.IOException- if an I/O error occurs while fetching the document
-
toURL
protected java.net.URL toURL(java.lang.String sourceID) throws java.io.IOExceptionReturn a URL for the source ID. The default implementation simply returns a new URL using the source ID as if bynew URL(sourceID).- Parameters:
sourceID- returns a URL for the source ID- Returns:
- a URL to use to read the source text
- Throws:
java.io.IOException- if an error occurs while creating the URL
-
read
protected java.lang.String read(java.lang.String sourceID, java.net.URL url, java.lang.String encodingHint) throws java.io.IOExceptionReads the source document from the URL and returns it as a string of indexable words.- Parameters:
sourceID- the identifier of the documenturl- the URL to read the document fromencodingHint- the name of an encoding, ornullto use a default encoding- Returns:
- the document text
- Throws:
java.io.IOException- if an error occurs while reading the document
-
preprocess
protected java.lang.String preprocess(java.lang.String sourceID, java.net.URL url, java.lang.String text)Preprocesses the text after it is read but before it is returned to the caller ofgetText(java.lang.String). The default implementation returns the text unchanged.- Parameters:
sourceID- the identifier of the documenturl- the URL that the document was read fromtext- the original text- Returns:
- the modified text
-
-