Class MarkovText
- java.lang.Object
-
- ca.cgjennings.text.MarkovText
-
public class MarkovText extends java.lang.Object
Generates random text using a Markov model. The generator must be supplied with a sample text model on which to base the produced text. The text must consist of at least one word, and may consist of any number of words. A word is any uninterrupted sequence of non-whitespace characters. Whitespace characters have special meaning to the generator; any sequence of whitespace is treated as if it were a single plain space.The generator can produce either letter sequences or word sequences. When generating a letter sequence, a specific number of letters is requested by the caller; for word sequences, a specific number of words is requested. When producing word sequences, the system can either choose whole words (in which case every word generated will actually occur in the model text), or it can create a sequence of pseudo-words, which are generated one letter at a time and may include "words" that do not appear in the model text.
Markov modelling produces more realistic text than simply selecting items (letters or words) at random from the model text. Markov models have an order, and the next item that is chosen depends on the previous order items. When order is 0, no previous items are taken into account: the next item depends only on the frequency of items in the model. If A appears twice as often as B, then A is twice as likely to be chosen. If order is 1, then 1 previous item is considered. If the previous item is C, and A follows C three times as often as B follows C, then A is three times more likely to be chosen after a C is chosen.
High orders become increasingly likely to simply reproduce long passages from the source text, because the number of possible choices drops rapidly as the order increases. That is, there will often be only one item that follows the previous order items, so that item will be the only choice that can be made.
- Author:
- Chris Jennings
-
-
Constructor Summary
Constructors Constructor Description MarkovText()
MarkovText(java.lang.CharSequence text)
MarkovText(java.lang.CharSequence text, java.util.Random rand)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description java.lang.String
generateCharacters(int charCount)
GeneratecharCount
letters of text.java.lang.String
generatePseudowords(int wordCount)
GeneratewordCount
words of text, one letter at a time.java.lang.String
generateWords(int wordCount)
GeneratewordCount
words of text, one word at a time.int
getOrder()
Returns the currently set Markov order.void
setOrder(int order)
Set the Markov order to use for text generation.void
setText(java.lang.CharSequence text)
Set the text used as a model for generating new text.
-
-
-
Method Detail
-
setText
public void setText(java.lang.CharSequence text)
Set the text used as a model for generating new text. The text will be broken into words, where a word is any non-whitespace sequence. An exception is thrown if the text does not contain at least one word.- Parameters:
text
- the model text to base generated text upon- Throws:
java.lang.IllegalArgumentException
- if the text does not contain any words
-
getOrder
public int getOrder()
Returns the currently set Markov order. The Markov order determines how many previous words or letters are used for context when choosing the next word or letter.- Returns:
- the current Markov order
-
setOrder
public void setOrder(int order)
Set the Markov order to use for text generation. The Markov order determines how many previous words or letters are used for context when choosing the next word or letter.- Parameters:
order
- the Markov order to use when generating text- Throws:
java.lang.IllegalArgumentException
- if order < 0
-
generateCharacters
public java.lang.String generateCharacters(int charCount)
GeneratecharCount
letters of text.- Parameters:
charCount
- the number of letters to generate.- Returns:
- the generated text
- Throws:
java.lang.IllegalArgumentException
- isn
< 0
-
generatePseudowords
public java.lang.String generatePseudowords(int wordCount)
GeneratewordCount
words of text, one letter at a time.- Parameters:
wordCount
- the number of words to generate.- Returns:
- the generated text
- Throws:
java.lang.IllegalArgumentException
- iswords
< 0
-
generateWords
public java.lang.String generateWords(int wordCount)
GeneratewordCount
words of text, one word at a time.- Parameters:
wordCount
- the number of words to generate.- Returns:
- the generated text
- Throws:
java.lang.IllegalArgumentException
- iswords
< 0
-
-