A C++ implementation of a Markov Chain Text Generator that works with both English and German texts.
- Generates pseudo-random text based on input samples
- Supports Unicode characters, including German umlauts and ร
- Works with Windows console for proper Unicode display
MarkovChain
: Main class for the Markov Chain implementationsplitIntoWords
: Processes input text into individual wordsgenerateText
: Produces new text based on the built Markov Chain
-
Text Processing:
- Split input text into words
- Convert words to lowercase
- Handle special characters (e.g., German umlauts)
-
Building the Markov Chain:
- Create transitions between consecutive words
- Store transitions in a map structure
-
Text Generation:
- Start with a random or specified word
- Repeatedly select next words based on transition probabilities
- Handle dead ends by restarting from a random word
-
Unicode Handling:
- Use wide strings (
std::wstring
) for Unicode support - Set Windows console to UTF-16 mode for proper display
- Use wide strings (
Compile (using MinGW-w64 on Windows):
g++ markovChain.cpp -o textgen
Run:
./textgen.exe
- First-order Markov Chain (considers only the current word)
- Input text is currently hardcoded in the program
- Optimized for Windows console output
- Implement file I/O for larger input texts
- Enhance to higher-order Markov Chains for more coherent output
- Add save/load functionality for Markov Chains
- Create a user interface for easier interaction