DON MINER MATT RODATUS CMSC 443 Genetic algorithm to solve simple substitution ciphers. README There are several options contained at the top of gasimple.py: MIN_NGRAM - The minimum length n-gram to use when guessing characters. This setting is also in train.py and must have the same value (i.e. set to 2, run train.py to generate the serialized object files, and then gasimple.py must have the same value of 2 to correctly use those object files). MAX_NGRAM - The maximum length n-gram to use when guessing characters. This setting is also in train.py and must have the same value. See info about the MIN_NGRAM setting for more information. CHARACTER_SET - The alphabet that is encoded. All other characters are considered to be no encoded. POPULATION_SIZE - The size of the population each generation. If too small, the genetic algorithm will converge too fast to an incorrect solution. If too large, it will an unnecessary amount of time. SURVIVAL_RATE - What top percentile of entities will survive every generation. MUTATION_RATE - The probability that a random mutation will occur instead of an actual gene splice. OUTPUT_EVERY - Show output every OUTPUT_EVERY generation. LAST_GENERATION - The number of generations to be calculated. USAGE train.py Training script used to generate the Markov chains for gasimple.py by reading sample text from standard in. This sample text should be large and representative of the decoded text. train.py will generate some serialized object files in the current directory which will be used by gasimple.py. train.py must be run before gasimple.py may be used. gasimple.py Genetic algorithm to solve simple substitution ciphers. Uses the serialized object files created by train.py and reads the cipher text through standard in. It outputs the best entity in each generation to standard out.