timpalpant / littleboxes Goto Github PK

View Code? Open in Web Editor NEW

1.0 1.0 0.0 91 KB

A crossword solver

License: GNU General Public License v3.0

Python 99.28% Shell 0.72%

littleboxes's People

Contributors

Stargazers

Watchers

littleboxes's Issues

Separate tests and test fixtures from main code

Move the tests into a separate directory like tests/ and make some small test fixtures for ClueDB, Dictionary, etc. in tests/fixtures so that the tests run quickly.

Decompose crossword into mostly independent subgraphs

Break down the entire crossword into smaller blocks where fills can be enumerated more exhaustively, then combine the best solutions from these blocks.

For example, in this past Sunday's crossword, you can isolate a block in the top-left corner by making only two cuts:

The two long words that are cut are most-likely phrases, and have no possible words in the dictionary anyway.

This could be a Solver that first decomposes the puzzle into blocks by finding minimum node cuts, passes each block to a sub-solver, then merges the results from the sub-solvers into a complete solution.

Loading databases from text files is slow

During startup we load several databases into memory to use for solving, such as the clue DB and a dictionary. Loading these from text files is slow (~45s).

Pre-process these resources into a format that can be loaded more quickly (maybe just a Pickle file). It will save time in the long run.

Implement DictionarySolver

Use the Dictionary class to look up potential answers for a given clue. This solver is intended to be used after some letters have been filled in by the ClueDB solver, constraining the set of possible words.

Implement N-gram letter model

Implement a simple model that estimates N-gram probabilities from a provided set of input words.

This model will be used to generate fills for boxes that have not been filled by either ClueDB or Dictionary solvers.

Add benchmark driver to run solver and score its performance

Add a script that runs a solver against puzzles from the last ~6 months and reports statistics about its performance on them. The score for each puzzle should be a number >= 0 representing the deviation from the correct solution. A perfect solution will have score == 0, and imperfect solutions have score > 0. Some tunable parameters should probably be: 1) penalty for unfilled squares, 2) penalty for incorrect squares.

Implement N-gram solver

Use the N-gram model to try to fill in squares that could not be filled by the ClueDB + Dictionary solvers.

Derive ClueDB from *.puz files

Add a script that generates a ClueDB (in *.mpk) format from a set of puzzle files.

Log the solution trajectory as the puzzle is being solved

Implement a plugin that writes out the current state of the puzzle as it is being solved, so that we can visualize how the solver is working and discover deficiencies that might be improved.

This may require some changes to the solver API, or some kind of "hooks" or wrapper around the Crossword class.

Visualize puzzle trajectories

Create a very simple web page or other UI that allows us to load see how puzzle solution trajectories evolve.

Blocked on #12.

Implement puzzle score function

This should be a callable that takes a Crossword and returns a float.

Scale can be arbitrary, lower scores are better. There are some ideas in the Dr. Fill paper for components that are informative. To start, we want something that:

Assigns lowest score to things in the ClueDB,
Assigns lower scores to real words/phrases,
Assigns lower scores to more probable sequences of N-grams

It can have parameters to weight these components, and maybe a script to fit the parameters.

This will be useful for 1) selecting the best solutions generated by the solvers and/or 2) directing the search.

Fuzzy and quantitative ClueDB lookup

The ClueDB currently provides an API for getting potential answers of a given length for an exact clue match. The length is a hard constraint, but matching clues should not be.

A more general version would be useful that allows fuzzy / partial matches of some kind, and returns an estimated confidence/likelihood for each potential answer (based on the number of times that answer has occurred, the number of possible choices, the closeness of the clue match). Ideally this more general ClueDB would have a tuning parameter that controls the strictness of "matches", with one extreme being "exact matches only" (the current behavior) and the other extreme being "all words of the correct length".

Brute-force lookup for clues within a certain Levenshtein distance might work. This paper may or may not have some other useful ideas:

Abernathy, C., & Bao, L. (2012). CRIS: Natural Language Processing of Crossword Clues, 1–8. (https://github.com/lisabao/cris)

Implement simulated annealing solver

It should take a set of "moves" (puzzle modifications), a score function (#27), and a temperature schedule. The solver will randomly select moves, and accept them according to the Metropolis-Hastings criterion.

The trick will be coming up with a set of moves that allows effective search of the puzzle landscape (all possible letters in all possible boxes). Some potential ideas for moves:

Fill a word with a compatible entry from the ClueDB / Dictionary.
Erase a word or some letters.
Fill some letters based on the N-gram model (#10).

Some initial sketches in the mc_solver branch. If this turns out to be ineffective, an alternative is a refined branch search as described in the Dr. Fill paper.

Implement Dictionary with word phrases

Add an option, when loading a Dictionary, to also include word phrases up to a certain length, since these commonly occur in crosswords.

When solving, we'll want to load words and phrases up to the maximum length answer in the puzzle.

Test fixture puzzle not empty

Running run_tests.sh fails several tests because the puzzle in tests/fixtures/test.puz already has solutions in it. This means that calling get_fill on that puzzle will not return a list of None values.

timpalpant / littleboxes Goto Github PK

littleboxes's People

Contributors

Stargazers

Watchers

littleboxes's Issues

Recommend Projects

Recommend Topics

Recommend Org