timpalpant / littleboxes Goto Github PK
View Code? Open in Web Editor NEWA crossword solver
License: GNU General Public License v3.0
A crossword solver
License: GNU General Public License v3.0
Move the tests into a separate directory like tests/
and make some small test fixtures for ClueDB, Dictionary, etc. in tests/fixtures
so that the tests run quickly.
Break down the entire crossword into smaller blocks where fills can be enumerated more exhaustively, then combine the best solutions from these blocks.
For example, in this past Sunday's crossword, you can isolate a block in the top-left corner by making only two cuts:
The two long words that are cut are most-likely phrases, and have no possible words in the dictionary anyway.
This could be a Solver that first decomposes the puzzle into blocks by finding minimum node cuts, passes each block to a sub-solver, then merges the results from the sub-solvers into a complete solution.
During startup we load several databases into memory to use for solving, such as the clue DB and a dictionary. Loading these from text files is slow (~45s).
Pre-process these resources into a format that can be loaded more quickly (maybe just a Pickle file). It will save time in the long run.
Use the Dictionary class to look up potential answers for a given clue. This solver is intended to be used after some letters have been filled in by the ClueDB solver, constraining the set of possible words.
Implement a simple model that estimates N-gram probabilities from a provided set of input words.
This model will be used to generate fills for boxes that have not been filled by either ClueDB or Dictionary solvers.
Add a script that runs a solver against puzzles from the last ~6 months and reports statistics about its performance on them. The score for each puzzle should be a number >= 0 representing the deviation from the correct solution. A perfect solution will have score == 0, and imperfect solutions have score > 0. Some tunable parameters should probably be: 1) penalty for unfilled squares, 2) penalty for incorrect squares.
Use the N-gram model to try to fill in squares that could not be filled by the ClueDB + Dictionary solvers.
Add a script that generates a ClueDB (in *.mpk) format from a set of puzzle files.
Implement a plugin that writes out the current state of the puzzle as it is being solved, so that we can visualize how the solver is working and discover deficiencies that might be improved.
This may require some changes to the solver API, or some kind of "hooks" or wrapper around the Crossword class.
Create a very simple web page or other UI that allows us to load see how puzzle solution trajectories evolve.
Blocked on #12.
This should be a callable that takes a Crossword and returns a float.
Scale can be arbitrary, lower scores are better. There are some ideas in the Dr. Fill paper for components that are informative. To start, we want something that:
It can have parameters to weight these components, and maybe a script to fit the parameters.
This will be useful for 1) selecting the best solutions generated by the solvers and/or 2) directing the search.
The ClueDB currently provides an API for getting potential answers of a given length for an exact clue match. The length is a hard constraint, but matching clues should not be.
A more general version would be useful that allows fuzzy / partial matches of some kind, and returns an estimated confidence/likelihood for each potential answer (based on the number of times that answer has occurred, the number of possible choices, the closeness of the clue match). Ideally this more general ClueDB would have a tuning parameter that controls the strictness of "matches", with one extreme being "exact matches only" (the current behavior) and the other extreme being "all words of the correct length".
Brute-force lookup for clues within a certain Levenshtein distance might work. This paper may or may not have some other useful ideas:
It should take a set of "moves" (puzzle modifications), a score function (#27), and a temperature schedule. The solver will randomly select moves, and accept them according to the Metropolis-Hastings criterion.
The trick will be coming up with a set of moves that allows effective search of the puzzle landscape (all possible letters in all possible boxes). Some potential ideas for moves:
Some initial sketches in the mc_solver
branch. If this turns out to be ineffective, an alternative is a refined branch search as described in the Dr. Fill paper.
Add an option, when loading a Dictionary, to also include word phrases up to a certain length, since these commonly occur in crosswords.
When solving, we'll want to load words and phrases up to the maximum length answer in the puzzle.
Running run_tests.sh fails several tests because the puzzle in tests/fixtures/test.puz already has solutions in it. This means that calling get_fill on that puzzle will not return a list of None values.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.