Git Product home page Git Product logo

littleboxes's People

Contributors

jpalpant avatar timpalpant avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar

littleboxes's Issues

Decompose crossword into mostly independent subgraphs

Break down the entire crossword into smaller blocks where fills can be enumerated more exhaustively, then combine the best solutions from these blocks.

For example, in this past Sunday's crossword, you can isolate a block in the top-left corner by making only two cuts:

xword2

The two long words that are cut are most-likely phrases, and have no possible words in the dictionary anyway.

This could be a Solver that first decomposes the puzzle into blocks by finding minimum node cuts, passes each block to a sub-solver, then merges the results from the sub-solvers into a complete solution.

Loading databases from text files is slow

During startup we load several databases into memory to use for solving, such as the clue DB and a dictionary. Loading these from text files is slow (~45s).

Pre-process these resources into a format that can be loaded more quickly (maybe just a Pickle file). It will save time in the long run.

Implement DictionarySolver

Use the Dictionary class to look up potential answers for a given clue. This solver is intended to be used after some letters have been filled in by the ClueDB solver, constraining the set of possible words.

Implement N-gram letter model

Implement a simple model that estimates N-gram probabilities from a provided set of input words.

This model will be used to generate fills for boxes that have not been filled by either ClueDB or Dictionary solvers.

Add benchmark driver to run solver and score its performance

Add a script that runs a solver against puzzles from the last ~6 months and reports statistics about its performance on them. The score for each puzzle should be a number >= 0 representing the deviation from the correct solution. A perfect solution will have score == 0, and imperfect solutions have score > 0. Some tunable parameters should probably be: 1) penalty for unfilled squares, 2) penalty for incorrect squares.

Implement N-gram solver

Use the N-gram model to try to fill in squares that could not be filled by the ClueDB + Dictionary solvers.

Log the solution trajectory as the puzzle is being solved

Implement a plugin that writes out the current state of the puzzle as it is being solved, so that we can visualize how the solver is working and discover deficiencies that might be improved.

This may require some changes to the solver API, or some kind of "hooks" or wrapper around the Crossword class.

Implement puzzle score function

This should be a callable that takes a Crossword and returns a float.

Scale can be arbitrary, lower scores are better. There are some ideas in the Dr. Fill paper for components that are informative. To start, we want something that:

  1. Assigns lowest score to things in the ClueDB,
  2. Assigns lower scores to real words/phrases,
  3. Assigns lower scores to more probable sequences of N-grams

It can have parameters to weight these components, and maybe a script to fit the parameters.

This will be useful for 1) selecting the best solutions generated by the solvers and/or 2) directing the search.

Fuzzy and quantitative ClueDB lookup

The ClueDB currently provides an API for getting potential answers of a given length for an exact clue match. The length is a hard constraint, but matching clues should not be.

A more general version would be useful that allows fuzzy / partial matches of some kind, and returns an estimated confidence/likelihood for each potential answer (based on the number of times that answer has occurred, the number of possible choices, the closeness of the clue match). Ideally this more general ClueDB would have a tuning parameter that controls the strictness of "matches", with one extreme being "exact matches only" (the current behavior) and the other extreme being "all words of the correct length".

Brute-force lookup for clues within a certain Levenshtein distance might work. This paper may or may not have some other useful ideas:

Implement simulated annealing solver

It should take a set of "moves" (puzzle modifications), a score function (#27), and a temperature schedule. The solver will randomly select moves, and accept them according to the Metropolis-Hastings criterion.

The trick will be coming up with a set of moves that allows effective search of the puzzle landscape (all possible letters in all possible boxes). Some potential ideas for moves:

  1. Fill a word with a compatible entry from the ClueDB / Dictionary.
  2. Erase a word or some letters.
  3. Fill some letters based on the N-gram model (#10).

Some initial sketches in the mc_solver branch. If this turns out to be ineffective, an alternative is a refined branch search as described in the Dr. Fill paper.

Implement Dictionary with word phrases

Add an option, when loading a Dictionary, to also include word phrases up to a certain length, since these commonly occur in crosswords.

When solving, we'll want to load words and phrases up to the maximum length answer in the puzzle.

Test fixture puzzle not empty

Running run_tests.sh fails several tests because the puzzle in tests/fixtures/test.puz already has solutions in it. This means that calling get_fill on that puzzle will not return a list of None values.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.