Light

tedmiston / spelling-bee-answers Goto Github PK

View Code? Open in Web Editor NEW

1.0 4.0 0.0 4.98 MB

An automated archive of NYTimes Spelling Bee puzzle answers 🐝

Home Page: https://www.nytimes.com/puzzles/spelling-bee

License: MIT License

Makefile 1.48% Python 97.09% Shell 1.42%

nytimes nytimes-spelling-bee puzzle archive data python

spelling-bee-answers's Introduction

Spelling Bee Answers ·

An automated archive of NYTimes Spelling Bee puzzle answers

New puzzles are released at 3 am ET.

Puzzles

See Days.md.

Pangrams

See Pangrams.md.

Words

See Words.md.

spelling-bee-answers's People

Contributors

Stargazers

Watchers

spelling-bee-answers's Issues

Info : Alpha sort words lists

Currently they are unsorted which makes them mirror the order in which each word first occurs across the daily puzzles.

CI : Tests

Add running tests to CI.

Unit tests
Integration tests

Data : Backfill historical data

The first puzzle date that I ran this code on was 2023-01-01.

NYT themselves do not provide historical puzzles or any puzzles beyond the current day's.

However, the Spelling Bee goes back to at least May 2018 - https://www.sbsolver.com/archive/2018/05.

On the other hand, the first valid forum URL appears to be 2021-09-20.

It may possible to gather this historical data by various means, such as pages on web archivers.

Create SBSolver scraper (via #47)
...

Quality : Test coverage

Add test coverage report.

Add coverage.py
? Add pytest-cov
? Add readme badge
Setup pytest exclusions - https://stackoverflow.com/questions/64611388/exclude-a-function-from-coverage

Maybe later:

Try statement coverage (instead of line coverage)
- https://www.codewithc.com/pytest-coverage-how-to-use-code-coverage-in-python-with-pytest/
- https://breadcrumbscollector.tech/how-to-use-code-coverage-in-python-with-pytest/
Add parametrization to tests

Info : Related projects

https://nytbee.com

https://www.sbsolver.com

https://spellingbeegrid.com/

https://www.reddit.com/r/nytspellingbee/

https://www.wordplays.com/crossword-solver/

https://spelling-bee-assistant.app/

Info : Word complexity

Add complexity column to all words table.

Add complexity / reading level per word, e.g., https://www.wordcalc.com/readability/.

CI : Security scan

Add a security scan to CI.

Dependency Review or similar from https://github.com/tedmiston/spelling-bee-answers/actions/new?category=security.

~~Already using CodeQL default setup.~~ Update: Disabled because it just burns and burns Actions runner minutes on every commit.

Info : Pangrams list

Create an ongoing list of pangrams like the all words list.

Core : Pydantic model(s)

Create Pydantic model(s) as needed.

Currently all logic interacts with the JSON data files directly without any model / validation / (de)serialization layer. Migrating to models enables cleaner separation of concerns and quality.

Create ~~Day~~ Puzzle model
- https://github.com/tedmiston/spelling-bee-answers/blob/main/days/2023-01-01.json
- ~~Make expiration / freeExpiration keys both optional~~ omit them because they're not useful ¯\_(ツ)_/¯

Draft ideas for potential future models:

? Create Table model
- Fields: headers, rows
? Create ~~DocTemplate~~ TaggedDoc model
- It's not really a template since we're replacing existing contents when we update them vs just render / substitute
? Create Word model
- Include boolean field for is_pangram
?? Create WordList model
- Fields: title, description, words, table

Core : Use timezone-aware datetimes

To reduce timezone-related bugs making runs on CI consistent with local runs.

Add pendulum (https://pendulum.eustace.io/)
Update days.py
Update tests/tests_integration.py
Test on CI

Info : Merge Definition URL into Word column

(As a link.)

UX : Embolden primary column in tables

Make primary column bold in words / days tables for readability.

Embolden words in word lists
Embolden dates in days lists
Re-gen docs

CI : Update All Words list nightly

This can be added to the Answers pipeline and run right after the readme table generation step.

CI : Lint

Add linting to CI.

https://github.com/tedmiston/spelling-bee-answers/actions/new?category=continuous-integration&query=lint

black dry run
pylint
flake8

Maybe others / more later - https://smirnov-am.github.io/python-linters-for-better-code-quality/

Info : Word popularity

Add word popularity / commonality / frequency to all words table.

Need to find a good source for this one.

~~blocked by #11~~

Source ideas:

Info : Aggregate words list

Add list / table of all words across all puzzles.

As a markdown doc
Link from main readme
Use the stats counter script to generate
~~Update nightly~~ → moved to #24

Quality : Scraper interface

Create a simple common interface for the two Spelling Bee scrapers to improve quality.

Info : Word definitions

#13 Added links to definitions on Wordnik. It would be nice to have the definitions inline, say scrape the first occurrence from Wordnik or some heuristic, perhaps?

CI : Cache Poetry binary

Cache Poetry binary itself in CI (not deps installed via Poetry).

The Gr1N/setup-poetry action runs every time on every run which adds ~15–20s.

This is the slowest step in the entire pipeline!

It does not seem to have any built-in feature to cache the Poetry binaries itself.

Maybe I can achieve that via actions/cache?

Alternatively, the setup-python docs mention just using pipx install poetry. How does the performance of that compare to setup-poetry? [How] can that be cached?

Info : Puzzle Editors

Is it always the same puzzle editor?

Create Editors.md with columns: "Name", "Puzzle Count"
...

Core : Package refactor

Refactor core Python code from disparate modules into one cohesive package.

Info : Day Word Counts

Add word count by day to the readme table.

Info : Word definition links

Add definition column to all words table.

Link to the word's definition on Wordnik.

Data : SBSolver scraper

A PoC scraper to parse puzzle data from the SBSolver archive, e.g., https://www.sbsolver.com/s/1.

In the future this can be used for historical data backfilling a la #42. This should allow retrieving (at least some of) the data from 2018–2022.

CI : Core pipeline

Add core CI pipeline.

Lint, tests, etc (see sub-issues):

Info : Puzzle difficulty analysis

Is there a way to assess which days puzzles are harder or easier?

For example, with the full-size crossword, Monday puzzles are easiest and difficulty progressively increases throughout the week through Saturday; Sunday is a bit different though. Anecdotally, I suspect the Bee follows a similar pattern.

Perhaps just using score / points and/or word count from the puzzle directly could be a first pass? There's also the points needed for Genius level metric for each puzzle.

Note: I am not currently tracking the points info in the puzzle JSON files. Can I acquire that historical data?

Info : Daily puzzle markdown pages

Generate daily markdown doc pages, like the All Words page, but for the letters, pangrams, and answers of that day.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.