Git Product home page Git Product logo

orchard-street-wordlists's Introduction

Orchard Street Wordlists

Fresh wordlists for all your passphrase-creation needs. Use these wordlists to create strong, secure passphrases, either with dice or a password manager/generator.

  • Made up of common words found in (English) Wikipedia and Google Books
  • Uniquely decodable, and thus safe to combine words in passphrases without a delimiter
  • Free of profane words, abbreviations, and British spellings
  • Available in a variety of lengths for different use-cases

NOTE: These lists are occasionally edited. If you want a static, unchanging copy of any of the word lists, feel free to download the lists as they are currently, download the latest tag/release, or fork this repository at any time. See licensing information below.

Orchard Street Long List

The Orchard Street Long List is a 17,576-word list. It provides a hefty 14.1 bits of entropy per word, meaning a 7-word passphrase gives almost 99 bits of entropy.

List length               : 17576 words
Mean word length          : 7.98 characters
Length of shortest word   : 3 characters (add)
Length of longest word    : 15 characters (troubleshooting)
Free of prefix words?     : false
Uniquely decodable?       : true
Entropy per word          : 14.101 bits
Efficiency per character  : 1.767 bits
Above brute force line?   : true
Mean edit distance        : 7.915

Word samples
------------
plank billionaire evaluated punched proficiency positioned
symptom commensurate spit connector misguided royalties
brokerage losers policy diagram graceful publishing
successors redesigned companions intrusion alternatives cleaned
rationalism coupons cosmos clarification translation blaming

Orchard Street Medium List

The Orchard Street Medium List has 8,192 (213) words. This length is optimized for binary computers and their random number generators. It gives a nice round 13.00 bits of entropy per word, which makes entropy calculations a bit easier for us humans.

List length               : 8192 words
Mean word length          : 7.07 characters
Length of shortest word   : 3 characters (add)
Length of longest word    : 10 characters (worthwhile)
Free of prefix words?     : false
Uniquely decodable?       : true
Entropy per word          : 13.000 bits
Efficiency per character  : 1.839 bits
Above brute force line?   : true
Mean edit distance        : 6.966

Word samples
------------
adding pilots maximal website opponent attraction
dispatched confirms chapter eagle brains arising
brethren nations palms vaccine relocation basis
motorway tidal jewelry warn alleged courtesy
impacts nature gauge quartz provisions exam

This list is used by the Buttercup password manager.

Orchard Street Diceware List

The Orchard Street Diceware List is our version of the classic Diceware list. With this list, you can use dice to create a secure passphrase). This list's 7,776 words gives a traditional 12.925 bits of entropy per word, same as the EFF long word list.

This list is also available without corresponding dice roll numbers prepended.

List length               : 7776 words
Mean word length          : 7.05 characters
Length of shortest word   : 3 characters (add)
Length of longest word    : 10 characters (worthwhile)
Free of prefix words?     : false
Uniquely decodable?       : true
Entropy per word          : 12.925 bits
Efficiency per character  : 1.832 bits
Above brute force line?   : true
Mean edit distance        : 6.954

Word samples
------------
believing drawing advocate mechanism slaves panel
lecturer institutes encourages assists rovers injected
checked liberals thirteen posting frigate mayo
monitored ruler mean renewal liquid requiring
polished cardiac injuries challenge coherence legs

This list is an option for users of Strongbox password manager.

Orchard Street Short Lists

Orchard Street Alpha and Orchard Street QWERTY lists both have 1,296 words and are optimized for inputting resulting passphrases into devices like smart TVs or video game consoles. Each word gives a passphrase an additional 10.34 bits of entropy.

The difference between these two lists are which keyboard layout they are optimized for. Use the Alpha list if your device's keyboard is laid out alphabetically; use the QWERTY list if it is closer to the QWERTY layout.

Orchard Street Alpha list

List length               : 1296 words
Mean word length          : 4.12 characters
Length of shortest word   : 3 characters (add)
Length of longest word    : 7 characters (stopped)
Free of prefix words?     : false
Uniquely decodable?       : true
Entropy per word          : 10.340 bits
Efficiency per character  : 2.509 bits
Above brute force line?   : true
Mean edit distance        : 4.043

Word samples
------------
deity jazz cad bay beg lest
fees kind fell sell toys shoots
hints new stops food tell ideas
toad died must road net feet
die sold leg done peer tour

Orchard Street QWERTY List

List length               : 1296 words
Mean word length          : 4.24 characters
Length of shortest word   : 3 characters (add)
Length of longest word    : 8 characters (referred)
Free of prefix words?     : false
Uniquely decodable?       : true
Entropy per word          : 10.340 bits
Efficiency per character  : 2.441 bits
Above brute force line?   : true
Mean edit distance        : 4.170

Word samples
------------
pine mod polo egg three whip
zen ties cadet wars sweat tier
unity jam tire egg idea hull
sent kiss open fife reader will
mute mecca drugs rent turn den

For more information, see this GitHub repo and/or this blog post.

FAQ

Check our FAQ for answers to frequently asked questions.

Licensing

Creative Commons License
This work is licensed under the Creative Commons Attribution-ShareAlike 4.0 International License.

Sources of words and other legal notes

The words contained in these word lists are primarily taken from two sources: Google Books Ngram data (2012 data) and Wikipedia, via a Wikipedia word frequency project, taken in June 2023.

This project has no association with Google, Wikipedia, or the creators of the Wikipedia frequency project cited above. To my knowledge, Google, Wikipedia, nor the creators of the Wikipedia word frequency project cited above endorses this project.

At that time that words were pulled from Wikipedia, Wikipedia text was licensed under the Creative Commons Attribution-ShareAlike 4.0 International License ("CC BY-SA 4.0"), and thus I am using that license for this project.

orchard-street-wordlists's People

Contributors

sts10 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

orchard-street-wordlists's Issues

Should exact homophones be removed from the lists?

Here's the list I have so far. The rule is, if both words are present, the second one is removed.

bite,byte
bites,byte
born,borne
boulder,bolder
breached,breeched
bread,bred
bridal,bridle
capital,capitol
ceiling,sealing
cell,sell
cells,sells
cents,scents
cents,sense
century,sentry
chair,chaire
chairs,chaires
chairs,cherres
chalk,chock
chased,caste
checks,cheques
citing,sighting
cocoa,coco
cot,caught
deer,dear
duel,dual
dues,dews
fairer,farer
fairs,fares
ferry,fairy
flew,flu
foul,fowl
fraught,frot
freeze,frees
freeze,frieze
gate,gait
gated,gaited
gourd,gored
gray,grey
hair,hare
handmade,handmaid
hanger,hangar
heart,hart
heel,heal
heels,heals
hertz,hurts
holy,wholly
hours,ours
idle,idol
illicit,elicit
inns,ins
jeans,genes
knot,not
knots,nots
leaky,leeky
lessons,lessens
maid,made
mantle,mantel
martial,marshal
maze,maise
maze,maize
meat,meet
meat,mete
meats,meets
meats,metes
metal,meddle
metals,meddles
mind,mined
morning,mourning
nose,knows
oar,ore
oars,ores
ordinance,ordnance
oriental,orientale
ought,aught
outcast,outcaste
overdue,overdo
oversight,overcite
packs,pacts
packs,pax
pedal,peddle
phrases,fraises
piece,peace
planes,plains
pole,poll
poles,polls
pray,prey
premiere,premier
principal,principle
profits,prophets
right,rite
right,wright
right,write
rights,rites
rights,wrights
rights,writes
rings,wrings
roll,role
rolling,rowling
rolls,roles
rooted,routed
rot,wrought
seams,seems
seams,semes
seeds,cedes
seeds,sedes
seeks,sikhs
seems,seams
sink,sync
sinks,syncs
socks,sox
soles,souls
sonny,sunny
soul,sole
souls,soles
sown,sone
speck,spec
steak,stake
steaks,stakes
steel,steal
stocking,stalking
story,storey
summary,summery
surely,shirley
tail,tale
tails,tales
teeming,teaming
therefore,therefor
threw,thru
throes,throws
throne,thrown
time,thyme
toad,toed
toad,towed
vane,vain
vane,vein
vascular,bascular
veil,vale
veils,vails
veils,vales
verses,versus
versus,verses
vial,vile
wave,waive
waved,waived
weak,week
weaker,weeker
wearing,waring
weather,whether
weekly,weakly
weight,wait
weighted,waited
weights,waits
whale,wail
whales,wails
whales,wales
wine,whine
wrapped,rapped
wrapped,rapt
wrapper,rapper
wrath,wroth
wrote,rote

How can/should I license these word lists?

It's my understanding that all of the text that makes up Wikipedia is dual-licensed as both Creative Commons Attribution-ShareAlike 3.0 Unported License (“CC BY-SA”), and GNU Free Documentation License (“GFDL”). Note that technically this project uses data from dumps.wikimedia.org, which seems to have the same licensing terms.

Since all of the Orchard Street Wordlists use Wikipedia text data (they are partially comprised of frequently used words from Wikipedia), and since CC BY-SA 3.0 since it is a "ShareALike" license, I have licensed all of the Orchard Street Wordlists under CC BY-SA 3.0. (I've heard mixed things about GFDL...)

However, I'd like to license the lists under the newer CC BY-SA version 4.0 license, to take advantage of the improvements in the 4.0 suite of licenses, as Creative Commons recommends.

My question is: Would licensing these word lists under CC BY-SA 4.0 violate Wikipedia's CC BY-SA 3.0 licensing? Creative Commons has a guide to upgrading to 4.0, which seems to imply that I can't just bump from 3.0 to 4.0, but I'm not sure. I may be able to dual-license under both 3.0 and 4.0? I welcome input here!

Feature request: Orchard Street word list for Dvorak and Colemak typists

I'm a Dvorak typist myself, so I would be especially interested in what such a word list would look like. Also knowing that Colemak has found a strong following of typists, it would be interesting to see that list as well.

Of course, both of us are niche typists compared to the larger typing world, so it's understandable if you make this request very low priority.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.