Git Product home page Git Product logo

Comments (14)

berzerk0 avatar berzerk0 commented on May 20, 2024 3

Duplication - this is me getting caught with the classic invisible newline between windows and Linux.
Rev 1.1 will have this fixed in the main files, the Chunk files will take longer.

WPA-formatted sources - I have found Wordlists that include "WPA" in the title, but that isn't much of a guarantee that they exclusively come from router/wifi sources.

It is also possible (and equally not possible, as I am asserting this with zero evidence) that the trends for common passwords do not change dramatically if they are used for a Router or for an email address. It seems just as likely to me that people see it as a generic "password" rather than "the Wifi password."

I'll see if I can find some sources with more background, but I have doubts.

EDIT
Of course, today I went somewhere where the Guest Wifi password was "wireless guest"

from probable-wordlists.

iancnorden avatar iancnorden commented on May 20, 2024 2

PR from me shortly for de-dupe. Great work.

from probable-wordlists.

Miserlou avatar Miserlou commented on May 20, 2024

Good project though!

Would love to see a list of WPA-formatted passwords that come just from router/wifi sources, not user-passwords.

from probable-wordlists.

WiseNerd avatar WiseNerd commented on May 20, 2024

Easy fix for the dupes that worked for me was issuing:%s^M\+ in vim to kill the trailing blank space artifacts from windows, and then issuing uniq -u passfile.txt > cleanpassfile.txt. Cool project.

from probable-wordlists.

 avatar commented on May 20, 2024

@WiseNerd So if you already fixed it, why not make a PR?

from probable-wordlists.

berzerk0 avatar berzerk0 commented on May 20, 2024

@iancnorden You're gonna beat me to the punch!
I have the desktop chugging away, but won't be back to upload changes for a half day or so

from probable-wordlists.

iancnorden avatar iancnorden commented on May 20, 2024

Now it's a race! I had not realized the size, Git clone is still chugging away!

from probable-wordlists.

WiseNerd avatar WiseNerd commented on May 20, 2024

@blobgo well my macbook's limited ddr2 memory would be neutered by sanitizing that entire thing, I fixed a small part mostly out of curiosity. But was hoping to save somebody some time nonetheless :)

from probable-wordlists.

iancnorden avatar iancnorden commented on May 20, 2024

De-dupes still running.

from probable-wordlists.

berzerk0 avatar berzerk0 commented on May 20, 2024

Initial De-Dupes (up to ~30 Million Non-Spec and WPA) are done, looks like I can't do the big ones in parallel - probably done by tomorrow.

Or so I thought, they didn't come out right.

@WiseNerd I was using

awk '!seen[$0]++' hasDupes > doesntHaveDupes 

which I assumed started at the top and worked its way down, but then for one of the files it popped "password" out of the 2nd slot. No way.

uniq 

only works if two lines are next to one another, unfortunately.

I might just have to compile again from sources - unless @iancnorden 's experience comes up with a solid de-duping

from probable-wordlists.

iancnorden avatar iancnorden commented on May 20, 2024

Chewing on the folder with Top2Bill*

164/958 completed, started around 1400 eastern.

If curious, thanks to https://github.com/ltdenard ... and this will have to continue overnight at this rate.

for f in ls -lha .| tail -n+4 | awk '{print $10}'; do sort -u ${f} > /tmp/tmp1 && mv /tmp/tmp1 ./${f}; done;

from probable-wordlists.

palexhorse avatar palexhorse commented on May 20, 2024

Can all unique combinations be put into a new file, or do you just want the duplicates removed?

from probable-wordlists.

berzerk0 avatar berzerk0 commented on May 20, 2024

For Rev 1.1 we aim to just remove the duplicates while otherwise preserving order.
The "duplicates" are likely illusory, where there probably are invisible newline characters splitting them up.
This has some effect on overall accuracy once they have been removed.

Rev 2.0 will have the newlines weeded out at the source, so this problem will not carry over.

from probable-wordlists.

berzerk0 avatar berzerk0 commented on May 20, 2024

De-Duped Rev 1.1 is live now, but does not contain the largest files.

Rev 1.2 will, in torrents with compression.

Closing this in light of the release of 1.1 and the impending release of 1.2

from probable-wordlists.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.