Git Product home page Git Product logo

rephraser's Introduction

RePhraser

A Python-based reimagining of Phraser using Markov-chains for linguistically-correct password cracking.

In its current state, it is easily able to kick out more than 3 million candidates per second on a modern quadcore CPU. In terms of hit rate, combined with the travco_princev8.rule a single GPU was able to crack 45 passwords otherwise-uncracked in any other attack in under an hour. But you know, definitely is still in development. :)

The purpose and idea behind linguistic cracking is that people forced to adhere to long password requirements (e.g. 15 character  minimum) commonly create passphrases of words that are related. More specifically words that are in a logical order,  often times choosing sentences or sentence fragments. By training a model to observe what words generally follow other words in source text, one can produce candidates that make  some amount of linguistic sense and better mimic human choices. Drastically lowering the number of possible choices, and the  amount of time it takes to crack four and five word phrases, not to mention making even longer phrases feasible to enumerate.

Usage:

As with most sane programs, python3 rephraser.py -h will list arguments it can take.

  1. Get a corpus of sentences (or just a large text, like a text format ebook) to work with, I recommend choosing a file  from https://wortschatz.uni-leipzig.de/en/download/ and extracting the "sentences" file inside it (you should remove line  numbers sed -r 's/^[0-9]+[[:blank:]]*//' FILE), as there should only be sentences in that file before giving it to RePhraser).

  2. Then call RePhraser on it, piping the output to hashcat (I hope you have sufficient RAM, Markov models are memory-heavy to generate).   E.g. :   python3 rephraser.py --corpus sentences-eng_wikipedia_2016_1M --model ./wiki1M_model.keyvi | hashcat -m 1000 -O -D 2 -w 4 --status --status-timer=60 --username -r ./rephraserBasic8.rule -r ~/travco_princev8.rule ./NTLMHashfile

  3. Profit

For future runs, you can reuse the Markov Chain model RePhraser had to make (saved to the path specified with --model), call rephraser with just --model ./wiki1M_model.keyvi. Don't worry, the finished/compiled models are a fraction (as much as 1/40th) the size of the model while it is being constructed in markovify and then compiled into a keyvi dictionary.

Two paragraphs to use in Company Policy / Password Guidelines:

Passphrases that are easy to remember, but exceedingly difficult to crack, are unlikely or illogical phrases of five or more words. Adding special characters or numbers or other complexity, although not required, is better done between or inside words (dozens of possible positions) instead of only on the beginnings or ends of passphrases.

Combinations of thematically-unrelated words like:

"diabetic unicorn TRAPPED chicago tunnel"

Can be easy to remember if you say them to yourself "A diabetic-unicorn is trapped in a Chicago tunnel" and force an attacker to have  a large dictionary with multiple types of words (cities, structures, fantasy creatures, verbs, medical conditions) to even have a chance of  producing your password. Using words that wouldn't typically come after each-other in a sentence also forces an attacker to use brute force to guess  combinations of words. Although it may seem complex, a logical phrase like "Docking With The International Space Station" (six fairly-uncommon words right? holy cow!)  is much simpler to crack to an attacker using a natural language model, because "Docking" and "Space Station" are both thematically related  and very likely to be together in a sentence. If an attacker is using "Docking" as the first word (through pure brute-force) to generate  a phrase, the rest of the words are a likely, if not the most likely combination to follow it.

Dependencies:

  • Markovify pip3 install markovify for initial creation and compilation of Markov Chains
  • Keyvi pip3 install keyvi for efficient storage and shared-memory access by multiple processes
    • cmake (apt package usually is cmake)
      • libboost (apt package usually is libboost-all-dev)
    • snappy (apt package usually is libsnappy-dev)
  • Python 3.5+ is strongly recommended (keyvi dropped support for Python 2.7 and 3.4)

Other files in this repo:

  • rephraserBasic8.rule Eight rules to alter rephraser's output into different common patterns of capitalization and spaces within hashcat
  • travco_princev8.rule manually written ruleset for hashcat for use agaisnt passphrases (4000+ rules)
  • commonprefixes_sorted_50k.rule ruleset made by running zxcvbn-python on the 37+ million 15+ character passphrases in the haveibeenpwnedv2 list, sorted with the most common prefixes first
  • commonsuffixes_sorted_50k.rule ruleset made by running zxcvbn-python on the 37+ million 15+ character passphrases in the haveibeenpwnedv2 list, sorted with the most common suffixes first

rephraser's People

Contributors

travco avatar ysaxon avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

rephraser's Issues

Keyvi changed the layout of their code.

This diff was enough to get it working for me.

index 3d47b35..5862a38 100644
--- a/rephraser.py
+++ b/rephraser.py
@@ -1,5 +1,7 @@
 import markovify
 import keyvi
+import keyvi.compiler
+import keyvi.dictionary
 import os
 import sys
 from signal import signal, SIGINT, SIG_IGN
@@ -168,32 +170,32 @@ if __name__ == '__main__':
               combined_model = mmodel
       del mmodel
       combined_model.compile(inplace = True)
-      keyvicompiler = keyvi.JsonDictionaryCompiler()
+      keyvicompiler = keyvi.compiler.JsonDictionaryCompiler()
       for key in combined_model.chain.model:
         keyvicompiler.Add(' '.join(key), json.dumps(combined_model.chain.model[key]))
       del combined_model
       keyvicompiler.Compile()
       keyvicompiler.WriteToFile(args.model)
       del keyvicompiler
-      dct = keyvi.Dictionary(args.model)
+      dct = keyvi.dictionary.Dictionary(args.model)
     elif os.path.isfile(args.corpus):
       # Load single-file corpus from --corpus
       with open(args.corpus) as f:
         mmodel = markovify.Text(f, retain_original=False, state_size=args.ngrams)
       mmodel.compile(inplace = True)
-      keyvicompiler = keyvi.JsonDictionaryCompiler()
+      keyvicompiler = keyvi.compiler.JsonDictionaryCompiler()
       for key in mmodel.chain.model:
         keyvicompiler.Add(' '.join(key), json.dumps(mmodel.chain.model[key]))
       del mmodel
       keyvicompiler.Compile()
       keyvicompiler.WriteToFile(args.model)
       del keyvicompiler
-      dct = keyvi.Dictionary(args.model)
+      dct = keyvi.dictionary.Dictionary(args.model)
   elif args.model != '':
     # Load a saved model in a keyvi file
     if os.path.isfile(args.model):
       with open(args.model) as f:
-        dct = keyvi.Dictionary(args.model)
+        dct = keyvi.dictionary.Dictionary(args.model)
     else:
       sys.stderr.write('[REPHRASER] Couldn\'t find model at ' + args.model + '\n[REPHRASER] Exiting!\n')
       sys.exit(1)

Clarification on Command to execute - No password candidates received in stdin mode, aborting..

python3 rephraser.py --corpus sentences-eng_wikipedia_2016_1M --model ./wiki1M_model.keyvi | hashcat -m 1000 -O -D 2 -w 4 --status --status-timer=60 --username -r ./rephraserBasic8.rule -r ~/travco_princev8.rule ./NTLMHashfile

When executing the above command after firstly adding the sentance file as you mentioned and removing the line numbers. ( also had to remove the first lines starting with numerical values as well)

I notice that there is no attack mode? im assuming "-a 0" as you are using rules? (i guess no need if using stdin?)

When running the command no words ever get passed to hashcat and therefore i receive the message:

"No password candidates received in stdin mode, aborting..."

Can you confirm please?

Traceback shows:
Traceback (most recent call last):
File "/opt/rephraser/rephraser.py", line 281, in
traverselikely(mpqueue, (BEGIN,BEGIN), args.words, args.batchdepth, [])
File "/opt/rephraser/rephraser.py", line 135, in traverselikely
traverselikely(mpqueue, nextstate, depthremaining - 1, batchdepth, prefix + [nextword])
File "/opt/rephraser/rephraser.py", line 111, in traverselikely
cstate_model = dct[' '.join(state)].GetValue()
TypeError: sequence item 1: expected str instance, bytes found

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.