Git Product home page Git Product logo

lemmeknow's Introduction

Hi there ๐Ÿ‘‹, I'm Swanand (swanandx)

ย ย  ย ย  ย ย 

Hacker / Developer

_swanandx

struct Swanand;

impl Swanand {
    fn whoami() -> &'static str {
      "Hey ๐Ÿ‘‹, I'm swanandx and I'm a hacker who loves to build cool stuff"
    }
}

impl Developer for Swanand {
    fn code() -> &'static str {
        r#"I build _blazingly fast_ tools for cyber security,
           create games, websites, apps etc.

           Languages - Rust, Python, C, C++, JS/TS, Assembly/WebAssembly
           Technologies - Actix, Yew, React, GoDot, Flutter
        "#
    }
}

impl Hacker for Swanand {
    fn hack() -> &'static str {
        r#"I love to hack on TryHackMe and play various CTFs.
           Apart from that, I make software to help others!
           
           RE & PWN is <3
        "#
    }
}

Support:

swanandx



lemmeknow's People

Contributors

0xvjay avatar br1ght0ne avatar interrrp avatar skeletaldemise avatar swanandx avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

lemmeknow's Issues

Add support for filtering output

We have entries for Rarity and Tags in database. We need to implement a filter so that user can filter based on rarity and/or tags.

For example, lemmeknow --rarity 0.2:0.6 --tags Credentials TEXT.

making it a module will be nice idea ๐Ÿ˜„ src/output/filter.rs

Phone number regex problem

Detecting a phone number for any country is no easy task. Some Python libs are doing a great job at it using mix of regex and Machine Learning. Current implementation in lemmeknow is creating lot of obvious false positives:

image

rewrite regex without using look-around

Following regex won't compile as regex crate doesn't support look-around.

If possible, can we rewrite them in such a way that they don't use look-around??

  • Internet Protocol (IP) Address Version 6
  • Bitcoin (โ‚ฟ) Wallet Address
  • American Social Security Number
  • Date of Birth
  • JSON Web Token (JWT)
  • Amazon Web Services Access Key
  • Amazon Web Services Secret Access Key
  • YouTube Video ID

Update them in src/data/regex.json

PS: You can run following command in repo if you want to see exact syntax error of regex ( or just use regex crate to compile them one by one )

cargo test validate_regex_examples -- --show-output

Use languages that compile to RegEx for regexes

Would it makes sense to use languages like pomsky and Melody to compile the regexes? I think it would make it easier to maintain the list of Regexes, make it easier to contribute new Regexes and probably also make them easier to test, and to read.

calculate min and max length for regex, if any

TryHackMe flag will be min 3 ( as it must have thm in it ) and there is no max limit.

YouTube Video ID will be 11 characters long, ( n92YrzELBJU )

We need a list with this for every regex.

  • if there is no fixed length, put *
  • if min but no max, use 3-*
  • can only be 8 or 10, use 8/10
  • exact length 11, use 11

Many regex identify text which have fixed size range, if we filter based on it first, we might optimize our algorithm.
Suggestions are welcomed for other ways to optimize.

typo in regex.json

"Exploit": "There is a change this could be a Google Maps API key, so could try using 'gmapapiscanner'[1] or 'gap'[2]\nto check which Google Maps service it is valid for and generate a PoC that you can show in your report. To\nget a better understanding on the severity of having the Google Maps API key exposed, make sure to to to\nread \"Unauthorized Google Maps API Key Usage Cases, and Why You Need to Care\"[3] written by Ozgur Alp (@ozguralp)\n\nReferences:\n [1] https://github.com/ozguralp/gmapsapiscanner\n [2] https://github.com/joanbono/gap\n [3] https://ozguralp.medium.com/unauthorized-google-maps-api-key-usage-cases-and-why-you-need-to-care-1ccb28bf21e\n\nAPI Documentation: https://developers.google.com/maps/documentation/javascript/get-api-key",

change โ†’ chance

There are no unit tests

I am debugging if my program is broken or if LemmeKnow is broken, there is no unit tests in LemmeKnow so I cannot prove it works. Please add unit tests for the API :)

Use `&str` for tags

    /// Only include the Data which have at least one of the specified `tags`
    pub tags: Vec<String>,
    /// Only include Data which doesn't have any of the `excluded_tags`
    pub exclude_tags: Vec<String>,

String is a growable buffer, we do not expect them to grow so we should use str

benchmark latest code with v0.8

as regex 1.9 speeds up use of regexes in threads, this should boost our perf as well.

we need to benchmark lemmeknow latest with v0.8.

need some tests to validate the regexes from JSON file

The regex.json file also have Examples for some regexes,

{
      "Name": "Capture The Flag (CTF) Flag",
      "Regex": "(?i)^(flag\\{.*\\}|ctf\\{.*\\}|ctfa\\{.*\\})$",
      "plural_name": false,
      "Description": null,
      "Rarity": 1,
      "URL": null,
      "Tags": [
         "CTF Flag"
      ],
      "Examples": {
         "Valid": [
            "FLAG{hello}"
         ],
         "Invalid": []
      }
   },

We need to check if the regex is matching those examples correctly to validate it! For that, we can create a file under tests like lemmeknow/tests/validate_regexes.rs and just parse the JSON file and validate it there.

Allow querying the online lemmeknow by URL

When opening the lemmeknow webpage with a URL such as this:
https://swanandx.github.io/lemmeknow-frontend/?q=search+term
It should use this as input and try to figure out what the search term could be.

Why?

Browser Search engines.

With this feature you could register lemmeknow as a search engine for your browser. You could (for example) use the alias lmk to search lemmeknow.

Then typing lmk dQw4w9WgXcQ would lead you to find out what the ID stands for.

proposal: use `SmallVec` instead of `Vec` for buffer

The idea is to use smallvec for buffer while extracting strings from file.
We only consider the strings which are longer than 4 characters, so for other strings, which we are going to reject anyway, we can avoid heap allocation caused due to buffer vector. here.

let mut buffer: SmallVec<[u8; 4]> = smallvec![]; // TODO: change 4 to more optimal number

So, what to do?

  • Change Vec to SmallVec
  • Experiment with different sizes ( at least 4 will be good imho, but we should try other using quasi-doubling strategy , i.e. 4, 8, 16, 32 )
  • Benchmark the code to see if it actually improves performance.
  • Choose the one with best performance. ( post the results here or while making PR )

It would be amazing if you can post benchmark of all sizes, then we can choose the most optimal.

Configure release profile

All we need to do is add following at the end of our Cargo.toml file:

[profile.release]
lto = true
panic = "abort"

That is it!

Add Doctests

There is currently no doctests for the API and it is confusing me to read them :( It'd be lovely to have some!

Unable to identify base64

Hi, I'm trying lemmeknow and I gotta say that it works quite well with links (e.g. YouTube channels, wallets and so on), but it misses to detect the easiest things. For example it is unable to recognize a base64 encoded text.

Nice work though.

Show Exploits in cli output

We have Exploit for some identifications. It would be great if we could show them if user passed -v i.e. verbose flag on cli.

{
      "Name": "Mailchimp API Key",
      ...
      "Description": null,
      "Exploit": "Use the command below to verify that the API key is valid (substitute <dc> for your datacenter, i. e. us5):\n  $ curl --request GET --url 'https://<dc>.api.mailchimp.com/3.0/' --user 'anystring:API_KEY_HERE' --include\n",
      "Rarity": 0.8,
      "URL": null,
     ...
   },

use `onig` crate for matching on strings

We can use onig crate instead of regex crate for strings API. We can't fully replace regex crate as onig doesn't provide a way to match on bytes not it compiles for wasm32 ( it might, but that will be lot of work ).

Suggestion

  • Use onig for matching on strings ( not on wasm32 target )
  • Use regex for bytes and strings for wasm32

Pros

  • onig takes performance from 130ms to 28ms!! It's blazingly fast

Cons

  • Variance in performance for matching on strings and bytes for API users, because we have no other choice than using regex crate for bytes
  • Extra dependency

Workaround

  • We can make a feature called bytes for enabling bytes support, that way users can explicitly opt for adding regex crate as dependency and be aware of slower ( comparatively ) performace.

regex vs onig crate benchmark!

image

Identifies JWT as LTC/Ripple/BCH-Wallet-Adress

Input: eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJzdWIiOiIxMjM0NTY3ODkwIiwibmFtZSI6IkpvaG4gRG9lIiwiaWF0IjoxNTE2MjM5MDIyfQ.SflKxwRJSMeKKF2QT4fwpMeJf36POk6yJV_adQssw5c

Output:
Found Possible Identifications :)

Matched text Identified as Description
eyJhbGciOiJIUzI1..._adQssw5c Litecoin (LTC) Wallet Address URL: https://live.blockcypher.com/ltc/address/eyJhbGciOiJIUzI1..._adQssw5c

Expected "Identified as" would be JWT.
Shortened the JWT for better visibility.


Another example:

Input: eyJhbGciOiJIUzI1NiJ9.eyJmb28iOiJiYXIifQ.JTvQIxZOL_-00JdKfTAEmhV-a6KUlB6OUWM8NuN7MN8
Output: No Possible Identifications :(

Expected "Identified as" would be JWT.

Add some benchmarks!

This project uses same regex database as PyWhat , we need some performance benchmarks against it <3 !

  • For identifying single text
  • For analyzing strings from a file
  • Calling function multiple times through API i.e. lemmeknow::what_is("text here") for lemmeknow and Identifier.identify("text") for pyWhat.

API documentations available here ๐Ÿ˜ธ :-

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.