Git Product home page Git Product logo

ioc-extractor's Introduction

IoC extractor

npm version Node.js CI CodeFactor Coverage Status Documentation

IoC extractor is an npm package for extracting common IoC (Indicator of Compromise) from a block of text.

Note: the package is highly influenced by cacador.

Installation

npm install -g ioc-extractor
# or if you want to use ioc-extractor as a library in your JS/TS project
npm install ioc-extractor

Usage

As a CLI

$ ioc-extractor --help
Usage: ioc-extractor [options]

Options:
  -ns, --no-strict  Disable strict option
  -nr, --no-refang  Disable refang option
  -p, --punycode    Enable punycode option
  -h, --help        display help for command
$ echo "1.1.1.1 8.8.8.8 example.com" | ioc-extractor | jq
{
  "asns": [],
  "btcs": [],
  "cves": [],
  "domains": [
    "example.com"
  ],
  "emails": [],
  "eths": [],
  "gaPubIDs": [],
  "gaTrackIDs": [],
  "ipv4s": [
    "1.1.1.1",
    "8.8.8.8"
  ],
  "ipv6s": [],
  "macAddresses": [],
  "md5s": [],
  "sha1s": [],
  "sha256s": [],
  "sha512s": [],
  "ssdeeps": [],
  "urls": [],
  "xmrs": []
}

As a library

import { extractIOC } from "ioc-extractor";

const input = "1.1.1[.]1 google(.)com f6f8179ac71eaabff12b8c024342109b";
const ioc = extractIOC(input);
console.log(ioc.md5s);
// => ['f6f8179ac71eaabff12b8c024342109b']
console.log(ioc.ipv4s);
// => ['1.1.1.1']
console.log(ioc.domains);
// => ['google.com']

extractIOC takes the following options:

If you want to extract a specific type of IoC, you can use extract function.

import {
  refang,
  extractDomains,
  extractIPv4s,
  extractMD5s,
} from "ioc-extractor";

const input = "1.1.1[.]1 google(.)com f6f8179ac71eaabff12b8c024342109b";
const refanged = refang(input);
// => 1.1.1.1 google.com f6f8179ac71eaabff12b8c024342109b

const ipv4s = extractIPv4s(refanged);
// => ['1.1.1.1']

const domains = extractDomains(refanged);
// => ['google.com']

const md5s = extractMD5s(refanged);
// => ['f6f8179ac71eaabff12b8c024342109b']

Network related extract functions (e.g. extractDomains) can take the following options:

See docs for more details.

IoC Types

This package supports the following IoCs:

  • Hashes: MD5, SHA1, SHA256, SHA512, SSDEEP
  • Networks: domain, email, IPv4, IPv6, URL, ASN
  • Hardwares: MAC address
  • Utilities: CVE (CVE ID)
  • Cryptocurrencies: BTC (BTC address), ETH (ETH address), XMR (XMR address)
  • Trackers: GA track ID (Google Analytics tracking ID), GA pub ID (Google Adsense Publisher ID)

Refang Techniques

For Networks IoCs, the following refang techniques are supported:

Techniques Defanged Refanged
. in spaces 1.1.1 . 1 1.1.1.1
. in brackets, parentheses, etc. 1.1.1[.]1 1.1.1.1
dot in brackets, parentheses, etc. example[dot]com example.com
Back slash before . example\.com example.com
/ in brackets, parentheses, etc. http://example.com[/]path http://example.com/path
:// in brackets, parentheses, etc. http[://]example.com http://example.com
: in brackets, parentheses, etc. http[:]//example.com http://example.com
@ in brackets, parentheses, etc. test[@]example.com [email protected]
at in brackets, parentheses, etc. test[at]example.com [email protected]
hxxp hxxps://example.com https://example.com
Partial 1.1.1[.1 1.1.1.1
Any combination hxxps[:]//test\.example[.)com[/]path https://test.example.com/path

Options

strict

Whether to do strict TLD matching or not. Defaults to true.

refang

Whether to do refang or not. Defaults to false.

punycode

Whether to do Punycode conversion or not. Defaults to false.

Alternatives

ioc-extractor's People

Contributors

dependabot[bot] avatar ninoseki avatar pemontto avatar renovate-bot avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

ioc-extractor's Issues

Dependency Dashboard

This issue lists Renovate updates and detected dependencies. Read the Dependency Dashboard docs to learn more.

Awaiting Schedule

These updates are awaiting their schedule. Click on a checkbox to get an update now.

  • chore(deps): update dependency eslint-plugin-prettier to v5.0.1
  • chore(deps): update dependency ts-loader to v9.5.0
  • chore(deps): update dependency typedoc to v0.25.2
  • chore(deps): update dependency typescript to v5.2.2
  • fix(deps): update dependency commander to v11.1.0
  • chore(deps): update actions/checkout action to v4
  • chore(deps): update dependency lint-staged to v15

Open

These updates have all been created already. Click a checkbox below to force a retry/rebase of any.

Detected dependencies

github-actions
.github/workflows/test.yml
  • actions/checkout v3
  • actions/setup-node v3
npm
package.json
  • commander ^11.0.0
  • get-stdin 9.0.0
  • @types/get-stdin ^7.0.0
  • @types/jest ^29.5.3
  • @types/node 20.4.10
  • @typescript-eslint/eslint-plugin 6.3.0
  • @typescript-eslint/parser 6.3.0
  • benny 3.7.1
  • coveralls 3.1.1
  • eslint 8.47.0
  • eslint-config-prettier 9.0.0
  • eslint-plugin-jest 27.2.3
  • eslint-plugin-prettier 5.0.0
  • eslint-plugin-simple-import-sort 10.0.0
  • husky 8.0.3
  • jest 29.6.2
  • lint-staged 13.2.3
  • prettier 3.0.1
  • ts-jest 29.1.1
  • ts-loader 9.4.4
  • ts-node 10.9.1
  • tsup ^7.2.0
  • typedoc 0.24.8
  • typescript 5.1.6
  • node >=18

  • Check this box to trigger a request for Renovate to run again on this repository

URLs that do not start with https?://

I have noticed in my usage that researchers may simply report a url in the form: domain[.]tld/someURI

The extractIOC function is not returning these in the iocs.urls array. Just getting domain in the domains array.

Could add another option for strictURLs maybe that makes the https?:// part of the regex optional if the option is set to false. Or simply make it optional without an option setting.

Add an executable

cacador executable is a quite useful in CLI env.

pbpaste | cacador | jq . 

So adding an executable is a good idea IMHO.

URL regex issue

The URL regex matches non-URL value.

var iocExtractor = require("ioc-extractor")

const input = 'www.hoge.com';
const ioc = iocExtractor.getIOC(input);
console.log(ioc.networks.urls); // => ["www.hoge.com"];

www.hoge.com is a domain, not a URL.

URL Regex not matching some valid URLs

Hello, I really like this library, however recently I've come across a problem.

I noticed that when extracting IOC,

Input: (Valid URL)
https://1.1.1.1.domain.com:443

Expected output:
https://1.1.1.1.domain.com:443

Actual output:
https://1.1.1.1

I looked around and I believe that the url regex is causing the issue, specifically:

const url = `(?:${protocol})${auth}(?:localhost|${ipv4}|${domain})${port}${path}`;

Once localhost or ipv4 match is found, the group drops the rest.
This means this also produces unexpected output:

Input: (Valid URL)
https://localhost.domain.com:443

Expected output:
https://localhost.domain.com:443

Actual output:
https://localhost

Edit: Just to test my theory:

Input: (Valid URL)
https://1.1.1.domain.com:443

Expected output:
https://1.1.1.domain.com:443

Actual output: (this is correct)
https://1.1.1.domain.com:443

This shows that once ipv4 does not match, domain match is used instead.

Thank you for your time!

Edit 2:
I found a simple fix for the bug:

const url = `(?:${protocol})${auth}(?:${domain}|localhost|${ipv4})${port}${path}`;

Folder search

Can the tool be used to search in depth of a folder with subfolders of many files?
Ty

FR: Add tracker type IOC support

It would be great to add support for tracker type IOC (e.g. Google Analytics tracker)

  • Google Analytics tracker ID format: UA-XXXXXXX-X

Action Required: Fix Renovate Configuration

There is an error with this repository's Renovate configuration that needs to be fixed. As a precaution, Renovate will stop PRs until it is resolved.

Error type: undefined. Note: this is a nested preset so please contact the preset author if you are unable to fix it yourself.

Which npm version / Dockerfile?

I’m trying to install ios-extractor in a docker container and tested several npm versions…
I did not find one that does not return any error… :(
Which version do you recommend? Somebody has a Dockerfile to share?
Here is the most common error across the tested versions:
Syntax error: Unexpected token export
Tx!

Email Regex does not include dashes

For both emailRegex and nonStrictEmailRegex they do not account for dashes in the email addresses. By adding a dash like: [A-Za-z0-9_.-] it should capture more addresses.

export const emailRegex = new RegExp(`[A-Za-z0-9_.]+@${domain}`, "gi");
export const nonStrictEmailRegex = new RegExp(
`[A-Za-z0-9_.]+@${nonStrictDomain}`,

Cannot deal with `http[:]` type of defang technique

It cannot deal with a http[:] type of defang technique.

var iocExtractor = require("ioc-extractor");

const input = 'http[:]//example1.com http://example2.com https[:]//example3.com';
const ioc = iocExtractor.getIOC(input);
console.log(ioc.networks.urls); // => ["http://example2.com"]

It would be good if it supports this type of defang/refang technique.

Remove Files type

In my use cases, I never use Files type IoCs (doc, exe, flash, img, mac, web and zip).
Removing Files type support means performance improvement so I'd like to drop the support.

If you need this kind of capability, I recommend to use the followings.

Filter unicode quotation marks

I am using your great library in my community project and found that when working on PDF plain-text it would be great if your tool could filter out the following two Unicode remnants:

U+201C : LEFT DOUBLE QUOTATION MARK {double turned comma quotation mark}
U+201D : RIGHT DOUBLE QUOTATION MARK {double comma quotation mark}

as you can see from the example these have been recognized as part of the domain names

Example

https://orkl.eu/libraryEntry/637b8178-1bb0-4e7e-a1aa-d00c18bbcdd4

--> click on "IOCs"

“cloudsafety.online
“drive-globalordnance.com
“landof-service.com
“network-storage-ltd.com
“nonviolent-conflict-service.com
“proxycrioisolation.com
“share-drive-ua.com

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.