Git Product home page Git Product logo

weakfilescan's Introduction

weakfilescan

基于爬虫,动态收集扫描目标相关信息后进行二次整理形成字典规则,利用动态规则的多线程敏感信息泄露检测工具,支持多种个性化定制选项,包括:

  • 规则字典多样化定义(支持正则、整数、字符、日期)
  • 扫描域名策略(域名全称、主域名、域名的名字)
  • 自定义HTTP状态码
  • 支持动态配置HTTP脚本扩展名
  • 自定义判断文件是否存在正则
  • 返回结果集误报清洗选项
  • HTTPS服务器证书校验
  • 线程数定义
  • HTTP请求超时时间
  • 是否允许URL重定向
  • 是否开启Session支持,在发出的所有请求之间保持cookies
  • 是否允许随机User-Agent
  • 是否允许随机X-Forwarded-For
  • 动态代理列表配置(支持TOR)
  • HTTP头自定义

更多使用详情参照 /config.py

快速开始

python wyspider.py http://wuyun.org php

字典支持规则

规则使用简介

在字典中使用规则引擎,必须以 { 括号开头,并以 }$ 结尾,类型后面跟的 # 代表生成数据的长度,$ 代表单步值,开始-结束,数据的起始区间设置。

{规则=类型#长度$step:开始-结束}$
规则 说明
re 正则引擎
int 整数
str 字符
date 日期

正则引擎类型

使用实例 {re=引擎名称:正则表达式}$

{re=exrex:[0-9]}$
[u'0', u'1', u'2', u'3', u'4', u'5', u'6', u'7', u'8', u'9']
{re=exrex:[aA]dmin[1-5]}$
[u'admin1', u'admin2', u'admin3', u'admin4', u'admin5', u'Admin1', u'Admin2', u'Admin3', u'Admin4', u'Admin5']

整数类规则

类型 使用实例
顺序递进 处理step {int=series$单步值:开始数字-结束数字}$
{int=series$2:0-10}$
[0, 2, 4, 6, 8, 10]
类型 使用实例
连号数字 {int=digits#长度:开始数字-结束数字}$
{int=digits#3:0-9}$
[123, 234, 345, 456, 567, 678, 789]
类型 使用实例
重叠数字 {int=overlap#长度:开始数字-结束数字}$
{int=overlap#4:0-9}$ 
[1111, 2222, 3333, 4444, 5555, 6666, 7777, 8888, 9999]

字符类规则

类型 使用实例
顺序递进 处理step {str=letters#长度:开始字符-结束字符}$
{str=letters#3:a-g}$
['abc', 'bcd', 'cde', 'def', 'efg']
类型 使用实例
重叠字母 {str=overlap#长度:开始字符-结束字符}$
{str=overlap#4:a-g}$
['aaaa', 'bbbb', 'cccc', 'dddd', 'eeee', 'ffff', 'gggg']

日期类规则

类型 使用实例
{date=year:开始年份-结束年份}$
{date=year:2010-2015}$
[2010, 2011, 2012, 2013, 2014, 2015]
类型 使用实例
{date=mon:开始月份-结束月份}$
{date=mon:01-12}$
[1, 01, 2, 02, 3, 03, ‘...’, 9, 09]
类型 使用实例
{date=day:开始日-结束日}$
{date=day:01-31}$
[1, 01, 2, 02, 3, 03, 4, 04, 5, 05, ‘...’, 31]
类型 使用实例
年月 {date=year_mon:开始年月-结束年月}$
{date=year_mon:201501-201504}$
[201501, 20151, 201502, 20152, ‘...’, 201504]
类型 使用实例
月日 {date=mon_day:开始月日-结束月日}$
{date=mon_day:0501-0531}$
[0501, 51, 0502, 52, 0506, 56, 0511, 511, ‘...’, 0530,530]
类型 使用实例
年月日 {date=year_mon_day:开始年月日-结束年月日}$
{date=year_mon_day:20150101-20150401}$
[20150101, 201511, 20150112, 2015112, ‘...’, 20150401]
类型 使用实例
月日年 {date=mon_day_year:开始月日年-结束月日年}$
{date=mon_day_year:01012015-04012015}$
[01012015, 112015, 01122015, 1122015, ‘...’, 04012015]

安装

CentOS 6.* 7.* Linux

安装 setuptools, pip

wget https://bootstrap.pypa.io/ez_setup.py -O - | python
wget https://pypi.python.org/packages/source/p/pip/pip-6.0.8.tar.gz
tar zvxf pip-6.0.8.tar.gz
cd pip-6.0.8
python setup.py install

安装 lxml解析器 & beautifulsoup4

yum install python-devel libxml2-devel libxslt-devel
pip install lxml beautifulsoup4

weakfilescan's People

Contributors

ring04h avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

weakfilescan's Issues

Hello 猪猪侠,不知道这个工具还更新不,我反馈2个比较重要的bug。

第一个:
common.py脚本里提取目标的链接资源时,没有考虑大小写问题,导致爬虫爬取数据不准确。 可以拿这个网站:http://www.shijihulian.com/ 测试。我看了源码,find_all似乎没有考虑大小写的,这算是一个坑吧。
第二个:
FuzzEnginer 分析引擎对备份文件的判断存在缺陷,我认为有2种情况下会被漏掉:1. 备份文件存,文件大小大于20K(代码:39行定义),2. 备份文件存在,文件大小于20k, 但文件内容包含了如:404,not Found等错误信息(代码:40-42行定义)
会被漏掉的2个假象场景:
1、目标网站存在全站的备份文件web.rar ,这个很容易就超过20K限制。
2、目标网站存在key.rar, 里面压缩了几个文件:404.html,config.php,password.txt ,几个很简单的文件。404.html很容易触发 “错误定义”文件识别导致被漏掉。

按照要求安装完成后如下报错

[root@87a5585bcce3 weakfilescan]# python wyspider.py http://wuyun.org php

* scan http://wuyun.org start

Traceback (most recent call last):
File "wyspider.py", line 17, in
print json.dumps(start_wyspider(sys.argv[1]), indent=2)
File "/weakfilescan/controller.py", line 41, in start_wyspider
link_datas = GetAllLink(siteurl).start()
File "/weakfilescan/libs/GetAllLink.py", line 47, in start
response_obj = LinksParser(http_request_get(self.siteurl))
File "/weakfilescan/common.py", line 170, in init
self.baseurl = get_baseurl(self.url)
File "/weakfilescan/common.py", line 31, in get_baseurl
netloc = urlparse.urlparse(link).netloc
File "/usr/lib64/python2.7/urlparse.py", line 142, in urlparse
tuple = urlsplit(url, scheme, allow_fragments)
File "/usr/lib64/python2.7/urlparse.py", line 181, in urlsplit
i = url.find(':')
AttributeError: 'NoneType' object has no attribute 'find'

no encoding

Traceback (most recent call last):
File "wyspider.py", line 12, in
import libs.requests as requests
File "/media/ddos/DDOS/PenTest/weakfilescan/libs/requests/init.py", line 53, in
from .packages.urllib3.contrib import pyopenssl
File "/media/ddos/DDOS/PenTest/weakfilescan/libs/requests/packages/init.py", line 3, in
from . import urllib3
File "/media/ddos/DDOS/PenTest/weakfilescan/libs/requests/packages/urllib3/init.py", line 10, in
7K{����.ื7�
File "/media/ddos/DDOS/PenTest/weakfilescan/libs/requests/packages/urllib3/connectionpool.py", line 33, in
from .connection import (
File "/media/ddos/DDOS/PenTest/weakfilescan/libs/requests/packages/urllib3/connection.py", line 1
SyntaxError: Non-ASCII character '\xe1' in file /media/ddos/DDOS/PenTest/weakfilescan/libs/requests/packages/urllib3/connection.py on line 1, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details

Start error

hi, thanks for your tool. I love it!

can you help me fix this error please?

/root/weakfilescan/libs/tldextract.py:42: UserWarning: Module chardet was already imported from /usr/local/lib/python2.7/dist-packages/chardet/init.pyc, but /usr/lib/python2.7/dist-packages is being added to sys.path
import pkg_resources
/root/weakfilescan/libs/tldextract.py:42: UserWarning: Module bs4 was already imported from /usr/local/lib/python2.7/dist-packages/bs4/init.pyc, but /usr/lib/python2.7/dist-packages is being added to sys.path
import pkg_resources
/root/weakfilescan/libs/tldextract.py:42: UserWarning: Module lxml was already imported from /usr/local/lib/python2.7/dist-packages/lxml/init.pyc, but /usr/lib/python2.7/dist-packages is being added to sys.path
import pkg_resources
/root/weakfilescan/libs/tldextract.py:42: UserWarning: Module certifi was already imported from /usr/local/lib/python2.7/dist-packages/certifi/init.pyc, but /usr/lib/python2.7/dist-packages is being added to sys.path
import pkg_resources

Also, it is possible to use more extensions like php,asp,html etc in the same scan?

Thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.