Git Product home page Git Product logo

jsfinder's Introduction

JSFinder

JSFinder is a tool for quickly extracting URLs and subdomains from JS files on a website.

JSFinder是一款用作快速在网站的js文件中提取URL,子域名的工具。

提取URL的正则部分使用的是LinkFinder

JSFinder获取URL和子域名的方式:

image

Blog: https://threezh1.com/

更新说明

用法

  • 简单爬取
python JSFinder.py -u http://www.mi.com

这个命令会爬取 http://www.mi.com 这单个页面的所有的js链接,并在其中发现url和子域名

返回示例:

url:http://www.mi.com                                         
Find 50 URL:                                                  
http://api-order.test.mi.com                                  
http://api.order.mi.com                                       
http://userid.xiaomi.com/userId                               
http://order.mi.com/site/login?redirectUrl=                                                   
...已省略                            
                                                              
Find 26 Subdomain:                                            
api-order.test.mi.com                                         
api.order.mi.com                                              
userid.xiaomi.com                                             
order.mi.com                                                                                              
...已省略

  • 深度爬取
python JSFinder.py -u http://www.mi.com -d

深入一层页面爬取JS,时间会消耗的更长。

建议使用-ou 和 -os来指定保存URL和子域名的文件名。 例如:

python JSFinder.py -u http://www.mi.com -d -ou mi_url.txt -os mi_subdomain.txt
  • 批量指定URL/指定JS

指定URL:

python JSFinder.py -f text.txt

指定JS:

python JSFinder.py -f text.txt -j

可以用brupsuite爬取网站后提取出URL或者JS链接,保存到txt文件中,一行一个。

指定URL或JS就不需要加深度爬取,单个页面即可。

  • 其他

-c 指定cookie来爬取页面 例:

python JSFinder.py -u http://www.mi.com -c "session=xxx"

-ou 指定文件名保存URL链接 例:

python JSFinder.py -u http://www.mi.com -ou mi_url.txt

-os 指定文件名保存子域名 例:

python JSFinder.py -u http://www.mi.com -os mi_subdomain.txt
  • 注意

url 不用加引号

url 需要http:// 或 https://

指定JS文件爬取时,返回的URL为相对URL

指定URL文件爬取时,返回的相对URL都会以指定的第一个链接的域名作为其域名来转化为绝对URL。

  • 截图

实测简单爬取:

python3 JSFinder.py -u https://www.jd.com/

URL:

02.jpg

03.jpg

Subdomain:

01.jpg

实测深度爬取:

python3 JSFinder.py -u https://www.jd.com/ -d -ou jd_url.txt -os jd_domain.txt

05.jpg

06.jpg

实际测试:

http://www.oppo.com
URL:4426 个
子域名:24 个

http://www.mi.com
URL:1043 个
子域名:111 个

http://www.jd.com
URL:3627 个
子域名:306 个

jsfinder's People

Contributors

atikrahman1 avatar cclauss avatar threezh1 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

jsfinder's Issues

请问显示这个是咋回事儿啊

Traceback (most recent call last):
File "D:\python\JSFinder-master\JSFinder.py", line 6, in
import requests, argparse, sys, re
ModuleNotFoundError: No module named 'requests'
Uploading 屏幕截图 2024-05-21 122425.png…

是用不了

下载解压后只有两个文件,进入cmd,是用不了是怎末回事

老是出现gbk错误 请问怎么解决呢?

使用命令py -3 JSFinder.py -u https://www.mi.com -d -ou mi_url.txt -os mi_subdomain.txt
老是出现
Traceback (most recent call last):
File "JSFinder.py", line 247, in
giveresult(urls, args.url)
File "JSFinder.py", line 221, in giveresult
print(url)
UnicodeEncodeError: 'gbk' codec can't encode character '\u200b' in position 81: illegal multibyte sequence

插件问题

我使用的是浏览器插件的方式,但是很多时候,启用jsfinder之后,我浏览网页的点击出现问题。

HTML Parser报错

TypeError: HTMLParser.init() got an unexpected keyword argument 'strict'

子域名抓取不准确

针对baidu.com jd.com等抓取的很准确,但是针对政府网站抓取不正确。。根据政府网站命名规则。,,,123.xxx.gov.cn(代表某厅网站),而某省网站是xxx.gov.cn 这样在抓取某厅的二级域名时,会将某gov.cn认为一级域名,而抓取某省所有厅部门网站,,,,而非某厅的二级域名

运行失败,提示js_url不存在

在windows10和kali下运行python3 JSFinder.py -u https://www.jd.com均提示如下:

url:https://www.jd.com
Traceback (most recent call last):
  File "JSFinder.py", line 242, in <module>
    urls = find_by_url(args.url)
  File "JSFinder.py", line 124, in find_by_url
    temp_urls = extract_URL(script_array[script])
  File "JSFinder.py", line 50, in extract_URL
    return [match.group().strip('"').strip("'") for match in result
  File "JSFinder.py", line 51, in <listcomp>
    if match.group() not in js_url]
NameError: name 'js_url' is not defined

能帮忙解决一下嘛?

Out of range

Getting issue while scanning file with 1000 links of js files.

ALL Find 510 links
Traceback (most recent call last):
  File "/JSFinder/JSFinder.py", line 264, in <module>
    giveresult(urls, urls[0])
IndexError: list index out of range

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.