Git Product home page Git Product logo

jav-scrapy's People

Contributors

birdyg avatar dependabot[bot] avatar hongjie104 avatar newdolphintime avatar pminmax945 avatar qiusli avatar raawaa avatar rchee avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

jav-scrapy's Issues

提示SyntaxError: Use of const in strict mode.

在我的linode VPS ubuntu X64 下运行出现错误。非IT人士一枚,请指导,谢谢!

root@ubuntu:~/jav-scrapy# jav -s ipz-634 -o ~/magnet.txt

/root/jav-scrapy/jav.js:16
const baseUrl = 'https://www.javbus.me';
^^^^^
SyntaxError: Use of const in strict mode.
at Module._compile (module.js:439:25)
at Object.Module._extensions..js (module.js:474:10)
at Module.load (module.js:356:32)
at Function.Module._load (module.js:312:12)
at Function.Module.runMain (module.js:497:10)
at startup (node.js:119:16)
at node.js:902:3

添加下载视频截图的方法

小封面下载的方法已经除掉,方法可见#23,在 getItemPagegetItemMagnet(link, meta, callback);后面添加了下载当前番号的视频截图的功能:

// 所有截图link
var snapshots = []
$('a.sample-box').each(function (i, e) {
	let $e = $(e);

	snapshots.push($e.attr("href"))
})
getSnapshots(link, snapshots);

getSnapshots 方法:

function getSnapshots(link, snapshots) {
    // https://pics.dmm.co.jp/digital/video/118abp00454/118abp00454jp-1.jpg
    for (var i = 0; i < snapshots.length; i++){
        getSnapshot(link, snapshots[i])
    }
}

getSnapshot 方法:

function getSnapshot(link, snahpshotLink) {
    let fanhao = link.split('/').pop();
    let itemOutput = output + "/" + fanhao
    mkdirp.sync(itemOutput);

    let snapshotName = snahpshotLink.split('/').pop();
    let fileFullPath = path.join(itemOutput, snapshotName)
    fs.access(fileFullPath, fs.F_OK, function (err) {
        if (err) {
            var snapshotFileStream = fs.createWriteStream(fileFullPath + '.part');
            var finished = false;
            request.get(snahpshotLink)
                .on('end', function () {
                    if (!finished) {
                        fs.renameSync(fileFullPath + '.part', fileFullPath);
                        finished = true;
                        // console.error(('[' + fanhao + ']').green.bold.inverse + '[截图]'.yellow.inverse, fileFullPath);
                    }
                })
                .on('error', function (err) {
                    if (!finished) {
                        finished = true;
                        // console.error(('[' + fanhao + ']').red.bold.inverse + '[截图]'.yellow.inverse, err.message.red);
                        errorCount++;
                    }
                })
                .pipe(snapshotFileStream);
        } else {
            // console.log(('[' + fanhao + ']').green.bold.inverse + '[截图]'.yellow.inverse, 'file already exists, skip!'.yellow);
        }
    })
}

由于本人对nodejs不熟,所以添加的方法是在作者原来的方法上改的。

你好,这个获取的网站怎么是https://www.3ubdxu00l1lkcjoz5n.com/

y> jav -b http://www.javbus.in/genre/28
========== 获取资源站点:https://www.3ubdxu00l1lkcjoz5n.com/ ==========
并行连接数: 2 连接超时设置: 30 秒
磁链保存位置: C:\Users\taw\magnets
代理服务器: 无
第1页页面获取失败:connect ETIMEDOUT 67.228.126.62:80
...进行第2次尝试...
第1页页面获取失败:connect ETIMEDOUT 67.228.126.62:80
...进行第3次尝试...
第1页页面获取失败:connect ETIMEDOUT 67.228.126.62:80
...进行第4次尝试...

bug

jav-scrapy/jav.js:370
                            mag_sizes = _.orderBy(mag_sizes, 'size', 'desc');
                                          ^

TypeError: _.orderBy is not a function

Mac下安装出现错误

iOSdeMac-mini:jav-scrapy Jafar$ npm link
npm ERR! Darwin 16.0.0
npm ERR! argv "/usr/local/bin/node" "/usr/local/bin/npm" "link"
npm ERR! node v7.2.1
npm ERR! npm v3.10.10
npm ERR! path /Users/ios/jav-scrapy
npm ERR! code EACCES
npm ERR! errno -13
npm ERR! syscall symlink

npm ERR! Error: EACCES: permission denied, symlink '/Users/ios/jav-scrapy' -> '/usr/local/lib/node_modules/jav-scarpy'
npm ERR! { Error: EACCES: permission denied, symlink '/Users/ios/jav-scrapy' -> '/usr/local/lib/node_modules/jav-scarpy'
npm ERR! errno: -13,
npm ERR! code: 'EACCES',
npm ERR! syscall: 'symlink',
npm ERR! path: '/Users/ios/jav-scrapy',
npm ERR! dest: '/usr/local/lib/node_modules/jav-scarpy' }
npm ERR!
npm ERR! Please try running this command again as root/Administrator.

npm ERR! Please include the following file with any support request:
npm ERR! /Users/ios/jav-scrapy/npm-debug.log
iOSdeMac-mini:jav-scrapy Jafar$ jav
-bash: jav: command not found
iOSdeMac-mini:jav-scrapy Jafar$ jav -h
-bash: jav: command not found
iOSdeMac-mini:jav-scrapy Jafar$ jav --help
-bash: jav: command not found
iOSdeMac-mini:jav-scrapy Jafar$ cd
iOSdeMac-mini:~ Jafar$ jav --help
-bash: jav: command not found
iOSdeMac-mini:~ Jafar$ ls
Applications Documents Library Music Public ijkplayer-ios
Desktop Downloads Movies Pictures bin jav-scrapy
iOSdeMac-mini:~ Jafar$ cd jav-scrapy/
iOSdeMac-mini:jav-scrapy Jafar$ jav --help
-bash: jav: command not found
iOSdeMac-mini:jav-scrapy Jafar$
怎么破?

可能是由于网站页面结构更改的原因,导致报错 TypeError: Cannot read property '1' of null

Node: v5.10.0
可能是由于网站页面结构更改的原因,导致图中 script 为空。
qq 20160613235115
然后下一句执行let meta = parse(script);调用 parse 的时候

function parse(script) {
  let gid_r = /gid\s+=\s+(\d+)/g.exec(script); //因为 script 为空,所以这里为 null
  let gid = gid_r[1]; // 然后这里就报错了
  let uc_r = /uc\s+=\s(\d+)/g.exec(script);
  let uc = uc_r[1];
  let img_r = /img\s+=\s+\'(\http.+\.jpg)/g.exec(script);
  let img = img_r[1];
  return {
    gid: gid,
    img: img,
    uc: uc,
    lang: 'zh'
  };
}

ubuntu 20.04 运行时报错

系统前段时间全新安装后,又重新安装了本工具,但是在运行时出现报错,这个问题在ubuntu 18.04时没有出现过,不知道是javbus更改了网页结构还是自己的系统的问题,目前nodejs版本是10.19.0,请大神指教,报错情况如下:

$ jav -s KTB -p 20 -o /home/twinfish/movie/Japan/foto/KTB -x http://127.0.0.1:8118
========== 获取资源站点:https://www.javbus.com/ ==========
并行连接数: 20        连接超时设置: 30 秒
磁链保存位置:  /home/twinfish/movie/Japan/foto/KTB
代理服务器:  http://127.0.0.1:8118
正处理以下番号影片...
KTB-054,KTB-053,KTB-052,KTB-051,KTB-050,KTB-049,KTB-048,KTB-047,KTB-046,KTB-045,KTB-044,KTB-043,KTB-042,KTB-041,KTB-040,KTB-039,KTB-038,KTB-037,KTB-036,KTB-035,KTB-034,KTB-033,KTB-032,KTB-031,KTB-030,KTB-029,KTB-028,KTB-026,KTB-027,KTB-025
/media/data2/download/software/net/jav-scrapy/jav.js:204
    let img = img_r[1];
                   ^

TypeError: Cannot read property '1' of null
    at parse (/media/data2/download/software/net/jav-scrapy/jav.js:204:20)
    at Request._callback (/media/data2/download/software/net/jav-scrapy/jav.js:234:28)
    at Request.self.callback (/media/data2/download/software/net/jav-scrapy/node_modules/request/request.js:185:22)
    at Request.emit (events.js:198:13)
    at Request.<anonymous> (/media/data2/download/software/net/jav-scrapy/node_modules/request/request.js:1161:10)
    at Request.emit (events.js:198:13)
    at IncomingMessage.<anonymous> (/media/data2/download/software/net/jav-scrapy/node_modules/request/request.js:1083:12)
    at Object.onceWrapper (events.js:286:20)
    at IncomingMessage.emit (events.js:203:15)
    at endReadableNT (_stream_readable.js:1145:12)

写了个GUI

不知道GUI这种东西不知道会不会被滥用滥传。。虽然我也编的不怎么好。。。Vc都忘光了,拿手头易语言编的。。。等改天闲了改成VC的代码再传吧,目前不准备发了,需要的可以拿e文件自己编译,不要外传(Pull #36
靠作者的代码让GUI实现了些花样功能比如方便地抓star、label、studio、series、genre类某一种的,比如切换有无码,比如换域名防墙,比如读写配置等。。。
1
2

TypeError: Cannot read property '1' of null

遇到如下错误:
D:\Develop\nodejs\jav-scrapy\jav.js:200
let gid = gid_r[1];
^

TypeError: Cannot read property '1' of null
at parse (D:\Develop\nodejs\jav-scrapy\jav.js:200:18)
at Request._callback (D:\Develop\nodejs\jav-scrapy\jav.js:235:20)
at Request.self.callback (D:\Develop\nodejs\jav-scrapy\node_modules\request\request.js:200:22)
at emitTwo (events.js:87:13)
at Request.emit (events.js:172:7)
at Request. (D:\Develop\nodejs\jav-scrapy\node_modules\request\request.js:1067:10)
at emitOne (events.js:82:20)
at Request.emit (events.js:169:7)
at IncomingMessage. (D:\Develop\nodejs\jav-scrapy\node_modules\request\request.js:988:12)
at emitNone (events.js:72:20)

bug

在执行jav -l 500(大于第一页的值),会发现只会解析第一页. 这个问题好像是你在提交"修正控制流上的小问题"出现的,如果改回去就对了,我没看出来什么原因.....

未处理完当前页影片就开始处理下一页的影片

当待处理影片小于当前页面影片数时,如果获取磁链或封面时出现网络请求错误,会直接跳过当前页面直接抓取下一页的影片。

例如:jav -l 2,首先会抓取第一页的两部影片,如果其中一部影片抓取出错,会直接结束本页处理,开始抓取下一页的两部影片。

========== 获取资源站点:http://www.javbus.in ==========
并行连接数: 2        连接超时设置: 1 秒
磁链保存位置:  /home/raawaa/magnets.txt
获取第1页中的影片链接 ( http://www.javbus.in )...
正处理以下番号影片...
HIHL-012,INBA-004
===== 第1页处理完毕 =====

获取第2页中的影片链接 ( http://www.javbus.in/page/2 )...
正处理以下番号影片...
HVG-021,RABS-015
总进度(1/2): [=========================-------------------------]
===== 第2页处理完毕 =====

获取第3页中的影片链接 ( http://www.javbus.in/page/3 )...
正处理以下番号影片...
RBD-724,RBD-721
总进度(2/2): [==================================================]
已抓取2个磁链,本次抓取完毕,等待其他爬虫回家...

导致了许多影片没有被抓取,直接被跳过了

img_r could be null

jav-scrapy/jav.js:203
    let img = img_r[1];
                   ^

TypeError: Cannot read property '1' of null

Client network socket disconnected before secure TLS connection was established

一直提示下班的错误,是什么问题!

========== 获取资源站点:https://www.3ubdxu00l1lkcjoz5n.com/ ==========
并行连接数: 10 连接超时设置: 3 秒
磁链保存位置: /Users/qihangchuangfu/magnets
代理服务器: http://127.0.0.1:1087
获取第1页中的影片链接 ( https://www.3ubdxu00l1lkcjoz5n.com/ )...
正处理以下番号影片...
NKKD-124,BBAN-230,HUNTA-591,AP-653,BBAN-231,BBAN-229,BBAN-228,HUNTA-589,HUNTA-592,BBSS-019,BBAN-227,AP-654,NKKD-127,FIV-040,DNW-032,KFNE-015,BURI-003,SCR-216,TUE-088,DPGD-003,KRU-038,MMB-234,KRU-021,SGA-128,ABP-856,JUY-845,JUY-848,JUY-838,JUY-843,JUY-841
[BBAN-229][截图] tunneling socket could not be established, statusCode=500
[BBAN-229][截图] tunneling socket could not be established, statusCode=500
[BBAN-229][截图] tunneling socket could not be established, statusCode=500
[BBAN-231][截图] Client network socket disconnected before secure TLS connection was established
[BBAN-231][截图] Client network socket disconnected before secure TLS connection was established
[BBAN-231][截图] Client network socket disconnected before secure TLS connection was established
[BBAN-231][截图] Client network socket disconnected before secure TLS connection was established
[BBAN-231][截图] Client network socket disconnected before secure TLS connection was established
[BBAN-229] read ECONNRESET
[BBAN-229][截图] read ECONNRESET
[BBAN-229][截图] read ECONNRESET
[BBAN-229][截图] read ECONNRESET

Windows上运行报错800A03F6

这是我的运行环境

  • Windows 10 Pro x64
  • node v4.2.1
  • npm 2.14.7

在执行npm install后,会报错npm WARN package.json [email protected] No repository field.

运行jav -h时,Windows Script Host会弹出报错

脚本: G:/jav_scrapy/jav.js
行 : 1
字符:1
错误:无效字符
代码:800A03F6
源 :Microsoft JScript编译错误

请问这是什么原因?我没有Google到合适的解决方法

TypeError: Cannot read properties of null (reading '1')

TypeError: Cannot read properties of null (reading '1')
at parse (D:!Software\jav-scrapy-0.7.0\jav.js:204:20)
at Request._callback (D:!Software\jav-scrapy-0.7.0\jav.js:235:28)
at Request.self.callback (D:!Software\jav-scrapy-0.7.0\node_modules\request\request.js:185:22)
at Request.emit (node:events:390:28)
at Request. (D:!Software\jav-scrapy-0.7.0\node_modules\request\request.js:1154:10)
at Request.emit (node:events:390:28)
at IncomingMessage. (D:!Software\jav-scrapy-0.7.0\node_modules\request\request.js:1076:12)
at Object.onceWrapper (node:events:509:28)
at IncomingMessage.emit (node:events:402:35)
at endReadableNT (node:internal/streams/readable:1343:12)
这个应该是没有抓到图片导致的吧,还有好像没有不抓图像的选项

有些影片的标题中含有/字符会导致错误

前提:
function getItemCover(link, meta, done) {
var fanhao = link.split('/').pop();
var filename = fanhao + 'l.jpg';
个人修改成
function getItemCover(link, meta, done) {
var fanhao = link.split('/').pop();
var filename = meta.title + '.jpg';
以便直接输出以影片标题为文件名的封面图

但是有些影片的标题中含有/字符,而这个字符会导致命名错误
20180609a
比如截图中的MUDR-034
想请教一下作者,有没有什么办法能够直接删除标题中的/字符或者替换为.或空格之类可以在文件名中使用的字符?

图片名增加标题日期演员的办法

改js文件(win10路径C:\Users\XXX\AppData\Roaming\npm\node_modules\jav-scarpy\jav.js):

function getItemCover(link, meta, done) {
var fanhao = link.split('/').pop();
var filename = fanhao + 'l.jpg';
改为
function getItemCover(link, meta, done) {
var fanhao = link.split('/').pop();
var temptitle = meta.title;
var shorttitle = temptitle.substring(0, 200);
var finaltitle = shorttitle.replace(/[\/*?:"|<>]/g, '');
var filename = finaltitle + ' -' + meta.actress + '['+ meta.date + '].jpg';

第204行 let img = img_r[1]; 报错 TypeError: Cannot read property '1' of null

jav -l 50 -p 10 -t 3000
========== 获取资源站点:http://www.javbus.me ==========
并行连接数: 10 连接超时设置: 3 秒
磁链保存位置: C:\Users\shengjianbin\magnets
代理服务器: 无
获取第1页中的影片链接 ( http://www.javbus.me )...
正处理以下番号影片...
ABP-435,ABP-439,CHN-099,ABP-437,ABP-438,ABP-436,JUX-790,SNIS-602,PGD-846,GVG-257,GVG-256,GVG-255,GVG-259,GVG-258,GVG-261,RVG-020,GVG-260,DAVK-001,RBD-744,BF-437,BF-438,BF-439,BF-436,ATID-265,ADN-088,ADN-086,ATID-266,SNIS-597,SNIS-598,SNIS-599
D:\Develop\nodejs\GitHub\jav-scrapy\jav.js:204
let img = img_r[1];
^

TypeError: Cannot read property '1' of null
at parse (D:\Develop\nodejs\GitHub\jav-scrapy\jav.js:204:18)
at Request._callback (D:\Develop\nodejs\GitHub\jav-scrapy\jav.js:235:20)
at Request.self.callback (D:\Develop\nodejs\GitHub\jav-scrapy\node_modules\request\request.js:199:22)
at emitTwo (events.js:87:13)
at Request.emit (events.js:172:7)
at Request. (D:\Develop\nodejs\GitHub\jav-scrapy\node_modules\request\request.js:1036:10)
at emitOne (events.js:82:20)
at Request.emit (events.js:169:7)
at IncomingMessage. (D:\Develop\nodejs\GitHub\jav-scrapy\node_modules\request\request.js:963:12)
at emitNone (events.js:72:20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.