Git Product home page Git Product logo

novel-robot's Introduction

NAME

Novel::Robot

download novel /bbs thread 小说/贴子下载器

site

support novel/forum website 支持小说/贴子站点

Novel::Robot::Parser

type

support robot ouput file type, 支持小说输出形式

Novel::Robot::Packer

install

for example, on debian, 以debian环境为例

$ apt-get install parallel sendemail ansible calibre cpanminus
$ cpanm Novel::Robot

run_novel.pl

download url and convert

download and convert ebook, 简单下载或处理电子书

download the whole novel and convert to ebook 下载全本小说并转换为电子书

run_novel.pl -u "http://www.jjwxc.net/onebook.php?novelid=14838" -T mobi
run_novel.pl -u "http://www.jjwxc.net/onebook.php?novelid=14838" -o mytest.epub

download 1-3 chapter 下载小说的1-3章,最终输出文件名为abc.mobi

run_novel.pl -u "http://www.jjwxc.net/onebook.php?novelid=14838" -o abc.mobi -G "-i 1-3"

deal txt and convert

convert txt to ebook, 把txt文件转成电子书

run_novel.pl -f 飘灯-风尘叹.txt -T mobi
run_novel.pl -f fct.txt -w 飘灯 -b 风尘叹 -T mobi

download by site-writer-book and convert

run_novel.pl -s lofter -T mobi -w chuweizhiyu -b 时之足
run_novel.pl -s lofter -T mobi -w chuweizhiyu -b 时之足 -G "-i 3-"
run_novel.pl -s lofter -T mobi -w chuweizhiyu -b 时之足 -G "-i 3-5"

send novel with email

download/convert novel, use sendEmail send mobi to email address : [email protected]

下载小说并使用sendEmail推送到指定邮箱

local smtp service 本地已安装smtp服务

run_novel.pl -u "http://www.jjwxc.net/onebook.php?novelid=14838" -T mobi -t "[email protected]" -S "-f [email protected]"
run_novel.pl -f fct.txt -w 飘灯 -b 风尘叹 -T mobi -t "[email protected]" -S "-f [email protected]"

remote smtp service 使用远程smtp服务

run_novel.pl -u "http://www.jjwxc.net/onebook.php?novelid=14838" -T mobi -t "[email protected]" -G "-i 1-3"  -S "-f [email protected] -s smtp.src.com -xu xxx -xp somepwd"
run_novel.pl -u "http://www.jjwxc.net/onebook.php?novelid=14838" -T mobi -t "[email protected]" -S "-f [email protected] -s smtp.qq.com:587 -o tls=yes -xu yyy -xp 'aaaaaaaaaaaaagga'"
run_novel.pl -f fct.txt -w 飘灯 -b 风尘叹 -T mobi -t "[email protected]" -S "-f [email protected] -s smtp.src.com -xu xxx -xp somepwd"
run_novel.pl -f fct.txt -w 飘灯 -b 风尘叹 -T mobi -T mobi -t "[email protected]" -S "-f [email protected] -s smtp.qq.com:587 -o tls=yes -xu yyy -xp 'aaaaaaaaaaaaagga'"

use ansible,push ebook to remote host, and then sendEmail

使用ansible把电子书上传到远程服务器,再在远程服务器调用sendEmail在服务器直接smtp推送

run_novel.pl -h remote.vps.com -u "http://www.jjwxc.net/onebook.php?novelid=14838" -T mobi -t "[email protected]" -S "-f [email protected]"
run_novel.pl -h remote.vps.com -f fct.txt -w 飘灯 -b 风尘叹 -T mobi -t "[email protected]" -S "-f [email protected]"

ARG

-u : book url,小说url
-w : writer name, 作者
-b : book name,书名
-f : txt file, txt文件或目录

-t : send to email address,推送的目标邮箱地址

-o : output filename,输出电子书文件名
-T : ebook type,电子书类型

-G : get_novel.pl args
-C : conv_novel.pl args
-S : sendEmail args

-h : remote host,远程调用的机器名

get_novel.pl

download

download novel from url

下载小说

get_novel.pl -u [url] -t [type] -i [min_item_num-max_item_num] --cookie [cookie] -o [dst_file/dst_dir]
get_novel.pl -u [小说目录页url] -t [目标文件类型] -i [起止章节号] --cookie [cookie登录信息] -o [目标文件名/目标文件夹]

get_novel.pl -u "http://www.jjwxc.net/onebook.php?novelid=14838" -t txt
get_novel.pl -u "http://www.jjwxc.net/onebook.php?novelid=14838" -t html
get_novel.pl -u "http://www.jjwxc.net/onebook.php?novelid=14838" -t web -o some_dir

get_novel.pl -u "http://www.jjwxc.net/onebook.php?novelid=14838" -t html -i 3
get_novel.pl -u "http://www.jjwxc.net/onebook.php?novelid=14838" -t html -i 3-4
get_novel.pl -u "http://www.jjwxc.net/onebook.php?novelid=14838" -t html -i 3-
get_novel.pl -u "http://www.jjwxc.net/onebook.php?novelid=14838" -t html -i -3

jjwxc vip: 可以使用firefox浏览器,先登录m.jjwxc.net,然后用 cookies.txt 扩展导出cookies.txt,则可以下载当前登录账号所购买的小说;也可直接指定firefox配置目录下的cookies.sqlite文件;或者直接使用m.jjwxc.net的cookie字符串

get_novel.pl -u "http://www.jjwxc.net/onebook.php?novelid=217747" -i 33-34 --cookie cookies.txt
get_novel.pl -u "http://www.jjwxc.net/onebook.php?novelid=217747" -i 33-34 --cookie ~/.mozilla/firefox/*/cookies.sqlite
get_novel.pl -u "http://www.jjwxc.net/onebook.php?novelid=217747" -i 33-34 --cookie "name1=value1; name2=value2"

convert txt

convert txt, parse chapter name with regex

转换txt,可指定章节标题的正则式

get_novel.pl -w [writer] -b [book] -f [txt_file/directory] -t [type] -r [chapter_regex]
get_novel.pl -w [作者] -b [书名] -f [txt文件或目录] -t [目标文件类型] -r [章节标题匹配的正则式]
get_novel.pl -w 牵机 -b 断情逐妖记 -f dq1.txt -t html
get_novel.pl -w 牵机 -b 断情逐妖记 -f dq1.txt,dq2.txt,dir1 -r "第[ \\t\\d]+章" -t html
get_novel.pl -f 飘灯-像妖怪一样自由.txt -t html

only print info

only print info, but not download, 输出小说信息(不下载)

get_novel.pl -u [url] -D 1
get_novel.pl -u "http://www.jjwxc.net/onebook.php?novelid=14838"  -D 1

ARG

-s : site, 指定站点

-u : book url,小说url

-w : writer name 作者名
-b : book name,书名
-f : txt file / txt file dir, 指定文本文件来源(可以是单个目录或文件)
-r : chapter regex, 指定分割章节的正则表达式(例如:"第[ \\t\\d]+章")
-c : firefox cookies.sqlite / netscape HTTP cookie file / cookie string, details in Novel::Robot::Browser

-t : save type, 小说保存类型,例如txt/html
-o : output filename, 保存的小说文件名

-p : min_page_num-max_page_num, 只取 x-y 页
-i : min_item_num-max_item_num, 只取 x-y 章/楼

-C : with_toc, 小说保存时是否生成目录(默认是)
-G : grep_content , 提取关键字
-F : filter_content , 过滤关键字
-A : only_poster, 贴子只看楼主
-N : min_content_word_num, 贴子每楼层最小字数

--novel_list_path : xpath to extract novel_list, 提取novel_list的路径
--content_path    : xpath to extract content, 提取content的路径
--writer_path     : xpath to extract writer, 提取writer的路径
--book_path       : xpath to extract book, 提取book的路径
--content_regex   : regex to extract content, 提取content的正则
--writer_regex    : regex to extract writer, 提取writer的正则
--book_regex      : regex to extract book, 提取book的正则

-D : only print info, not download, 只输出信息,不下载
-v : verbose, 显示进度条(默认显示)

-P : max_process_num, 进程个数 

conv_novel.pl

use calibre to convert novel file into epub/mobi/..., default filename format is [writer]-[bookname].[type]

使用calibre将下载的 html格式 的小说转换成 其他格式的电子书,例如epub、mobi等等。如果未指定writer及book选项,则需要将html源文件名称设置为 [作者-书名]

conv_novel.pl -f [txt_file] -t [type] -w [writer] -b [book]
conv_novel.pl -f [源文件] -t [目标文件类型(小写)] -w [作者] -b [书名]

conv_novel.pl -f 天平-风起阿房.html -t mobi
conv_novel.pl -f mxj.html -w 施定柔 -b 迷侠记 -t epub

bulk_novel.pl

get bulk novel info: <writer,book,url> , default option is not download

get writer's novels info, not download

批量获取版块/作者专栏的小说信息,默认-D=1不下载

bulk_novel.pl -b "http://www.jjwxc.net/oneauthor.php?authorid=14644"
bulk_novel.pl -b "http://www.jjwxc.net/oneauthor.php?authorid=14644" -i 1-3

get board's threads info, not download

查询版块贴子信息,默认-D=1不下载

bulk_novel.pl -b "http://bbs.jjwxc.net/board.php?board=153&page=0" -P 1-2
bulk_novel.pl -b "http://bbs.jjwxc.net/board.php?board=153&page=0" -i 1-20

query novels info, not download

批量查询,不下载

bulk_novel.pl -s jjwxc -q 作者 -k 牵机 -i 1-10
bulk_novel.pl -s jjwxc -q 作品 -k 网王 -P 1-3
bulk_novel.pl -s hjj -b 153 -q 贴子主题 -k 迷侠记

bulk download novels

下载专栏内的所有小说

bulk_novel.pl -b "http://www.jjwxc.net/oneauthor.php?authorid=14644" -D 0

manually select some novels, use parallel for multiple novels download

手动选择下载部分小说,用parallel批量调用run_novel.pl获取mobi

bulk_novel.pl -s jjwxc -q 作者 -k 牵机 -P 1 > raw_booklist.txt
awk -F, '$1=="牵机"' raw_booklist.txt > refine_booklist.txt
parallel --colsep , run_novel.pl -u "{3}" -T mobi :::: refine_booklist.txt

ARG

-s : site, 指定站点

-b : board_url / board_id  版块url,或版块编号

-q : query type, 查询类型
-k : query keyword, 查询关键字

-P : min_page_num-max_page_num, 只取 x-y 页
-i : min_item_num-max_item_num, 只取 x-y 项

-D : only print info, not download, 只输出信息,不下载

FUNCTION

new

init to set src site and dst type

初始化设置解析引擎,目标文件类型

my $xs = Novel::Robot->new(
site => 'jjwxc',
type => 'html', 
);

set_parser

set src site, 设置解析引擎

$xs->set_parser('jjwxc');

set_packer

set dst type, 设置打包引擎

$xs->set_packer('html');

get_item

download one novel/thread 下载整本小说

$xs->set_parser('jjwxc');
my $index_url = 'http://www.jjwxc.net/onebook.php?novelid=2456';
$xs->get_item($index_url);


$xs->set_parser('txt');
$xs->get_item([ '/somepath/somefile.txt' ]
        writer => '牵机', book => '断情逐妖记', 
        );

bulk novels and download

bulk download writer's novel 下载作者专栏

$xs->set_parser('jjwxc');
my $writer_url = 'http://www.jjwxc.net/oneauthor.php?authorid=14644';
my ($writer_name, $books_ref) = $xs->{parser}->get_board_ref($writer_url, %opt);
$xs->get_item($_, %opt) for @$books_ref;

bulk download board's threads 下载版块

$xs->set_parser('hjj');
my $board_url = "http://bbs.jjwxc.net/showmsg.php?board=153";
my ($info, $tiezis_ref) = $xs->{parser}->get_board_ref($board_url, %opt);
$xs->get_item($_, %opt) for @$tiezis_ref;

query novels/threads 查询关键字并下载

my $query_type = '作者';
my $query_keyword='牵机';
my ($info, $items_ref) = $xs->{parser}->get_query_ref($query_keyword, query_type => $query_type, %opt);
$xs->get_item($_, %opt) for @$items_ref;

novel-robot's People

Contributors

abbypan avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.