Git Product home page Git Product logo

csdn-blog-export's Introduction

#CSDNBlogExport
CSDN博客导出工具

之前一直想把CSDN的博客导入到自己的网站中,可是由于博客比较多,后面受朋友老郭启发,就找了个时间用Java开发了这款小工具。
Had been trying to CSDN blog into their website, but because of the blog is more, inspired by my friend guo behind, will find a time this kind of small tools with Java development.

#only use
直接下载CSDNBlogExport.7z解压使用即可。
不用配置环境
Direct download CSDNBlogExport.7z decompression can be used.
Don't need to configure the environment

经过测试,667篇博客,开50个线程,在54秒左右可以全部导出到文件。
Tested, 667 blog, open 50 threads, in 54 seconds can all exported to a file.

博客文件导出的存储规则是:
软件运行目录\blog\年-月\年-月-日 博客标题名.markdown
Blog file exported storage rule is:
Software running directory/blog/year-month/year-month-day blog title name.markdown

#开发 CSDNBlogExport目录下是完整的程序代码
使用了WebMagic爬虫框架,本来自己写HttpURLConnection工具类也能实现的,只是比较耗时,偷个小懒,既然别人有更好的工具,为什么不用呢
技术含量呢,可以说基本没有什么,但是也是花了大半天时间做的。
中间还遇到部分玩家无法导出博客的情况,因为CSDN对于用户的链接命名分了2种情况,当时写的时候没有发现,是测试别人博客的时候发现的,经过半小时解决了这个问题。

虽然很想把这个程序完善,但是由于时间限制还是不能做太多事。
不保证本版本一直能使用下去,如果某天本程序不能使用了(肯定是CSDN对返回的数据进行了处理或者进行了权限控制),请留言或者联系我QQ:619699629或者邮箱:[email protected]
我会利用空闲时间跟上csdn对博客的升级,以保证能继续使用

此版本为1.0版本,希望用的朋友遇到bug,在这里留言或联系我,我会及时修复。

也欢迎朋友加入进来与我一起完善本程序。
以后的更新会一直在本项目中维护,如果有需要,可以star本项目哦

本小程序可导出任意CSDN用户的博客,但是仅供学习使用。 免责声明:如果导出博客侵犯他人权益,引起纠纷的,一概与本人无关。

#development
CSDNBlogExport directory is a complete program code Used WebMagic crawler frame, original, write their own HttpURLConnection tools can be achieved only takes time, steal a little lazy, now that people have a better tools, why not Technical content, it was basically have no what, but also spent most of time to do.
And in case of some players cannot export blog because CSDN links named points to the user for two kinds of circumstances, then write not found, is to test others while on a blog, solved the problem after half an hour.

Although very want to send this application is perfect, but due to time constraints or can't do too many things.
Does not guarantee that this version has been able to use, if one day can't use this program (must be CSDN on the returned data processing or access control), please leave a message or contact me QQ: 619699629 or email: [email protected]

I will use free time keep up with the CSDN on updating the blog, to ensure that can continue to use

This version is 1.0 version, hope to meet with friends bug, leave a message or contact me here, I'll repair in time.
Also welcome friends to join in with me in perfect this procedure.
This small program can export any CSDN user's blog, but only for the use of learning.
Disclaimer: if the export blog infringement of rights and interests of others, cause disputes, all has nothing to do with himself.

#Bug修复记录 2017.7.31: 修复html编辑器写完博客后导出博客不全的bug
现在版本 v1.1
感谢CSDN博主[三名狂客]提出的Bug

2017.8.16:
进行版本升级,因为CSDN把一个分页的bug给堵上了,原来的版本不能使用,请下载最新版本2.0
现在版本 v2.0
感谢CSDN博主[龙腾四海365]提出的Bug

本次bug修复后,时间会延长一点,因为我在里面把分页的线程写死了,50个线程,有兴趣的可以自己扩展。
经过测试,在输入50线程时,668篇博客的爬取时间为92S。

2017.9.6:
进行版本升级,部分拥有两个id的CSDN用户无法备份博客,已进行修复,请下载最新版本2.1
现在版本 v2.1
感谢CSDN博主[沐雨浩]提示的Bug

csdn-blog-export's People

Contributors

chenhaoxiang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

csdn-blog-export's Issues

输入用户名获取不到博客信息

错误代码如下:

[INFO] 2018-09-03 11:39:47 [cn.chenhaoxiang.CSDNBlogExport.startGetBlogID(CSDNBlogExport.java:167)] -> 用户名:loveer0 [AWT-EventQueue-0] [chx] [INFO] 2018-09-03 11:39:48 [us.codecraft.webmagic.Spider.run(Spider.java:306)] -> Spider blog.csdn.net started! [AWT-EventQueue-0] [chx] [INFO] 2018-09-03 11:39:49 [us.codecraft.webmagic.downloader.HttpClientDownloader.download(HttpClientDownloader.java:88)] -> downloading page success http://blog.csdn.net/loveer0 [pool-1-thread-1] [chx] [INFO] 2018-09-03 11:39:49 [cn.chenhaoxiang.CSDNBlogExport.process(CSDNBlogExport.java:65)] -> 开始获取loveer0的博客文章ID... [pool-1-thread-1] [chx] [ERROR] 2018-09-03 11:39:49 [us.codecraft.webmagic.Spider$1.run(Spider.java:324)] -> process request Request{url='http://blog.csdn.net/loveer0', method='null', extras=null, priority=0, headers={}, cookies={}} error [pool-1-thread-1] [chx] java.lang.NullPointerException at cn.chenhaoxiang.CSDNBlogExport.process(CSDNBlogExport.java:94) at us.codecraft.webmagic.Spider.onDownloadSuccess(Spider.java:414) at us.codecraft.webmagic.Spider.processRequest(Spider.java:406) at us.codecraft.webmagic.Spider.access$000(Spider.java:61) at us.codecraft.webmagic.Spider$1.run(Spider.java:320) at us.codecraft.webmagic.thread.CountableThreadPool$1.run(CountableThreadPool.java:74) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) [INFO] 2018-09-03 11:39:49 [us.codecraft.webmagic.Spider.run(Spider.java:338)] -> Spider blog.csdn.net closed! 1 pages downloaded. [AWT-EventQueue-0] [chx] [INFO] 2018-09-03 11:39:49 [cn.chenhaoxiang.CSDNBlogExport.startGetBlogID(CSDNBlogExport.java:176)] -> 本次查询的文章ID数量:0 [AWT-EventQueue-0] [chx]

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.