Git Product home page Git Product logo

zhihulink's Introduction

Languages

Deep Learning

Books

zhihulink's People

Contributors

oovm avatar wjxway avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

zhihulink's Issues

The overall logic of full project

登陆 ZhihuLinkLogin

使用模拟登陆或者手动的方式获得 cookies

接下来使用 cookies 构造 http 请求获取网页内容

前端 ZhihuLinkGet

请求到的原始内容存放在本地 ApplicationData/ZhihuLink

但是希望用户不要手操本地内容, 一切主要操作放在 FrontEnd.

数据处理 ZhihuLinkData

原始数据太过繁杂, 所以需要清洗数据, 同时清空缓存.

然后转存为信息密度更高的结构.

数据分析 ZhihuLinkAnalysis

从已经清洗完的数据中进行数据的分析和可视化

其他工具

其他工具依托于请求到的原始数据或者清洗过的数据.

XMLObject appears in the converted document

转换的文件中出现 XMLObject

这是由于转换规则冲突或缺失导致的

XMLObject[ ] 从左边括号到右边括号整个粘贴上来即可

我更新规则后会告知使用哪个新版本

敏感字符串的话用 "XXXXXXXXXXXXXXXXXXXXXX" 代替即可

Manage the cookies

昨天的内测表明现在的操作很糟糕, 太过复杂, 绝大多数用户难以忍受这么多的步骤.

同时 #2 遇到了巨大的难题, 手动的添加 cookies 仍然是唯一的解决方案.

  • 需要一个cookies管理方案

Format the raw answer data

  • Transform the answer from html to markdown

Steps:

  • HTML-Formatted
  • Parsing the HTML
  • Finishing transform to Markdown

Function::Get user following

Modules

  • ZhihuLinkFollowee
  • ZhihuLinkFollower
  • ZhihuLinkFollowingQuestion
  • ZhihuLinkFollowingTopic
  • ZhihuLinkFollowingColumn
  • ZhihuLinkFollowingFavlist

Simulated login

  • Login use username and password
  • Automatic re-login if cookies expired

Overall Data Structure

These features SHOULD NOT be introduced until v3.0

data structure

数据分两种形式进行存储:

  1. 封装形式——此为用户使用的形式,可以包括ZhihuCookieData,ZhihuUserData,ZhihuPostData,ZhihuBlogData等等
  2. 纯数据形式——此为数据存储的形式,主要是以单文件形式存在的Cookie和数据库形式存在的各种其他数据。

理想状态下,Cookie在硬盘中应该采用如下形式:

<|userid-><|"Cookies"->{<|cookie1|>,<|cookie2|>,...},"BasicInfo"-><|"Username"->***,...|>|>,...|>

第一部分是真实Cookie的全部信息,第二部分则存储了需要用来构建ZhihuCookieData所需要的一些额外信息。这样可以在构建一个ZhihuCookieData对象的时候不需要联网啥的,那样太慢了。若是没有"BasicInfo",用户在选择cookie的时候将会有困难。

其他的数据就是Database一份内存一份,Database该怎么存怎么存,取出来的时候直接封装成对应的***Data形式的东西就行,所有的数据分析都应该基于***Data的形式。

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.