Comments (18)
对于这个功能如何做,我不是很清楚。因为不知道如何保存这个东西。我觉得是否可以通过禁用某些js就可以了。不过需要进行测试。等我有空,可以试试
from translators_cn.
/*!
* Prototype of Zotero web translator for 知乎 (Zhihu)
*
* Language: TypeScript 4.0; ES2020
* Date: 2020-10-15
*
* Lemming <https://github.com/Lemmingh>
* Licensed under the ISC License.
*/
/// <reference path="types-zotero/index.d.ts" />
/// <reference path="types-zhihu/index.d.ts" />
'use strict';
/* Main. */
async function doWeb(_doc: Document, url: string): Promise<void> {
const newZoteroItem = new Zotero.Item('document');
// Remove unnecessary query and fragment string from `url`, before performing other operations.
url = url.split(/\?|#/)[0];
// If there is any exception, let it propagate to terminate the process.
const item: ZhihuItem = await getZhihuItem(url);
// Generate the snapshot of the Zhihu item.
const snapshot: HTMLDocument = getSnapshot(item);
newZoteroItem.attachments.push({
title: 'Snapshot',
document: snapshot
});
// TODO: Set metadata.
newZoteroItem.title = item.title;
// Save.
newZoteroItem.complete();
}
/**
* Returns a `Promise<ZhihuItem>`.
* @param {string} url - URL of the answer or Zhuanlan article.
*/
async function getZhihuItem(url: string): Promise<ZhihuItem> {
// ! Warning:
// ! RegExp lookbehind assertions are not available until ECMAScript 2018.
// ! https://github.com/tc39/proposal-regexp-lookbehind
// Zhuanlan article URL examples:
// https://zhuanlan.zhihu.com/p/59589298
// https://zhuanlan.zhihu.com/p/35295235
const ZhuanlanArticleIdRegexp = /(?<=p\/)\d+/;
// Answer URL examples:
// https://www.zhihu.com/answer/984072342
// https://www.zhihu.com/question/24952084/answer/984072342
// https://www.zhihu.com/question/392313958/answer/1198915276
const AnswerIdRegexp = /(?<=answer\/)\d+/;
if (url.startsWith('https://zhuanlan.zhihu.com/p/')) {
return getAsArticle();
} else if (url.includes('/answer/')) {
return getAsAnswer();
} else {
throw new Error("Not supported");
}
/* Sends an HTTP GET request to Zhihu API. */
// Important:
// Many Zhihu services, including API, lack cross-origin headers.
// Well, it doesn't seem to matter, since Zotero web translator seems to run in the same context as the active document in the browser.
// And `fetch()` can be easily converted to `Zotero.HTTP.request()`.
// ! `Promise` may throw exceptions that can only be caught by try-catch.
function fetchJson(aUrl: RequestInfo): Promise<Zhihu.Answer | Zhihu.Article> {
return fetch(aUrl).then(res => res.json());
}
async function getAsAnswer(): Promise<ZhihuItem> {
// Construct API URL.
const idStr = url.match(AnswerIdRegexp)![0];
const apiUrl = `https://www.zhihu.com/api/v4/answers/${idStr}?include=content,excerpt`;
const res = await fetchJson(apiUrl) as Zhihu.Answer;
return {
id: res.id,
type: res.type as ZhihuItemType,
author: {
id: res.author.id,
name: res.author.name,
},
content: res.content!,
excerpt: res.excerpt!,
title: res.question.title,
};
}
async function getAsArticle(): Promise<ZhihuItem> {
// Construct API URL.
const idStr = url.match(ZhuanlanArticleIdRegexp)![0];
const apiUrl = `https://zhuanlan.zhihu.com/api/articles/${idStr}`;
const res = await fetchJson(apiUrl) as Zhihu.Article;
return {
id: res.id,
type: res.type as ZhihuItemType,
author: {
id: res.author.id,
name: res.author.name,
},
content: res.content,
excerpt: new DOMParser().parseFromString(res.excerpt, 'text/html').body.innerText,
title: res.title,
};
}
}
/**
* Generates the snapshot of a ZhihuItem.
* @param {ZhihuItem} item
*/
function getSnapshot(item: ZhihuItem): HTMLDocument {
const doc = new HTMLDocument();
// Assuming that response from Zhihu is safe.
doc.body.innerHTML = item.content;
doc.title = item.title;
// TODO: Inject other elements if necessary.
return doc;
}
/* High-level representations of entities on Zhihu, with only properties we are interested in. */
/**
* A unified representation of items (Zhihu answer or Zhuanlan article) on Zhihu.
*/
interface ZhihuItem {
/**
* Item ID.
*/
id: number;
/**
* Item type.
*/
type: ZhihuItemType;
author: ZhihuAuthor;
/**
* Content.
* An HTML string.
*/
content: string;
/**
* The first few sentences of the content. Can be used as abstract.
* A **plain text** string.
*/
excerpt: string;
/**
* The name.
*
* For an article, use its title.
* For an answer, use question title.
*/
title: string;
}
const enum ZhihuItemType {
Answer = 'answer',
Article = 'article',
People = 'people',
Question = 'question',
}
/**
* A user on Zhihu.
*/
interface ZhihuAuthor {
/**
* User ID.
*/
id: string;
/**
* Item type.
*/
// type: ZhihuItemType.People;
/**
* 用户名.
*/
name: string;
}
from translators_cn.
from translators_cn.
估计可能是知乎那边做了一些跨域请求的限制。
目前我这边没有相关zhihu的抓取插件,负责现在的抓取插件应该是在官方那边负责的。
from translators_cn.
这个功能是否可以做?
知乎上有些比较有用的技术贴,想用zotero来管理。
from translators_cn.
可能是知乎那边做了一些跨域请求的限制
我认为这不是知乎的问题。
CORS (现在是 Fetch 标准的一部分)本身非常严格。CORS 请求是 HTTP 请求(只能使用 HTTP / HTTPS 协议),标准规定了必须和允许的 header,还跟着其他一堆约束条件。此外,某些浏览器还会自己添加标准之外的限制。
那张图显然是直接打开了本地 HTML 文件,协议是 file (本地资源,会导致没有 Origin
),这肯定不可以。试试搭个 HTTP 服务器。
这个功能如何做
前些阵子,牛岱发布了 Zhihu On VSCode,其中实现了浏览知乎的功能。那么,抓取知乎页面肯定是可行的。
不妨参考他的办法,访问知乎的 API:
网上也有其他人研究过,比如
from translators_cn.
@Lemmingh 谢谢这位同学提供的这么详细的资料,我抽空看一下
from translators_cn.
@DansYU 同学,你这边的需求是什么,把回答离线保存下来么?
from translators_cn.
我的需要就是能把知乎的专栏网页 和 回答保存下载,要是能实现酒非常感谢了!!
from translators_cn.
@DansYU 同学,你这边的需求是什么,把回答离线保存下来么?
哇噻,还在积极开发。zotero开源生态真不错。owner真强。以后能自己动手就好了~项目README.MD最后的链接很有帮助
from translators_cn.
from translators_cn.
专栏文章的 <head>
有 Open Graph 元数据。回答的 div.ContentItem.AnswerItem
中有 Microdata 元数据。这些也能用,而且应该比走 API 方便。
from translators_cn.
@Lemmingh 我不是很明白,这个snapshot的item是哪个translator控制的?
from translators_cn.
是指这个吗:
function getSnapshot(item: ZhihuItem): HTMLDocument
这是我自己捏的函数。
item
的数据格式是 ZhihuItem
,定义在下面那一团里。
整个原型都用 TypeScript 写,图方便。
from translators_cn.
@Lemmingh 这样啊,不过现在Zotero好像不能自定义item type。我现在还不大确定知乎这个要做成哪种的 item type。或者直接使用webpage这个类型。
from translators_cn.
你是不是没用过 TypeScript。😅
TypeScript 可以理解为加了静态检查的 JavaScript,要编译成 JavaScript 才能运行。
知乎的 Zotero type 取 document
即可:
let newZoteroItem = new Zotero.Item('document');
snapshot
的类型是 HTMLDocument
,虽然 Zotero 的文档没有明说,但可以观察出来。其他的对象类型也可以推导。
from translators_cn.
我的需要就是能把知乎的专栏网页 和 回答保存下载,要是能实现酒非常感谢了!!
我觉得可以用其他一些工具来保存页面,zotero只负责元数据的提取,最后将附件和zotero关联即可。
利用一些chrome插件,像简悦对知乎有适配,可以保存为离线markdown,pdf,html之类的;
SingleFile可以完美的保存离线HTML。
我最近在管理B站视频就是这种思路,手动下载视频,再关联条目。
from translators_cn.
@Lemmingh 参考了你前面给的API链接。我测试了下,可以把知乎上的内容保存到笔记中,图片,链接以及具体格式的显示效果都挺好。
from translators_cn.
Related Issues (20)
- People's Daily Database/人民日报图文数据库 图书馆VPN访问 HOT 3
- Baidu Baike/百度百科搜索 HOT 2
- CHINESE JOURNAL OF LAW/法学研究期刊 HOT 2
- Wanfang Data/万方数据 HOT 1
- samsoncn.com/西安三才 HOT 2
- CSDN技术博客
- Zhihu/知乎 HOT 3
- Jd/京东 HOT 4
- 申请增加社会科学文献出版社旗下多个数据库的翻译器 HOT 1
- 建议增加**社会科学的翻译器 HOT 5
- RHHZ
- E-Tiller HOT 4
- Founder/方正
- 请问京东和豆瓣的翻译器有没有可能实现识别“译”“编”等字,然后在导入时直接识别为字段“译者”“编辑”? HOT 4
- 抓取AI辅读网站【https://papers.cool】 HOT 2
- SciEngine
- 申请添加**国家版本馆的翻译器 HOT 1
- 1 HOT 2
- 无法保存条目 HOT 1
- Airiti library HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from translators_cn.