Qian Cao's Projects
A repo for AI course of RUC in fall semester 2020.
Aman-4-Real/Aman-4-Real is a ✨special ✨ repository that I can use to add a README.md to my GitHub profile.
A beautiful, simple, clean, and responsive Jekyll theme for academics
Github Pages template for academic personal websites, forked from mmistakes/minimal-mistakes
http://nlp.seas.harvard.edu/2018/04/03/attention.html
An arXiv paper search engine based on Elasticsearch and fastapi.
A toolkit for arXiv papers daily reading. The script will crawl arXiv papers in custom areas everyday and display key information.
Paper, dataset and code list for multimodal dialogue.
An implementation of the BooleanSearch based on MapReduce.
This repo contains a pipeline to download, clean and process CommonCrawl data into a corpus.
一键拥有你自己的 ChatGPT 网页服务。 One-Click to deploy your own ChatGPT web UI.
10W首中文歌词数据库
Useful code templates, utils, and some toolkits.
ElasticSearch入门教程
GPT2 for Chinese chitchat/用于中文闲聊的GPT2模型(实现了DialoGPT的MMI**)
Easily turn large sets of image urls to an image dataset. Can download, resize and package 100M urls in 20h on one machine.
A repo for Math_Foundation_of_AI course of RUC in fall semester 2020.
[ACM MM 2022]: Multi-Modal Experience Inspired AI Creation
Open domain Chinese dialogue corpus and datasets.
PL0 Compiler 编译原理 C 语言 实现的 PL/0 编译器 flex & bison
[ACM MM 2024] See or Guess: Counterfactually Regularized Image Captioning
将维基百科中文语料,繁转简并提取文字内容整理成JSON文件