The amphi from lbrndnr

amphi's Issues

Authentication

Use phx.gen.auth to refactor authentication

Profile page

A page that shows the user, liked posts/comments, written posts/comments, publications, collaborators maybe

PDF Reader

It should be super comfy to read a PDF, read the comments alongside and write new ones.

Load/Render PDF in a memory efficient way (release pages that are not being displayed, for example)
Display comment threads in some way
Make it possible to highlight text and comment that section
Display cited papers

Once we have a basic web crawler and website running we should deploy this to some service. Dunno which service suits our needs the best. Maybe we're good with just getting a database service first and run the website locally, only. I also heard of fly.io which might be cool.

Web Crawler

The crawler should have the following functionalities:

Fetch new articles and crawl through arxiv database (later other providers like pubmed)
Extract text, author (name, email, affiliation), publication date, citations, keywords, ccs
Save all related DOIs in order to avoid duplicates in the db (providers use different DOI for the "same" paper)
Save the entries in the database

I implemented a basic web crawler in js so that we can use pdf.js. This seemed to make it easier to read the pdf in comparison to pdfplumber, for example. Getting a clean copy of the text content is quite difficult, but might not be necessary.

Feed

There should be a feed with relevant/new papers that could be interesting to the user. No idea how to implement this though. The post order for the feed should probably take the following data points into account:

publication date
number of likes
topic (keywords, ccs)
number of reads

This means that first we have to implement things like:

post likes
mechanism to see different feeds (e.g. all, new, new in AI)
user history to count the number of reads of a paper

Database Setup

Eventually, it should be possible to do a fuzzy text search on the contents of all the papers in the db. As far as I understand, this is the perfect use case of NoSQL. However, NoSQL databases are relatively slow at relating data to one another, so loading all the publications of one specific author might be slow.
I'm not sure how to tackle this. Maybe it's possible to use postgres for data like comments/users and mongodb for the papers?

lbrndnr / amphi Goto Github PK

amphi's People

Contributors

Watchers

Forkers

amphi's Issues

Authentication

Profile page

PDF Reader

Infrastructure

Web Crawler

Feed

Database Setup

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent