biotorrents / gazelle Goto Github PK

View Code? Open in Web Editor NEW

19.0 2.0 3.0 24.09 MB

BioTorrents.de’s version of Gazelle

Home Page: https://torrents.bio

License: ISC License

PHP 76.82% JavaScript 2.70% SCSS 2.68% Twig 17.78% Shell 0.02%

biology biological-data biological-sequences bittorrent bittorrent-tracker website api

gazelle's People

Contributors

Stargazers

Watchers

Forkers

baiyou2014 gfgstudio itsdaithi

gazelle's Issues

Test all user account emails

Move all social features to a Discourse backend

Replace huge swaths of garbage homebrew code for nonessential features to a Discourse API backend running in a Docker container. Several steps to this migration:

set up the Discourse Connect SSO with automatic forum login
finish mocking up the forums, wiki, torrent comments, news/blog, user profile, and private message interfaces
support the full CRUD operations of the relevant Discourse features
lock all social stuff behind an authentication challenge ("you must be logged in to view the forums")
migrate the existing data to Discourse
proxy the Discourse API through the BioTorrents.de one

This will free up a lot of time to focus on the actual torrent features. There are some longstanding forum bugs, as well as huge SQLi potential, that I don't want to fix (e.g., locking and moving threads has never worked right).

This should be reasonably feature complete compared to what currently exists, except with a better frontend and overall cleaner backend logic.

Bearer token scopes

Can be pretty simple, sliced by section or HTTP method. Whatever works and is easiest.

Fix RSS

Support CRUD with the API

The API and web interface need equal functionality. Uploading torrents, etc.

Buy and test a U2F hardware key

I recently received a YubiKey 5C NFC essentially for free, so I can now develop and test the FIDO2 authentication feature. It will be implemented using the native browser WebAuthn specification, of course, and not rely on the couple of deprecated libraries that OT Gazelle used.

Add OpenAI output to requests

It's not readily apparent whether a collection is locked or unlocked

Long term: upgrade to anniemaybytes/chihaya

https://github.com/anniemaybytes/chihaya

Redirect to requested page on login

Rewrite author pages and add SemanticScholar API data

I guess a good thing to do while I'm at it, is to use the SemanticScholar API to make Sci-Hub links not suck (right now they're just raw HTML links without any context).

Remove the Sphinx dependency entirely

High priority because the request pages and probably the torrent search API are both broken.

All integer primary keys should be a bigint and most database tables should have a UUID v7 unique key

Branching off the work in creatorObjects to position the database for scale. I've been meaning to implement some kind of basic sharding and replication since the beginning, which relies on not having key collisions. UUID v7 stored as binary(16) as a unique key, while maintaining the standard auto-increment id bigint columns, seems to be the way to go.

The database class is already set up to transparently handle UUID binary to string conversion so, e.g., select uuid, name from creators order by created desc limit 10 will return UUID's in the form of 01877b4a-b27c-70db-9522-149e9a40ef59.

UUID documentation:
https://uuid.ramsey.dev/en/stable/rfc4122/version7.html
https://uuid.ramsey.dev/en/stable/database.html

Sharding documentation:
https://aws.amazon.com/what-is/database-sharding/
https://www.linode.com/docs/guides/sharded-database/

Misc documentation:
https://emmer.dev/blog/why-you-should-use-uuids-for-your-primary-keys/
https://itnext.io/laravel-the-mysterious-ordered-uuid-29e7500b4f8
https://stackoverflow.com/questions/52414414/best-practices-on-primary-key-auto-increment-and-uuid-in-sql-databases
https://tomharrisonjr.com/uuid-or-guid-as-primary-keys-be-careful-7b2aa3dcb439
https://vladmihalcea.com/uuid-database-primary-key/
https://www.mysqltutorial.org/mysql-uuid/
https://www.percona.com/blog/store-uuid-optimized-way/

Migrate to a non-loser database connection

The database class sucks and it's too verbose with not enough prepared queries.

Longer term: upgrade Sphinx to Manticore

The Sphinx backend is reaching EoL. Manticore is a fork of Sphinx, maybe not a drop-in replacement, but something mostly compatible with existing APIs.

https://manual.manticoresearch.com/Introduction

Links aren't truncated to the path when linking on-site resources in Markdown mode

Thanks eva!

Rethink bonus points and user classes to act more like a hostile bank account

The way that user classes and bonus points currently work: you rank up by having upload activity (or buying upload with BP) and you get a large amount of BP for your seed size. This should be reversed, where user ranks depend on a minimum average seed size and BP are negatively compounded. I know "hostile bank account" is a tautology.

Add OpenAI output to collections

Long term: upgrade to OPSnet/bencode-torrent

https://github.com/OPSnet/bencode-torrent

Soft deletes for torrents

Would be pretty useful: DMCA request comes in, we soft delete it, turns out the request is abusive, nothing is lost.

Adding multiple authors to new requests doesn't work

Everywhere authors can be added to a form, should just be a textarea with entries separated by newlines.

Turn authors (creators) into first-class objects

Currently, the artist tables in the database are all linking tables. Torrent creators / study authors / artists / etc. should be their own object in the logical schema similar to a torrent group, that can be independently indexed and searched.

Move everything over to clean routes

I'm done with /foo.php?bar=baz in the web interface and API. Flight support is coming along. The best part is, it breaks all the leetcode and enforces strict standards.

RSS feeds are broken, need to implement routes

Refactor the JavaScript: IIFE's and event listeners

Currently, a lot of the site JavaScript uses raw functions dumped into a file. This causes problems with Google Closure Compiler, which in its advanced optimization mode, aggressively rewrites function names. This prevents me from using anything more than simple optimizations. The solution is to encapsulate all JavaScript in self-executing arrow functions (() => { /* ... */ })(); whose contents look for certain events such as clicking a widget and such.

Update the install guide

plz be my ai gf

https://github.com/biotorrents/gazelle/blob/openai/app/OpenAI.php

OpenAI API integration for tl;dr torrent group summaries and keywords. Need to get as much production database coverage as possible before my free trial credits expire in April 2023 or so. This is largely done and will be merged into the authentication branch soon.

Pardon the delay! It turns out that rewriting the whole authentication, template, database, and a lot of other stuff became essentially a full application rewrite. Once everything is tested, I'll just merge it, even if it means the forums and wiki might go away for a while.

Site header universal search

Login is broken on dev

Can't log into the dev instance (whoops). Good time to just rewrite the crazy system to use a secure library with sensible paths.

Full API CRUD support

Currently, the API only supports GET requests. It should use controllers for all the major objects of the site with simple methods like create(), read(), update(), and delete().

Full stack upload form rewrite

Make sure the essential staff tools work

The format and archive autofill is still broken on the upload form

Scheduled bonus point purchases

Encourage spending by letting people set and forget some bonus points purchases.

Update collections and requests to use the new Manticore search backend

The data are already indexed every minute. These two areas would let me finally remove the Sphinx dependency and the old query language classes.

Resetting the advanced search parameters doesn't work. Also, searching for a specific leech status finds the results but doesn't display them

Add top10 tags data to the torrent stats page

Fix better.php

Implement database replication in the Gazelle codebase

The new database class should transparently pull data from a replica if the methods single, row, column, or multi are called, and write data to the source if do is called. Both scenarios should support an array of database instances, but realistically, there's only one of each and that's way overkill.

Use a PhpRedis cache backend

Simple PhpRedis class that's not so crazy.

https://github.com/biotorrents/gazelle/blob/development/app/CacheRedis.php

Get rid of PHP 8 warnings, e.g., function calls on array key null

The dev site has been on PHP 8 with all reporting enabled. There are warnings everywhere. This mostly involves adding a lot of null coalescing to make sure that obscure type errors won't occur in later PHP versions.

Grouped freeleech search breaks

Example:
https://biotorrents.de/torrents.php?freetorrent=1&order_by=time&order_way=desc&group_results=1&action=advanced&searchsubmit=1

Thanks eva!

Twitter announce bot

https://twitter.com/bio_ebooks
https://developer.twitter.com/en/docs/twitter-api/tools-and-libraries/v2#php
https://github.com/biotorrents/gazelle/blob/development/app/Announce.php

This has been planned since before Elon Musk bought Twitter. Should be pretty simple with some kind of library. All it needs to do is post a tweet on upload/request creation with the title, maybe 3 tags, and a link to the content.

Rewrite the torrent search backend

The frontend is mostly done, screenshot attached. The backend has always been a mess. I don't like messy logic in the core of the app (the whole point is to efficiently index and serve data), so I'm gonna rewrite the Sphinx backend with something a bit more simple and clean. I've had a look at the library source code, it seems to be okay. With time, we can probably index collections, requests, and maybe even Top 10 history.

Namespace the damn app already

First class collision occurred with OpenAI. Will start with purely static classes (most of them) and work my way toward the other classes. PSR-4 support has been a thing in composer.json for a while, probably deleted because JSON doesn't support comments. No use statements because I like to know what's going on.

Use a cached PDO database wrapper

Simple cached class with universal prepared statements.

https://github.com/biotorrents/gazelle/blob/development/app/Database.php

Use models for at least some core objects

A set of basic models that extend a subset of Laravel Eloquent's features, e.g., find(), save(), delete() (soft), etc., would go a huge way toward cleaning up the code. Each major artifact such as a torrent, group, collection, request, etc., should be its own "thing" that can be loaded, displayed, and manipulated. Also has implications for API CRUD support: load the model, change some stuff, and save it.