Git Product home page Git Product logo

fanzub's Introduction

fanzub

This repository contains the source code for Fanzub.com, an Usenet search engine for Japanese media. I don't really download much these days (partially due to watching more and more anime on DVD instead of as fansub) and have kind of lost interest in the site.

For this reason I've decided to make the source code of the site available through Github. Feel free to fork it and start your own Fanzub-like site. Don't expect me to respond to pull requests though.

To get this code working you need some experience as a (PHP) web developer and Linux server admin, as there is no manual beyond this readme. I'm willing to answer questions about the code, but I won't hold your hand all the way. The source code includes my own crappy attempt at a PHP framework as well as lots of regular expression magic, so beware.

Requirements

  • PHP 5+
  • MySQL 5+
  • Memcached
  • Sphinx Search
  • Adequate versions of above software are included with Ubuntu 12.04 LTS, newer software will probably work too (no guarantees)
  • The database dump is approximately 10 GB and has a table with 10 million rows. You probably don't want to try to host the site on shared hosting, use at least a VPS instead with enough space for both the SQL dump and the restored database.

Configuration

  • Apache
    The Apache (lib/fanzub-apache2.conf) is pretty straight forward except for several "alias" commands that allow URLs like fanzub.com/help without a trailing .php extension. Adapting the configuration for nginx or other webserver shouldn't be difficult.

  • Sphinx Search
    Any search operation on Fanzub requires Sphinx Search to function. You can find the config file in lib/sphinx-config.conf, be sure to set the database password here too and fix the paths if necessary. Don't forget to configure cron jobs to update the Sphinx Search index frequently: I used the Ubuntu defaults (/etc/cron.d/sphinxsearch) of a daily index rebuild and updates every 5 minutes.

  • Usenet servers
    If you want index new posts you'll need one or more Usenet providers to download headers from. I used four: Hitnews, Astraweb, Giganews and Newshosting. You can certainly use less (even just one) if you want. Out of the four Astraweb might be a good choice as they offer block accounts which means you don't need to pay a monthly subscription (and headers typically don't count towards your download limit). The configuration file for the servers is lib/usenet.ini.php

  • File permissions
    The following folders (and contents) need to be writeable by the same user as the webserver (on Debian/Ubuntu this is the "www-data" user):
    data/
    www/logs/

  • PHP exceptions
    Exceptions are stored in a small SQLite database in www.fanzub.com/data/journal.db. Make sure this file exists and is writeable by the webserver. You can create the database using the journal.sql file:
    sqlite3 journal.db < ../sql/journal.sql

  • Cron jobs
    If you want to index new posts you need to run one script (scripts/headers) often. Make sure this file is executable. See scripts/fanzub.cron for an example. You may need to change it to reflect the number of servers you use and to prevent high load on your server.

  • Database
    The database schema can be found in sql/fanzub-schema.sql. The database data dump can be downloaded from: https://fanzub.com/dump/. The data dump contains all tables except the "downloads" table, which is related to a feature I never finished and contains hashed IP addresses (hence why I'm not including it). To prevent errors you might first want to restore the fanzub-schema.sql file before restoring the data dump so that your database will include an empty "downloads" table.

    As mentioned before, please note that the database is huge. You'll need at least 30~50 GB free space to restore it (which will take a while).

fanzub's People

Contributors

fanzub avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.