Git Product home page Git Product logo

artexin's People

Contributors

n0phx avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

foxbunny mp52

artexin's Issues

Simplify auth

The whole site will be accessed via SSH tunnel, so no need for convoluted auth.

Remove fixtures

Clean up fixtures from the repository (and don't use them in the tests).

Add support for image files

When image URL is passed, generate a HTML page that contains a single image with whatever metadata is available to us.

Simplify page ID

Use either lower-case-lettters-only (a-z) or mix of lower- and upper-case letters (a-zA-Z).

Add web UI for collecting pages

Add a basic UI that takes a list of URLs and starts collecting them into zip files.

The software should check for available space before starting the operation, and should consider each URL as taking up 3MB of space in total during processing. The software should warn the user when sufficient space is not available.

Rationale for 3MB per URL storage capacity comes from the average size of a web page today of 1.5MB and the additional storage needed to create the zip file.

The maximum size per page should be configurable, and there should be a safety margin as well (i.e., X megabytes less than total free space should be taken by the collecting).

To prevent race conditions, there should be a lock file which the web UI should create before attempting to reserve space for its operation. If web UI finds a lock file, it should wait for its removal before attempting to calculate the available storage space.

There should be a file, called 'reservations', that contains the total space used by all processes (rounded up to nearest integer). Each process should:

  • install the lock file first
  • read the file contents
  • calculate physically available space
  • take into account the space registered in the reservations
  • add to reservations the space it needs
  • overwrite the reservations file with updated data
  • release the lock

If the thread cannot reserve enough space, it should release the lock immediately, and inform the user.

Simple tool for testing content

Ideally, we should be able to do something like bundle.py http://whatever.com/or/the/other.html and get an unencrypted zipball. This tool would be used by site owners to test ArtExIn output. It should also generate enough debug data so that meaningful bug reports can be made. Also provide .exe version for Windows users.

Failing test for `urlutils.split()`

======================================================================
FAIL: Doctest: artexin.urlutils.split
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python3.4/doctest.py", line 2193, in runTest
    raise self.failureException(self.format_failure(new.getvalue()))
AssertionError: Failed doctest test for artexin.urlutils.split
  File "/vagrant/artexin/urlutils.py", line 61, in split

----------------------------------------------------------------------
File "/vagrant/artexin/urlutils.py", line 78, in artexin.urlutils.split
Failed example:
    split('http://localhost?foo=bar')
Expected:
    ('http://localhost', '/?foo=bar')
Got:
    '/?foo=bar'


----------------------------------------------------------------------

Support for proper job queue

Instead of processing URLs in child processes, we need a proper job queue (like Celery).

ArtExIn should provide an UI for monitoring the queue and reporting on success or failure.

User that creates a new task should receive email notification on success, and there should be an ability to send out notification to admins as well.

Passing processes as bytestring crashes app

Traceback (most recent call last):
  File "/usr/local/lib/python3.4/dist-packages/bottle.py", line 862, in _handle
    return route.call(**args)
  File "/usr/local/lib/python3.4/dist-packages/bottle.py", line 1729, in wrapper
    rv = callback(*a, **ka)
  File "/srv/code/artexin_webui/auth.py", line 272, in wrapped
    return f(*args, **kwargs)
  File "/srv/code/artexin_webui/app.py", line 80, in collections_process
    max_procs=request.app.config['artex.processes'])
  File "/srv/code/artexin_webui/schema.py", line 141, in process_urls
    results = batch(urls, **kwargs)  # WARNING: batch() has many children!
  File "/srv/code/artexin/batch.py", line 48, in batch
    pool = multiprocessing.Pool(max_procs)
  File "/usr/lib/python3.4/multiprocessing/context.py", line 118, in Pool
    context=self.get_context())
  File "/usr/lib/python3.4/multiprocessing/pool.py", line 160, in __init__
    if processes < 1:
TypeError: unorderable types: str() < int()

Add two-step verification and session handling

  1. Type in email and password
  2. Receive login link via email
  3. Click on the link to access the site

The link should contain a one-time code that expires after 3 minutes.

Sessions should expire when browser window is closed (ideally). Opt-in 'remember me' feature should extend the session for a maximum of 14 days.

Channels

Each channel should have a human-readable name, and a folder name (single word consisting of alphanumerics and underscores). Each channel folder should be created when a new channel is created in the UI, and should contain a .name file that contains the human readable name as single-line string.

Each piece of content should be assigned a channel when processed in a batch. There should be an UI for assigning the channel to each URL as well as choosing the default channel for the entire batch.

Blogger headers get cut off

The simple template on Blogger has post title outside the DIV which contains the article text. This needs to be handled at least for *.blogspot.com URLs.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.