Git Product home page Git Product logo

ifarchive-unbox's People

Contributors

curiousdannii avatar dfabulich avatar erkyrath avatar

Stargazers

 avatar

Watchers

 avatar  avatar

ifarchive-unbox's Issues

Can't handle zipped files with [, ], or ^ in the filename

You know how I said that I'd leave

await exec(`unzip -p ${zip_path} '${escape_shell_single_quoted(file_path)}' | file -i -`)

as-is until it caused a problem? It causes a problem.

In the logs, I see two examples:

Error: unzip|file error: caution: filename not matched:  platypus/options/Icon^M
Error: unzip|file error: caution: filename not matched:  4th1hrComp/agent_4F[1].A.taf

I believe both are caused by the shell getting confused by filenames.

I'm not interested in playing whack-a-mole with shell escapes. We need to use execFile().

(Reading the data and then writing it into a separate execFile('file') is okay.)

Rewrite links in HTML

We can't catch everything so server redirects are still essential, but we can't rewrite basic HTML links, script inclusions, image sources, etc, so that they point to the main domain not the subdomain.

Add script to purge file from cache

Make a CLI script that

  1. gets the list of individual file URLs
  2. shuts down the docker containers
  3. purges the local cache entries (for both the app and nginx)
  4. asks cloudflare to purge its cache
  5. starts the containers

Tar and unzip seem to fail sometimes

list_contents() is failing on some files. Examples that I see:

infocom/compilers/inform6/library/old/inform_library61.tar.gz

Error: tar error: tar: A lone zero block at 533

games/pc/hallowee.zip

Error: Command failed: unzip -Z1 /home/data/cache/2obskzspcc.zip warning [/home/data/cache/2obskzspcc.zip]: 128 extra bytes at beginning or within zipfile (attempting to process anyway)

In both cases, messages appear on stderr but the files unpack correctly anyhow.

I think the correct path is to rely on the exit status rather than stderr. Messages on stderr should be logged, but should not throw errors.

There's a nuisance factor in that tar's exit status is 0 for success, 1 for error. unzip has a big list of exit statuses (see man page); it boils down to 0 for success, 1 for success-with-warnings, higher values for error. So we have to check those values separately for tar vs unzip.

Get file contents in batches when app starts

When the app first starts it spawns zip processes to get all the contents of all files in the cache. If there are lots of cached files, the processes fail. Could be running out of memory or something?

The app starts when there are fewer (80 works), so I'm guessing that spawning the zip processes in batches will work. The server is only single core anyway, so while a little bit of parallel processing might help, it's not like running all of these at once was helping in the first place. It was just simple code to Promise.all(files.map(...))

Semi-smart "Start" button?

When showing a file list, if there's an index.html, we could have a prominent "Start" button that redirects to it. Similarly if there's exactly one .html file.

IFDB won't need this, but it would smooth out the experience of ifarchive.org links.

Set up caching

  • Set caching headers
  • Set up nginx cache

I'm not sure what a good caching time is - 1 day? more?

Try Cache-control: no-transform for Cloudflare

Cloudflare doesn't compress any of our IF storyfile formats, which is non-ideal, and there doesn't seem to be any way to add custom types to their compression list.

But adding a Cache-control: no-transform header might result in Cloudflare serving our gzipped files. We wouldn't get to take advantage of Cloudflare's brotli compression, but if it works that would definitely be a worthwhile trade-off.

Failure if path in the zip to be opened contains a space

Compression

Add compression support. But in Ngnix or node?

Way to indicate that Master-Index.xml has changed

Having thought about #48 for a while, I think it is worth having a way for the Archive to push a "please refetch" message to Unbox.

This will make life easier for the volunteers; they won't have to think about a five-minute polling delay.

I don't think this has to involve cache headers at all. Just a request we can make that triggers check_for_update() immediately. (The request doesn't have to wait for check_for_update() to finish though.)

I want at least a little bit of mischief protection, so this request should be a POST with a shared secret key in the form field. On the Archive side this will be launched from curl or wget running as root, immediately after a new Master-Index.xml is written.

Incorrect index links to files whose names contain `#`

https://unbox.ifarchive.org/?url=/if-archive/games/pc/spanish/zoo.zip

This zip file contains files where the name contains #, e.g. M#ROCKY.1.

To repro: Go to https://unbox.ifarchive.org/?url=/if-archive/games/pc/spanish/zoo.zip and click on the link to M#ROCKY.1

Actual: The link points to https://unbox.ifarchive.org/0impc5w62r/M#ROCKY.1 i.e. https://unbox.ifarchive.org/0impc5w62r/M with a "fragment identifier" of #ROCKY.1

Expected: https://unbox.ifarchive.org/?url=/if-archive/games/pc/spanish/zoo.zip should link to https://unbox.ifarchive.org/0impc5w62r/M%23ROCKY.1 (which does work)

Tar paths starting with ./ confuse the app

Depending on how a tar file was created, the tar -tf output might look like

./README.txt
./image.jpeg

list_contents() has no problem with this, and the index page gets generated with links to /HASH/./README.txt etc. However, because of browser URL resolution, when the user clicks on the link, the request comes in as /HASH/README.txt. The contents list does not contain README.txt so we return an error.

I can see a couple of ways to deal with this:

  • list_contents() could strip initial ./ off the path.
  • When looking up the file (the details.contents.indexOf(file_path) call in app.js), we could do a fallback check for './'+file_path if the first lookup fails.

(This is a rare problem -- low priority. I first noticed it with http://ifarchive.org/if-archive/games/pc/mansion-19.2.tar.gz . That was when testing my fix for #38 , so you won't be able to observe this until that fix it in. I haven't hunted for other cases.)

Support .tgz

And any other prevalent compressed files in the archive?

War of the Willows percent encoding

Enable log rotating

I see that when you do docker-compose up --build, log info goes to stdout. This should go to /var/log/unbox/unbox.log.

Then we'd have to set up logrotate. Except I don't know exactly how to do that. For Apache, logrotate is configured to do "/etc/init.d/apache2 reload" after rotation so that the server closes and reopens its logging file handle. What is the equivalent here?

find endsWith returns incorrect results

https://unbox.ifarchive.org/?url=https%3A%2F%2Fifarchive.org%2Fif-archive%2Fgames%2Fspringthing%2F2014%2FBearCreek.zip&find=Bear%20Creek.gblorb

This zip file contains two similar file names:

  • BearCreek/Bear Creek.gblorb
  • __MACOSX/BearCreek/._Bear Creek.gblorb

Expected: Since only the first file, Bear Creek.gblorb, matches the find parameter exactly, find should match that and redirect to it.
Actual: Find thinks that this is an ambiguous case and suggests both files as options

Mobile styles

Even though I'm using the same stylesheet as the main archive site, it is non responsive on mobile. Might have to change how the page is structured?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.