iftechfoundation / ifarchive-unbox Goto Github PK
View Code? Open in Web Editor NEWIF Archive Unboxing service
Home Page: https://unbox.ifarchive.org
License: MIT License
IF Archive Unboxing service
Home Page: https://unbox.ifarchive.org
License: MIT License
We should use full timestamps rather than just the day a file was updated, as there's a small chance that could cause issues.
I'm running into a locales issue. Asked about it here: https://stackoverflow.com/q/70388840/2854284
You know how I said that I'd leave
await exec(`unzip -p ${zip_path} '${escape_shell_single_quoted(file_path)}' | file -i -`)
as-is until it caused a problem? It causes a problem.
In the logs, I see two examples:
Error: unzip|file error: caution: filename not matched: platypus/options/Icon^M
Error: unzip|file error: caution: filename not matched: 4th1hrComp/agent_4F[1].A.taf
I believe both are caused by the shell getting confused by filenames.
I'm not interested in playing whack-a-mole with shell escapes. We need to use execFile().
(Reading the data and then writing it into a separate execFile('file') is okay.)
We can't catch everything so server redirects are still essential, but we can't rewrite basic HTML links, script inclusions, image sources, etc, so that they point to the main domain not the subdomain.
Make a CLI script that
list_contents() is failing on some files. Examples that I see:
infocom/compilers/inform6/library/old/inform_library61.tar.gz
Error: tar error: tar: A lone zero block at 533
Error: Command failed: unzip -Z1 /home/data/cache/2obskzspcc.zip warning [/home/data/cache/2obskzspcc.zip]: 128 extra bytes at beginning or within zipfile (attempting to process anyway)
In both cases, messages appear on stderr but the files unpack correctly anyhow.
I think the correct path is to rely on the exit status rather than stderr. Messages on stderr should be logged, but should not throw errors.
There's a nuisance factor in that tar's exit status is 0 for success, 1 for error. unzip has a big list of exit statuses (see man page); it boils down to 0 for success, 1 for success-with-warnings, higher values for error. So we have to check those values separately for tar vs unzip.
When the app first starts it spawns zip processes to get all the contents of all files in the cache. If there are lots of cached files, the processes fail. Could be running out of memory or something?
The app starts when there are fewer (80 works), so I'm guessing that spawning the zip processes in batches will work. The server is only single core anyway, so while a little bit of parallel processing might help, it's not like running all of these at once was helping in the first place. It was just simple code to Promise.all(files.map(...))
When showing a file list, if there's an index.html, we could have a prominent "Start" button that redirects to it. Similarly if there's exactly one .html file.
IFDB won't need this, but it would smooth out the experience of ifarchive.org links.
A file like this is identified as us-ascii, when it really needs to be UTF-8: https://2p287be0si.unbox.ifarchive.org/2p287be0si/IFComp2015/Games/Cape/dist/index.html
It doesn't get identified as UTF-8 because the HTML page itself doesn't contain any non-ASCII characters, but the JS does (or higher characters get inserted by JS, I'm not sure exactly which.)
For HTML files we should check for a <meta charset>
tag and use it if present.
I'm not sure what a good caching time is - 1 day? more?
Also document
https://unbox.ifarchive.org/?url=https%3A%2F%2Fifarchive.org%2Fif-archive%2Fgames%2Fagt%2Fnmr1.zip
https://unbox.ifarchive.org/b7iw91c5w/NMR1%20Orignal%20Play%20Distribution/NMR.D$$
500 error
Error: unzip|file error: caution: filename not matched: NMR1 Orignal Play Distribution/NMR.D1518
Cloudflare doesn't compress any of our IF storyfile formats, which is non-ideal, and there doesn't seem to be any way to add custom types to their compression list.
But adding a Cache-control: no-transform
header might result in Cloudflare serving our gzipped files. We wouldn't get to take advantage of Cloudflare's brotli compression, but if it works that would definitely be a worthwhile trade-off.
For the file https://ifarchive.org/if-archive/games/twine/Paintball_Wizard.zip the main HTML file inside it is "Paintball Wizard/info.html". But the link
fails with "NotFoundError: Unknown file: https://ifarchive.org/if-archive/games/twine/Paintball_Wizard.zip". Interestingly, the link
does work.
Add compression support. But in Ngnix or node?
Having thought about #48 for a while, I think it is worth having a way for the Archive to push a "please refetch" message to Unbox.
This will make life easier for the volunteers; they won't have to think about a five-minute polling delay.
I don't think this has to involve cache headers at all. Just a request we can make that triggers check_for_update() immediately. (The request doesn't have to wait for check_for_update() to finish though.)
I want at least a little bit of mischief protection, so this request should be a POST with a shared secret key in the form field. On the Archive side this will be launched from curl
or wget
running as root, immediately after a new Master-Index.xml is written.
https://unbox.ifarchive.org/?url=/if-archive/games/pc/spanish/zoo.zip
This zip file contains files where the name contains #
, e.g. M#ROCKY.1
.
To repro: Go to https://unbox.ifarchive.org/?url=/if-archive/games/pc/spanish/zoo.zip and click on the link to M#ROCKY.1
Actual: The link points to https://unbox.ifarchive.org/0impc5w62r/M#ROCKY.1 i.e. https://unbox.ifarchive.org/0impc5w62r/M with a "fragment identifier" of #ROCKY.1
Expected: https://unbox.ifarchive.org/?url=/if-archive/games/pc/spanish/zoo.zip should link to https://unbox.ifarchive.org/0impc5w62r/M%23ROCKY.1 (which does work)
Invalid HTML documents like the following currently receive a text/plain mime type. There's probably no harm to set them to text/html.
Redirects don't seem to be cached to me, even though I thought by default nginx would cache them.
https://unbox.ifarchive.org/?url=https%3A%2F%2Fifarchive.org%2Fif-archive%2Fgames%2Fspringthing%2F2011%2FMMA.zip links to https://unbox.ifarchive.org/27mjsmtnuh/MMA/Stiffy%20Makane-%20Apocolocyntosis.gblorb but I get a 500 error trying to view it.
RangeError [ERR_CHILD_PROCESS_STDIO_MAXBUFFER]: stdout maxBuffer length exceeded
Depending on how a tar file was created, the tar -tf
output might look like
./README.txt
./image.jpeg
list_contents() has no problem with this, and the index page gets generated with links to /HASH/./README.txt
etc. However, because of browser URL resolution, when the user clicks on the link, the request comes in as /HASH/README.txt
. The contents list does not contain README.txt
so we return an error.
I can see a couple of ways to deal with this:
./
off the path.details.contents.indexOf(file_path)
call in app.js), we could do a fallback check for './'+file_path
if the first lookup fails.(This is a rare problem -- low priority. I first noticed it with http://ifarchive.org/if-archive/games/pc/mansion-19.2.tar.gz . That was when testing my fix for #38 , so you won't be able to observe this until that fix it in. I haven't hunted for other cases.)
From slack:
Hmm. This Ishmael.zip has two index.html at different levels, the top-level one of which is what you want to launch the game. But doing the obvious thing in IFDB causes Unbox to say "BadRequestError: Filename is not unique".
Is there a way round this, with the things that you can do in an IFDB record? I didn't see one reading the unbox spec.
And any other prevalent compressed files in the archive?
Error: Command failed: curl https://ifarchive.org/if-archive/games/competition2020/Games/Jay Schillings Edge of Chaos/Chaos (Offline Play).zip -o /home/data/cache/8yypxconu.zip -s -S -D -
curl: (3) URL using bad/illegal format or missing URL
This URL contains percent-encoded spaces https://ifarchive.org/if-archive/games/competition2015/The%20War%20of%20the%20Willows/willows-1.1.zip
When I copy and paste it to unbox, https://unbox.ifarchive.org/?url=https%3A%2F%2Fifarchive.org%2Fif-archive%2Fgames%2Fcompetition2015%2FThe%2520War%2520of%2520the%2520Willows%2Fwillows-1.1.zip returns 400 “BadRequestError: Unknown file”
This link works, with plusses instead of percent encoding: https://unbox.ifarchive.org/?url=https%3A%2F%2Fifarchive.org%2Fif-archive%2Fgames%2Fcompetition2015%2FThe+War+of+the+Willows%2Fwillows-1.1.zip
But the indexes
page https://ifarchive.org/indexes/if-archive/games/competition2015/The%20War%20of%20the%20Willows/ links to the percent encoded version, so I think unbox should support it, too.
I see that when you do docker-compose up --build
, log info goes to stdout. This should go to /var/log/unbox/unbox.log.
Then we'd have to set up logrotate. Except I don't know exactly how to do that. For Apache, logrotate is configured to do "/etc/init.d/apache2 reload" after rotation so that the server closes and reopens its logging file handle. What is the equivalent here?
https://ifarchive.org/if-archive/games/mini-comps/spanish/retrocomp2004/orfeo2.zip exists
but https://unbox.ifarchive.org/?url=https%3A%2F%2Fifarchive.org%2Fif-archive%2Fgames%2Fmini-comps%2Fspanish%2Fretrocomp2004%2Forfeo2.zip shows an error:
BadRequestError: Unknown file: https://ifarchive.org/if-archive/games/mini-comps/spanish/retrocomp2004/orfeo2.zip
This zip file contains two similar file names:
Expected: Since only the first file, Bear Creek.gblorb
, matches the find
parameter exactly, find should match that and redirect to it.
Actual: Find thinks that this is an ambiguous case and suggests both files as options
Even though I'm using the same stylesheet as the main archive site, it is non responsive on mobile. Might have to change how the page is structured?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.