Comments (8)
@Popolechien I think I have a ticket for that somewhere... but we need definitly to setup an quality insurance system. The idea is to add this validation step one time the files are uploaded to the warehouse.
@automactic The docker itself should be really is: monitor a directory, check the new files with zim-check, if returns no error, then move the file to make it really available to download. Otherwise "to be defined".
from zimfarm.
@kelson42, the zimfarm warehouse is a SFTP server. It cannot do stuff like monitor dir and test new files.
What might be a good idea is to introduce the concept of staging. SFTP server move files from workers to staging, then a dedicated monitor will kick off testing jobs for new files in staging. After tests passed, move them to production.
from zimfarm.
@kelson42 How are we planning to test zim files? For situation like above, there doesn't seems to be an obvious way to automate the test
from zimfarm.
@automactic We'll need to have a human step in there. For these wikis I'm also thinking of contacting the mods to ask if they'd be ok with us having a simplified landing page (like we already do on a few Wikipedia).
from zimfarm.
@Popolechien It seems to me quite unrealistic, because of human resource bottleneck, to have a human review of many thousands of new ZIM a month. On the top of that, this is something which can be automated, so for us, probably something we could/should do
@automactic I do not have talked about the "warehouse" container. IMO the warehouse container is fine to receipt the ZIM files from the distributed workers. Just take care that we have a way to easily know if a file is fully uploaded or not on the fs. We need that because one time a file will be uploaded, the "sanity check" container (still to build) will run zim-check
against that file and then move it to final destination. To conclude the warehouse and the sanity-check container will share a Docker volume.
from zimfarm.
@kelson42 Of course not every ZIM, but just the new ones for their very first deployment. I don't know how many new contents we publish yearly, but I'd be surprised at this stage that it's more than a handful.
from zimfarm.
@Popolechien OK, then I do that already.
from zimfarm.
This will be handled outside the zimfarm project. See kiwix/container-images#30
from zimfarm.
Related Issues (20)
- Better explanation for Zimit scraper ZIM "Language" metadata text input
- Filtering recipes by scraper error reason HOT 1
- Clarify whether name or title is too long HOT 1
- Clarify that Title should be max 30 characters long HOT 6
- Enable word wrap in debug report HOT 1
- Missing files HOT 1
- Cleanup after RDBMS migration to PG
- Review all input validations HOT 5
- /schedules/backup/ include `most_recent_task`
- Add "Tyap" language to the language list HOT 3
- Deleting wikipedia_ak_all seems to fail HOT 2
- Two times "Azerbaijani" in the recipe language list HOT 1
- Add new languages for recently-created Wikipedias HOT 2
- Introduce `--customZimLanguage` support in MWoffliner recipes HOT 1
- Illustration seems not always retrieved properly HOT 3
- Task history not sorted HOT 2
- Set nautilus collection param as secret in offliner
- Never totally delete recipes HOT 3
- Zimfarm at youzim.it doesn't show schedule names HOT 15
- Fix `_id` sample value in OpenAPI documentation
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from zimfarm.