Kiwix Kubernetes Cluster
kiwix / operations Goto Github PK
View Code? Open in Web Editor NEWKiwix Kubernetes Cluster
Home Page: http://charts.k8s.kiwix.org/
Kiwix Kubernetes Cluster
Home Page: http://charts.k8s.kiwix.org/
This leads that mirror have even more things to download... and things which we actually don't want to be mirrored.
Kiwix-JS based extensions are available for download at http://download.kiwix.org/release/browsers/
At least some of them cannot be directly installed by users and the way to go is for them to use the platform's App store.
We should offer a README on this folder/listing (probably one per browser) with explanations about what to expect and pointers to at least the Store's link and maybe the repository.
@mossroy @Jaifroid what do you think? If that works for you, just share the texts here and I'll install them.
https://library.kiwix.org/?lang=&q=Mathoverflow Still delivers a version of 2021 although wwe have publiahed a new version yesterday.
Currently, if we have maintenance, the service is just made unavailable for a few minutes or longer. This leads:
It would be better to have a proper system allowing to, on demand, put a service on maintenance to avoid the two bad consequences listed above.
Running the update_mirrorbrain_db.sh
script is very slow.
Some of its log is confusing to me:
2022-03-31T16:54:42.152549791+02:00 Scanning mirror 'dotsrc.org' at zim
2022-03-31T17:10:17.248399094+02:00 Thu Mar 31 14:55:14 2022 dotsrc.org: starting
2022-03-31T17:10:17.248434727+02:00 Thu Mar 31 14:55:16 2022 dotsrc.org: files in 'zim' before scan: 7957
2022-03-31T17:10:17.248443037+02:00 Thu Mar 31 15:10:16 2022 dotsrc.org: scanned 7957 files (8/s) in 900s
2022-03-31T17:10:17.248449878+02:00 Thu Mar 31 15:10:16 2022 dotsrc.org: files to be purged: 0
2022-03-31T17:10:17.248457167+02:00 Thu Mar 31 15:10:16 2022 dotsrc.org: total files in 'zim' after scan: 7957 (delta: 0)
2022-03-31T17:10:17.248463516+02:00 Thu Mar 31 15:10:16 2022 dotsrc.org: purged old files in 0s.
2022-03-31T17:10:17.248469968+02:00 Thu Mar 31 15:10:17 2022 dotsrc.org: done.
2022-03-31T17:10:17.252725480+02:00 Completed in 15.3 minutes
2022-03-31T17:10:18.447814836+02:00 Scanning mirror 'dotsrc.org' at zim/wikivoyage
2022-03-31T17:10:58.248454610+02:00 Thu Mar 31 15:10:50 2022 dotsrc.org: starting
2022-03-31T17:10:58.248491578+02:00 Thu Mar 31 15:10:55 2022 dotsrc.org: files in 'zim/wikivoyage' before scan: 90
2022-03-31T17:10:58.248538672+02:00 Thu Mar 31 15:10:58 2022 dotsrc.org: scanned 90 files (33/s) in 2s
2022-03-31T17:10:58.248548227+02:00 Thu Mar 31 15:10:58 2022 dotsrc.org: files to be purged: 0
2022-03-31T17:10:58.248554563+02:00 Thu Mar 31 15:10:58 2022 dotsrc.org: total files in 'zim/wikivoyage' after scan: 90 (delta: 0)
2022-03-31T17:10:58.248560832+02:00 Thu Mar 31 15:10:58 2022 dotsrc.org: purged old files in 0s.
2022-03-31T17:10:58.248566572+02:00 Thu Mar 31 15:10:58 2022 dotsrc.org: done.
2022-03-31T17:10:58.448330017+02:00 Completed in 25 seconds
zim/
folder?zim/
then individually for each sub folder. Fortunately our tree is mostly flat so it would be done only twice for most but still.ftp.acc.umu.se
:2022-03-31T16:35:18.965149845+02:00 Thu Mar 31 14:32:30 2022 ftp.acc.umu.se: Error 302 occured
This happens for all videos for instance. It seems it is redirecting to another mirror of them but the listing is still on that mirror…
curl -I https://ftp.acc.umu.se/mirror/kiwix.org/zim/videos/aimhi_en_english-website_2022-01.zim
HTTP/1.1 302 Found
Date: Thu, 31 Mar 2022 15:58:24 GMT
Server: Apache/2.4.51 (Unix)
Location: https://laotzu.ftp.acc.umu.se/mirror/kiwix.org/zim/videos/aimhi_en_english-website_2022-01.zim
Might be better to register that other server as an independant (videos-only?) mirror…
I strongly suspect that the stats about campusafrica are still inside.
Just for reference, once cronjob terminates, there's a Warning event emitted on the Pod that says
MountVolume.SetUp failed for volume "kube-api-access-xxxxx" : object "cluster-backup"/"kube-root-ca.crt" not registered
This is due to upstream kubernetes/kubernetes#105204
Looks like geoloc of IPS does not work anymore.
apt update && apt upgrade
k get pods -A -o wide|grep Error
k get pods -A -o wide | pyp -i 'print("\n".join([line for line in l if re.split(r"\s+", line)[4] != "0"]))'
curl -s -H "X-Auth-Token: $SCW_SECRET_KEY" https://api.scaleway.com/k8s/v1/regions/fr-par/clusters/$KIWIX_PROD_CLUSTER|jq ".version,.upgrade_available"
curl -s -H "X-Auth-Token: $SCW_SECRET_KEY" https://api.scaleway.com/k8s/v1/regions/fr-par/versions|jq ".versions[].name"
Note: this is an automatic reminder intended for the assignee(s).
With explanation on how library_zim.xml consumers should use OPDS API instead
This was done in our custom reverse proxy.
apt update && apt upgrade
k get pods -A -o wide|grep Error
k get pods -A -o wide | pyp -i 'print("\n".join([line for line in l if re.split(r"\s+", line)[4] != "0"]))'
curl -s -H "X-Auth-Token: $SCW_SECRET_KEY" https://api.scaleway.com/k8s/v1/regions/fr-par/clusters/$KIWIX_PROD_CLUSTER|jq ".version,.upgrade_available"
curl -s -H "X-Auth-Token: $SCW_SECRET_KEY" https://api.scaleway.com/k8s/v1/regions/fr-par/versions|jq ".versions[].name"
Note: this is an automatic reminder intended for the assignee(s).
df -h / && df -h /data
apt update && apt upgrade
k get pods -A -o wide|grep Error
k get pods -A -o wide | pyp -i 'print("\n".join([line for line in l if re.split(r"\s+", line)[4] != "0"]))'
curl -s -H "X-Auth-Token: $SCW_SECRET_KEY" https://api.scaleway.com/k8s/v1/regions/fr-par/clusters/$KIWIX_PROD_CLUSTER|jq ".version,.upgrade_available"
curl -s -H "X-Auth-Token: $SCW_SECRET_KEY" https://api.scaleway.com/k8s/v1/regions/fr-par/versions|jq ".versions[].name"
Note: this is an automatic reminder intended for the assignee(s).
df -h / && df -h /data
apt update && apt upgrade
k get pods -A -o wide|grep Error
k get pods -A -o wide | pyp -i 'print("\n".join([line for line in l if re.split(r"\s+", line)[4] != "0"]))'
curl -s -H "X-Auth-Token: $SCW_SECRET_KEY" https://api.scaleway.com/k8s/v1/regions/fr-par/clusters/$KIWIX_PROD_CLUSTER|jq ".version,.upgrade_available"
curl -s -H "X-Auth-Token: $SCW_SECRET_KEY" https://api.scaleway.com/k8s/v1/regions/fr-par/versions|jq ".versions[].name"
Note: this is an automatic reminder intended for the assignee(s).
Nice to have: test kiwix-serve directly
All our node servers are IPv6 ready and were communicating fine with it.
It seems that since installing/running kubelet, IPv6 communication between those servers is not working anymore.
Might be because of the iptables rules that the node creates or the kilo network implementation.
It's not a requirement for it to work at this time but we should at least understand exactly how and why.
$ curl -I https://wiki.openzim.org
HTTP/2 302
date: Sat, 09 Apr 2022 14:08:22 GMT
content-type: text/html
content-length: 145
location: http://wiki.openzim.org/wiki/
strict-transport-security: max-age=15724800; includeSubDomains
See https://www.mediawiki.org/wiki/Manual:HTTPS#Running_a_HTTPS-only_wiki
df -h / && df -h /data
apt update && apt upgrade
k get pods -A -o wide|grep Error
k get pods -A -o wide | pyp -i 'print("\n".join([line for line in l if re.split(r"\s+", line)[4] != "0"]))'
curl -s -H "X-Auth-Token: $SCW_SECRET_KEY" https://api.scaleway.com/k8s/v1/regions/fr-par/clusters/$KIWIX_PROD_CLUSTER|jq ".version,.upgrade_available"
curl -s -H "X-Auth-Token: $SCW_SECRET_KEY" https://api.scaleway.com/k8s/v1/regions/fr-par/versions|jq ".versions[].name"
Note: this is an automatic reminder intended for the assignee(s).
apt update && apt upgrade
k get pods -A -o wide|grep Error
k get pods -A -o wide | pyp -i 'print("\n".join([line for line in l if re.split(r"\s+", line)[4] != "0"]))'
curl -s -H "X-Auth-Token: $SCW_SECRET_KEY" https://api.scaleway.com/k8s/v1/regions/fr-par/clusters/$KIWIX_PROD_CLUSTER|jq ".version,.upgrade_available"
curl -s -H "X-Auth-Token: $SCW_SECRET_KEY" https://api.scaleway.com/k8s/v1/regions/fr-par/versions|jq ".versions[].name"
Note: this is an automatic reminder intended for the assignee(s).
In a generic way as all projects are backed-up similarly. Should include both files and databases instructions
df -h / && df -h /data
apt update && apt upgrade
k get pods -A -o wide|grep Error
k get pods -A -o wide | pyp -i 'print("\n".join([line for line in l if re.split(r"\s+", line)[4] != "0"]))'
curl -s -H "X-Auth-Token: $SCW_SECRET_KEY" https://api.scaleway.com/k8s/v1/regions/fr-par/clusters/$KIWIX_PROD_CLUSTER|jq ".version,.upgrade_available"
curl -s -H "X-Auth-Token: $SCW_SECRET_KEY" https://api.scaleway.com/k8s/v1/regions/fr-par/versions|jq ".versions[].name"
Note: this is an automatic reminder intended for the assignee(s).
Every 5d or so, metrics pod gets evicted.
Reason: Evicted
Message: Pod The node had condition: [DiskPressure].
The node itself has plenty (50GB+) free space though. I am not sure exactly but I believe this may be related to the large disk storage used inside the container, in non-volume-declared locations, thus not mounted.
Here's a list of some space-consuming locations, after about 1h.
--- /root --------------------------------------
7.5 GiB [##########] /.perceval
7.5 GiB [######### ] /.graal
--- /var/lib ------------------------------------
879.3 MiB [##########] /elasticsearch
122.2 MiB [# ] /mysql
--- /home ------------------------------------
719.4 MiB [##########] /grimoirelab
--- /logs -------------------------------------
99.4 MiB [##########] all.log
Only elasticsearch is mounted on the host. I believe the log is what gets out of control. This should probably involve fixing the metrics image as well.
There are several providers in this field and I haven't invested time in comparing them but I like Grafana Loki:
$ wget http://download.openzim.org/release/zim-tools/zim-tools_linux-x86_64.tar.gz
--2022-04-13 12:47:23-- http://download.openzim.org/release/zim-tools/zim-tools_linux-x86_64.tar.gz
Resolving download.openzim.org (download.openzim.org)... 51.159.108.60
Connecting to download.openzim.org (download.openzim.org)|51.159.108.60|:80... connected.
HTTP request sent, awaiting response... 404 Not Found
2022-04-13 12:47:23 ERROR 404: Not Found.
kelson@camber:/tmp$ curl -I http://download.openzim.org/release/zim-tools/zim-tools_linux-x86_64.tar.gz
HTTP/1.1 404 Not Found
Date: Wed, 13 Apr 2022 10:47:29 GMT
Content-Type: text/html
Content-Length: 153
Connection: keep-alive
Can Kiwix consider helping QA Automation?
As requested by @deldesir in Haiti who wants to help modernize QA with CI scripts to help with CI integration testing using kiwix-tools, libzim and others — to surface critical issues efficiently for https://Internet-in-a-Box.org field communities and others!
That's CI in every sense (Continuous Improvement not just Continuous Integration !) allowing ongoing semi-automated smoke testing & functional testing of https://download.kiwix.org/nightly and https://download.openzim.org/nightly daily (i.e. nightly) builds.
QUESTION: Could this be solved similar to the same way Kiwix's "release" channel solves this today, below...?
EXAMPLE: https://www.kiwix.org/en/downloads/kiwix-serve/ links to the latest release builds such as...
(So that manual work digging up version numbers like 3.3.0-1 and dates like 2022-10-04 is completely avoided — allowing community QA feedback loops to become MUCH tighter & MUCH more efficient !)
SUMMARY: The request is that Kiwix consider allowing QA Test Automation — by assisting with permalink direct links to nightly builds — so the "latest nightly" channel can become much more useful for Community QA / Test Automation 🙏
CLARIFICATION: Nightly builds break regularly and that is perfectly normal & understandable! Anybody who takes QA seriously knows this is perfectly normal on dev branches! No Worries At All
ASIDE: WordPress solves this same need for permalinks in a slightly different way, with URL Redirects from things like...
https://wordpress.org/latest.zip
Which today happens to redirect to...
https://wordpress.org/wordpress-6.0.2.zip
(i.e. just like file-level symlinks / permalinks, but redirecting to another URL showing the actual version number in the URL right away! Either solution works great [filesystem-level or URL-level permalinks] if Kiwix can hopefully consider either idea!)
Based on https://github.com/kiwix/container-images/tree/master/bittorrent-tracker-docker and with both UDP+TCP
With the new library deployed, our library.xml should be generated by a new (python) script.
Due to upstream openzim/python-libzim#139 we can not use the output of the script as the metadata are truncated for a number of books.
We are thus now serving a non-evolving library
df -h / && df -h /data
apt update && apt upgrade
k get pods -A -o wide|grep Error
k get pods -A -o wide | pyp -i 'print("\n".join([line for line in l if re.split(r"\s+", line)[4] != "0"]))'
curl -s -H "X-Auth-Token: $SCW_SECRET_KEY" https://api.scaleway.com/k8s/v1/regions/fr-par/clusters/$KIWIX_PROD_CLUSTER|jq ".version,.upgrade_available"
curl -s -H "X-Auth-Token: $SCW_SECRET_KEY" https://api.scaleway.com/k8s/v1/regions/fr-par/versions|jq ".versions[].name"
Note: this is an automatic reminder intended for the assignee(s).
df -h / && df -h /data
apt update && apt upgrade
k get pods -A -o wide|grep Error
k get pods -A -o wide | pyp -i 'print("\n".join([line for line in l if re.split(r"\s+", line)[4] != "0"]))'
curl -s -H "X-Auth-Token: $SCW_SECRET_KEY" https://api.scaleway.com/k8s/v1/regions/fr-par/clusters/$KIWIX_PROD_CLUSTER|jq ".version,.upgrade_available"
curl -s -H "X-Auth-Token: $SCW_SECRET_KEY" https://api.scaleway.com/k8s/v1/regions/fr-par/versions|jq ".versions[].name"
Note: this is an automatic reminder intended for the assignee(s).
df -h / && df -h /data
apt update && apt upgrade
k get pods -A -o wide|grep Error
k get pods -A -o wide | pyp -i 'print("\n".join([line for line in l if re.split(r"\s+", line)[4] != "0"]))'
curl -s -H "X-Auth-Token: $SCW_SECRET_KEY" https://api.scaleway.com/k8s/v1/regions/fr-par/clusters/$KIWIX_PROD_CLUSTER|jq ".version,.upgrade_available"
curl -s -H "X-Auth-Token: $SCW_SECRET_KEY" https://api.scaleway.com/k8s/v1/regions/fr-par/versions|jq ".versions[].name"
Note: this is an automatic reminder intended for the assignee(s).
In https://wiki.kiwix.org/wiki/Content_in_all_languages/de, it should be for example:
https://download.kiwix.org/zim/wikipedia_en_all_maxi.zim.torrent
in place of
https://download.kiwix.org/zim/wikipedia_en_all_maxi_2022-05.zim.torrent
AFAIK this problem impacts all links, not only the BitTorrent one
apt update && apt upgrade
k get pods -A -o wide|grep Error
k get pods -A -o wide | pyp -i 'print("\n".join([line for line in l if re.split(r"\s+", line)[4] != "0"]))'
curl -s -H "X-Auth-Token: $SCW_SECRET_KEY" https://api.scaleway.com/k8s/v1/regions/fr-par/clusters/$KIWIX_PROD_CLUSTER|jq ".version,.upgrade_available"
curl -s -H "X-Auth-Token: $SCW_SECRET_KEY" https://api.scaleway.com/k8s/v1/regions/fr-par/versions|jq ".versions[].name"
Note: this is an automatic reminder intended for the assignee(s).
apt update && apt upgrade
k get pods -A -o wide|grep Error
k get pods -A -o wide | pyp -i 'print("\n".join([line for line in l if re.split(r"\s+", line)[4] != "0"]))'
curl -s -H "X-Auth-Token: $SCW_SECRET_KEY" https://api.scaleway.com/k8s/v1/regions/fr-par/clusters/$KIWIX_PROD_CLUSTER|jq ".version,.upgrade_available"
curl -s -H "X-Auth-Token: $SCW_SECRET_KEY" https://api.scaleway.com/k8s/v1/regions/fr-par/versions|jq ".versions[].name"
Note: this is an automatic reminder intended for the assignee(s).
df -h / && df -h /data
apt update && apt upgrade
k get pods -A -o wide|grep Error
k get pods -A -o wide | pyp -i 'print("\n".join([line for line in l if re.split(r"\s+", line)[4] != "0"]))'
curl -s -H "X-Auth-Token: $SCW_SECRET_KEY" https://api.scaleway.com/k8s/v1/regions/fr-par/clusters/$KIWIX_PROD_CLUSTER|jq ".version,.upgrade_available"
curl -s -H "X-Auth-Token: $SCW_SECRET_KEY" https://api.scaleway.com/k8s/v1/regions/fr-par/versions|jq ".versions[].name"
Note: this is an automatic reminder intended for the assignee(s).
Delete everything and remove credit card
At least the blocking system doesn't seem to work off actual User's IPs as blocking one user's IP would block others.
It's probably a reverse proxy issue but given MW doesn't display IPs, it's not confirmed.
As a consequence the size of this directory grows without any end. Mirrors start to complain about that.
df -h / && df -h /data
apt update && apt upgrade
k get pods -A -o wide|grep Error
k get pods -A -o wide | pyp -i 'print("\n".join([line for line in l if re.split(r"\s+", line)[4] != "0"]))'
curl -s -H "X-Auth-Token: $SCW_SECRET_KEY" https://api.scaleway.com/k8s/v1/regions/fr-par/clusters/$KIWIX_PROD_CLUSTER|jq ".version,.upgrade_available"
curl -s -H "X-Auth-Token: $SCW_SECRET_KEY" https://api.scaleway.com/k8s/v1/regions/fr-par/versions|jq ".versions[].name"
Note: this is an automatic reminder intended for the assignee(s).
Image serving drives is a node app (Surfer) that's not efficient for serving large files. We've also had download errors on super large ka-like video files.
A simple fix would be to add an nginx companion to surfer that would serve static files (out of _admin and _webdav).
df -h / && df -h /data
apt update && apt upgrade
k get pods -A -o wide|grep Error
k get pods -A -o wide | pyp -i 'print("\n".join([line for line in l if re.split(r"\s+", line)[4] != "0"]))'
curl -s -H "X-Auth-Token: $SCW_SECRET_KEY" https://api.scaleway.com/k8s/v1/regions/fr-par/clusters/$KIWIX_PROD_CLUSTER|jq ".version,.upgrade_available"
curl -s -H "X-Auth-Token: $SCW_SECRET_KEY" https://api.scaleway.com/k8s/v1/regions/fr-par/versions|jq ".versions[].name"
Note: this is an automatic reminder intended for the assignee(s).
All containers' logs are stored on their node's filesystem, growing forever.
This can quickly become problematic.
To implement this, we need to define how much log we want to keep.
@kelson42 what do you think?
logrotate man ; we'll have to configure log for each container (as that's what we get: 1 file per container)
But only the catalogue part
Tested on https://wiki.kiwix.org with https://www.whatsmyip.org/http-compression-test/
AFAIK this is a regression in the reverse proxy.
Uptime robot reports a daily downtime of about 12mn around 4am everyday. It is reported as a ConnectionTimeout.
Pods were running.
It seems to match the end of the matomo-db backup cron job.
To avoid getting a full disk or a full network usage inadvertantly. A mirror was taking all the bandwidth... and we did not remark it.
I can't find a way to load developer.mozilla.org_en_all_2022-03.zim
in library.kiwix.org. It appears to be missing, but it is available in download.kiwix.org (zimit directory).
Can it be added? It's very useful!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.