fedora-infra / mirrormanager2 Goto Github PK

Rewrite of the MirrorManager application in Flask and SQLAlchemy

License: GNU General Public License v2.0

Python 86.20% Shell 0.73% HTML 7.91% Perl 3.74% Mako 0.06% CSS 1.36%

mirrormanager2's Introduction

Fedora MirrorManager

MirrorManager2 is a rewrite of mirrormanager using flask and SQLAlchemy.

MirrorManager is the application that keeps track of the nearly 400 public mirrors, and over 300 private mirrors, that carry Fedora, EPEL, and RHEL content, and is used by rpmfusion.org, a third party repository. It automatically selects the "best" mirror for a given user based on a set of fallback heuristics.

The complete MirrorManager functionality requires generate-mirrorlist-cache and mirrorlist-server which can be found at https://github.com/adrianreber/mirrorlist-server.

Mailing list for announcements and discussions: https://lists.fedoraproject.org/archives/list/[email protected]/

Hacking

Using Tinystage

MirrorManager2 authenticates using OpenID Connect. For this, it requires an OIDC provider, and the tiny-stage environment provides that.

Download tiny-stage from Github with:

$ git clone https://github.com/fedora-infra/tiny-stage
$ cd tiny-stage

Now install Ansible, Vagrant and the vagrant-libvirt plugin from the official Fedora repos, and startup tiny-stage:

$ sudo dnf install ansible vagrant vagrant-libvirt vagrant-sshfs
$ vagrant up ipa auth

It takes a bit of time, but tiny-stage will now be installed, with dummy users and groups.

Hacking with Vagrant

Quickly start hacking on mirrormanager2 using the vagrant setup that is included in the repo is super simple.

From within main directory (the one with the Vagrantfile in it) of your git checkout of mirrormanager2, run the vagrant up command to provision your dev environment:

$ vagrant up

When this command is completed (it may take a while) you will be able to the command to start the mirrormanager server:

$ vagrant ssh -c "sudo systemctl restart mirrormanager2"

Once that is running, simply go to https://mirrormanager2.tinystage.test/ in your browser on your host to see your running mirrormanager test instance.

Manual Setup

Here are some preliminary instructions about how to stand up your own instance of mirrormanager2. All required packages for MirrorManager2 are part of Fedora or RHEL/CentOS/EPEL. In the following example we will, however use a virtualenv and a sqlite database and we will install our dependencies from the Python Package Index (PyPI).

Note: this setup still needs tiny-stage running.

First, install development dependencies:

$ sudo dnf install poetry tox

Next, install MirrorManager's dependencies:

$ poetry install

Tinystage has a self-signed certificate, it needs to be added to the known certificates:

$ curl -k https://ipsilon.tinystage.test/ca.crt >> $(poetry run python -m certifi)
$ poetry run oidc-register https://ipsilon.tinystage.test/idp/openidc/ http://localhost:5000/authorize

You should then create your own sqlite database for your development instance of mirrormanager2:

$ poetry run ./createdb.py

If all goes well, you can start a development instance of the server by running:

$ poetry run ./runserver.py

Open your browser and visit http://localhost:5000 to check it out.

Once you made your changes please run the test suite to verify that nothing covered by tests has been broken:

$ tox

mirrormanager2's People

Contributors

Stargazers

Watchers

mirrormanager2's Issues

SAWarning: DELETE statement on table 'host_category_dir' expected to delete 1 row(s); 0 were matched.

Seen on mm-frontend01 error_log:

[:error] [pid 22412] /usr/lib64/python2.7/site-packages/sqlalchemy/orm/persistence.py:117: SAWarning: DELETE statement on table 'host_category_dir' expected to delete 1 row(s); 0 were matched.  Please set confirm_deleted_rows=False within the mapper configuration to prevent this warning.
[:error] [pid 22412]   cached_connections, mapper, table, delete)```

Offer the possibilty to gracefully stop the crawler

Even with all the existing timeouts in the crawler there are situations where the crawler just hangs for some unknown reason. For hours. Right now the only option is to kill the crawler and interrupt all open network connections to the mirrors and the database.

It would be good if the crawler could be shut down gracefully with a signal (or something similar) which would then end all open crawling threads and update the database accordingly.

Handle OSError: [Errno 12] Cannot allocate memory

If situation where not much memory is available the crawler cannot spawn rsync processes to crawl a certain mirror. Currently this fails like this:

WARNING - Failed to run rsync.
Traceback (most recent call last):
  File "/usr/bin/mm2_crawler", line 704, in try_per_category
    result, listing = run_rsync(url, params, logger)
  File "/usr/lib/python2.7/site-packages/mirrormanager2/lib/sync.py", line 46, in run_rsync
    bufsize=-1
  File "/usr/lib64/python2.7/subprocess.py", line 711, in __init__
    errread, errwrite)
  File "/usr/lib64/python2.7/subprocess.py", line 1205, in _execute_child
    self.pid = os.fork()
OSError: [Errno 12] Cannot allocate memory

This is not handled at all and needs proper handling, logging and reporting.

AttributeError: 'list' object has no attribute 'startswith'

In staging:

Traceback (most recent call last):
File "/usr/share/mirrormanager2/mirrorlist_client.wsgi", line 156, in application
results = keep_only_http_results(results)
File "/usr/share/mirrormanager2/mirrorlist_client.wsgi", line 129, in keep_only_http_results
if url.startswith(u'http'):
AttributeError: 'list' object has no attribute 'startswith'

This request triggers the error: https://mirrors.stg.fedoraproject.org/mirrorlist?path=pub/fedora/linux/&redirect=1

Everything using redirect=1 is broken since 0.7.2 has been installed in staging.

Add preferred netblock filter for hosts

In the original mirrormanager it was not possible to add netblocks larger than /16 to a host. This filter does not seem to exist anymore. Right now it seems possible to add RFC1918 networks (which does not make much sense in most cases (except Fedora internal infrastructure systems)) and even 0.0.0.0/0.

Looking at the database we have lot's of private networks added as preferred netblocks to different hosts.

Bad Request 7860

Saw several (but not a large number, around 5 per mirrorlist server) today:

Jun 22 18:10:55 mirrorlist-phx2.phx2.fedoraproject.org python2[20404]: # Bad Request 7860
Jun 22 18:10:55 mirrorlist-phx2.phx2.fedoraproject.org python2[20404]: # {'repo': u'epel-6', 'IP': IP('103.7.56.6'), 'client_ip': u'103.7.56.6', 'metalink': True, 'arch': u'x86_64'}
Jun 22 18:10:55 mirrorlist-phx2.phx2.fedoraproject.org python2[20404]: Traceback (most recent call last):
Jun 22 18:10:55 mirrorlist-phx2.phx2.fedoraproject.org python2[20404]: File "/usr/share/mirrormanager2/mirrorlist_server.py", line 877, in handle
Jun 22 18:10:55 mirrorlist-phx2.phx2.fedoraproject.org python2[20404]: r = do_mirrorlist(d)
Jun 22 18:10:55 mirrorlist-phx2.phx2.fedoraproject.org python2[20404]: File "/usr/share/mirrormanager2/mirrorlist_server.py", line 718, in do_mirrorlist
Jun 22 18:10:55 mirrorlist-phx2.phx2.fedoraproject.org python2[20404]: allhosts, cache, file, pathIsDirectory=pathIsDirectory)
Jun 22 18:10:55 mirrorlist-phx2.phx2.fedoraproject.org python2[20404]: File "/usr/share/mirrormanager2/mirrorlist_server.py", line 423, in append_path
Jun 22 18:10:55 mirrorlist-phx2.phx2.fedoraproject.org python2[20404]: s = hcurl_cache[hcurl_id]
Jun 22 18:10:55 mirrorlist-phx2.phx2.fedoraproject.org python2[20404]: KeyError: 7860
Jun 22 18:15:13 mirrorlist-ibiblio.vpn.fedoraproject.org python2[15569]: # Bad Request 7860
Jun 22 18:15:13 mirrorlist-ibiblio.vpn.fedoraproject.org python2[15569]: # {'repo': u'epel-6', 'IP': IP('66.85.22.1'), 'client_ip': u'66.85.22.1', 'metalink': True, 'arch': u'x86_64'}
Jun 22 18:15:13 mirrorlist-ibiblio.vpn.fedoraproject.org python2[15569]: Traceback (most recent call last):
Jun 22 18:15:13 mirrorlist-ibiblio.vpn.fedoraproject.org python2[15569]: File "/usr/share/mirrormanager2/mirrorlist_server.py", line 877, in handle
Jun 22 18:15:13 mirrorlist-ibiblio.vpn.fedoraproject.org python2[15569]: r = do_mirrorlist(d)
Jun 22 18:15:13 mirrorlist-ibiblio.vpn.fedoraproject.org python2[15569]: File "/usr/share/mirrormanager2/mirrorlist_server.py", line 718, in do_mirrorlist
Jun 22 18:15:13 mirrorlist-ibiblio.vpn.fedoraproject.org python2[15569]: allhosts, cache, file, pathIsDirectory=pathIsDirectory)
Jun 22 18:15:13 mirrorlist-ibiblio.vpn.fedoraproject.org python2[15569]: File "/usr/share/mirrormanager2/mirrorlist_server.py", line 423, in append_path
Jun 22 18:15:13 mirrorlist-ibiblio.vpn.fedoraproject.org python2[15569]: s = hcurl_cache[hcurl_id]
Jun 22 18:15:13 mirrorlist-ibiblio.vpn.fedoraproject.org python2[15569]: KeyError: 7860
Jun 22 18:20:18 mirrorlist-dedicatedsolutions.vpn.fedoraproject.org python2[30150]: # Bad Request 7860
Jun 22 18:20:18 mirrorlist-dedicatedsolutions.vpn.fedoraproject.org python2[30150]: # {'repo': u'epel-6', 'IP': IP('66.85.22.1'), 'client_ip': u'66.85.22.1', 'metalink': True, 'arch': u'x86_64'}
Jun 22 18:20:18 mirrorlist-dedicatedsolutions.vpn.fedoraproject.org python2[30150]: Traceback (most recent call last):
Jun 22 18:20:18 mirrorlist-dedicatedsolutions.vpn.fedoraproject.org python2[30150]: File "/usr/share/mirrormanager2/mirrorlist_server.py", line 877, in handle
Jun 22 18:20:18 mirrorlist-dedicatedsolutions.vpn.fedoraproject.org python2[30150]: r = do_mirrorlist(d)
Jun 22 18:20:18 mirrorlist-dedicatedsolutions.vpn.fedoraproject.org python2[30150]: File "/usr/share/mirrormanager2/mirrorlist_server.py", line 718, in do_mirrorlist
Jun 22 18:20:18 mirrorlist-dedicatedsolutions.vpn.fedoraproject.org python2[30150]: allhosts, cache, file, pathIsDirectory=pathIsDirectory)
Jun 22 18:20:18 mirrorlist-dedicatedsolutions.vpn.fedoraproject.org python2[30150]: File "/usr/share/mirrormanager2/mirrorlist_server.py", line 423, in append_path
Jun 22 18:20:18 mirrorlist-dedicatedsolutions.vpn.fedoraproject.org python2[30150]: s = hcurl_cache[hcurl_id]
Jun 22 18:20:18 mirrorlist-dedicatedsolutions.vpn.fedoraproject.org python2[30150]: KeyError: 7860
Jun 22 18:40:07 mirrorlist-dedicatedsolutions.vpn.fedoraproject.org python2[30150]: # Bad Request 7860
Jun 22 18:40:07 mirrorlist-dedicatedsolutions.vpn.fedoraproject.org python2[30150]: # {'repo': u'epel-5', 'IP': IP('192.245.195.4'), 'client_ip': u'192.245.195.4', 'metalink': False, 'arch': u'x86_64'}
Jun 22 18:40:07 mirrorlist-dedicatedsolutions.vpn.fedoraproject.org python2[30150]: Traceback (most recent call last):
Jun 22 18:40:07 mirrorlist-dedicatedsolutions.vpn.fedoraproject.org python2[30150]: File "/usr/share/mirrormanager2/mirrorlist_server.py", line 877, in handle
Jun 22 18:40:07 mirrorlist-dedicatedsolutions.vpn.fedoraproject.org python2[30150]: r = do_mirrorlist(d)
Jun 22 18:40:07 mirrorlist-dedicatedsolutions.vpn.fedoraproject.org python2[30150]: File "/usr/share/mirrormanager2/mirrorlist_server.py", line 718, in do_mirrorlist
Jun 22 18:40:07 mirrorlist-dedicatedsolutions.vpn.fedoraproject.org python2[30150]: allhosts, cache, file, pathIsDirectory=pathIsDirectory)
Jun 22 18:40:07 mirrorlist-dedicatedsolutions.vpn.fedoraproject.org python2[30150]: File "/usr/share/mirrormanager2/mirrorlist_server.py", line 423, in append_path
Jun 22 18:40:07 mirrorlist-dedicatedsolutions.vpn.fedoraproject.org python2[30150]: s = hcurl_cache[hcurl_id]
Jun 22 18:40:07 mirrorlist-dedicatedsolutions.vpn.fedoraproject.org python2[30150]: KeyError: 7860
Jun 22 18:40:57 mirrorlist-phx2.phx2.fedoraproject.org python2[20404]: # Bad Request 7860
Jun 22 18:40:57 mirrorlist-phx2.phx2.fedoraproject.org python2[20404]: # {'repo': u'epel-5', 'IP': IP('199.30.133.99'), 'client_ip': u'199.30.133.99', 'metalink': False, 'arch': u'x86_64'}
Jun 22 18:40:57 mirrorlist-phx2.phx2.fedoraproject.org python2[20404]: Traceback (most recent call last):
Jun 22 18:40:57 mirrorlist-phx2.phx2.fedoraproject.org python2[20404]: File "/usr/share/mirrormanager2/mirrorlist_server.py", line 877, in handle
Jun 22 18:40:57 mirrorlist-phx2.phx2.fedoraproject.org python2[20404]: r = do_mirrorlist(d)
Jun 22 18:40:57 mirrorlist-phx2.phx2.fedoraproject.org python2[20404]: File "/usr/share/mirrormanager2/mirrorlist_server.py", line 718, in do_mirrorlist
Jun 22 18:40:57 mirrorlist-phx2.phx2.fedoraproject.org python2[20404]: allhosts, cache, file, pathIsDirectory=pathIsDirectory)
Jun 22 18:40:57 mirrorlist-phx2.phx2.fedoraproject.org python2[20404]: File "/usr/share/mirrormanager2/mirrorlist_server.py", line 423, in append_path
Jun 22 18:40:57 mirrorlist-phx2.phx2.fedoraproject.org python2[20404]: s = hcurl_cache[hcurl_id]
Jun 22 18:40:57 mirrorlist-phx2.phx2.fedoraproject.org python2[20404]: KeyError: 7860

Document exactly what is expected from report_mirror

I have been working on a tool I call quick-fedora-mirror: https://pagure.io/quick-fedora-mirror

When it runs, it has a pretty complete picture of the contents of the local mirror already stored away and could easily send those to mirrormanager. Except that I'm not sure exactly what mirrormanager will take, besides a base64 encoded bzip2 compressed version of some data structure.

Could you tell me what it's supposed to look like? It would be really great if I could generate it without using python (because I'm trying to minimize client dependencies).

Graph mirror propagation in collectd

Check our @adrianreber's cool graph here: https://adrian.fedorapeople.org/repomd-propagation.pdf

We should collect those statistics over time and graph them in collectd.

HTTP crawl skips most files

@pypingou your commit ced912a broke HTTP crawling. Most directories are now skipped. I have seen this now on multiple hosts but was only able to track it down today. This needs to be reverted and the problem which this tried to fix needs to be fixed in some other way.

An up to date host which does only provide HTTP for scanning has only the following directories listed as up to date. Only repodata directories and no directories with actual content.

4/i386/debug/repodata
4/i386/repodata
4/ppc/debug/repodata
4/ppc/repodata
4/SRPMS/repodata
4/x86_64/debug/repodata
4/x86_64/repodata
5/i386/debug/repodata
5/i386/repodata
5/ppc/debug/repodata
5/ppc/repodata
5/SRPMS/repodata
5/x86_64/debug/repodata
5/x86_64/repodata
6/i386/debug/repodata
6/i386/repodata
6/ppc64/debug/repodata
6/ppc64/repodata
6/SRPMS/repodata
6/x86_64/debug/repodata
6/x86_64/repodata
7/ppc64/debug/repodata
7/ppc64/repodata
7/SRPMS/repodata
7/x86_64/debug/repodata
7/x86_64/repodata
testing/4/i386/debug/repodata
testing/4/i386/repodata
testing/4/ppc/debug/repodata
testing/4/ppc/repodata
testing/4/SRPMS/repodata
testing/4/x86_64/debug/repodata
testing/4/x86_64/repodata
testing/5/i386/debug/repodata
testing/5/i386/repodata
testing/5/ppc/repodata
testing/5/SRPMS/repodata
testing/5/x86_64/debug/repodata
testing/5/x86_64/repodata
testing/6/i386/debug/repodata
testing/6/i386/repodata
testing/6/ppc64/debug/repodata
testing/6/ppc64/repodata
testing/6/SRPMS/repodata
testing/6/x86_64/debug/repodata
testing/6/x86_64/repodata
testing/7/ppc64/debug/repodata
testing/7/x86_64/debug/repodata
testing/7/x86_64/repodata

Introduce timeout check during run_rsync()

Currently the crawler timeout is ignored if crawling via rsync. The possibility to specify a rsync timeout value on the command-line exists and is used but the rsync timeout options has another meaning:

 --timeout=TIMEOUT
              This option allows you to set a maximum I/O timeout in seconds.
              If no data is transferred for the specified time then rsync will exit.
              The default is 0, which means no timeout.

The crawling, however, should stop with an error if it did not finish during the specified timeout like it does with HTTP or FTP.

See #53 and #55

Add a way to specify you want only https urls from metalink

Some folks want all their traffic to use ssl, we should offer a option to the metalink url that makes it return just https using mirrors. Something like ?method=https or the like in the url.

Along with this we might consider mailing mirror admins and asking if they would update to https where they have https available.

Once enough mirrors offered https we could make it the default perhaps.

Add a script to check all the metalink urls

We had some metalink doom over the last couple days and a couple things were painful:

Rebuilding the umdl took forever (6-8 hours?). The script in #93 was added to try and make that much faster when we know what's wrong.
Checking that all the metalinks actually work after we rebuilt and synced out the pickle. We have to manually ssh around to f22, f21, f20, epel, etc, boxes and try to 'yum update' for both updates and updates-testing. It would be excellent if we had a script that just tried to download the metalink and validate it (somehow). We could then easily loop over all the metalinks and verify them from one box. We could do this automatically from nagios.

Simpler checkin endpoint(s)

I would like to be able to do a checkin from a shell script. I know the current checkin xmlrpc checkin endpoint is there for compatibility with existing report_mirror checkins, but it would be nice to have something that's a bit simpler to use.

I propose to add two API endpoints to api.py: checkin-json and checkin-text. These will take json or a text file and parse that into a structure suitable for passing to read_host_config. Compressed endpoints could be added, too. The json should be trivial to generate from the regular report_mirror data structure that gets pickled. The text file should be easy to generate from a shell script.

Given the fact that you should really shouldn't unpickle untrusted data, I'd figure that this would be something you wouldn't mind adding. I think that doing this should be pretty easy for me, but I'm not sure how I'm going to test it. Any hints as to how you'd want to see this done would be appreciated, of course.

Confirm that the mirror manager badge still works with mm2

There's a badge for being a mirror admin awarded by this script. We should confirm that it still works against mm2 and if not, then fix it.

What should report_mirror check?

Currently report_mirror only transfers the list of directories on the mirror. It does not look at the content at all. This has been a problem in the past multiple times. Mirrors which are out of sync, for whatever reason, are marked as not being up to date by the crawler. The mirror, however, runs report_mirror which reports that all directories are on the mirror and is marked as being up to date. So the state of the mirror is flipping, depending what ran last: crawler or report_mirror.
This leads to mirrors being offered to users which are out of date but only for a few hours per day and that makes it difficult to debug problem reports from users.

For private mirrors, however, report_mirror is kind of required. It is the only information we have from a mirror which cannot be crawled and a broken or out of sync mirror is not that problematic as only a limited number of users will be hitting that mirror.

Possible solutions:

Only allow report_mirror for private mirrors. Which would be a pity as mirrors cannot influence their status. Which is a good thing as we know faster when a mirror is up to date again.
Include information about the files report_mirror found in the directories (maybe only repomd.xml checksums). This would increase I/O on the mirror moderately to dramatically and thus reducing the acceptance of report_mirror.

I am filing this issue mainly as a mean to document the current behaviour and maybe someone has a good idea how this can be solved.

Consider support for distributing docker images in conjunction with Pulp

In particular in conjunction with Pulp crane. @maxamillion and fedora releng are the stakeholders here.

I believe the way the production is going to work is that:

someone will kick off a container build with fedpkg
that will call koji
which will in turn call a container build plugin.
which will in turn call OSBS to build the container
OSBS will then contact Pulp Crane to tell it about the image (and upload it)

... that's all I know at this point. People could then list the images in pulp and download what they want.

A missing piece of the puzzle is that we would like to somehow leverage our mirror infrastructure to distribute those images. It would be nice to have pulp crane redirect download requests to the right place.

Hide certain fields for non-admins

MM1 was not displaying all possible fields to non-admin users. At least site->admin_active and host->admin_active should not be visible and changeable as well as Peer ASNs for example. The complete list of items to hide from non-admin users has to be looked up in the MM1 source code.

Description (help_text) for Host/Site entries missing

MM1 used to have detailed description about what each field the mirror admin has to fill out means.

This has been lost in the MM2 transition as this information has been stored in

https://git.fedorahosted.org/cgit/mirrormanager.git/tree/server/mirrormanager/controllers.py

(see help_text). These descriptions are missing at multiple places.

This has let to mirror admins entering country names instead of "2-letter ISO country code" which makes the mirrors unusable. I have also seen some of the text entries having validators which are missing from MM2.

Heading missing in mirror overview

There is currently no heading in the mirror overview at:

https://admin.fedoraproject.org/mirrormanager/mirrors

Especially the Yes/No for "Internet2" enabled is hard to guess.

Remove support for pickle checkins in a few weeks

Follow up from PR #173.
We want to disable checkin via pickle once we have given enough time for admins to move to json encoding.

New hosts/sites have admin_active disabled

New sites/hosts are created with admin_active disabled. @pypingou, as you have authored 764ac68 maybe you could check how to have admin_active enabled by default.

Move create fedora-install-N functionality to umdl

As discussed in PR #58 the functionality provide by the script mm2_create_install_repo could be integrated into umdl and thus these repositories could be created automagically.

Create unit tests for the mirrorlist server

To verify changes on the mirrorlist code do not break anything unit tests for the mirrorlist server are needed.

Following could be implemented:

Generate a minimal pkl
Start the mirrorlist server
Query mirrorlist/metalink with all possible options

Also see #179

Provide a URL to check for basic functionality

In environments where MM2 is running behind proxies it would be good to provide a URL which makes a simple DB connection and then returns. The MM2 start page (as opposed to the MM1 start page) does a relatively complex database query which can lead to a high memory and CPU consumption just by multiple proxies checking the availability.

There was the proposal to provide a URL like /ping which does exactly this and which then just returns OK.

logs kept forever and outside logrotate

Some recent logging changes have resulted in mirrormanager keeping /var/log/mirrormanager/ logs, but it never seems to expire/remove them and they are not controlled by logrotate, they seem to rotate every day at 00:00UTC outside logrotate.

So, how many of these logs are useful to keep? Should it be configurable?

Should mirrormanager let logrotate handle them instead of doing it internally itself?

Should logs be compressed to save space?

Note that in Fedora ansible I added a logotate for these before I realized they are controlled directly by mirrormanager. We will want to adjust that based on what we do here.

Add possibility to filter mirrorlist by product

MM1 had the ability to filter the mirrors by product. It was possible to show all EPEL mirrors regardless of version and/or architecture. MM2 has this possibility not. This breaks some external links.

http://mirrors.fedoraproject.org/publiclist/EPEL/

http://mirrors.fedoraproject.org/publiclist/Fedora/

Admin_active was disabled on a newly created host

I have seen that on a newly created host (by a non admin user) the admin_active flag was not set. The site was behaving correctly but not the host.

Remove trailing slashes from URLs

MM2 basically requires that all URLs do not have a trailing slash. There are some places in the code which check if there is a trailing slash or not but at least the metalink generation does not verify the URL which lead to:

https://fedorahosted.org/fedora-infrastructure/ticket/4881

If all slashes would be removed when an URL is added it should be resolved.

crawler: KeyError: 'unreadable'

ERROR:crawler:Hosts(43/144):Threads(30/30):1763:mirror.nonstop.co.il:Failure in thread '594d454', host <Host(1763 - mirror.nonstop.co.il)>
Traceback (most recent call last):
File "/usr/bin/mm2_crawler", line 1474, in worker
rc = per_host(session, host.id, options, config)
File "/usr/bin/mm2_crawler", line 1397, in per_host
sync_hcds(session, host, host_category_dirs, options.repodata)
File "/usr/bin/mm2_crawler", line 734, in sync_hcds
stats['unreadable'] += 1
KeyError: 'unreadable'
INFO:crawler:Hosts(43/144):Threads(30/30):1763:mirror.nonstop.co.il:Ending crawl of <Host(1763 - mirror.nonstop.co.il)> with status 3

Sort repomd.xml by time in metalink

It would help consumers of our metalink if the repomd.xml's are sorted by time, with the newest one at the top and the alternates being older in turn.

See discussion in https://fedorahosted.org/fedora-infrastructure/ticket/4866

Drop ftp:// urls from metalinks

ftp causes issues with many firewalls and is in general a horrible protocol. We should stop offerering them in metalink urls.

We might want to check/contact any mirrors that have only ftp urls and ask them to fix it or update to add a http{s} url.

rsyncFilter is missing from MM2

MM1 had an interface called rsyncFilter which MM2 does no longer provide. A small number of people still seem to use it (according to the httpd logs).

See original README for details.

Hide always_up_to_date from non-admins

The field always_up_to_date should only be accessible by mirrormanager admins in the host category.

Only admins should be able to change "Always up to date"

The checkbox "Always up to date" in the "host category" should only be available for admins so that it is not accidentally selected like:

https://fedorahosted.org/fedora-infrastructure/ticket/5454

Make umdl logging better

Right now, umdl spits out:

06/22/2015 05:00:03 AM Starting umdl
06/22/2015 05:00:03 AM has changed: 1434873263 != 1434948558
06/22/2015 05:02:46 AM atomic/21 has changed: 1434836123 != 1434947584
06/22/2015 05:02:47 AM atomic/21/objects/01 has changed: 1434806893 != 1434947584
...
06/22/2015 05:08:54 AM development/rawhide/armhfp/os/Packages/k has changed: 1434720949 != 1434949693
06/22/2015 05:08:56 AM development/rawhide/armhfp/os/Packages/l has changed: 1434720949 != 1434949712
06/22/2015 05:36:26 AM %s: directory already has a repository
06/22/2015 05:36:26 AM %s: directory already has a repository
06/22/2015 10:46:16 AM Ending umdl

But it says "has changed" for every directory and it's not very useful.

It would be nice if we could add:

Anytime it commits to the db to update something say that in the log and what it updated. This would allow us to know when regenerating the pkl would update metalinks.
Perhaps some kind of checkpoint thing, so it logs every 15min what it's doing. Right now there's many hour sections of the logs where it's not clear what it's doing at all.
Weed out these useless 'has changed' for directories that always change.

check metalink alternate repomd code

We need to confirm that the code around handling alternate repomd.xml's is working as expected/desired and that we have the right timeouts on it.

The idea as I understand it is when repomd.xml changes for a repo, we also keep the old repomd.xml as an alternate. Then, after N days we drop that alternate. This allows people to get updates from mirrors with the previous repodata when master has just changed.

However, we have hit situations where we haven't had updates pushes in a while for various reasons (a week or so) and mm seems to drop the current repomd.xml and keep the old/outdated/no longer used alternative one as the only one.

We need to make sure we are dropping the old one, and we need to adjust so that if there's no updates pushes for a while we don't break anything.

crawler: handle atomic tree

With almost 400000 files in the atomic tree it makes not much sense to crawl this part in the same way as the rest of the files. The crawler needs some intelligent way to detect if the atomic tree is up to date or not.

Redirected to mirror without requested file

I encountered this on 2016-05-16:

Go to https://getfedora.org/en/workstation/prerelease/
Click download button pointing to
https://download.fedoraproject.org/pub/fedora/linux/releases/test/24_Beta/Workstation/x86_64/iso/Fedora-Workstation-Live-x86_64-24_Beta-1.6.iso
Get redirected to
http://mirror.vutbr.cz/fedora/releases/test/24_Beta/Workstation/x86_64/iso/Fedora-Workstation-Live-x86_64-24_Beta-1.6.iso
See 404 Not Found

I tried again a couple times and ultimately it sent me to a different mirror which worked fine.

http://mirror.karneval.cz/pub/linux/fedora/linux/releases/test/24_Beta/Workstation/x86_64/iso/Fedora-Workstation-Live-x86_64-24_Beta-1.6.iso

Originally I opened the issue in Fedora Infrastructure trac and was sent here.

Text for "Master rsync server Access Control List IPs" misleading

Each host has the possibility to define "Master rsync server Access Control List IPs" which has following description:

These host DNS names and/or IP addresses will be allowed to rsync from the master rsync/ftp servers. List here all the machines that you use for pulling.

This is not true. Access to the master mirror in Fedora is not granted automatically. Maybe we should hide this part for Fedora's MirrorManager installation. There used to be a way for mirrors to query all hosts listed in this field via /rsync_acls/ to populate the "hosts allow" field of the local rsync daemon. This does not exist in MirrorManager2 any more as it never had many users.

So we could remove this functionality at least from the Fedora templates.

Site-to-Site

There is a table with site-to-site entries in MM2 which is not used at all any more.

The only 'benefit' it has right now, is that some sites cannot be deleted from the database as they are listed in the site-to-site table. Especially old mirrors have entries in that table. Just like private inter-mirror-sync URLs this is a concept which is not used/followed in MM2 at all. The table as well as all code around it could be removed.

Mirrors with only a HTTPS URL cannot be crawled

The function method_pref() in the crawler cannot correctly handle mirrors which only provide a HTTPS URL. The simplest fix would probably be to remove the ':' in u.startswith('http:').

Description (help_text) for add host category URL is missing

Ticket https://fedorahosted.org/fedora-infrastructure/ticket/5016 made clear, that the old MM1 help text is also required for the form 'New Host Category URL'. This private attribute has no relation to the notion of a private mirror and the idea behind it seems to be private URLs which can be used between mirrors to sync content.

It could be removed as the concept of private inter-mirror URLs is not used at all.

mm2_get_global_netblocks has syntax errors

From the monthy crons on mm-backend01:

Subject: Cron mirrormanager@mm-backend01 cd /usr/share/mirrormanager2 && /usr/bin/mm2_get_global_netblocks /var/lib/mirrormanager/global_netblocks.txt
Date: Sun, 1 May 2016 00:48:04 +0000 (UTC)

basename: missing operand
Try 'basename --help' for more information.
/tmp/get_global_netblocks.HkpbiVzv/: Is a directory
basename: missing operand
Try 'basename --help' for more information.
bzcat: Input file /tmp/get_global_netblocks.HkpbiVzv/ is a directory.
short file (empty packet) at zebra-dump-parser/zebra-dump-parser.pl line 98.

Add host to xmlrpc logging

The logging introduced in #175 should also log which host submitted the data via report_mirror.

Offer a mirrorlist interface to query the time the data was generated

The mirrorlist servers (in Fedora) get every hour newly generated data and the process reading the data is restarted (kill -1). This works most of the time, but not always. The mirrorlist server process keeps on running but keeps on serving the old data and it seems further signals (kill -1) do not help to get the mirrorlist server process reading the new data.

If the time of the data generation is stored in the pickle used by the mirrorlist servers a new interface could be provided to query this timestamp which could be used for better monitoring.

Host Category URLs were not correctly removed from the database

See https://fedorahosted.org/fedora-infrastructure/ticket/4850

Changes to the Host Categories and corresponding URLs left MM in a state where the URLs were removed from the Host but still present in the database. Thus the same URL cannot be added again anywhere in MM.

metalink vs website differences

$ curl -sk 'https://mirrors.fedoraproject.org/metalink?repo=updates-released-f20&arch=x86_64' |grep -c http
7

One of these results is a header line. That means https://admin.fedoraproject.org/mirrormanager/mirrors/Fedora/20/x86_64 should show 6 entries.

Instead, it shows 151 active mirrors. Selecting one randomly, http://mirror.pnl.gov/fedora/linux/updates/20/ shows that this mirror does not contain f20 updates.

These two lists should have the same content.

Other, active mirrors, like http://mirror.cc.vt.edu/pub/fedora-archive/fedora/linux/updates/20/x86_64/ do not show up at all.

Adjust for pungi4 repo layout changes

We are now using pungi4 to create development/24 and development/rawhide composes.

The layout has changed up as these are full composes now with all the images and metadata and such in them.

UMDL seems to have picked up on 24, but it did the wrong thing.

https://mirrors.fedoraproject.org/mirrorlist?repo=fedora-24&arch=x86_64&country=global is currently pointing people to the Workstation repo, it should instead point to the Everything repo. Rawhide is still syncing but it likely will do the same.

We may need further adjustments, but short term s/Workstation/Everything/ should be a good start.

Change Create button to Update for existing hosts

The button to update an existing host always says 'Create', even the host already exists. Should change to 'Update'.

Python 3 compatibility

With Fedora working hard to have Python 3 as the default, and EPEL now including python34 for EL7 (and of course, there's Software Collections for Python 3.4 for EL6 and EL7), it would be awesome if MirrorManager2 added Python 3 support.

A cursory check at the requirements files in the repository indicate that there are only four modules that don't indicate Python 3 support in PyPi:

python-fedora
python-openid
python-openid-cla
python-openid-teams

I don't know to what extent these modules are required, but perhaps a first step would be to make MirrorManager2 itself Py3 ready with the six module? If the modules above aren't mandatory to MirrorManager2's functionality, then perhaps splitting the stuff that uses that out as a subpackage to make the core available with Py3 is an option?