Git Product home page Git Product logo

mirrormanager2's Introduction

Fedora MirrorManager

MirrorManager2 is a rewrite of mirrormanager using flask and SQLAlchemy.

MirrorManager is the application that keeps track of the nearly 400 public mirrors, and over 300 private mirrors, that carry Fedora, EPEL, and RHEL content, and is used by rpmfusion.org, a third party repository. It automatically selects the "best" mirror for a given user based on a set of fallback heuristics.

The complete MirrorManager functionality requires generate-mirrorlist-cache and mirrorlist-server which can be found at https://github.com/adrianreber/mirrorlist-server.

Mailing list for announcements and discussions: https://lists.fedoraproject.org/archives/list/[email protected]/

Hacking

Using Tinystage

MirrorManager2 authenticates using OpenID Connect. For this, it requires an OIDC provider, and the tiny-stage environment provides that.

Download tiny-stage from Github with:

$ git clone https://github.com/fedora-infra/tiny-stage
$ cd tiny-stage

Now install Ansible, Vagrant and the vagrant-libvirt plugin from the official Fedora repos, and startup tiny-stage:

$ sudo dnf install ansible vagrant vagrant-libvirt vagrant-sshfs
$ vagrant up ipa auth

It takes a bit of time, but tiny-stage will now be installed, with dummy users and groups.

Hacking with Vagrant

Quickly start hacking on mirrormanager2 using the vagrant setup that is included in the repo is super simple.

From within main directory (the one with the Vagrantfile in it) of your git checkout of mirrormanager2, run the vagrant up command to provision your dev environment:

$ vagrant up

When this command is completed (it may take a while) you will be able to the command to start the mirrormanager server:

$ vagrant ssh -c "sudo systemctl restart mirrormanager2"

Once that is running, simply go to https://mirrormanager2.tinystage.test/ in your browser on your host to see your running mirrormanager test instance.

Manual Setup

Here are some preliminary instructions about how to stand up your own instance of mirrormanager2. All required packages for MirrorManager2 are part of Fedora or RHEL/CentOS/EPEL. In the following example we will, however use a virtualenv and a sqlite database and we will install our dependencies from the Python Package Index (PyPI).

Note: this setup still needs tiny-stage running.

First, install development dependencies:

$ sudo dnf install poetry tox

Next, install MirrorManager's dependencies:

$ poetry install

Tinystage has a self-signed certificate, it needs to be added to the known certificates:

$ curl -k https://ipsilon.tinystage.test/ca.crt >> $(poetry run python -m certifi)
$ poetry run oidc-register https://ipsilon.tinystage.test/idp/openidc/ http://localhost:5000/authorize

You should then create your own sqlite database for your development instance of mirrormanager2:

$ poetry run ./createdb.py

If all goes well, you can start a development instance of the server by running:

$ poetry run ./runserver.py

Open your browser and visit http://localhost:5000 to check it out.

Once you made your changes please run the test suite to verify that nothing covered by tests has been broken:

$ tox

mirrormanager2's People

Contributors

abompard avatar adrianreber avatar alfredmyers avatar ausil avatar conan-kudo avatar dependabot[bot] avatar devyanikota avatar keszybz avatar lenkaseg avatar lmacken avatar mdomsch avatar nirik avatar nphilipp avatar puiterwijk avatar pypingou avatar ralphbean avatar renovate[bot] avatar ryanlerch avatar sjenning avatar taranjeet avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mirrormanager2's Issues

Offer the possibilty to gracefully stop the crawler

Even with all the existing timeouts in the crawler there are situations where the crawler just hangs for some unknown reason. For hours. Right now the only option is to kill the crawler and interrupt all open network connections to the mirrors and the database.

It would be good if the crawler could be shut down gracefully with a signal (or something similar) which would then end all open crawling threads and update the database accordingly.

Handle OSError: [Errno 12] Cannot allocate memory

If situation where not much memory is available the crawler cannot spawn rsync processes to crawl a certain mirror. Currently this fails like this:

WARNING - Failed to run rsync.
Traceback (most recent call last):
  File "/usr/bin/mm2_crawler", line 704, in try_per_category
    result, listing = run_rsync(url, params, logger)
  File "/usr/lib/python2.7/site-packages/mirrormanager2/lib/sync.py", line 46, in run_rsync
    bufsize=-1
  File "/usr/lib64/python2.7/subprocess.py", line 711, in __init__
    errread, errwrite)
  File "/usr/lib64/python2.7/subprocess.py", line 1205, in _execute_child
    self.pid = os.fork()
OSError: [Errno 12] Cannot allocate memory

This is not handled at all and needs proper handling, logging and reporting.

AttributeError: 'list' object has no attribute 'startswith'

In staging:

Traceback (most recent call last):
File "/usr/share/mirrormanager2/mirrorlist_client.wsgi", line 156, in application
results = keep_only_http_results(results)
File "/usr/share/mirrormanager2/mirrorlist_client.wsgi", line 129, in keep_only_http_results
if url.startswith(u'http'):
AttributeError: 'list' object has no attribute 'startswith'

This request triggers the error: https://mirrors.stg.fedoraproject.org/mirrorlist?path=pub/fedora/linux/&redirect=1

Everything using redirect=1 is broken since 0.7.2 has been installed in staging.

Add preferred netblock filter for hosts

In the original mirrormanager it was not possible to add netblocks larger than /16 to a host. This filter does not seem to exist anymore. Right now it seems possible to add RFC1918 networks (which does not make much sense in most cases (except Fedora internal infrastructure systems)) and even 0.0.0.0/0.

Looking at the database we have lot's of private networks added as preferred netblocks to different hosts.

See also: https://fedorahosted.org/fedora-infrastructure/ticket/5016

Bad Request 7860

Saw several (but not a large number, around 5 per mirrorlist server) today:

Jun 22 18:10:55 mirrorlist-phx2.phx2.fedoraproject.org python2[20404]: # Bad Request 7860
Jun 22 18:10:55 mirrorlist-phx2.phx2.fedoraproject.org python2[20404]: # {'repo': u'epel-6', 'IP': IP('103.7.56.6'), 'client_ip': u'103.7.56.6', 'metalink': True, 'arch': u'x86_64'}
Jun 22 18:10:55 mirrorlist-phx2.phx2.fedoraproject.org python2[20404]: Traceback (most recent call last):
Jun 22 18:10:55 mirrorlist-phx2.phx2.fedoraproject.org python2[20404]: File "/usr/share/mirrormanager2/mirrorlist_server.py", line 877, in handle
Jun 22 18:10:55 mirrorlist-phx2.phx2.fedoraproject.org python2[20404]: r = do_mirrorlist(d)
Jun 22 18:10:55 mirrorlist-phx2.phx2.fedoraproject.org python2[20404]: File "/usr/share/mirrormanager2/mirrorlist_server.py", line 718, in do_mirrorlist
Jun 22 18:10:55 mirrorlist-phx2.phx2.fedoraproject.org python2[20404]: allhosts, cache, file, pathIsDirectory=pathIsDirectory)
Jun 22 18:10:55 mirrorlist-phx2.phx2.fedoraproject.org python2[20404]: File "/usr/share/mirrormanager2/mirrorlist_server.py", line 423, in append_path
Jun 22 18:10:55 mirrorlist-phx2.phx2.fedoraproject.org python2[20404]: s = hcurl_cache[hcurl_id]
Jun 22 18:10:55 mirrorlist-phx2.phx2.fedoraproject.org python2[20404]: KeyError: 7860
Jun 22 18:15:13 mirrorlist-ibiblio.vpn.fedoraproject.org python2[15569]: # Bad Request 7860
Jun 22 18:15:13 mirrorlist-ibiblio.vpn.fedoraproject.org python2[15569]: # {'repo': u'epel-6', 'IP': IP('66.85.22.1'), 'client_ip': u'66.85.22.1', 'metalink': True, 'arch': u'x86_64'}
Jun 22 18:15:13 mirrorlist-ibiblio.vpn.fedoraproject.org python2[15569]: Traceback (most recent call last):
Jun 22 18:15:13 mirrorlist-ibiblio.vpn.fedoraproject.org python2[15569]: File "/usr/share/mirrormanager2/mirrorlist_server.py", line 877, in handle
Jun 22 18:15:13 mirrorlist-ibiblio.vpn.fedoraproject.org python2[15569]: r = do_mirrorlist(d)
Jun 22 18:15:13 mirrorlist-ibiblio.vpn.fedoraproject.org python2[15569]: File "/usr/share/mirrormanager2/mirrorlist_server.py", line 718, in do_mirrorlist
Jun 22 18:15:13 mirrorlist-ibiblio.vpn.fedoraproject.org python2[15569]: allhosts, cache, file, pathIsDirectory=pathIsDirectory)
Jun 22 18:15:13 mirrorlist-ibiblio.vpn.fedoraproject.org python2[15569]: File "/usr/share/mirrormanager2/mirrorlist_server.py", line 423, in append_path
Jun 22 18:15:13 mirrorlist-ibiblio.vpn.fedoraproject.org python2[15569]: s = hcurl_cache[hcurl_id]
Jun 22 18:15:13 mirrorlist-ibiblio.vpn.fedoraproject.org python2[15569]: KeyError: 7860
Jun 22 18:20:18 mirrorlist-dedicatedsolutions.vpn.fedoraproject.org python2[30150]: # Bad Request 7860
Jun 22 18:20:18 mirrorlist-dedicatedsolutions.vpn.fedoraproject.org python2[30150]: # {'repo': u'epel-6', 'IP': IP('66.85.22.1'), 'client_ip': u'66.85.22.1', 'metalink': True, 'arch': u'x86_64'}
Jun 22 18:20:18 mirrorlist-dedicatedsolutions.vpn.fedoraproject.org python2[30150]: Traceback (most recent call last):
Jun 22 18:20:18 mirrorlist-dedicatedsolutions.vpn.fedoraproject.org python2[30150]: File "/usr/share/mirrormanager2/mirrorlist_server.py", line 877, in handle
Jun 22 18:20:18 mirrorlist-dedicatedsolutions.vpn.fedoraproject.org python2[30150]: r = do_mirrorlist(d)
Jun 22 18:20:18 mirrorlist-dedicatedsolutions.vpn.fedoraproject.org python2[30150]: File "/usr/share/mirrormanager2/mirrorlist_server.py", line 718, in do_mirrorlist
Jun 22 18:20:18 mirrorlist-dedicatedsolutions.vpn.fedoraproject.org python2[30150]: allhosts, cache, file, pathIsDirectory=pathIsDirectory)
Jun 22 18:20:18 mirrorlist-dedicatedsolutions.vpn.fedoraproject.org python2[30150]: File "/usr/share/mirrormanager2/mirrorlist_server.py", line 423, in append_path
Jun 22 18:20:18 mirrorlist-dedicatedsolutions.vpn.fedoraproject.org python2[30150]: s = hcurl_cache[hcurl_id]
Jun 22 18:20:18 mirrorlist-dedicatedsolutions.vpn.fedoraproject.org python2[30150]: KeyError: 7860
Jun 22 18:40:07 mirrorlist-dedicatedsolutions.vpn.fedoraproject.org python2[30150]: # Bad Request 7860
Jun 22 18:40:07 mirrorlist-dedicatedsolutions.vpn.fedoraproject.org python2[30150]: # {'repo': u'epel-5', 'IP': IP('192.245.195.4'), 'client_ip': u'192.245.195.4', 'metalink': False, 'arch': u'x86_64'}
Jun 22 18:40:07 mirrorlist-dedicatedsolutions.vpn.fedoraproject.org python2[30150]: Traceback (most recent call last):
Jun 22 18:40:07 mirrorlist-dedicatedsolutions.vpn.fedoraproject.org python2[30150]: File "/usr/share/mirrormanager2/mirrorlist_server.py", line 877, in handle
Jun 22 18:40:07 mirrorlist-dedicatedsolutions.vpn.fedoraproject.org python2[30150]: r = do_mirrorlist(d)
Jun 22 18:40:07 mirrorlist-dedicatedsolutions.vpn.fedoraproject.org python2[30150]: File "/usr/share/mirrormanager2/mirrorlist_server.py", line 718, in do_mirrorlist
Jun 22 18:40:07 mirrorlist-dedicatedsolutions.vpn.fedoraproject.org python2[30150]: allhosts, cache, file, pathIsDirectory=pathIsDirectory)
Jun 22 18:40:07 mirrorlist-dedicatedsolutions.vpn.fedoraproject.org python2[30150]: File "/usr/share/mirrormanager2/mirrorlist_server.py", line 423, in append_path
Jun 22 18:40:07 mirrorlist-dedicatedsolutions.vpn.fedoraproject.org python2[30150]: s = hcurl_cache[hcurl_id]
Jun 22 18:40:07 mirrorlist-dedicatedsolutions.vpn.fedoraproject.org python2[30150]: KeyError: 7860
Jun 22 18:40:57 mirrorlist-phx2.phx2.fedoraproject.org python2[20404]: # Bad Request 7860
Jun 22 18:40:57 mirrorlist-phx2.phx2.fedoraproject.org python2[20404]: # {'repo': u'epel-5', 'IP': IP('199.30.133.99'), 'client_ip': u'199.30.133.99', 'metalink': False, 'arch': u'x86_64'}
Jun 22 18:40:57 mirrorlist-phx2.phx2.fedoraproject.org python2[20404]: Traceback (most recent call last):
Jun 22 18:40:57 mirrorlist-phx2.phx2.fedoraproject.org python2[20404]: File "/usr/share/mirrormanager2/mirrorlist_server.py", line 877, in handle
Jun 22 18:40:57 mirrorlist-phx2.phx2.fedoraproject.org python2[20404]: r = do_mirrorlist(d)
Jun 22 18:40:57 mirrorlist-phx2.phx2.fedoraproject.org python2[20404]: File "/usr/share/mirrormanager2/mirrorlist_server.py", line 718, in do_mirrorlist
Jun 22 18:40:57 mirrorlist-phx2.phx2.fedoraproject.org python2[20404]: allhosts, cache, file, pathIsDirectory=pathIsDirectory)
Jun 22 18:40:57 mirrorlist-phx2.phx2.fedoraproject.org python2[20404]: File "/usr/share/mirrormanager2/mirrorlist_server.py", line 423, in append_path
Jun 22 18:40:57 mirrorlist-phx2.phx2.fedoraproject.org python2[20404]: s = hcurl_cache[hcurl_id]
Jun 22 18:40:57 mirrorlist-phx2.phx2.fedoraproject.org python2[20404]: KeyError: 7860

Document exactly what is expected from report_mirror

I have been working on a tool I call quick-fedora-mirror: https://pagure.io/quick-fedora-mirror

When it runs, it has a pretty complete picture of the contents of the local mirror already stored away and could easily send those to mirrormanager. Except that I'm not sure exactly what mirrormanager will take, besides a base64 encoded bzip2 compressed version of some data structure.

Could you tell me what it's supposed to look like? It would be really great if I could generate it without using python (because I'm trying to minimize client dependencies).

HTTP crawl skips most files

@pypingou your commit ced912a broke HTTP crawling. Most directories are now skipped. I have seen this now on multiple hosts but was only able to track it down today. This needs to be reverted and the problem which this tried to fix needs to be fixed in some other way.

An up to date host which does only provide HTTP for scanning has only the following directories listed as up to date. Only repodata directories and no directories with actual content.

4/i386/debug/repodata
4/i386/repodata
4/ppc/debug/repodata
4/ppc/repodata
4/SRPMS/repodata
4/x86_64/debug/repodata
4/x86_64/repodata
5/i386/debug/repodata
5/i386/repodata
5/ppc/debug/repodata
5/ppc/repodata
5/SRPMS/repodata
5/x86_64/debug/repodata
5/x86_64/repodata
6/i386/debug/repodata
6/i386/repodata
6/ppc64/debug/repodata
6/ppc64/repodata
6/SRPMS/repodata
6/x86_64/debug/repodata
6/x86_64/repodata
7/ppc64/debug/repodata
7/ppc64/repodata
7/SRPMS/repodata
7/x86_64/debug/repodata
7/x86_64/repodata
testing/4/i386/debug/repodata
testing/4/i386/repodata
testing/4/ppc/debug/repodata
testing/4/ppc/repodata
testing/4/SRPMS/repodata
testing/4/x86_64/debug/repodata
testing/4/x86_64/repodata
testing/5/i386/debug/repodata
testing/5/i386/repodata
testing/5/ppc/repodata
testing/5/SRPMS/repodata
testing/5/x86_64/debug/repodata
testing/5/x86_64/repodata
testing/6/i386/debug/repodata
testing/6/i386/repodata
testing/6/ppc64/debug/repodata
testing/6/ppc64/repodata
testing/6/SRPMS/repodata
testing/6/x86_64/debug/repodata
testing/6/x86_64/repodata
testing/7/ppc64/debug/repodata
testing/7/x86_64/debug/repodata
testing/7/x86_64/repodata

Introduce timeout check during run_rsync()

Currently the crawler timeout is ignored if crawling via rsync. The possibility to specify a rsync timeout value on the command-line exists and is used but the rsync timeout options has another meaning:

 --timeout=TIMEOUT
              This option allows you to set a maximum I/O timeout in seconds.
              If no data is transferred for the specified time then rsync will exit.
              The default is 0, which means no timeout.

The crawling, however, should stop with an error if it did not finish during the specified timeout like it does with HTTP or FTP.

See #53 and #55

Add a way to specify you want only https urls from metalink

Some folks want all their traffic to use ssl, we should offer a option to the metalink url that makes it return just https using mirrors. Something like ?method=https or the like in the url.

Along with this we might consider mailing mirror admins and asking if they would update to https where they have https available.

Once enough mirrors offered https we could make it the default perhaps.

Add a script to check all the metalink urls

We had some metalink doom over the last couple days and a couple things were painful:

  • Rebuilding the umdl took forever (6-8 hours?). The script in #93 was added to try and make that much faster when we know what's wrong.
  • Checking that all the metalinks actually work after we rebuilt and synced out the pickle. We have to manually ssh around to f22, f21, f20, epel, etc, boxes and try to 'yum update' for both updates and updates-testing. It would be excellent if we had a script that just tried to download the metalink and validate it (somehow). We could then easily loop over all the metalinks and verify them from one box. We could do this automatically from nagios.

Simpler checkin endpoint(s)

I would like to be able to do a checkin from a shell script. I know the current checkin xmlrpc checkin endpoint is there for compatibility with existing report_mirror checkins, but it would be nice to have something that's a bit simpler to use.

I propose to add two API endpoints to api.py: checkin-json and checkin-text. These will take json or a text file and parse that into a structure suitable for passing to read_host_config. Compressed endpoints could be added, too. The json should be trivial to generate from the regular report_mirror data structure that gets pickled. The text file should be easy to generate from a shell script.

Given the fact that you should really shouldn't unpickle untrusted data, I'd figure that this would be something you wouldn't mind adding. I think that doing this should be pretty easy for me, but I'm not sure how I'm going to test it. Any hints as to how you'd want to see this done would be appreciated, of course.

What should report_mirror check?

Currently report_mirror only transfers the list of directories on the mirror. It does not look at the content at all. This has been a problem in the past multiple times. Mirrors which are out of sync, for whatever reason, are marked as not being up to date by the crawler. The mirror, however, runs report_mirror which reports that all directories are on the mirror and is marked as being up to date. So the state of the mirror is flipping, depending what ran last: crawler or report_mirror.
This leads to mirrors being offered to users which are out of date but only for a few hours per day and that makes it difficult to debug problem reports from users.

For private mirrors, however, report_mirror is kind of required. It is the only information we have from a mirror which cannot be crawled and a broken or out of sync mirror is not that problematic as only a limited number of users will be hitting that mirror.

Possible solutions:

  • Only allow report_mirror for private mirrors. Which would be a pity as mirrors cannot influence their status. Which is a good thing as we know faster when a mirror is up to date again.
  • Include information about the files report_mirror found in the directories (maybe only repomd.xml checksums). This would increase I/O on the mirror moderately to dramatically and thus reducing the acceptance of report_mirror.

I am filing this issue mainly as a mean to document the current behaviour and maybe someone has a good idea how this can be solved.

Consider support for distributing docker images in conjunction with Pulp

In particular in conjunction with Pulp crane. @maxamillion and fedora releng are the stakeholders here.

I believe the way the production is going to work is that:

  • someone will kick off a container build with fedpkg
  • that will call koji
  • which will in turn call a container build plugin.
  • which will in turn call OSBS to build the container
  • OSBS will then contact Pulp Crane to tell it about the image (and upload it)

... that's all I know at this point. People could then list the images in pulp and download what they want.

A missing piece of the puzzle is that we would like to somehow leverage our mirror infrastructure to distribute those images. It would be nice to have pulp crane redirect download requests to the right place.

Hide certain fields for non-admins

MM1 was not displaying all possible fields to non-admin users. At least site->admin_active and host->admin_active should not be visible and changeable as well as Peer ASNs for example. The complete list of items to hide from non-admin users has to be looked up in the MM1 source code.

Description (help_text) for Host/Site entries missing

MM1 used to have detailed description about what each field the mirror admin has to fill out means.

This has been lost in the MM2 transition as this information has been stored in

https://git.fedorahosted.org/cgit/mirrormanager.git/tree/server/mirrormanager/controllers.py

(see help_text). These descriptions are missing at multiple places.

This has let to mirror admins entering country names instead of "2-letter ISO country code" which makes the mirrors unusable. I have also seen some of the text entries having validators which are missing from MM2.

Create unit tests for the mirrorlist server

To verify changes on the mirrorlist code do not break anything unit tests for the mirrorlist server are needed.

Following could be implemented:

  1. Generate a minimal pkl
  2. Start the mirrorlist server
  3. Query mirrorlist/metalink with all possible options

Also see #179

Provide a URL to check for basic functionality

In environments where MM2 is running behind proxies it would be good to provide a URL which makes a simple DB connection and then returns. The MM2 start page (as opposed to the MM1 start page) does a relatively complex database query which can lead to a high memory and CPU consumption just by multiple proxies checking the availability.

There was the proposal to provide a URL like /ping which does exactly this and which then just returns OK.

logs kept forever and outside logrotate

Some recent logging changes have resulted in mirrormanager keeping /var/log/mirrormanager/ logs, but it never seems to expire/remove them and they are not controlled by logrotate, they seem to rotate every day at 00:00UTC outside logrotate.

So, how many of these logs are useful to keep? Should it be configurable?

Should mirrormanager let logrotate handle them instead of doing it internally itself?

Should logs be compressed to save space?

Note that in Fedora ansible I added a logotate for these before I realized they are controlled directly by mirrormanager. We will want to adjust that based on what we do here.

crawler: KeyError: 'unreadable'

ERROR:crawler:Hosts(43/144):Threads(30/30):1763:mirror.nonstop.co.il:Failure in thread '594d454', host <Host(1763 - mirror.nonstop.co.il)>
Traceback (most recent call last):
File "/usr/bin/mm2_crawler", line 1474, in worker
rc = per_host(session, host.id, options, config)
File "/usr/bin/mm2_crawler", line 1397, in per_host
sync_hcds(session, host, host_category_dirs, options.repodata)
File "/usr/bin/mm2_crawler", line 734, in sync_hcds
stats['unreadable'] += 1
KeyError: 'unreadable'
INFO:crawler:Hosts(43/144):Threads(30/30):1763:mirror.nonstop.co.il:Ending crawl of <Host(1763 - mirror.nonstop.co.il)> with status 3

Drop ftp:// urls from metalinks

ftp causes issues with many firewalls and is in general a horrible protocol. We should stop offerering them in metalink urls.

We might want to check/contact any mirrors that have only ftp urls and ask them to fix it or update to add a http{s} url.

rsyncFilter is missing from MM2

MM1 had an interface called rsyncFilter which MM2 does no longer provide. A small number of people still seem to use it (according to the httpd logs).

See original README for details.

Make umdl logging better

Right now, umdl spits out:

06/22/2015 05:00:03 AM Starting umdl
06/22/2015 05:00:03 AM has changed: 1434873263 != 1434948558
06/22/2015 05:02:46 AM atomic/21 has changed: 1434836123 != 1434947584
06/22/2015 05:02:47 AM atomic/21/objects/01 has changed: 1434806893 != 1434947584
...
06/22/2015 05:08:54 AM development/rawhide/armhfp/os/Packages/k has changed: 1434720949 != 1434949693
06/22/2015 05:08:56 AM development/rawhide/armhfp/os/Packages/l has changed: 1434720949 != 1434949712
06/22/2015 05:36:26 AM %s: directory already has a repository
06/22/2015 05:36:26 AM %s: directory already has a repository
06/22/2015 10:46:16 AM Ending umdl

But it says "has changed" for every directory and it's not very useful.

It would be nice if we could add:

  • Anytime it commits to the db to update something say that in the log and what it updated. This would allow us to know when regenerating the pkl would update metalinks.
  • Perhaps some kind of checkpoint thing, so it logs every 15min what it's doing. Right now there's many hour sections of the logs where it's not clear what it's doing at all.
  • Weed out these useless 'has changed' for directories that always change.

check metalink alternate repomd code

We need to confirm that the code around handling alternate repomd.xml's is working as expected/desired and that we have the right timeouts on it.

The idea as I understand it is when repomd.xml changes for a repo, we also keep the old repomd.xml as an alternate. Then, after N days we drop that alternate. This allows people to get updates from mirrors with the previous repodata when master has just changed.

However, we have hit situations where we haven't had updates pushes in a while for various reasons (a week or so) and mm seems to drop the current repomd.xml and keep the old/outdated/no longer used alternative one as the only one.

We need to make sure we are dropping the old one, and we need to adjust so that if there's no updates pushes for a while we don't break anything.

crawler: handle atomic tree

With almost 400000 files in the atomic tree it makes not much sense to crawl this part in the same way as the rest of the files. The crawler needs some intelligent way to detect if the atomic tree is up to date or not.

Redirected to mirror without requested file

Text for "Master rsync server Access Control List IPs" misleading

Each host has the possibility to define "Master rsync server Access Control List IPs" which has following description:

These host DNS names and/or IP addresses will be allowed to rsync from the master rsync/ftp servers. List here all the machines that you use for pulling.

This is not true. Access to the master mirror in Fedora is not granted automatically. Maybe we should hide this part for Fedora's MirrorManager installation. There used to be a way for mirrors to query all hosts listed in this field via /rsync_acls/ to populate the "hosts allow" field of the local rsync daemon. This does not exist in MirrorManager2 any more as it never had many users.

So we could remove this functionality at least from the Fedora templates.

Site-to-Site

There is a table with site-to-site entries in MM2 which is not used at all any more.

The only 'benefit' it has right now, is that some sites cannot be deleted from the database as they are listed in the site-to-site table. Especially old mirrors have entries in that table. Just like private inter-mirror-sync URLs this is a concept which is not used/followed in MM2 at all. The table as well as all code around it could be removed.

mm2_get_global_netblocks has syntax errors

From the monthy crons on mm-backend01:

Subject: Cron mirrormanager@mm-backend01 cd /usr/share/mirrormanager2 && /usr/bin/mm2_get_global_netblocks /var/lib/mirrormanager/global_netblocks.txt
Date: Sun, 1 May 2016 00:48:04 +0000 (UTC)

basename: missing operand
Try 'basename --help' for more information.
/tmp/get_global_netblocks.HkpbiVzv/: Is a directory
basename: missing operand
Try 'basename --help' for more information.
bzcat: Input file /tmp/get_global_netblocks.HkpbiVzv/ is a directory.
short file (empty packet) at zebra-dump-parser/zebra-dump-parser.pl line 98.

Offer a mirrorlist interface to query the time the data was generated

The mirrorlist servers (in Fedora) get every hour newly generated data and the process reading the data is restarted (kill -1). This works most of the time, but not always. The mirrorlist server process keeps on running but keeps on serving the old data and it seems further signals (kill -1) do not help to get the mirrorlist server process reading the new data.

If the time of the data generation is stored in the pickle used by the mirrorlist servers a new interface could be provided to query this timestamp which could be used for better monitoring.

metalink vs website differences

$ curl -sk 'https://mirrors.fedoraproject.org/metalink?repo=updates-released-f20&arch=x86_64' |grep -c http
7

One of these results is a header line. That means https://admin.fedoraproject.org/mirrormanager/mirrors/Fedora/20/x86_64 should show 6 entries.

Instead, it shows 151 active mirrors. Selecting one randomly, http://mirror.pnl.gov/fedora/linux/updates/20/ shows that this mirror does not contain f20 updates.

These two lists should have the same content.

Other, active mirrors, like http://mirror.cc.vt.edu/pub/fedora-archive/fedora/linux/updates/20/x86_64/ do not show up at all.

Adjust for pungi4 repo layout changes

We are now using pungi4 to create development/24 and development/rawhide composes.

The layout has changed up as these are full composes now with all the images and metadata and such in them.

UMDL seems to have picked up on 24, but it did the wrong thing.

https://mirrors.fedoraproject.org/mirrorlist?repo=fedora-24&arch=x86_64&country=global is currently pointing people to the Workstation repo, it should instead point to the Everything repo. Rawhide is still syncing but it likely will do the same.

We may need further adjustments, but short term s/Workstation/Everything/ should be a good start.

Python 3 compatibility

With Fedora working hard to have Python 3 as the default, and EPEL now including python34 for EL7 (and of course, there's Software Collections for Python 3.4 for EL6 and EL7), it would be awesome if MirrorManager2 added Python 3 support.

A cursory check at the requirements files in the repository indicate that there are only four modules that don't indicate Python 3 support in PyPi:

  • python-fedora
  • python-openid
  • python-openid-cla
  • python-openid-teams

I don't know to what extent these modules are required, but perhaps a first step would be to make MirrorManager2 itself Py3 ready with the six module? If the modules above aren't mandatory to MirrorManager2's functionality, then perhaps splitting the stuff that uses that out as a subpackage to make the core available with Py3 is an option?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.