Git Product home page Git Product logo

mirrormanager2's Issues

Offer a mirrorlist interface to query the time the data was generated

The mirrorlist servers (in Fedora) get every hour newly generated data and the process reading the data is restarted (kill -1). This works most of the time, but not always. The mirrorlist server process keeps on running but keeps on serving the old data and it seems further signals (kill -1) do not help to get the mirrorlist server process reading the new data.

If the time of the data generation is stored in the pickle used by the mirrorlist servers a new interface could be provided to query this timestamp which could be used for better monitoring.

Document exactly what is expected from report_mirror

I have been working on a tool I call quick-fedora-mirror: https://pagure.io/quick-fedora-mirror

When it runs, it has a pretty complete picture of the contents of the local mirror already stored away and could easily send those to mirrormanager. Except that I'm not sure exactly what mirrormanager will take, besides a base64 encoded bzip2 compressed version of some data structure.

Could you tell me what it's supposed to look like? It would be really great if I could generate it without using python (because I'm trying to minimize client dependencies).

Consider support for distributing docker images in conjunction with Pulp

In particular in conjunction with Pulp crane. @maxamillion and fedora releng are the stakeholders here.

I believe the way the production is going to work is that:

  • someone will kick off a container build with fedpkg
  • that will call koji
  • which will in turn call a container build plugin.
  • which will in turn call OSBS to build the container
  • OSBS will then contact Pulp Crane to tell it about the image (and upload it)

... that's all I know at this point. People could then list the images in pulp and download what they want.

A missing piece of the puzzle is that we would like to somehow leverage our mirror infrastructure to distribute those images. It would be nice to have pulp crane redirect download requests to the right place.

Add a way to specify you want only https urls from metalink

Some folks want all their traffic to use ssl, we should offer a option to the metalink url that makes it return just https using mirrors. Something like ?method=https or the like in the url.

Along with this we might consider mailing mirror admins and asking if they would update to https where they have https available.

Once enough mirrors offered https we could make it the default perhaps.

AttributeError: 'list' object has no attribute 'startswith'

In staging:

Traceback (most recent call last):
File "/usr/share/mirrormanager2/mirrorlist_client.wsgi", line 156, in application
results = keep_only_http_results(results)
File "/usr/share/mirrormanager2/mirrorlist_client.wsgi", line 129, in keep_only_http_results
if url.startswith(u'http'):
AttributeError: 'list' object has no attribute 'startswith'

This request triggers the error: https://mirrors.stg.fedoraproject.org/mirrorlist?path=pub/fedora/linux/&redirect=1

Everything using redirect=1 is broken since 0.7.2 has been installed in staging.

Hide certain fields for non-admins

MM1 was not displaying all possible fields to non-admin users. At least site->admin_active and host->admin_active should not be visible and changeable as well as Peer ASNs for example. The complete list of items to hide from non-admin users has to be looked up in the MM1 source code.

Introduce timeout check during run_rsync()

Currently the crawler timeout is ignored if crawling via rsync. The possibility to specify a rsync timeout value on the command-line exists and is used but the rsync timeout options has another meaning:

 --timeout=TIMEOUT
              This option allows you to set a maximum I/O timeout in seconds.
              If no data is transferred for the specified time then rsync will exit.
              The default is 0, which means no timeout.

The crawling, however, should stop with an error if it did not finish during the specified timeout like it does with HTTP or FTP.

See #53 and #55

Description (help_text) for Host/Site entries missing

MM1 used to have detailed description about what each field the mirror admin has to fill out means.

This has been lost in the MM2 transition as this information has been stored in

https://git.fedorahosted.org/cgit/mirrormanager.git/tree/server/mirrormanager/controllers.py

(see help_text). These descriptions are missing at multiple places.

This has let to mirror admins entering country names instead of "2-letter ISO country code" which makes the mirrors unusable. I have also seen some of the text entries having validators which are missing from MM2.

Handle OSError: [Errno 12] Cannot allocate memory

If situation where not much memory is available the crawler cannot spawn rsync processes to crawl a certain mirror. Currently this fails like this:

WARNING - Failed to run rsync.
Traceback (most recent call last):
  File "/usr/bin/mm2_crawler", line 704, in try_per_category
    result, listing = run_rsync(url, params, logger)
  File "/usr/lib/python2.7/site-packages/mirrormanager2/lib/sync.py", line 46, in run_rsync
    bufsize=-1
  File "/usr/lib64/python2.7/subprocess.py", line 711, in __init__
    errread, errwrite)
  File "/usr/lib64/python2.7/subprocess.py", line 1205, in _execute_child
    self.pid = os.fork()
OSError: [Errno 12] Cannot allocate memory

This is not handled at all and needs proper handling, logging and reporting.

metalink vs website differences

$ curl -sk 'https://mirrors.fedoraproject.org/metalink?repo=updates-released-f20&arch=x86_64' |grep -c http
7

One of these results is a header line. That means https://admin.fedoraproject.org/mirrormanager/mirrors/Fedora/20/x86_64 should show 6 entries.

Instead, it shows 151 active mirrors. Selecting one randomly, http://mirror.pnl.gov/fedora/linux/updates/20/ shows that this mirror does not contain f20 updates.

These two lists should have the same content.

Other, active mirrors, like http://mirror.cc.vt.edu/pub/fedora-archive/fedora/linux/updates/20/x86_64/ do not show up at all.

Drop ftp:// urls from metalinks

ftp causes issues with many firewalls and is in general a horrible protocol. We should stop offerering them in metalink urls.

We might want to check/contact any mirrors that have only ftp urls and ask them to fix it or update to add a http{s} url.

Adjust for pungi4 repo layout changes

We are now using pungi4 to create development/24 and development/rawhide composes.

The layout has changed up as these are full composes now with all the images and metadata and such in them.

UMDL seems to have picked up on 24, but it did the wrong thing.

https://mirrors.fedoraproject.org/mirrorlist?repo=fedora-24&arch=x86_64&country=global is currently pointing people to the Workstation repo, it should instead point to the Everything repo. Rawhide is still syncing but it likely will do the same.

We may need further adjustments, but short term s/Workstation/Everything/ should be a good start.

rsyncFilter is missing from MM2

MM1 had an interface called rsyncFilter which MM2 does no longer provide. A small number of people still seem to use it (according to the httpd logs).

See original README for details.

Offer the possibilty to gracefully stop the crawler

Even with all the existing timeouts in the crawler there are situations where the crawler just hangs for some unknown reason. For hours. Right now the only option is to kill the crawler and interrupt all open network connections to the mirrors and the database.

It would be good if the crawler could be shut down gracefully with a signal (or something similar) which would then end all open crawling threads and update the database accordingly.

Text for "Master rsync server Access Control List IPs" misleading

Each host has the possibility to define "Master rsync server Access Control List IPs" which has following description:

These host DNS names and/or IP addresses will be allowed to rsync from the master rsync/ftp servers. List here all the machines that you use for pulling.

This is not true. Access to the master mirror in Fedora is not granted automatically. Maybe we should hide this part for Fedora's MirrorManager installation. There used to be a way for mirrors to query all hosts listed in this field via /rsync_acls/ to populate the "hosts allow" field of the local rsync daemon. This does not exist in MirrorManager2 any more as it never had many users.

So we could remove this functionality at least from the Fedora templates.

crawler: handle atomic tree

With almost 400000 files in the atomic tree it makes not much sense to crawl this part in the same way as the rest of the files. The crawler needs some intelligent way to detect if the atomic tree is up to date or not.

Provide a URL to check for basic functionality

In environments where MM2 is running behind proxies it would be good to provide a URL which makes a simple DB connection and then returns. The MM2 start page (as opposed to the MM1 start page) does a relatively complex database query which can lead to a high memory and CPU consumption just by multiple proxies checking the availability.

There was the proposal to provide a URL like /ping which does exactly this and which then just returns OK.

Add a script to check all the metalink urls

We had some metalink doom over the last couple days and a couple things were painful:

  • Rebuilding the umdl took forever (6-8 hours?). The script in #93 was added to try and make that much faster when we know what's wrong.
  • Checking that all the metalinks actually work after we rebuilt and synced out the pickle. We have to manually ssh around to f22, f21, f20, epel, etc, boxes and try to 'yum update' for both updates and updates-testing. It would be excellent if we had a script that just tried to download the metalink and validate it (somehow). We could then easily loop over all the metalinks and verify them from one box. We could do this automatically from nagios.

Make umdl logging better

Right now, umdl spits out:

06/22/2015 05:00:03 AM Starting umdl
06/22/2015 05:00:03 AM has changed: 1434873263 != 1434948558
06/22/2015 05:02:46 AM atomic/21 has changed: 1434836123 != 1434947584
06/22/2015 05:02:47 AM atomic/21/objects/01 has changed: 1434806893 != 1434947584
...
06/22/2015 05:08:54 AM development/rawhide/armhfp/os/Packages/k has changed: 1434720949 != 1434949693
06/22/2015 05:08:56 AM development/rawhide/armhfp/os/Packages/l has changed: 1434720949 != 1434949712
06/22/2015 05:36:26 AM %s: directory already has a repository
06/22/2015 05:36:26 AM %s: directory already has a repository
06/22/2015 10:46:16 AM Ending umdl

But it says "has changed" for every directory and it's not very useful.

It would be nice if we could add:

  • Anytime it commits to the db to update something say that in the log and what it updated. This would allow us to know when regenerating the pkl would update metalinks.
  • Perhaps some kind of checkpoint thing, so it logs every 15min what it's doing. Right now there's many hour sections of the logs where it's not clear what it's doing at all.
  • Weed out these useless 'has changed' for directories that always change.

Add preferred netblock filter for hosts

In the original mirrormanager it was not possible to add netblocks larger than /16 to a host. This filter does not seem to exist anymore. Right now it seems possible to add RFC1918 networks (which does not make much sense in most cases (except Fedora internal infrastructure systems)) and even 0.0.0.0/0.

Looking at the database we have lot's of private networks added as preferred netblocks to different hosts.

See also: https://fedorahosted.org/fedora-infrastructure/ticket/5016

logs kept forever and outside logrotate

Some recent logging changes have resulted in mirrormanager keeping /var/log/mirrormanager/ logs, but it never seems to expire/remove them and they are not controlled by logrotate, they seem to rotate every day at 00:00UTC outside logrotate.

So, how many of these logs are useful to keep? Should it be configurable?

Should mirrormanager let logrotate handle them instead of doing it internally itself?

Should logs be compressed to save space?

Note that in Fedora ansible I added a logotate for these before I realized they are controlled directly by mirrormanager. We will want to adjust that based on what we do here.

Site-to-Site

There is a table with site-to-site entries in MM2 which is not used at all any more.

The only 'benefit' it has right now, is that some sites cannot be deleted from the database as they are listed in the site-to-site table. Especially old mirrors have entries in that table. Just like private inter-mirror-sync URLs this is a concept which is not used/followed in MM2 at all. The table as well as all code around it could be removed.

Python 3 compatibility

With Fedora working hard to have Python 3 as the default, and EPEL now including python34 for EL7 (and of course, there's Software Collections for Python 3.4 for EL6 and EL7), it would be awesome if MirrorManager2 added Python 3 support.

A cursory check at the requirements files in the repository indicate that there are only four modules that don't indicate Python 3 support in PyPi:

  • python-fedora
  • python-openid
  • python-openid-cla
  • python-openid-teams

I don't know to what extent these modules are required, but perhaps a first step would be to make MirrorManager2 itself Py3 ready with the six module? If the modules above aren't mandatory to MirrorManager2's functionality, then perhaps splitting the stuff that uses that out as a subpackage to make the core available with Py3 is an option?

check metalink alternate repomd code

We need to confirm that the code around handling alternate repomd.xml's is working as expected/desired and that we have the right timeouts on it.

The idea as I understand it is when repomd.xml changes for a repo, we also keep the old repomd.xml as an alternate. Then, after N days we drop that alternate. This allows people to get updates from mirrors with the previous repodata when master has just changed.

However, we have hit situations where we haven't had updates pushes in a while for various reasons (a week or so) and mm seems to drop the current repomd.xml and keep the old/outdated/no longer used alternative one as the only one.

We need to make sure we are dropping the old one, and we need to adjust so that if there's no updates pushes for a while we don't break anything.

Bad Request 7860

Saw several (but not a large number, around 5 per mirrorlist server) today:

Jun 22 18:10:55 mirrorlist-phx2.phx2.fedoraproject.org python2[20404]: # Bad Request 7860
Jun 22 18:10:55 mirrorlist-phx2.phx2.fedoraproject.org python2[20404]: # {'repo': u'epel-6', 'IP': IP('103.7.56.6'), 'client_ip': u'103.7.56.6', 'metalink': True, 'arch': u'x86_64'}
Jun 22 18:10:55 mirrorlist-phx2.phx2.fedoraproject.org python2[20404]: Traceback (most recent call last):
Jun 22 18:10:55 mirrorlist-phx2.phx2.fedoraproject.org python2[20404]: File "/usr/share/mirrormanager2/mirrorlist_server.py", line 877, in handle
Jun 22 18:10:55 mirrorlist-phx2.phx2.fedoraproject.org python2[20404]: r = do_mirrorlist(d)
Jun 22 18:10:55 mirrorlist-phx2.phx2.fedoraproject.org python2[20404]: File "/usr/share/mirrormanager2/mirrorlist_server.py", line 718, in do_mirrorlist
Jun 22 18:10:55 mirrorlist-phx2.phx2.fedoraproject.org python2[20404]: allhosts, cache, file, pathIsDirectory=pathIsDirectory)
Jun 22 18:10:55 mirrorlist-phx2.phx2.fedoraproject.org python2[20404]: File "/usr/share/mirrormanager2/mirrorlist_server.py", line 423, in append_path
Jun 22 18:10:55 mirrorlist-phx2.phx2.fedoraproject.org python2[20404]: s = hcurl_cache[hcurl_id]
Jun 22 18:10:55 mirrorlist-phx2.phx2.fedoraproject.org python2[20404]: KeyError: 7860
Jun 22 18:15:13 mirrorlist-ibiblio.vpn.fedoraproject.org python2[15569]: # Bad Request 7860
Jun 22 18:15:13 mirrorlist-ibiblio.vpn.fedoraproject.org python2[15569]: # {'repo': u'epel-6', 'IP': IP('66.85.22.1'), 'client_ip': u'66.85.22.1', 'metalink': True, 'arch': u'x86_64'}
Jun 22 18:15:13 mirrorlist-ibiblio.vpn.fedoraproject.org python2[15569]: Traceback (most recent call last):
Jun 22 18:15:13 mirrorlist-ibiblio.vpn.fedoraproject.org python2[15569]: File "/usr/share/mirrormanager2/mirrorlist_server.py", line 877, in handle
Jun 22 18:15:13 mirrorlist-ibiblio.vpn.fedoraproject.org python2[15569]: r = do_mirrorlist(d)
Jun 22 18:15:13 mirrorlist-ibiblio.vpn.fedoraproject.org python2[15569]: File "/usr/share/mirrormanager2/mirrorlist_server.py", line 718, in do_mirrorlist
Jun 22 18:15:13 mirrorlist-ibiblio.vpn.fedoraproject.org python2[15569]: allhosts, cache, file, pathIsDirectory=pathIsDirectory)
Jun 22 18:15:13 mirrorlist-ibiblio.vpn.fedoraproject.org python2[15569]: File "/usr/share/mirrormanager2/mirrorlist_server.py", line 423, in append_path
Jun 22 18:15:13 mirrorlist-ibiblio.vpn.fedoraproject.org python2[15569]: s = hcurl_cache[hcurl_id]
Jun 22 18:15:13 mirrorlist-ibiblio.vpn.fedoraproject.org python2[15569]: KeyError: 7860
Jun 22 18:20:18 mirrorlist-dedicatedsolutions.vpn.fedoraproject.org python2[30150]: # Bad Request 7860
Jun 22 18:20:18 mirrorlist-dedicatedsolutions.vpn.fedoraproject.org python2[30150]: # {'repo': u'epel-6', 'IP': IP('66.85.22.1'), 'client_ip': u'66.85.22.1', 'metalink': True, 'arch': u'x86_64'}
Jun 22 18:20:18 mirrorlist-dedicatedsolutions.vpn.fedoraproject.org python2[30150]: Traceback (most recent call last):
Jun 22 18:20:18 mirrorlist-dedicatedsolutions.vpn.fedoraproject.org python2[30150]: File "/usr/share/mirrormanager2/mirrorlist_server.py", line 877, in handle
Jun 22 18:20:18 mirrorlist-dedicatedsolutions.vpn.fedoraproject.org python2[30150]: r = do_mirrorlist(d)
Jun 22 18:20:18 mirrorlist-dedicatedsolutions.vpn.fedoraproject.org python2[30150]: File "/usr/share/mirrormanager2/mirrorlist_server.py", line 718, in do_mirrorlist
Jun 22 18:20:18 mirrorlist-dedicatedsolutions.vpn.fedoraproject.org python2[30150]: allhosts, cache, file, pathIsDirectory=pathIsDirectory)
Jun 22 18:20:18 mirrorlist-dedicatedsolutions.vpn.fedoraproject.org python2[30150]: File "/usr/share/mirrormanager2/mirrorlist_server.py", line 423, in append_path
Jun 22 18:20:18 mirrorlist-dedicatedsolutions.vpn.fedoraproject.org python2[30150]: s = hcurl_cache[hcurl_id]
Jun 22 18:20:18 mirrorlist-dedicatedsolutions.vpn.fedoraproject.org python2[30150]: KeyError: 7860
Jun 22 18:40:07 mirrorlist-dedicatedsolutions.vpn.fedoraproject.org python2[30150]: # Bad Request 7860
Jun 22 18:40:07 mirrorlist-dedicatedsolutions.vpn.fedoraproject.org python2[30150]: # {'repo': u'epel-5', 'IP': IP('192.245.195.4'), 'client_ip': u'192.245.195.4', 'metalink': False, 'arch': u'x86_64'}
Jun 22 18:40:07 mirrorlist-dedicatedsolutions.vpn.fedoraproject.org python2[30150]: Traceback (most recent call last):
Jun 22 18:40:07 mirrorlist-dedicatedsolutions.vpn.fedoraproject.org python2[30150]: File "/usr/share/mirrormanager2/mirrorlist_server.py", line 877, in handle
Jun 22 18:40:07 mirrorlist-dedicatedsolutions.vpn.fedoraproject.org python2[30150]: r = do_mirrorlist(d)
Jun 22 18:40:07 mirrorlist-dedicatedsolutions.vpn.fedoraproject.org python2[30150]: File "/usr/share/mirrormanager2/mirrorlist_server.py", line 718, in do_mirrorlist
Jun 22 18:40:07 mirrorlist-dedicatedsolutions.vpn.fedoraproject.org python2[30150]: allhosts, cache, file, pathIsDirectory=pathIsDirectory)
Jun 22 18:40:07 mirrorlist-dedicatedsolutions.vpn.fedoraproject.org python2[30150]: File "/usr/share/mirrormanager2/mirrorlist_server.py", line 423, in append_path
Jun 22 18:40:07 mirrorlist-dedicatedsolutions.vpn.fedoraproject.org python2[30150]: s = hcurl_cache[hcurl_id]
Jun 22 18:40:07 mirrorlist-dedicatedsolutions.vpn.fedoraproject.org python2[30150]: KeyError: 7860
Jun 22 18:40:57 mirrorlist-phx2.phx2.fedoraproject.org python2[20404]: # Bad Request 7860
Jun 22 18:40:57 mirrorlist-phx2.phx2.fedoraproject.org python2[20404]: # {'repo': u'epel-5', 'IP': IP('199.30.133.99'), 'client_ip': u'199.30.133.99', 'metalink': False, 'arch': u'x86_64'}
Jun 22 18:40:57 mirrorlist-phx2.phx2.fedoraproject.org python2[20404]: Traceback (most recent call last):
Jun 22 18:40:57 mirrorlist-phx2.phx2.fedoraproject.org python2[20404]: File "/usr/share/mirrormanager2/mirrorlist_server.py", line 877, in handle
Jun 22 18:40:57 mirrorlist-phx2.phx2.fedoraproject.org python2[20404]: r = do_mirrorlist(d)
Jun 22 18:40:57 mirrorlist-phx2.phx2.fedoraproject.org python2[20404]: File "/usr/share/mirrormanager2/mirrorlist_server.py", line 718, in do_mirrorlist
Jun 22 18:40:57 mirrorlist-phx2.phx2.fedoraproject.org python2[20404]: allhosts, cache, file, pathIsDirectory=pathIsDirectory)
Jun 22 18:40:57 mirrorlist-phx2.phx2.fedoraproject.org python2[20404]: File "/usr/share/mirrormanager2/mirrorlist_server.py", line 423, in append_path
Jun 22 18:40:57 mirrorlist-phx2.phx2.fedoraproject.org python2[20404]: s = hcurl_cache[hcurl_id]
Jun 22 18:40:57 mirrorlist-phx2.phx2.fedoraproject.org python2[20404]: KeyError: 7860

Create unit tests for the mirrorlist server

To verify changes on the mirrorlist code do not break anything unit tests for the mirrorlist server are needed.

Following could be implemented:

  1. Generate a minimal pkl
  2. Start the mirrorlist server
  3. Query mirrorlist/metalink with all possible options

Also see #179

HTTP crawl skips most files

@pypingou your commit ced912a broke HTTP crawling. Most directories are now skipped. I have seen this now on multiple hosts but was only able to track it down today. This needs to be reverted and the problem which this tried to fix needs to be fixed in some other way.

An up to date host which does only provide HTTP for scanning has only the following directories listed as up to date. Only repodata directories and no directories with actual content.

4/i386/debug/repodata
4/i386/repodata
4/ppc/debug/repodata
4/ppc/repodata
4/SRPMS/repodata
4/x86_64/debug/repodata
4/x86_64/repodata
5/i386/debug/repodata
5/i386/repodata
5/ppc/debug/repodata
5/ppc/repodata
5/SRPMS/repodata
5/x86_64/debug/repodata
5/x86_64/repodata
6/i386/debug/repodata
6/i386/repodata
6/ppc64/debug/repodata
6/ppc64/repodata
6/SRPMS/repodata
6/x86_64/debug/repodata
6/x86_64/repodata
7/ppc64/debug/repodata
7/ppc64/repodata
7/SRPMS/repodata
7/x86_64/debug/repodata
7/x86_64/repodata
testing/4/i386/debug/repodata
testing/4/i386/repodata
testing/4/ppc/debug/repodata
testing/4/ppc/repodata
testing/4/SRPMS/repodata
testing/4/x86_64/debug/repodata
testing/4/x86_64/repodata
testing/5/i386/debug/repodata
testing/5/i386/repodata
testing/5/ppc/repodata
testing/5/SRPMS/repodata
testing/5/x86_64/debug/repodata
testing/5/x86_64/repodata
testing/6/i386/debug/repodata
testing/6/i386/repodata
testing/6/ppc64/debug/repodata
testing/6/ppc64/repodata
testing/6/SRPMS/repodata
testing/6/x86_64/debug/repodata
testing/6/x86_64/repodata
testing/7/ppc64/debug/repodata
testing/7/x86_64/debug/repodata
testing/7/x86_64/repodata

What should report_mirror check?

Currently report_mirror only transfers the list of directories on the mirror. It does not look at the content at all. This has been a problem in the past multiple times. Mirrors which are out of sync, for whatever reason, are marked as not being up to date by the crawler. The mirror, however, runs report_mirror which reports that all directories are on the mirror and is marked as being up to date. So the state of the mirror is flipping, depending what ran last: crawler or report_mirror.
This leads to mirrors being offered to users which are out of date but only for a few hours per day and that makes it difficult to debug problem reports from users.

For private mirrors, however, report_mirror is kind of required. It is the only information we have from a mirror which cannot be crawled and a broken or out of sync mirror is not that problematic as only a limited number of users will be hitting that mirror.

Possible solutions:

  • Only allow report_mirror for private mirrors. Which would be a pity as mirrors cannot influence their status. Which is a good thing as we know faster when a mirror is up to date again.
  • Include information about the files report_mirror found in the directories (maybe only repomd.xml checksums). This would increase I/O on the mirror moderately to dramatically and thus reducing the acceptance of report_mirror.

I am filing this issue mainly as a mean to document the current behaviour and maybe someone has a good idea how this can be solved.

mm2_get_global_netblocks has syntax errors

From the monthy crons on mm-backend01:

Subject: Cron mirrormanager@mm-backend01 cd /usr/share/mirrormanager2 && /usr/bin/mm2_get_global_netblocks /var/lib/mirrormanager/global_netblocks.txt
Date: Sun, 1 May 2016 00:48:04 +0000 (UTC)

basename: missing operand
Try 'basename --help' for more information.
/tmp/get_global_netblocks.HkpbiVzv/: Is a directory
basename: missing operand
Try 'basename --help' for more information.
bzcat: Input file /tmp/get_global_netblocks.HkpbiVzv/ is a directory.
short file (empty packet) at zebra-dump-parser/zebra-dump-parser.pl line 98.

Redirected to mirror without requested file

crawler: KeyError: 'unreadable'

ERROR:crawler:Hosts(43/144):Threads(30/30):1763:mirror.nonstop.co.il:Failure in thread '594d454', host <Host(1763 - mirror.nonstop.co.il)>
Traceback (most recent call last):
File "/usr/bin/mm2_crawler", line 1474, in worker
rc = per_host(session, host.id, options, config)
File "/usr/bin/mm2_crawler", line 1397, in per_host
sync_hcds(session, host, host_category_dirs, options.repodata)
File "/usr/bin/mm2_crawler", line 734, in sync_hcds
stats['unreadable'] += 1
KeyError: 'unreadable'
INFO:crawler:Hosts(43/144):Threads(30/30):1763:mirror.nonstop.co.il:Ending crawl of <Host(1763 - mirror.nonstop.co.il)> with status 3

Simpler checkin endpoint(s)

I would like to be able to do a checkin from a shell script. I know the current checkin xmlrpc checkin endpoint is there for compatibility with existing report_mirror checkins, but it would be nice to have something that's a bit simpler to use.

I propose to add two API endpoints to api.py: checkin-json and checkin-text. These will take json or a text file and parse that into a structure suitable for passing to read_host_config. Compressed endpoints could be added, too. The json should be trivial to generate from the regular report_mirror data structure that gets pickled. The text file should be easy to generate from a shell script.

Given the fact that you should really shouldn't unpickle untrusted data, I'd figure that this would be something you wouldn't mind adding. I think that doing this should be pretty easy for me, but I'm not sure how I'm going to test it. Any hints as to how you'd want to see this done would be appreciated, of course.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.