Git Product home page Git Product logo

Comments (10)

pypingou avatar pypingou commented on July 17, 2024

If this specific commit broke something then we have a larger problem as before this commit we were trying to use a non-existing dict :-/

from mirrormanager2.

adrianreber avatar adrianreber commented on July 17, 2024

I have added some debug output to the function try_per_file() which right now has trouble scanning the mirrors. try_per_file() gets as parameter d a Directory object and loops over the elements of the directory to get all files from the mirrors. There are two different objects in Directory which give information about the included directories. There is d.files and d.fileDetails.

d.files is a dict and we used to loop over the keys of the dict. This actually contains all (or the newest files I think) of the current directory. Your commit changed the loop to loop over d.fileDetails which only contains files for which we store mdsums and shasums. This leads to not looping over files without *sums and thus not scanning any other directories than repodata directories.

For comparison here is a print of d.files for a non-repodata directory:

{'globus-xio-debuginfo-0-3.2-1.el4.ppc.hdr': {'stat': 1326268238, 'size': '3711'}, 'header.src.info': {'stat': 1326741540, 'size': '0'}, 'globus-xio-gsi-driver-debuginfo-0-2.1-1.el4.ppc.hdr': {'stat': 1326268238, 'size': '2096'}, 'fex-debuginfo-0-1.20100416.2814-2.el4.ppc.hdr': {'stat': 1326669624, 'size': '1514'}, 'libidn2-debuginfo-0-0.8-1.el4.ppc.hdr': {'stat': 1326669624, 'size': '4387'}, 'globus-xio-popen-driver-debuginfo-0-2.2-1.el4.ppc.hdr': {'stat': 1326268238, 'size': '2014'}, 'header.info': {'stat': 1326741540, 'size': '5326'}, 'nwipe-debuginfo-0-0.06-2.el4.ppc.hdr': {'stat': 1326741540, 'size': '2274'}, 't1lib-debuginfo-0-5.0.2-2.ppc.hdr': {'stat': 1326268238, 'size': '1886'}, 'globus-xio-pipe-driver-debuginfo-0-2.1-1.el4.ppc.hdr': {'stat': 1326268238, 'size': '1591'}}

and d.filesDetails:

[]

the same for a repodata directory:

`{'20050ef902df2efc7a43e025b7ce4042a802eb31-primary.xml.gz': {'stat': 1330563141, 'size': '7222'}, 'updateinfo.xml.gz': {'stat': 1330563141, 'size': '10536'}, 'd4cfd0a2db6e1cae4d666ebe5063d0635bcaaff3-filelists.xml.gz': {'stat': 1330563141, 'size': '1883'}, '61f86b6cf57723478c8900e9555dce5fb0073b92-primary.sqlite.bz2': {'stat': 1330563141, 'size': '11118'}, 'a52a2bf6a8cffeb37c895f5171f5c6f815ebc327-other.xml.gz': {'stat': 1330563141, 'size': '9797'}, '4fd7be747395e68e5f50a032feb9b7960f31ad17-filelists.sqlite.bz2': {'stat': 1330563141, 'size': '3318'}, 'repomd.xml': {'stat': 1330563141, 'size': '2918'}, '736468f6f034c65b5e418206d02f638bcb5f6eec-other.sqlite.bz2': {'stat': 1330563141, 'size': '13287'}}``

and d.fileDetails:

[<mirrormanager2.lib.model.FileDetail object at 0x2a831450>]

So the original code which looped over the keys of d.files was doing the right thing (for most of the cases).

from mirrormanager2.

pypingou avatar pypingou commented on July 17, 2024

This means d.files isn't the result of a DB mapping but an attribute that is set elsewhere (not my favorite design tbh).

Sounds like you're right and we should revert the commit mentioned above, problem is: I specifically remember that commit fixing a problem we were having (just can't remember which one it was).

from mirrormanager2.

mdomsch avatar mdomsch commented on July 17, 2024

It was a hack because millions of directory entries times tens of hundreds
of files was way too slow to query. Hence the hacks too to keep only a few
most recently updated files in the directories with lots of RPMs and .html
files.
On Sep 18, 2015 7:40 AM, "Pierre-Yves Chibon" [email protected]
wrote:

This means d.files isn't the result of a DB mapping but an attribute that
is set elsewhere (not my favorite design tbh).

Sounds like you're right and we should revert the commit mentioned above,
problem is: I specifically remember that commit fixing a problem we were
having (just can't remember which one it was).


Reply to this email directly or view it on GitHub
#131 (comment)
.

from mirrormanager2.

pypingou avatar pypingou commented on July 17, 2024

On Fri, Sep 18, 2015 at 06:39:43AM -0700, Matt Domsch wrote:

It was a hack because millions of directory entries times tens of hundreds
of files was way too slow to query. Hence the hacks too to keep only a few
most recently updated files in the directories with lots of RPMs and .html
files.

I wonder if we could move the hack to be a @property of the object.

Food for thoughts :)

from mirrormanager2.

adrianreber avatar adrianreber commented on July 17, 2024

I had a look at the code and the d.files pickle is filled in by umdl and read by the crawler.

Unfortunately I also do not remember why you changed the code to be like it is now. I remember that it crashed in some cases but not why. So we need to revert the commit and probably add a check if d.files is empty so that we cannot loop over it.

from mirrormanager2.

pypingou avatar pypingou commented on July 17, 2024

FTR, the original PR: #107

from mirrormanager2.

adrianreber avatar adrianreber commented on July 17, 2024

Thanks for finding the original PR. I have not reverted the commits from that PR but changed the code so that it now uses the dict again. I will update this issue tomorrow after having tested my changes.

from mirrormanager2.

pypingou avatar pypingou commented on July 17, 2024

@adrianreber if you changes are working, don't forget to make a PR so that we can make a release :)

from mirrormanager2.

adrianreber avatar adrianreber commented on July 17, 2024

This has been fixed in #138

from mirrormanager2.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.