Git Product home page Git Product logo

Comments (4)

andk avatar andk commented on July 16, 2024

On Sun, 22 Jan 2012 05:12:43 -0800, Jeffrey Ryan Thalhammer [email protected] said:

I'd like to have an additional field in the
02packages.details.txt.gz file that reflects the mtime of the
distribution file (or perhaps the module file). My reason is this...

Given the indexes from two different repositories, I need to
determine which one contains the latest version of any given
package. If the version numbers are different, the only way to
decide is based on the mtime of either the distribution or the
module. As I understand the code, this is how PAUSE decides what the
"latest" is.

Most code that parses the index files will probably just ignore an
extra field (after the dist path). So hopefully, this change will be
relatively safe.

What do you think?

Would you mind telling me more about the plan/purpose? I'd hope I can
make a more reasonable suggestion when I know what you're intending to
achieve. Historical analysis? Comparing any two running servers for
freshness? Staleness statistics? Why is the timestamp in the header of
the file not sufficient? Which errors are you trying to guard against?

andreas

from pause.

thaljef avatar thaljef commented on July 16, 2024

You're absolutely right. Please forgive me for jumping to solutions.

After giving it some more thought, I have concluded that my problem isn't with PAUSE. Rather, my problem is a symptom of the way I've abused the CPAN architecture.

As I see it, the design of CPAN (and its toolchain) assumes there is only one repository. There are many physical mirrors, but it is understood that they all come from a single logical source.

But tools like pinto and cpanminus are trying to take a different view. They want to create a world where there are many CPAN-like repositories, and each may contain completely different packages.

Fitting this multi-repository model onto the existing CPAN architecture creates some problems. One challenge is figuring out which repository has the "latest" version of any given package. If a package is available on multiple repositories, the version number isn't enough information to decide which distribution to fetch. When two packages have the same version, then you also need to know which distribution is the "latest".

Internally, I believe PAUSE has a similar problem. When PAUSE decides which packages to put in the index, it also looks at the mtime of the distribution file (or maybe it is the module file). The mtime is used to resolve the "latest" package in cases where there are two instances of a package that have the same version number. At least, that's my understanding of the code.

So I thought it might make sense to include the mtime in PAUSE's index file, thereby allowing pinto and cpanminus to make a similar decision when they examine multiple repositories. If PAUSE included the mtime in its index, I think it could help pave the way toward a multi-repository world. But I realize this isn't really PAUSE's concern though.

Thanks for prodding me into thinking this through. If you have any insights sort of stuff, I'd love to hear them.

-Jeff

from pause.

andk avatar andk commented on July 16, 2024

On Mon, 23 Jan 2012 11:29:48 -0800, Jeffrey Ryan Thalhammer [email protected] said:

You're absolutely right. Please forgive me for jumping to solutions.

After giving it some more thought, I have concluded that my problem
isn't with PAUSE. Rather, my problem is a symptom of the way I've
abused the CPAN architecture.

As I see it, the design of CPAN (and its toolchain) assumes there is
only one repository. There are many physical mirrors, but it is
understood that they all come from a single logical source.

Correct, the "C" stood for something;)

But tools like
pinto
and
cpanminus
are trying to take a different view. They want to create a world
where there are many CPAN-like repositories, and each may contain
completely different packages.

Hey, the UPAN? The Un-CPAN? I always felt so sorry for the PHP people
who have to live with multiple concurrent repos and the uncertainty
coming from them.

Fitting this multi-repository model onto the existing CPAN
architecture creates some problems. One challenge is figuring out
which repository has the "latest" version of any given package. If a
package is available on multiple repositories, the version number
isn't enough information to decide which distribution to fetch. When
two packages have the same version, then you also need to know which
distribution is the "latest".

You also need to know which version is authorized by the committer
and/or the author. Which versions are just experiments. Maybe once you
get a hold on all the other aspects that need to be tracked and somehow
presented, then the release timestamp will become less of a concern?

Internally, I believe PAUSE has a similar problem. When PAUSE
decides which packages to put in the index, it also looks at the
mtime of the distribution file (or maybe it is the module file). The
mtime is used to resolve the "latest" package in cases where there
are two instances of a package that have the same version number. At
least, that's my understanding of the code.

Yes, I seem to recall that there is a timestamp check involved.

So I thought it might make sense to include the mtime in PAUSE's
index file, thereby allowing pinto and cpanminus to make a similar
decision when they examine multiple repositories. If PAUSE included
the mtime in its index, I think it could help pave the way toward a
multi-repository world. But I realize this isn't really PAUSE's
concern though.

I had considered the question whether the timestamp should be presented
somewhere before, but I always dismissed it because it's so conveniently
accessible via rsync anyway. And with rsync you get much more than the
timestamp, you get a whole copy to examine the code for other hints
about the (more or less volatile) relation to repositories, etc.

Thanks for prodding me into thinking this through. If you have any
insights sort of stuff, I'd love to hear them.

For getting a super efficient rsynced copy of the CPAN you must not miss
File::Rsync::Mirror::Recent. You're never more than a few seconds behind
Pause. Please check it out if you have missed it so far.

Please keep me posted about the project, I do not want to miss the next
big thing in the perl community.

Good luck!

andreas

from pause.

thaljef avatar thaljef commented on July 16, 2024

On Mon, Jan 23, 2012 at 2:16 PM, andk <
[email protected]

wrote:

Hey, the UPAN? The Un-CPAN? I always felt so sorry for the PHP people
who have to live with multiple concurrent repos and the uncertainty
coming from them.

Yes, this is a bit of a double-edge sword. On one hand, I think much of
CPAN's success comes from its singularity. On the other hand, most of the
Perl code in the universe will never be in the CPAN. My goal is to bring
the *PAN model to all that wild code. But bridging between the CPAN's
centralized design and the distributed world is a challenge.

Fitting this multi-repository model onto the existing CPAN
architecture creates some problems.

You also need to know which version is authorized by the committer
and/or the author. Which versions are just experiments. Maybe once you
get a hold on all the other aspects that need to be tracked and somehow
presented, then the release timestamp will become less of a concern?

Yes, the timestamp issue is probably just the tip of the iceberg. There
are surely lots of other problems in a distributed model. Enforcing
ownership of namespaces and resolving namespace conflicts are some of the
other problems that come to mind. I don't know how to solve these yet.

I had considered the question whether the timestamp should be presented

somewhere before, but I always dismissed it because it's so conveniently
accessible via rsync anyway. And with rsync you get much more than the
timestamp, you get a whole copy to examine the code for other hints
about the (more or less volatile) relation to repositories, etc.

With pinto and cpanm, the goal is not to just replicate a repository.
Instead, the goal is to search through several repositories and find
something we want. So in that context, I think having some metadata about
the contents of each repository is really useful.

The META.(json|yml) files might be useful, but those seem so
unpredictable. I'll have to give that more thought.

Please keep me posted about the project, I do not want to miss the next
big thing in the perl community.

Will do. And thanks for all you do to keep the Perl world running smoothly.

-Jeff

from pause.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.