Git Product home page Git Product logo

Comments (31)

mojavelinux avatar mojavelinux commented on May 15, 2024 1

I had fooled myself into thinking this had to do with orphaned branches because the master and develop branches were pointing to the same commit. I see now that it has to do with any branches that aren't the same.

It makes sense to me that you only fetch branches on demand. That's often what you want. What I'm trying to do is grab content out of any branches that match a pattern (e.g., v*). So I need a way to get a list of remote branches so I know which ones to fetch. Currently, the listBranches function only looks for heads. Could you add a function that lists remote branches (or all branches?).

Then I'd be able to iterate over the branches that match the filter and collect the files.

The workaround is to dive into .git/refs/remotes/origin/ myself to find matching branches, but it's a little crowed in there, so an API method would be much nicer.

from isomorphic-git.

mojavelinux avatar mojavelinux commented on May 15, 2024 1

While we're on the topic of cloning/fetching/checking out, is there a way to suppress the progress messages written to stdout?

Computing CRCs
Resolving deltas
percent2	milliseconds2	callsToReadSlice	callsToGetExternal
0%	0	0	0
1%	17	3	0
2%	61	3	0
3%	26	2	0
4%	25	4	0
5%	4	5	0
6%	6	3	0
7%	31	5	0
8%	28	5	0
9%	34	4	0
10%	10	6	0
...
100%	0	2	0
hash	readSlice	offsets	crcs	sort	misc
64	608	2819	10	0	15
by depth:
0	1	2	3	4	5	6	7	8	9	10	11
0	129	68	26	6	0	0	0	0	0	0	0
0	203	320	75	8	0	0	0	0	0	0	0

from isomorphic-git.

mojavelinux avatar mojavelinux commented on May 15, 2024 1

I would also study got. It's very well done.

from isomorphic-git.

billiegoose avatar billiegoose commented on May 15, 2024 1

A remote option for resolveRef;

Ooh! That's a great idea. That way listBranches and resolveRef work exactly the same way, and it elevates 'remote' to a common abstraction. You won't have to be aware of the filesystem implementation (refs/remotes/${remote}/branch). And now I see I can fix git.checkout by changing it to checkout local branches by default instead of 'origin' by default. (A design flaw on my part that made checking out local branches impossible - oops!)

from isomorphic-git.

billiegoose avatar billiegoose commented on May 15, 2024 1

One way is to get a list of branch names in the repository, filter them, then tell fetch to retrieve them.

Another (perhaps alternate) option would be just to have fetch take a collection of include patterns. That's what I'm essentially doing anyway. Something like: include: ['v*', 'master'].

I'll probably end up providing both. Some way to do remote reference discovery that essentially exposes GitRemoteHTTP.discover() to get the capabilities and references of a remote, and modifying "git.fetch" to accept an array of refspec patterns. So it'll probably end up looking a little more verbose, like refspec: ['+refs/heads/master:refs/remotes/origin/master', '+refs/heads/v*:refs/remotes/origin/v/*']. But it will also read the refspecs from .git/config files, which means it'll understand if you've configured your branch to pull from a different remote than 'origin' by default, and things like that.

from isomorphic-git.

billiegoose avatar billiegoose commented on May 15, 2024

Somewhat intentionally, isomorphic-git doesn't grab all the git history when it does a clone, it just grabs the history of one branch at a time. In this case "fetch" just grabbed "master" since that's the default for this repo. This is my way of keeping people from shooting themselves in the foot, cloning megabytes more than they need to. Maybe I need to rethink that and go with the more generous "if no branch is specified fetch ALL the branches" instead of just fetching the default? Or maybe do shallow clones of all the branches by default? Or simply document the current behavior with a big bold "notice" in the documentation and mention in the README that this behavior differs from canonical git? What do you think?

Edit: since the error message is unhelpful, I've labeled this a bug. isomorphic-git needs to be smart enough to realize what has happened and say something like "Error: Tried to checkout a branch that isn't available locally - do git.fetch({ref: 'gh-pages'}) to make the branch available locally"

As for this particular case, here's how you would do it using "fetch":

const git = require('isomorphic-git')
const fs = require('fs')

;(async () => {
  const dir = 'isogit'

  await git.init({
    fs,
    dir
  })

  await git.config({
    fs,
    dir,
    path: 'remote.origin.url',
    value: 'https://github.com/isomorphic-git/isomorphic-git.git'
  })

  await git.fetch({
    fs,
    dir,
    remote: 'origin',
    ref: 'gh-pages'
  })

  await git.checkout({
    fs,
    dir,
    remote: 'origin',
    ref: 'gh-pages',
  })
})()

but that can be simplified to:

const git = require('isomorphic-git')
const fs = require('fs')
;(async () => {
  await git.clone({
    fs,
    dir: 'isogit',
    url: 'https://github.com/isomorphic-git/isomorphic-git.git',
    ref: 'gh-pages'
  })
})()

However on my Windows machine I'm getting an error saying it cannot create deployed at Mon Jan 8 01:55:46 UTC 2018 by Deployment Bot (from Travis CI). Which... is understandable given that colons are not allowed in filenames on Windows. I'll have to see what canonical git does in this situation.

from isomorphic-git.

billiegoose avatar billiegoose commented on May 15, 2024

I'll take out the progress messages going to stdout. I'm looking for a good / great way to handle progress messages. Have you seen any APIs that do it really well? I like the simplicity of returning Promises for "git.clone" etc, but returning an EventEmitter would allow a lot more flexibility.

from isomorphic-git.

mojavelinux avatar mojavelinux commented on May 15, 2024

Here's the script I came up with to grab information from the package.json file from each reference in this repository and report information about it (in this case, the version).

const fs = require('fs')
const git = require('isomorphic-git')
const path = require('path')

;(async () => {
  const url = 'https://github.com/isomorphic-git/isomorphic-git.git'
  const dir = 'isogit'
  const originRefsDir = path.join(dir, '.git/refs/remotes/origin')

  await git.fetch({ fs, dir, url })

  // QUESTION is there a better way to get the HEAD/main branch?
  const mainBranchInfo = fs.readFileSync(path.join(originRefsDir, 'HEAD'), 'utf8')
  const mainBranchName = mainBranchInfo.slice(mainBranchInfo.lastIndexOf('/') + 1, mainBranchInfo.length).trim()

  const branchNames = fs.readdirSync(originRefsDir).filter((ref) => ref.charAt(0) === 'v' && !ref.endsWith('^{}'))

  const data = []
  const errors = []

  for (let i = 0, len = branchNames.length; i < len; i++) {
    const branchName = branchNames[i]
    if (branchName !== mainBranchName) await git.fetch({ fs, dir, url, ref: branchName })
    try {
      await git.checkout({ fs, dir, remote: 'origin', ref: branchName })
    } catch (e) {
      errors.push(e)
    }
    data.push(JSON.parse(fs.readFileSync(path.join(dir, 'package.json'), 'utf8')).version)
  }

  data.forEach((it) => console.log(it))
})()

There are two things I'd like to know.

  • Is there a better way to figure out which branch HEAD points to after the initial fetch?
  • How do I tell the difference between a branch and a tag?

from isomorphic-git.

mojavelinux avatar mojavelinux commented on May 15, 2024

Have you seen any APIs that do it really well?

nodegit seems to do a decent job. I've been able to use those hooks in the past to create a progress bar.

from isomorphic-git.

billiegoose avatar billiegoose commented on May 15, 2024

Is there a better way to figure out which branch HEAD points to after the initial fetch?

Not currently. I've been meaning to expose a resolveRef function that can be used to do that. I could also put that value in the return value from clone. It would be good for clone to return some metadata.

How do I tell the difference between a branch and a tag?

Tags are stored in .git/refs/tags instead of .git/refs/heads. I haven't made the helper functions for it yet

from isomorphic-git.

mojavelinux avatar mojavelinux commented on May 15, 2024

I could also put that value in the return value from clone.

+1 I was thinking that too.

Tags are stored in .git/refs/tags instead of .git/refs/heads. I haven't made the helper functions for it yet

But only after you fetch it, right? As you can see from my script, I can't tell before I fetch whether I'm fetching a tag or a branch. I'd like to exclude tags.

from isomorphic-git.

billiegoose avatar billiegoose commented on May 15, 2024

OK, it's weird, but tags aren't associated with a remote AFAICT. So right now, you should be fine because you shouldn't find any tags in .git/refs/remotes/origin...

Oh. Huh. Well, that's a bug. Canonical git doesn't do that. Mine is accidentally dumping tags in there. It should be dumping them in .git/refs/tags.

from isomorphic-git.

mojavelinux avatar mojavelinux commented on May 15, 2024

👍

from isomorphic-git.

billiegoose avatar billiegoose commented on May 15, 2024

Are there any loose ends to tie up here?

from isomorphic-git.

mojavelinux avatar mojavelinux commented on May 15, 2024

I'll refactor the code based on the latest master and see what's still sticking out. Stay tuned.

from isomorphic-git.

mojavelinux avatar mojavelinux commented on May 15, 2024

We're looking pretty good! Here's how the updated code looks (which reports the package version for each branch):

const fs = require('fs-extra')
const git = require('isomorphic-git')
const path = require('path')

;(async () => {
  const url = 'https://github.com/isomorphic-git/isomorphic-git.git'
  const dir = 'isogit'
  const depth = 1
  const repo = { fs, dir }
  const originBaseRef = 'refs/remotes/origin/'

  if (process.argv[2] === '--clean') await fs.remove(dir)

  await fs.pathExists(dir)
    .then((exists) => exists ? undefined : git.fetch({ ...repo, url, depth }))

  const defaultBranchName =
    (await git.resolveRef({ ...repo, ref: originBaseRef + 'HEAD', depth: 2 }))
    .replace(originBaseRef, '')

  const branchNames = (await git.listBranches({ ...repo, remote: 'origin' }))
    .filter((name) => name !== 'HEAD')

  async function isBranchFetched(branchName) {
    return git.readObject({ ...repo, oid: (await git.resolveRef({ ...repo, ref: originBaseRef + branchName })) })
      .then(() => true)
      .catch(() => false)
  }

  async function extractVersion(branchName) {
    const sha = await git.resolveRef({ ...repo, ref: originBaseRef + branchName })
    const { object: { tree } } = await git.readObject({ ...repo, oid: sha })
    const { object: { entries } } = await git.readObject({ ...repo, oid: tree })
    const packageEntry = entries.find((entry) => entry.path === 'package.json')
    if (packageEntry) {
      const { object: pkg } = await git.readObject({ ...repo, oid: packageEntry.oid })
      return JSON.parse(pkg.toString('utf8')).version
    }
  }

  const data = []

  for (let i = 0, len = branchNames.length; i < len; i++) {
    const branchName = branchNames[i]
    if (branchName !== defaultBranchName && !(await isBranchFetched(branchName))) {
      await git.fetch({ ...repo, url, ref: branchName, depth })
    }
    const version = await extractVersion(branchName)
    if (version) {
      data.push(branchName + ': ' + version)
    } else {
      console.log('package.json not found in branch: ' + branchName)
    }
  }

  data.forEach((it) => console.log(it))
})()

First, let me just say that the fetch depth is a killer feature. nodegit is lacking that, and it's sorely needed.

Here's my wishlist:

  • A way to check if a branch has been fetched. You can see my isBranchFetched method is filling in this logic.
  • An exclude filter for listBranches (in this case, I want to exclude HEAD, but this could be generally useful)
  • A remote option for resolveRef; you can see I have to prepend refs/remotes/origin/ to the branch name when working with remote branches (branches that haven't been checked out)
  • I'm not convinced that format: 'content' is working correctly for readObject when reading a blog. It doesn't seem to be any different than format: 'parsed'. What should I expect await git.readObject({ ...repo, oid: packageEntry.oid, format: 'content' }) to return?
  • When reading the entries, it might be nice to support a start path (as in read all files that descend from a given folder) or an explicit path (e.g., tree.entry('package.json'))

Even without these improvements, I can already see this library standing shoulder-to-shoulder with nodegit, which is damn exciting.

from isomorphic-git.

mojavelinux avatar mojavelinux commented on May 15, 2024

Here's an equivalent of this script written using nodegit.

const fs = require('fs-extra')
const git = require('nodegit')
const path = require('path')

;(async () => {
  const url = 'https://github.com/isomorphic-git/isomorphic-git.git'
  const dir = 'isogit/.git'
  const fetchOpts = { callbacks: { certificateCheck: () => 1 } }

  if (process.argv[2] === '--clean') await fs.remove(path.dirname(dir))

  const repo = await fs.pathExists(dir)
    .then((exists) => exists ? git.Repository.open(dir) : git.Clone.clone(url, dir, { bare: 1, fetchOpts }))

  const refs = await repo.getReferences(git.Reference.TYPE.OID)
  const branchNames = refs.reduce((accum, ref) => {
    const segments = ref.name().split('/')
    if (segments[1] === 'remotes' && segments[2] === 'origin') accum.push(segments.slice(3).join('/'))
    return accum
  }, [])

  async function extractVersion(branchName) {
    const tree = await repo.getBranchCommit('origin/' + branchName).then((commit) => commit.getTree())
    try {
      const packageBlob = await tree.getEntry('package.json').then((entry) => entry.getBlob())
      return JSON.parse(packageBlob.content().toString('utf8')).version
    } catch (e) {}
  }

  const data = []

  for (let i = 0, len = branchNames.length; i < len; i++) {
    const branchName = branchNames[i]
    const version = await extractVersion(branchName)
    if (version) {
      data.push(branchName + ': ' + version)
    } else {
      console.log('package.json not found in branch: ' + branchName)
    }
  }

  data.forEach((it) => console.log(it))
})()

from isomorphic-git.

billiegoose avatar billiegoose commented on May 15, 2024

Amazing, amazing feedback. I'm going to respond piecemeal bc I'm on the road today from my phone, so I apologise in advance.

A way to check if a branch has been fetched. You can see my isBranchFetched method is filling in this logic.

This is trickier said than done. The way you're doing it is the most robust - there are a number of edge cases where the object you want to read might not be available and catching that error is a sound approach. Maybe the branch is only fetched to a certain depth, or maybe it was fetched but there was a force push to the remote meanwhile.

I'm thinking what would actually speed up this code is to fetch all the branches at once. Then you'd only have to make one or two HTTP requests, and only one or two packfiles instead of one per branch (which I assume is what happens when your code runs? check .git/objects/pack and let me know, I never actually tested with multiple packfiles in a single repo IIRC). So either I should add a way to fetch all the branches up front, or make that the default behavior, or allow specifying a list of branches to fetch instead of just one, or some combination of all those ideas. But fetching multiple branches at once should speed up the code from O(n) with the number of branches to O(1) with the number of branches.

from isomorphic-git.

billiegoose avatar billiegoose commented on May 15, 2024

when reading a blog [sic]. It doesn't seem to be any different than format: 'parsed'.

I was planning to return a BlobDescription of some sort, but that's the thing about blobs: they've got no metadata. All the metadata (filename, executable bit, etc) is in the tree.

from isomorphic-git.

billiegoose avatar billiegoose commented on May 15, 2024

When reading the entries, it might be nice to support a start path (as in read all files that descend from a given folder) or an explicit path (e.g., tree.entry('package.json'))

I agree it's a little inconvenient to use Array.find or a for loop through the list to get the entries you care about, but I don't want to trade away simplicity for convenience. What's nice about TreeDescription and CommitDescription is they are just data structures. They're not strictly JSON because Buffer isn't a valid JSON object, but they are "structured clone"-able so you should be able to copy them and send them with postMessage style APIs. That makes them simple to work with and understand, which makes it simple for users to make their own convenience functions.

I might be persuaded differently later, but right now it seems much easier to say "tree.entries is an array of objects that have a type, a path, a mode, and an oid property" than to say "tree is an object with methods" and then have to document all the methods, and then deal with how you serialize the objects, and users who want to subclass them, etc.

from isomorphic-git.

mojavelinux avatar mojavelinux commented on May 15, 2024

I was planning to return a BlobDescription of some sort, but that's the thing about blobs: they've got no metadata. All the metadata (filename, executable bit, etc) is in the tree.

I get that. My point, though, is that when I pass format: 'content', I expect to get back the Buffer only. But I'm getting the description object instead.

from isomorphic-git.

mojavelinux avatar mojavelinux commented on May 15, 2024

I agree it's a little inconvenient to use Array.find or a for loop through the list to get the entries you care about, but I don't want to trade away simplicity for convenience.

My concern isn't the convenience factor. I just want to make sure that I'm not having it perform operations it doesn't need to perform. I'm only interested if entry(path) would eliminate overhead. If it's just a convenience function, then I agree it takes away from simplicity in the API.

from isomorphic-git.

mojavelinux avatar mojavelinux commented on May 15, 2024

I'm thinking what would actually speed up this code is to fetch all the branches at once. Then you'd only have to make one or two HTTP requests, and only one or two packfiles instead of one per branch (which I assume is what happens when your code runs?

That would be a welcomed addition! 👏

What I would like to be able to do is fetch branches that match a pattern in the most efficient way possible.

One way is to get a list of branch names in the repository, filter them, then tell fetch to retrieve them.

Another (perhaps alternate) option would be just to have fetch take a collection of include patterns. That's what I'm essentially doing anyway. Something like: include: ['v*', 'master'].

from isomorphic-git.

mojavelinux avatar mojavelinux commented on May 15, 2024

I just realized that TreeDescription#entries only returns a single level. Would it be possible to have work recusively like git ls-tree -r? In fact, git ls-tree -r also has the ability to hide trees by default (which can be enabled using -t).

Example:

$ git ls-tree -r remotes/origin/develop

from isomorphic-git.

billiegoose avatar billiegoose commented on May 15, 2024

Git stores each tree as a separate object, and retrieving objects with readObject should be just as efficient as anything I could implement "internally" in the library. So I don't think there's a performance gain to be had by adding a recursive option. If anything, the optimal perf will be when you can take advantage of use-case-specific knowledge and choose which directories to recurse into (src, doc) and which to skip (tests, dist). And because the worst case performance of a recursive read would be really bad, people would want more features like "ignore rules" and "recursion depth limit". Does the "-r" option still return a flat list or would it return nested lists? If it returns nested lists but only with the "-r" option, how do I describe that in the TypeScript definition file that provides IDE autocompletion for the library? It might snowball endlessly...

But that sucks right? Because a simple "I want a list of all the files in a git commit" shouldn't be this arduous task that requires reinventing the wheel every time. And heck I'd use a library that let me list recursively using globbing syntax to match (and exclude!) files and options for recursion depth, and keyword search, and filtering results by file size, and more!

(deep breath)

In the meantime, I think I'll add a section to the README and to the documentation for "cut-and-paste" useful code snippets for tasks that probably shouldn't be in the core of the library, yet are common enough that you shouldn't have to think about how to do them. They'd also double as useful Examples of how to use the core library, and possibly serve a third duty as answers to FAQs.

from isomorphic-git.

billiegoose avatar billiegoose commented on May 15, 2024

On a pragmatic note, if you haven't figured out how to recursively list the entries of a tree, let me know. I assume it's trivial but unless I actually work it out I can't be certain.

from isomorphic-git.

mojavelinux avatar mojavelinux commented on May 15, 2024

I've thought about this more and I actually agree with you. I have a lot more control being able to decide when to descend and when not to. I also don't have to worry about paths being created incorrectly on Windows since I receive them one level at a time.

What might be nice, however, is a tree walker like nodegit. That would just help manage the recursion, but still give me a callback to decide whether to keep going, stop, or whatever. But, of course, I can implement such a think in my own application if necessary.

from isomorphic-git.

mojavelinux avatar mojavelinux commented on May 15, 2024

The downside of the tree walker in nodegit is that it doesn't allow me to control the level of descent. So it's actually just a more cumbersome way of doing git ls-tree.

from isomorphic-git.

billiegoose avatar billiegoose commented on May 15, 2024

Took a while, but the 0.8.0 release fetches all branches by default now when you do a clone (opt out of that behavior by using the singleBranch: true option). I almost have support for providing an array of refs instead of a single ref argument as well, which combined with getRemoteInfo would let you pick out exactly which branches to clone. I think that fetch follows the refspec configuration in .git/config as well, although I need to write some better unit tests to verify that behaves properly.

from isomorphic-git.

billiegoose avatar billiegoose commented on May 15, 2024

I decided to rename this issue since it's evolved quite a lot.

from isomorphic-git.

mojavelinux avatar mojavelinux commented on May 15, 2024

all branches by default now when you do a clone

I can confirm this is working great.

I almost have support for providing an array of refs instead of a single ref argument as well, which combined with getRemoteInfo would let you pick out exactly which branches to clone.

This would be very nice for large repos.

from isomorphic-git.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.