Git Product home page Git Product logo

vcs's Introduction

VCS

https://secure.travis-ci.org/codeinn/vcs.png?branch=master

various version control system management abstraction layer for python.

Introduction

vcs is abstraction layer over various version control systems. It is designed as feature-rich Python library with clean API.

vcs uses Semantic Versioning

Features

  • Common API for SCM backends
  • Fetching repositories data lazily
  • Simple caching mechanism so we don't hit repo too often
  • Simple commit api
  • Smart and powerfull in memory changesets
  • Working directory support

Documentation

Online documentation for development version is available at http://packages.python.org/vcs/.

You may also build documentation for yourself - go into docs/ and run:

make html

vcs's People

Contributors

heckj avatar jonashaag avatar lukaszb avatar marcinkuzminski avatar niedbalski avatar tefnet avatar the-tiger avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

vcs's Issues

Version/tag names

Version number is tracked at {{{ vcs/init.py }}} but tags don't need full name - first 3 parts are enough ({{{ 0.1.1 }}} rather than {{{ 0.1.1-beta }}}).

Same with version at {{{ setup.py }}} - status (alpha/beta/stable) is resolved by distutils' "classifiers" ({{{Development Status :: 4 - Beta}}} etc.).


Reference: https://bitbucket.org/marcinkuzminski/vcs/issues/2/

Misleading get_changesets

On base (no implementation) it is:
{{{
def get_changesets(self, since=None, limit=None):
"""
Returns all commits since given since parameter. If since is
None it returns all commits limited by limit, or all commits if
limit is None.

    @param since: datetime
    @param limit: integer value for limit
    """
    raise NotImplementedError

}}}

while on hg backend :
{{{
def get_changesets(self, limit=10, offset=None):
"""
Return last n number of MercurialChangeset specified by limit
attribute if None is given whole list of revisions is returned
@param limit: int limit or None
"""
count = self.count()
offset = offset or 0
limit = limit or None
i = 0
while True:
if limit and i == limit:
break
i += 1
rev = count - offset - i
if rev < 0:
break
yield self.get_changeset(rev)
}}}

What we need is consistency. Will wait for Marcin to comment on this one - is since needed? Or offset should go? Really don't know at the moment.


Reference: https://bitbucket.org/marcinkuzminski/vcs/issues/12/

get_node returns FileNode with wrong path given

In example, vcs has file 'setup.py' at repository. If requested, git backend would return this file for path 'foo/bar/setup.py'.

{{{
import vcs
repo = vcs.get_repo('git', 'vcs')
f = repo.request('setup.py')

last = repo.get_changeset()
f2 = last.get_node('setup.py')
f is f2 # True

Mercurial backend raises ChangesetError here, as supposed

f3 = last.get_node('foo/bar/setup.py')
f is f3 # False
f.content == f3.content # True
}}}

It seems the problem lies in _get_hex_for_path GitChangeset's method implementation.


Reference: https://bitbucket.org/marcinkuzminski/vcs/issues/23/

Move master repo to github.com

We have decided to move master repo to github.com and it should be available at https://github.com/codeinn/vcs.

What it means is we need to perform little tasks:

  • bookmark default hg branch to git "master" branch
  • push default branch into codeinn/vcs master branch
  • all developers would then fork codeinn/vcs (preferably removing own vcs version at github, if exists); exceptions are developers who would still prefer to work with mercurial - bitbucket.org/marcinkuzminski/vcs is official mercurial master repository for them :)

After this is done I'd like to propose following flow: the "git-flow" :) For reference see: http://nvie.com/posts/a-successful-git-branching-model/. It is straightforward, really.

For anyone: please try to move issues from bitbucket into github.com/codeinn/vcs/issues. Once done, mark issue at bitbucket to github (with url).

From now on, refer to github issue numbers within commits:

  • "Started working on #XXX - summary whats going to be done"
  • "Fixed #XXX - summary of whats done"

This task was originally created at bitbucket: https://bitbucket.org/marcinkuzminski/vcs/issue/47/move-master-repo-to-githubcom (link may not work in near future). Here we follow issue tracker move.

Revision numbers/raw_id/id

We need to clean up changeset id representation. All backends should support integers as revision numbers.

As docstrings state, revision should always be integers. On the other hand, "raw_id" should be just full hash (or simply integer for subversion, for instance). This would probably need some code refactor (but "grin raw_id" doesn't output much...).

This would allow one to easily get numbers for git commits, just like calling:
{{{
git log --pretty=oneline | awk '{ print NR-1 ":" $1 }' | sort -rn | awk -F":" '{ print NR-1 ":" $2 }' | sort -rn
}}}
in any git repo.


Reference: https://bitbucket.org/marcinkuzminski/vcs/issues/25/

Mercurial Backend: HGPLAIN env var

{{{
def plain(self):
'''is plain mode active?
Plain mode means that all configuration variables which affect the
behavior and output of Mercurial should be ignored. Additionally, the
output should be stable, reproducible and suitable for use in scripts or
applications.
The only way to trigger plain mode is by setting the `HGPLAIN'
environment variable.
"""
}}}

this is mercurial.ui.ui method and a docstring. We should set that 'HGPLAIN' environment variable probably at the mercurial backend's module level as it would stabilize the output - could be helpful with workdirs.


Reference: https://bitbucket.org/marcinkuzminski/vcs/issues/28/

Add basic commit API

Status: in-memory-commits are now implemented for both mercurial and git (for about a week now). First docs can be found at http://packages.python.org/vcs/api/backends/index.html#vcs.backends.base.BaseInMemoryChangeset.

Workdirs still require some codes to be written but it should be easier as those are only wrappers around scm commands.

OLD

There should be an interface for simple commits - I think here about adding/removing FileNodes. Whats for? Mainly for applications which would like to use SCM as data management backend. Example? Web based wiki where pages are stored within mercurial (like bb's wiki).

So we don't need complex operations - only some basics.

Any suggestions are welcomed - this is supposed to pop out at 0.2 milestone or 0.3 if we can't make it before next bigger release.


Reference: https://bitbucket.org/marcinkuzminski/vcs/issues/4/

Deprecate Repository.revision attribute - we should use id specific for each backends

Currently Repository.get_changeset accepts revision number (given as integer) which is generally wrong for both hg and git.

Mercurial can operate rather easily with that however for instance one may push repository to the server on which vcs process repositories management, then tries to use i.e. 10 as changeset id and gets different results from local repository. With integers there are integrity problems and only full hexes identifies changesets properly.

Git doesn't support integers as id numbers for commits in any way (there are some possibility to write own script but it would works properly only for new repositories which is not the case mainly).

That said, we should deprecate integer identification. This is great feature but it just is not proper one for repository management/browsing.

Applications should implement such feature on its own.


Reference: https://bitbucket.org/marcinkuzminski/vcs/issues/43/

Test coverage integration

Thanks to <> (author of django-treebeard) I've realized there is some nice tool (called coverage) which can report code covered during python session... This is proposal and I won't spare more time on it now but I'd strongly suggest to integrate this one - tests would definitely benefit on that.

Here is lighting fast session output:

{{{

(lukaszb) ~/develop/workspace/vcs > coverage run setup.py nosetests
running nosetests
running egg_info
writing requirements to vcs.egg-info/requires.txt
writing vcs.egg-info/PKG-INFO
writing top-level names to vcs.egg-info/top_level.txt
writing dependency_links to vcs.egg-info/dependency_links.txt
reading manifest file 'vcs.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
writing manifest file 'vcs.egg-info/SOURCES.txt'
running build_ext
vcs/backends/git.py:22: UserWarning: Git backend is only a proof by now
warnings.warn("Git backend is only a proof by now")
..vcs/backends/git.py:141: DeprecationWarning: Repo.tree(sha) is deprecated. Use Repo[sha] instead.
tree = self.repository._repo.tree(hex)

....E..E................................EEE

ERROR: test_branches (tests.test_git.GitRepositoryTest)

Traceback (most recent call last):
File "/Users/lukaszb/develop/workspace/vcs/tests/test_git.py", line 139, in test_branches
self.assertEqual(chset.branch, 'master')
AttributeError: 'GitChangeset' object has no attribute 'branch'

ERROR: test_tags (tests.test_git.GitRepositoryTest)

Traceback (most recent call last):
File "/Users/lukaszb/develop/workspace/vcs/tests/test_git.py", line 147, in test_tags
self.assertTrue(tip in self.repo.tags)
AttributeError: 'GitRepository' object has no attribute 'tags'

ERROR: test_get_backend (tests.test_vcs.VCSTest)

Traceback (most recent call last):
File "/Users/lukaszb/develop/workspace/vcs/tests/test_vcs.py", line 12, in test_get_backend
hg = get_backend('hg')
File "vcs/backends/init.py", line 26, in get_backend
if alias not in BACKENDS:
TypeError: argument of type 'NoneType' is not iterable

ERROR: test_get_repo (tests.test_vcs.VCSTest)

Traceback (most recent call last):
File "/Users/lukaszb/develop/workspace/vcs/tests/test_vcs.py", line 22, in test_get_repo
backend = get_backend(alias)
File "vcs/backends/init.py", line 26, in get_backend
if alias not in BACKENDS:
TypeError: argument of type 'NoneType' is not iterable

ERROR: test_wrong_alias (tests.test_vcs.VCSTest)

Traceback (most recent call last):
File "/Users/lukaszb/develop/workspace/vcs/tests/test_vcs.py", line 17, in test_wrong_alias
self.assertRaises(VCSError, get_backend, alias)
File "/usr/local/Cellar/python/2.6.5/lib/python2.6/unittest.py", line 336, in failUnlessRaises
callableObj(_args, *_kwargs)
File "vcs/backends/init.py", line 26, in get_backend
if alias not in BACKENDS:
TypeError: argument of type 'NoneType' is not iterable


Ran 45 tests in 3.501s

FAILED (errors=5)
(lukaszb) ~/develop/workspace/vcs > coverage report -m

Name Stmts Exec Cover Missing

setup 12 9 75% 10-13
tests/init 10 8 80% 47-48
tests/conf 7 7 100%
tests/simplevcs/init 1 1 100%
tests/simplevcs/test_settings 1 1 100%
tests/test_git 81 77 95% 141-143, 241
tests/test_hg 201 199 99% 217, 398
tests/test_nodes 64 63 98% 129
tests/test_utils 14 13 92% 28
tests/test_vcs 19 14 73% 13, 23-26, 29
tests/utils 38 25 65% 22, 27, 33-36, 45-46, 63, 65, 72-77
vcs/init 7 7 100%
vcs/backends/init 16 8 50% 16-19, 27-31
vcs/backends/base 88 55 62% 35, 41, 44, 49, 53, 57, 63, 66, 77, 85-86, 97, 120, 123, 126, 133, 136, 139, 142, 173, 181, 188, 196, 202, 208, 215, 221, 228, 236, 243, 257, 264, 271
vcs/backends/git 164 146 89% 40, 46, 49, 52, 56, 85, 117, 153-154, 163, 171-173, 188, 193, 211, 229, 233
vcs/backends/hg 262 210 80% 36-57, 65-71, 103, 130, 156, 159, 163, 171, 175, 180-184, 215, 228-229, 246, 259, 268, 275, 294, 299, 324-325, 332-333, 339-341, 375
vcs/exceptions 7 7 100%
vcs/nodes 215 166 77% 50, 54-59, 79, 99, 108, 175, 205, 210-212, 217-219, 230-240, 244, 264, 272, 280-282, 307, 327, 332, 352, 374, 405, 414-423, 439
vcs/utils/init 1 1 100%
vcs/utils/diffs 195 21 10% 18-44, 59-71, 74, 82-83, 89-103, 110-133, 139-163, 169-271, 278, 284, 293-363
vcs/utils/imports 12 2 16% 15-26
vcs/utils/lazy 9 9 100%
vcs/utils/paths 10 10 100%
vcs/web/init 1 1 100%
vcs/web/exceptions 3 3 100%
vcs/web/simplevcs/init 0 0 100%
vcs/web/simplevcs/exceptions 4 4 100%
vcs/web/simplevcs/models 38 19 50% 9-10, 18-19, 23, 26, 29, 32-33, 36-37, 40, 43, 46, 49-53
vcs/web/simplevcs/settings 10 9 90% 9
vcs/web/simplevcs/utils 104 31 29% 35-49, 55-60, 66-73, 80-84, 91-92, 95, 99-101, 104-106, 115-128, 135-138, 145-152, 159-162, 168-173, 184-193
vcs/web/simplevcs/views/init 3 3 100%
vcs/web/simplevcs/views/diffs 19 7 36% 51-72
vcs/web/simplevcs/views/hg 36 8 22% 40-91

vcs/web/simplevcs/views/repository 15 6 40% 48-63

TOTAL 1667 1150 68%

}}}


Reference: https://bitbucket.org/marcinkuzminski/vcs/issues/10/

add clone point, and turn off update after clone.

A proposal, to extend MercurialRepository/GitRepository to have clone point we clone at, and also disable default update to workingcopy after clone, i think it's waste of diskspace, and make the whole operation longer. I know that You can specify clone_point, and update option to mercurials api. Also i seen that clone_point is possible for git, and also --no-checkout options is an equivalent of no update in mercurial.

One option is to make
{{{

!python

def init(self, repo_path, create=False, baseui=None, clone_url=None,
clone_point=None):

}}}

and another

{{{

!python

def init(self, repo_path, create=False, baseui=None,
{'clone_url':None,'clone_point':None,'update':False}):
}}}

What do You think Lukasz ?


Reference: https://bitbucket.org/marcinkuzminski/vcs/issues/29/

create repository recognition method,

Either utils, or Base repostitory should have a smart repository recognition method.
Needed for get_repo() auto alias filling.
And Would be handy in repo_scanners.

I think it should behave very strict. i.e. It has to be sure that given path is one and only one type of repository if it couldn't determine the proper scm raises an exception. e.x a repository under git&hg at the same time should raise an exception.

based on what hg does i think we should assume the fallowing

if dir contains only .hg dir => mercurial
if dir contains only .git dir => git
if dir contains only .bzr dir => bazar
if dir contains only .svn dir => svn

It should fallow symlinks,and (this one is to be decided) should not recurse into directories


Reference: https://bitbucket.org/marcinkuzminski/vcs/issues/33/

Change get_repo behavior,

Now change repo raises an exception when more than one backedn is found on given path, well i have a proposal to change that behavior since someone would like to keep one repository in two or more scm-ms.

First proposal is not to raise an exception but warning, and return a list of found repositories for example
[MercurialRepository,GitRepository]


Reference: https://bitbucket.org/marcinkuzminski/vcs/issues/39/

Mercurial Backend API - remove possibility to pass revision number to get changeset

Ok then, it seems like we should remove ability to pass revision number (integer) to mercurial backend's repository object. Here is a citation:

{{{
JordiGH> martinknyc: Revision numbers are local to a repository.
martinknyc: It all depends on which order the pulls/pushes happen on each repository.
martinknyc: The hashes are the effectively unique global identifiers.
}}}

I'll wait for Marcin to get acknowledged and with his approval will remove this //feature// (or abomination, as it seems ;-))

If we push this, we need to make sure to update quickstart part of the documentation. Most test would need update, too.


Reference: https://bitbucket.org/marcinkuzminski/vcs/issues/8/

Vcs raises an wrong RevisionError

I'd happen to have a a revision with hash:323297680174
which is a plain digit revision hash. I noticed that MercurialRepository.get_changeset() function on that revision raises na exception, but it shouldn't. I blame this part of the code:

{{{

!python

    elif isinstance(revision, (str, unicode)) and revision.isdigit():
        revision = int(revision)

}}}
The get_changeset could not lookup this as a key, as it's converted to int. And raises an Exception. I think a double check on a length of revision to 12 would be enough to fix this bug. Since i don;t belive that someone will generate 12digit number of commits.


Reference: https://bitbucket.org/marcinkuzminski/vcs/issues/9/

Replace request method with get_node method at Repository class

We should deprecate 'request' method in favor of 'get_node' method.

Moreover, we should review codes and implement each method (if applicable and logical of course) from Changeset class at Repository class with additional 'revision' argument. This way API would be more straightforward, in my view.

Marcin, please list here what methods we should implement at Repository class. This should be made at BaseRepository so no backend specific modules should be touched. Whole process should be rather painless.

We should deprecate replaced methods in 0.2 and remove old ones at 1.0.


Reference: https://bitbucket.org/marcinkuzminski/vcs/issues/30/

vcs differ generates not uniqe html id anchor links

here's the case
Imagine that changeset have changed two files.

objects/files.py
special/files.py

Using vcs.utils.diffs.DiffProcessor.as_html()

will generate same anchor id for those files
both diff views will have the same anchors
ie.

files.py_NEW<line_number>

files.py_OLD<line_number>

for both files, event though they are different files

Proposed solution is to extend the html templates to use full path instead of just file name,
and additional prefix, if we whant to use this in displaying few diffs of the same file from several changesets

I reviewed some codes, and it doesn't look so simple to get the filename from diff as it's currently implemented.

Will investigate more how to solve this.

vcs FileNode is not compatible with unicode

Ok, so as a part of our discussion about unicode.
here's a example of error scenario that in my opinion should be valid.

#!python
# -*- coding: utf-8 -*-

import vcs

ok = 'ąśðąęłąć.txt'
er = ('ąśðąęłąć.txt').decode('utf-8')

assert type(er) is unicode

r = vcs.get_repo('/home/hg/x')

tip = r.get_changeset()

tip.get_node(ok)
tip.get_node(er)

tip.get_node(er) raises an vcs.exceptions.NodeDoesNotExistError: There is no file nor directory at the given path: u'\u0105\u015b\xf0\u0105\u0119\u0142\u0105\u0107.txt' at revision u'19:tip'

and the line to blame is
611:if path in self._file_paths:
that seeks in self._file_path that are acctually stored as utf-8 strings.

So either we store everything(paths in filenodes) as unicode or utf-8 internally.

Lukasz please comment on that one, (btw we have to make such test like i did above)


Reference: https://bitbucket.org/marcinkuzminski/vcs/issues/46/

mimetype is super slow for big binary filenodes

Ok like in the title, i found that the node.is_binary function really messes things up.

{{{

!python

@LazyProperty
def is_binary(self):
    """
    Returns True if file has binary content.
    """          
    return bool(self.content and '\0' in self.content)

}}}

It looks like this is a CPU killer, for content files with let's say 70MB binary file.

I think it's need to be rewrite the to a lazy read in chunk generator for the content. That should speed up things a lot.


Reference: https://bitbucket.org/marcinkuzminski/vcs/issues/37/

Simple method to obtain archived revision

Let's not copy this functionality all the time ;-)

I propose two new methods for BaseChangeset. This way all backends could create archives for given changeset, plus methods can be easily overridden if backend offers own implementation (i.e. mercurial's archival module).

First I wanted to make default stream new StringIO instance, however on the second thought am not sure about memory consumption for big (in size) changesets. Am I correct here?

{{{
def get_archive(self, stream=None, kind=None, prefix=None):
"""
Returns archived changeset contents, as stream. Default stream is tempfile as for huge
changesets we could eat memory.

:param stream: file like object. Default: new ``tempfile.TemporaryFile`` instance.
:param kind: one of following: ``zip``, ``tar``, ``tgz`` or ``tbz2``. Default: ``tgz``.
:param prefix: name of root directory in archive. Default is repository name and
  changeset's raw_id joined with dash.
"""
# TODO !

def get_chunked_archive(self, **kwargs):
"""
Returns iterable archive. Tiny wrapper around get_archive method.

:param chunk_size: extra parameter which controls size of returned chunks. Default:
  8k.
"""
chunk_size = kwargs.pop('chunk_size', 8192)
archive = self.get_archive(**kwargs)
while True:
    data = archive.read(chunk_size)
    if not data:
        break
    yield data

}}}

Marcin, if you accept this, please change issue type to "enchancement" :)


Reference: https://bitbucket.org/marcinkuzminski/vcs/issues/45/

Extend repository slicing to non numerical revisions

It would be handy if repository object could be sliced also with string revisions
ex repo['b004e9716ce9':'955bad849eba233a22169ac6d2860526351e8298'] would return
the same generator of revisions like repo[10:20],

Checks need to be made if start < end, (i guess using numerical revisions in hg, not sure about git)

Post 0.2 cleanup - remove django app from vcs

simplevcs is slightly outdated already and maintaining it within vcs is time consuming.

We've decided to remove it from vcs entirely. However, there is my secret project approaching so stay tuned ;-)

Repository's .revisions, __getslice__, __getitem__ and .get_changesets methods changes

As a general idea, in git when You iterate the log, it shows You only log for current (e.g. master) branch. This is much different than what mercurial is showing.

My proposal is to add, 'getiterator(branch=None)' function that will generate iterator function for mercurial repository and will iterate only on revision for passed branch parameter. That would be a nice addition to next/prev functions of changeset.


Reference: https://bitbucket.org/marcinkuzminski/vcs/issues/44/

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.