Git Product home page Git Product logo

scalar's Introduction

Scalar

What is Scalar?

Scalar is a tool that helps Git scale to some of the largest Git repositories. It achieves this by enabling some advanced Git features, such as:

  • Partial clone: reduces time to get a working repository by not downloading all Git objects right away.

  • Background prefetch: downloads Git object data from all remotes every hour, reducing the amount of time for foreground git fetch calls.

  • Sparse-checkout: limits the size of your working directory.

  • File system monitor: tracks the recently modified files and eliminates the need for Git to scan the entire worktree.

  • Commit-graph: accelerates commit walks and reachability calculations, speeding up commands like git log.

  • Multi-pack-index: enables fast object lookups across many pack-files.

  • Incremental repack: Repacks the packed Git data into fewer pack-file without disrupting concurrent commands by using the multi-pack-index.

As new versions of Git are released, we update the list of features that Scalar automatically configures. This reduces your effort to keep your repositories as efficient as possible.

Scalar has moved!

Through significant effort from our team, we have successfully transitioned Scalar from a modified version of VFS for Git into a thin shell around core Git features. The Scalar executable has now been ported to be included in the microsoft/git fork. Please visit that fork for all of your Scalar needs:

Why did Scalar move?

Scalar started as a modification of VFS for Git to create a working solution with a robust test suite in a short amount of time. The goal was to depend more on features that exist within Git itself instead of creating new functionality within this project. Since the start, we have focused on this goal with efforts such as improving sparse-checkout performance in Git, implementing background maintenance in Git, and integrating the GVFS protocol into microsoft/git which allowed us to drop the Scalar.Mount process. All of these changes reduced the size of the code in Scalar itself until it could be replaced with a small command-line interface.

Additional benefits to this change include making our release and installation mechanism much simpler. Users now only need to install one tool, not multiple, to take advantage of all of the benefits.

What remains in this repository?

We are keeping the microsoft/scalar repository available since we have linked to it and want to make sure those links continue to work. We added pointers in several places to navigate readers to the microsoft/git repository for the latest versions.

We also have a large set of functional tests that verify that Scalar enlistments continue to work in a variety of advanced Git scenarios. These tests are incredibly helpful as we advance features in microsoft/git, so those tests remain in this repository. We run them as part of pull request validation in microsoft/git, so no changes are made there without passing this suite of tests.

What if I already installed Scalar and want the new version?

We are working to ensure that users on the .NET version of Scalar have a painless experience while changing to the new version.

  • On Windows, users can install microsoft/git and the installer will remove the .NET version and update any registered enlistments to work with the new version.

  • On macOS, users should run brew uninstall --cask scalar or brew uninstall --cask scalar-azrepos depending on their version and then run brew install --cask microsoft-git to get the new version. At the moment, users on macOS will need to re-run scalar register on their enlistments to ensure they are registered for future upgrades.

  • On Linux, there is no established uninstall mechanism, but the .NET version can be removed via sudo rm -rf /usr/local/lib/scalar/. Installing the new version will overwrite the scalar binary in /usr/local/bin. At the moment, users on Linux will need to re-run scalar register on their enlistments to ensure they are registered for future upgrades.

You can check if the new Scalar version is installed correctly by running scalar version which should have the same output as git version.

License

The Scalar source code in this repo is available under the MIT license. See License.md.

Code of Conduct

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

scalar's People

Contributors

alameenshah avatar bbodenmiller avatar benpeart avatar changeworld avatar chrisd8088 avatar derrickstolee avatar dscho avatar glensc avatar halterer avatar jamill avatar jeffhostetler avatar jeremyepling avatar jeschu1 avatar jongwooo avatar jrbriggs avatar kant avatar kevin-david avatar kewillford avatar kivikakk avatar mitesch avatar mjcheetham avatar nickgra avatar pmj avatar ravi-saini35 avatar rootulp avatar sanoursa avatar vdye avatar vtbassmatt avatar wilbaker avatar yehezkelshb avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

scalar's Issues

Update Functional Test data after rename

The functional tests use an old copy of the GVFS repo, so all of the paths use GVFS in the names. Those paths were modified automatically as part of the rename operation (#38).

As functional tests are re-added, we will need to revert the changes to those paths, but it will be a manual process.

Use a derivative of calver for scalar

It would be ideal if the versioning scheme gave a better indication of the time of release and the milestone associated with the bits.

I propose the following, where {build} = counter(SourceRef, 0)

Source ref Version template Version example Build Number
refs/heads/releases/19.08.157 {yy}.{MM}.{Milestone}.{build} 19.08.157.1 Release-19.08.157.1
refs/pull/25/merge 10.20.{PRNum}.{build} 10.20.25.1 PR-25.1
refs/heads/master {yy}.{MM}.{dd}.{build} 19.08.10.33 CI-master.33
refs/tags/tagname {yy}.{MM}.{dd}.{build} 19.08.10.34 CI-tagname.34

Create a perf test suite

To give us high confidence that customer satisfaction will be greater than it would be on VFS for Git, I think we want to measure identical scenarios on both VFS for Git sparse mode and scalar. Ideally we can also configure what part of the cone is available, so we can compare and contrast different sizes of enlistments. We should use representative sparse enlistments for different segments of our customer base.

Thoughts on the approach or execution?

[Mount Removal] Move repo registration from mount verb to clone verb

Repo registration (with Scalar.Service) should happen during scalar clone rather than mount.

Additionally, the clone verb itself should update the registration file (or each clone should create its own file) and Scalar.Service should read the file(s) to discover which repos have been registered.

Additionally:

  • Scalar.Service should also remove repos when it finds they're no longer on disk.
  • There should be a verb for manually registering/unregistering repos
  • Functional tests should not register when cloning or they should register in a test location that does not impact the installed Scalar.Service

Prefetch --stdin-folders-list doesn't match cone patterns

If we supply the same set of paths to git sparse-checkout add and scalar prefetch --stdin-folders-list, the prefetch command gets a smaller set of files than the sparse-checkout requires when writing files to disk. This leads to a very slow first checkout, even after prefetching.

The real solution is described in #36.

However, it may be worth a temporary fix to the BlobPrefetcher to match a few more paths and speed this up in the short term.

Set scalar.telemetry-pipe in installer

Set scalar.telemetry-pipe in the installer to get telemetry from scalar daemons.
This is a peer of gvfs.telemetry-pipe.

scalar.telemetry-pipe=scalar-c780ac06-135a-4e9e-ab6c-d41e2d265baa

We do NOT need the corresponding scalar.telemetry-id (like we have in gvfs).

[Mount Removal] Remove disk layout upgrade code

UPDATE

The upgrade steps are run as part of scalar mount will will be going away as part of the mount removal process.

We should remove the upgrade code as part of removing the mount process, and if in the future we need to perform disk layout upgrades it will need to be driving by the service and/or the installer.


We no longer need back-compat logic for previous GVFS disk layouts. We will dramatically change the way we store the repo config, and hopefully do so before we ship to EA.

At some point, we will not allow breaking changes and then will need upgrade logic. Should we delete the disk layout code now and then redesign/reimplement the upgrade logic when we need it?

cc: @mjcheetham

Remove GitHooksLoader

The GitHooksLoader should not be needed, as we only have the read-object hook, which is already native.

Remove LibGit2

Can we get rid of LibGit2 entirely? Here are some tradeoffs:

  • We currently track which downloaded objects are blobs or not. Dropping this stat would save time, and the batched read-object hook (#7) would make that less important.
  • CommitAndRootTreeExists() exists for a VFS for Git reason: to see if we need to prefetch the folders at a commit on clone time so we can generate an index before projecting. This isn't needed any more.
  • the LooseObjectsStep checks for corrupt loose objects. We'll have fewer objects with the batched read-object, and we could teach git pack-objects to clear corrupt objects, perhaps.

Outside of that last one, many of these changes are super small and don't have a huge impact on the full story.

Integrate fsmonitor with Watchman on macOS

For optimal performance, we need git status to run in O(modified) time. The fsmonitor feature exists in Git, and we should take advantage of it.

@dscho is working on this, but I can't assign it to him for some reason.

Rename EVERYTHING

New name TBD.

  • Replace all instances of "GVFS" and "VFSForGit" (and variants) in the filenames using an exact-rename commit.
  • Drop the base "GVFS" folder in the same commit.
  • Perform text-based replacement of "GVFS", "VFSForGit", and variants in most places (some things need to stay, such as the Git package name).

Progress indicators in BlobPrefetcher

scalar sparse --add needs progress indicators. These progress indicators need to be in two places (at least):

  1. BlobPrefetcher needs to provide feedback as it discovers and downloads blobs.

  2. git read-tree -mu HEAD needs to provide feedback as it populates the working directory.

These are very different solutions, so this issue will track BlobPrefetcher.

Investigate having the 'sparse' verb call 'git status' before finishing to prime the untracked cache

After adding a large set of cones to the repo using scalar sparse --add-stdin the first git status took a long time:

~/ScalarTests/repo/src>git status
On branch master
Your branch is up to date with 'origin/master'.

It took 17.52 seconds to enumerate untracked files. 'status -uno'
may speed it up, but you have to be careful not to forget to add
new files yourself (see 'git help status').
nothing to commit, working tree clean

If we had the sparse verb call git status before it finishes users would have a better experience running git status for the first time.

Ongoing: Port work from VFS for Git

Work in microsoft/vfsforgit sometimes needs a corresponding change here in microsoft/scalar.

Add a comment linking to the PR(s) that need porting to Scalar.

(Use ๐Ÿ‘ to indicate you are working on it, ๐Ÿš€ to indicate the item is done. ๐Ÿ‘Ž for "don't need")

Planning: feature branch in microsoft/git

This issue is to facilitate discussion.

In microsoft/git#171, we introduce the git sparse-checkout builtin. This has the features we need to get moving on the sparse clones in Scalar, but it is not ready for merging into vfs-2.22.0. In particular, we need to get feedback from the mailing list before we take a hard dependency on it, especially in the shipped version with microsoft/vfsforgit.

Here is my proposal:

  • Create a new feature branch features/sparse-checkout in microsoft/git.

  • The feature branch will include all updates to sparse-checkout (#8) and batch object downloading (#7, #36).

  • As vfs-2.22.0 advances, we can dual-checkin if it is a critical change. This should happen rarely as we are mostly doing upstream-first development in Git for VFS.

  • As git/git and git-for-windows/git ship new versions, microsoft/git gets a new vfs-2.XX.0 branch. The features/sparse-checkout will then be rebased on top of that using a force-push.

  • As features/sparse-checkout updates, we generate installers with suffix -sc to indicate this is something to consume in Scalar but not VFS for Git.

This setup should allow us to merge PRs like #54 and start working on functional tests, follow-up features, and perf tests.

/cc @jrbriggs, @wilbaker, @jeffhostetler, @kewillford, @jeschu1, @mjcheetham, @garimasi514, @nickgra.

[Mount Removal] Perform maintenance jobs in the service rather than the mount process

This is part of the work required to eliminate the mount process.

Special care will need to be taken regarding ACLs. The service runs with elevation, and we need to make sure that non-elevated git processes are still able to use the files produced by the maintenance tasks.

The service runs with elevation, and we need to make sure that non-elevated git processes are still able to use the files produced by the maintenance tasks.

As an alternative, we should investigate running Scalar.Service as the user rather than as admin.

Scalar installs into C:\Program Files\GVFS

I'm not sure what instance of "GVFS" in the codebase causes the installer to write into C:\Program Files\GVFS, but it requires the GVFS.Service and other GVFS.Mount processes to be terminated for the Scalar installer to work.

Performance: git add

We need to investigate what we can do about git add in our target enlistment.

  1. git add -p from src took 31s.
  2. git add . from src took 43s.

These are no-op adds with fsmonitor and untracked cache.

Install: scripted installation

We need to produce a simple scripted installation for macOS that pulls together scalar, git, gcm core, watchman, and internal tooling and correctly configures everything. This is to support demo scenarios and automation like perf and large build runs.

Sparse: Check for superceding parents in input

BUG: if we run git sparse-checkout set A A/B, then A is registered as a recursive closure AND A/B is marked as a recursive closure. This also means that A is marked as a parent path.

This results in Git complaining that the patterns are not cone-style, and reverts to the slow pattern matching algorithm.

To fix, consider removing paths from the "parent" list if they are in the "recursive" list. Further: remove children from the recursive list.

Progress indicator for vfs helper

While acquiring a set of objects prior to a workdir changing operation like checkout or reset, we should show progress similar to fetch's.

Progress indicators in `git read-tree -mu HEAD`

scalar sparse --add needs progress indicators. These progress indicators need to be in two places (at least):

  1. BlobPrefetcher needs to provide feedback as it discovers and downloads blobs.

  2. git read-tree -mu HEAD needs to provide feedback as it populates the working directory.

These are very different solutions, so this issue will track git read-tree -mu HEAD.

Git: core.gvfs config setting

The core.gvfs config setting does a lot of things, including block unwanted commands.

This setting was dropped as part of the rename effort (#38) and should be put back for now.

However, there are a lot of things that config options does that we may not want it to do in the Scalar world. Update Git to split those actions apart based on other config options or add a core.scalar for our situation.

For example, we still want to block git gc, but that could be part of core.virtualizeobjects instead.

Resumability: batch requests in vfs helper

In order to have partial resumability in the event of a network failure, we should limit the number of objects we request in one go and ask for multiple batches rather than one large batch.

This also opens the door for parallelization later.

Prototype git sparse verb

Goal: provide a user-friendly experience around configuring the sparse checkout.

Scope: Verb to add and remove entire directories from sparse enlistment.

Non-goals: High-performance application of sparse-checkout file.

Rewrite README

The README is a leftover from VFS for Git. It needs updating. Perhaps it should just point to the roadmap for now?

Sparse: Add functional test workflow

Create a functional test set that follows a typical workflow around a sparse enlistment:

  1. scalar clone --sparse=true
  2. Verify root files only.
  3. scalar sparse add
  4. Verify folders are added.

May be combined with #76.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.