Git Product home page Git Product logo

git-split-file's Introduction

permalink
/

Split a file in a git repository without losing the git history.

Introduction

Usually when a file in a git repository is split into several different files, the history gets lost.

Often this is not desirable.

git-split-file allows for split files to retain their history.

Splitting a file still needs to be done manually but this script will take care of branching/moving/renaming/committing/merging/etc. the split files.

See the usage section for details on how to do this.

Installation

Clone or download this repository or the single file git-split-file.sh

Usage

In short all that needs to be done is manually split a file and call git-split-file

There are, however, some details to take into account.

Manually splitting the file

You have a file in a git repository that you would like to split into several files.

First create a directory the split files can be placed in. This directory does not have to reside in the git repository the file to split is in.

Next split that file into several different files. The more of the order and whitespace is left intact, the more of the history will also be left intact.

Under some circumstances it is desirable to have a truncated version of the file that has been split, under other circumstances the file that has been split can be removed completely. Both scenarios are supported.

In the scenario where the file is to be deleted, just leave the contents of that file as-is. git-split-file will simply remove the file for you.

In the scenario where a change version of the file is to remain in the repository, add a version of the file as it should eventually be in the directory that also holds the other split files. git-split-file will commit the changes for you. It is possible to tell the script to either leave the file in the same location or move it to the same location as the other split files.

Calling the script

As the final result can live outside of the repository, the script can easily be run on a clone to verify everything works out as desired.

In order to function, the script needs to know a few things:

  • The source file that is to be split (the source file)
  • The directory where the split files are located (the source directory)
  • The location where the split files should be placed in the repository (the target path)
  • Whether to delete or move the source file. (the split strategy)

Currently moving the source file to another location than the target path is not supported.

How it works

As a picture might communicate matter more clearly than mere words, consider the following:

        A---bN--cN      split branch N
       /         \
      A---B---C   \     split branch one
     /         \   \
    A-------D---E---eN     source branch
   /                  \
--A--------------------F--  root branch
  • A is the last known commit on the root branch.
  • A separate branch is made to function as the "source" to and from other branches are split off.
  • For each file in the source directory a copy of the source file is created (using git mv $SOURCE $TARGET) leading to commit B (and bN).
  • The content of the copied file is updated (using cat $CONTENT > $TARGET and git add/commit $TARGET) leading to commit C (and cN).
  • The content of the source file is updated leading to commit D.
  • Each split branch is then merged into the source branch, resolving any conflicts that occur, leading to commit E (and eN).
  • When all branches have been merged, everything is merged back to the root branch. Commit F now has all of the changes.

Origin / Motivation

When working in low-quality code-bases that have grown organically over time, it is not uncommon to encounter files that span several (tens of) thousands of lines.

For various reasons it is desirable to split such files into smaller files.

Instead of doing this by hand, it made more sense to automate parts of the process.

Development

There is currently only one test, in a separate repo. Yes, more tests do need to be added. Feel free to contribute some! ๐Ÿ˜‰

License

This project has been licensed under GPL-3.0+ License.

git-split-file's People

Contributors

potherca avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

git-split-file's Issues

Not working, help...

On windows 7 I open Git Bash
script is installed, script starts up

Can't get it to work:

The given split file '/s/Delphi/Applications/PascalCoin/Try restructure skybuck style/Try split file script/Molina.txt' is not part of a git repository

this file IS added to a repository so what does it show this error message ???

Why is no re-basing done ?

Hello,

I tried this tutorial, manually by following it exactly, by creating branches, and labels via tag command and then committing, so far seems to work.

I think your script uses a similiar approach, however your script does not seem to use re-base command, is this correct, if so why does your script not do re-base ? Is it not necessary ?

I noticed it used merge instead ? Hmmmm... maybe to keep history ? Yes that was what this was about isn't it this script.

I do believe in keeping history though.... though will it still work well without re-basing hmmm... well interesting your technique offers perhaps a different way of splitting files.

So apperently two techniques exist:
branch
rename
re-base

or
branch
rename
merge

Interesting anyway link is here:

https://stackoverflow.com/questions/3491270/git-merge-apply-changes-to-code-that-moved-to-a-different-file

Maybe your script can also benefit from re-base and offer some interesting future but I guess that goes against the purpose of this script to preserve the history ! May have to try the merge strategy manually see if that can work too... perhaps the real magic is in the "branch and rename" part ! ;) Think so ;)

Running with `. git-split-file.sh` froze my terminal

The first time I tried to run the script I did it with . git-split-file.sh (I'm not enough of a Linux user to remember the difference between bash git-split-file.sh and . git-split-file.sh - I usually try one or the other to see what works).

That gave me this output:

Errors occurred:

 This script expects four command-line arguments

sed: can't read bash: No such file or directory

And then I can't escape back to the prompt, despite a flurry of control commands:

^C
^C^C^C^C^C
^Z
^C^C^C^C^C

sdfkljsd
ls
^C^C^C

^C
^C^C^C^C^C^C^C
^Z^Z^Z^\^\^\

Anyway thanks for the excellent script!

Why is branching necessary ?

I am trying to find a solution for my problem as described here:

https://stackoverflow.com/questions/51114596/re-structure-code-split-into-multiple-files-keep-splitted-files-updated-from-o

Your script seems promising to some degree I have some questions about it though, (haven't tried the script yet)

  1. Why is the branching necessary ? Is this a functionality to allow safe merging in case of conflicts ? What kind of conflicts could occur ? Maybe split files already exist ?

  2. Is branching perhaps done because git mv can only be done once per branch ? If so great solution, but the "how it works" doesn't really explain it like that ?

Also once the files are split, and suppose somebody updates the original file, will the changes to the original files be re-applied to the split files by GIT ??? I need a functionality like that.

Merge conflict due to unmerged files

I'm trying to split a file into 2 separate files and delete the original file (DELETE strategy). I've stashed all changes (git status is clean except for some untracked files). My working copy is ahead of origin by 2 commits (not sure if that's a problem).

Here's the log I get:

bash git-split-file.sh source.html split src DELETE
#                running git-split-file.sh
#        for source file source.html
#  with source directory split
#    to target directory src
#   using split strategy DELETE
Does this look correct? (y/n) y

=====> Creating separate branch to merge split files back into

=====> Creating sub-branches
-----> Creating separate branch to split file 'split/source_split_1.html'
-----> Creating separate branch to split file 'split/source_split_2.html'

=====> Running split processing for file 'split/source_split_1.html'
-----> Switching to split branch
Switched to branch 'split-file_source.html_source_split_1.html'
-----> Creating separate file for 'source_split_1.html'
-----> Creating commit
[split-file_source.html_source_split_1.html 265c143] Adds separate file for 'source_split_1.html'.
 Author: Potherca-Bot <[email protected]>
 1 file changed, 0 insertions(+), 0 deletions(-)
 rename {source.html => source_split_1.html} (100%)
-----> Writing content to target file 'source_split_1.html'
-----> Creating commit
[split-file_source.html_source_split_1.html 6c8c0f5] Changes content in separated file 'source_split_1.html'.
 Author: Potherca-Bot <[email protected]>
 1 file changed, 1 insertion(+), 344 deletions(-)

=====> Running split processing for file 'split/source_split_2.html'
-----> Switching to split branch
Switched to branch 'split-file_source.html_source_split_2.html'
-----> Creating separate file for 'source_split_2.html'
-----> Creating commit
[split-file_source.html_source_split_2.html 4ef8e48] Adds separate file for 'source_split_2.html'.
 Author: Potherca-Bot <[email protected]>
 1 file changed, 0 insertions(+), 0 deletions(-)
 rename {source.html => source_split_2.html} (100%)
-----> Writing content to target file 'source_split_2.html'
-----> Creating commit
[split-file_source.html_source_split_2.html 8f7588c] Changes content in separated file 'source_split_2.html'.
 Author: Potherca-Bot <[email protected]>
 1 file changed, 1 insertion(+), 262 deletions(-)

=====> Merging all the split branches into the source branch
-----> Switching to source branch
Switched to branch 'split-file_source.html'

=====> Running merge processing for file 'split/source_split_1.html'

=====> Merging branch 'split-file_source.html_source_split_1.html' back into 'split-file_source.html'
-----> Branch 'split-file_source.html_source_split_1.html' exists
Auto-merging source_split_1.html
Merge made by the 'recursive' strategy.
 {source.html => source_split_1.html} | 345 +-----------------------------------------------------------------------------------------------------------------------------------------------------------
 1 file changed, 1 insertion(+), 344 deletions(-)
 rename {source.html => source_split_1.html} (53%)
-----> No merge conflict

=====> Running merge processing for file 'split/source_split_2.html'

=====> Merging branch 'split-file_source.html_source_split_2.html' back into 'split-file_source.html'
-----> Branch 'split-file_source.html_source_split_2.html' exists
CONFLICT (rename/rename): Rename "source.html"->"source_split_1.html" in branch "HEAD" rename "source.html"->"source_split_2.html" in "split-file_source.html_source_split_2.html"
Automatic merge failed; fix conflicts and then commit the result.
-----> Merge conflict occurred. Attempting to resolve.
-----> Creating commit
U       source_split_1.html
U       source_split_2.html
error: commit is not possible because you have unmerged files.
hint: Fix them up in the work tree, and then use 'git add/rm <file>'
hint: as appropriate to mark resolution and make a commit.
fatal: Exiting because of an unresolved conflict.


-----> Merge conflict remains. Attempting to resolve more aggressively.
cat: git-status.log: No such file or directory
Nothing specified, nothing added.
Maybe you wanted to say 'git add .'?
-----> Creating commit
U       source_split_1.html
U       source_split_2.html
error: commit is not possible because you have unmerged files.
hint: Fix them up in the work tree, and then use 'git add/rm <file>'
hint: as appropriate to mark resolution and make a commit.
fatal: Exiting because of an unresolved conflict.



Cleanup all the things? (y/n) y
-----> Removing all the split branches that were created
-----> Switching to root branch
Switched to branch 'develop'
Your branch is ahead of 'origin/develop' by 2 commits.
  (use "git push" to publish your local commits)
Deleted branch split-file_source.html (was a0515cb).
Deleted branch split-file_source.html_source_split_1.html (was 6c8c0f5).
Deleted branch split-file_source.html_source_split_2.html (was 8f7588c).
# ================================================================================
# Done.

What I want to do is this

(Yeah stack overflow is pretty anal lol, so I will explain here what I want to do in abstract terms).

There is this file let's called it Original.txt it contains three texts: ObjectA, ObjectB, ObjectC, like so:

ObjectA
<some text/story>

ObjectB
<some text/story>

ObjectC
<some txt/story>

Tree would look like:

 Original.txt

Now I think this file is to big so I want to split it up into three files ObjectA, ObjectB, ObjectC.

Tree now becomes

Original.txt ----> ObjectA.txt  ObjectB.txt  ObjectC.txt

Now original.txt is updated, Tree could look like:

Original.txt ----> ObjectA.txt  ObjectB.txt  ObjectC.txt       (UserA)
                 \----> Original.txt (updated)                             (UserB)

So UserA is using the "splitted files" branch.
So UserB is using the "original" branch and making updates to it (this is the master branch).

Now the changes from original.txt should be re-implemented in the splitted files so they can benefit from the changes to original.txt

I tried a couple of methods, but no success so far.

Now even if this would work somehow, and yes that would already be incredibly usefull.

There is a further complexity. To make matters worse the files should also be moved to a subfolder like so:

Original.txt ----> Original\ObjectA.txt  Original\ObjectB.txt  Original\ObjectC.txt       (UserA)
                 \----> Original.txt (updated)                             (UserB)

So that the usage of the subfolder name Original indicates where ObjectA, B,C came from... they came out of this file called original which is now basically turned into a subfolder, with it's contents split into three files.

And again the mission would be to update ObjectA, ObjectB and ObjectC with changes from original,txt as it is updated on the master branch.

Basically there could even be two repositories to work isolated from each other.

RepositoryA would work with original.txt only.
RepositoryB would work with the splitted files only and would "down stream"updates from the original repository.

Now to make it even more complex/desirable. It would also be very cool if RepositoryB can make additional updates to it's own files and still benefit from changes made in RepositoryA.

Such a script would be very handy for "non-cooperating" repositories.

Let's say UserA does not want to split Repository A and wants to keep working on a single big file.

While UserB wants to work on RepositoryB and wants to work with splitted files, but still benefit from UserA updates then this script would be worth GOLD to me ! ;) =D and maybe other people too.

If you could create some kind of script which can do this I would be very happy.

Now it can either work with repositories/streaming/pushing/pulling... but that has the disadventage of having to be online and using github protocols and such.

So an alternative is to simply work with two branches.

BranchA to mimic RepositoryA
BranchB to mimic RepositoryB

So then UserB simply "fetches remote repositoryA into BranchA" and happily continues working on his own BranchB incorporating changes from BranchA as he/she works through the updates/commits from BranchA or so.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.