Skip to content

jmnavarrol/simplest-git-subrepos

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

simplest-git-subrepos

The simplest way to manage git repos within git repos.

Table of Contents

⚠ WARNING: Do not use this tool in production/live environments.

This script is just meant to showcase this idea's happy path. As such, no attempt at error managing, input sanitation, etc. is ever attempted.

i.e.: the subrepos file is just sourced from the main script so imagine what happens when someone tweaks its contents so instead of the expected hash it sources something like this: rm -rf ~/.

YOU'VE BEEN WARNED!!!

If you look for a production-grade version, please visit my other project, python-multigit.


the long explanation

It is quite a common desire to somehow have git repositories within git repositories[1]. The typical answers involve convoluted incantations of git submodules (with the usual word of caution about them), git-subtree (which is oh, so much easier!), or even specifically crafted tools to deal with the complexity of those[2].

In the end, the point is that something that sounds like it should be easy ends up being complex and -the worse of it, error prone.

But, does it need to be so complex?

The naive approach (my naive approach at least) would be ...well, why not just dropping a git repo within another git repo? Much to my surprise after a long search on Google without results, that's not only perfectly possible, but trivially easy indeed, and basically without any bad side effect.

After such a "discovery" I decided to share publicly by example in case others find it of benefit.

the use case

It seems that the motto "make easy things easy and hard things possible" can be tracked down to Larry Wall and "Learning Perl", aka the Llama book. I'm not telling submodules, subtrees, all those tools... are not without merit but I do say they fail Wall's saying: they certainly fail at making easy things easy. But, what I want to suit, then?

Let's have a project which I'll call SUPER which have some glue files in order to tie together another two ones which I'll call SUB1 and SUB2 (you can think of, say, a web front end using two modules, or whatever). Just to make things funnier, let's imagine that SUB1 also includes a deeper module, which I'll call SUBSUB. Overall, the layout looks like this:

[SUPER]|README.md
       |
       |[SUB1]|file1
       |      |
       |      |[SUBSUB]|another_file
       |
       |[SUB2]|file2

Now, if I merely consume the contents of SUB1 and SUB2 (and, of course, SUBSUB), using vendor branches or just bringing them at build time from an artifacts repository would be good enough but, what if I want/need to also contribute to all of them? Typically that's the case for corporate environments, where all those repositories belong to the same owner (the company) and the "proper" way to test and evolution the submodules is by calling them from the parent one (or another similar one for testing purposes). So, to recall the situation:

  1. The functionallity of the submodules can only be ascertained (at least comfortably) by means of their integration with the SUPER one.
  2. I have write access to at least some branches on the submodules.

the solution

As I already said above, why not try to just create a git repo within another? So let's do it:

jmnavarrol@:~/super$ git init
Initialized empty Git repository in ~/super/.git/
jmnavarrol@:~/super$ echo 'Hello, World!' > README.txt
jmnavarrol@:~/super$ git add README.txt
jmnavarrol@:~/super$ git commit -m "first commit"
[master (root-commit) 5677966] first commit
 1 file changed, 1 insertion(+)
 create mode 100644 README.txt
jmnavarrol@:~/super$

Now, let's go for the second one:

jmnavarrol@:~/super$ mkdir sub1 && cd sub1
jmnavarrol@:~/super/sub1$ git init
Initialized empty Git repository in ~/super/sub1/.git/
jmnavarrol@:~/super/sub1$ echo 'The sub1 repo' > file1
jmnavarrol@:~/super/sub1$ git add file1
jmnavarrol@:~/super/sub1$ git commit -m "first commit into the sub1 repo"
[master (root-commit) 1f8273f] first commit into the sub1 repo
 1 file changed, 1 insertion(+)
 create mode 100644 file1
jmnavarrol@:~/super/sub1$

So, how is the world seen from the SUB1 repo?

jmnavarrol@:~/super/sub1$ git checkout -b development
Switched to a new branch 'development'
jmnavarrol@:~/super/sub1$ git status
# On branch development
nothing to commit, working directory clean
jmnavarrol@:~/super/sub1$ git log
commit 1f8273fc5b3c8402aab9a57008c70934692ccaa8
Author: jmnav <####>
Date:   Sat May 14 20:24:56 2016 +0200

    first commit into the sub1 repo
jmnavarrol@:~/super/sub1$

And what about SUPER?

jmnavarrol@:~/super/sub1$ cd ..
jmnavarrol@:~/super$ git log
commit 5677966145def72214491868a97324a0952e8041
Author: jmnav <####>
Date:   Sat May 14 20:22:20 2016 +0200

    first commit
jmnavarrol@:~/super$ git status
# On branch master
# Untracked files:
#   (use "git add <file>..." to include in what will be committed)
#
#       sub1/
#
nothing added to commit but untracked files present (use "git add" to track)
jmnavarrol@:~/super$

Hummm... this looks like a problem... SUB1 knows nothing about the parent repo (as it should) and SUPER sees SUB1 as an untracked dir. What we should do? If we add the sub1 directory to the SUPER repository, all kinds of nasty things will happen as the history of data within SUB1 will be different depending if we ask to SUPER or SUB1 (see, for instance, that SUPER "thinks" to be in the master branch, while SUB1 sees itself in a different one). On the other hand, if I just leave that sub1 directory untracked, it not only will become cumbersome, but I risk adding it on SUPER by mistake.

Luckily, the .gitignore file comes to the rescue:

jmnavarrol@:~/super$ echo sub1/ >> .gitignore
jmnavarrol@:~/super$ git add .gitignore 
jmnavarrol@:~/super$ git commit -m "adding the subrepo to .gitignore so it goes away."
[master bca44d2] adding the subrepo to .gitignore so it goes away.
 1 file changed, 1 insertion(+)
 create mode 100644 .gitignore
jmnavarrol@:~/super$ git status
# On branch master
nothing to commit, working directory clean
jmnavarrol@:~/super$

See? SUB1 has "disappeared" from sight and it's guaranteed to stay that way (as long as the relevant entry within .gitignore stays in place).

From now on you can manage SUPER and SUB1 just as two completely different repositories: no need to learn the arcanes of a new tool, no more "Oh! I pushed to the wrong repo!", or "I pulled from the parent repo... where the heck have my changes on the submodule gone!?", just plain old git commands.

going forward

There's no much forward to go to: the trick about .gitignore is basically all of it. After all I promised "The simplest way", right?

There is one problem, though, and it comes because of the fact that SUPER and SUB1 are so completely decoupled (which was my selling point to start with): in the example above I worked on local repositories but what if, as it is the usual case, there is a whole team working out of remote repos? When somebody clones SUPER he gets no hint on what to do to reach to SUB1 or even that it exists at all. Of course, one could resort to external documentation to tell him what to do but that wouldn't be "making easy things easy" right?

For that I created a simple script, multigit that reads the subrepos to manage from a Bash hash in the subrepos file and recursively git clones them. By recursively I mean that it looks for other multigit scripts within the directory hierarchy to run them in turn so, starting from the top repo it clones all the defined subrepos in recursion. Starting in any middle point, running it from SUB1 in the scheme above, for instance, will do the expected: clone whatever repos there are defined down the line. Once all the repos are in place, it's just a matter of using git as if they were completely in isolation.

I created projects at GitHub to publish the script and self-explain its working by means of the SUPER / SUB1 / SUB2 / SUBSUB example:

multigit usage

jmnavarrol@:~/simplest-git-subrepos$ ./multigit
USAGE: multigit [-c|--clone]    - recursively Clones all defined git subrepos
       multigit [-h|--help]     - shows this Help
       multigit [-k|--checkout] - cheKs out the current branch in all defined git subrepos
       multigit [-l|--list]     - Lists git subrepos found
jmnavarrol@:~/simplest-git-subrepos$

Running multigit --clone produces this lay out:

jmnavarrol@:~/$ tree -d simplest-git-subrepos/
simplest-git-subrepos/
├── sub1
└── sub2

You can see there's no sign of the subsub directory (and repo). That's because multigit checkouts whatever happens to be the default branch of the repo, master in this case, and I haven't defined any subrepo on that branch within SUB1 (on purpose).

Now, go to SUB1 to checkout its development branch, where I did define SUBSUB as subrepo and run multigit again, either right from sub1 or from the top directory:

jmnavarrol@:~/simplest-git-subrepos$ cd sub1/
jmnavarrol@:~/simplest-git-subrepos/sub1$ git checkout development
Branch development set up to track remote branch development from origin.
Switched to a new branch 'development'
jmnavarrol@:~/simplest-git-subrepos/sub1$ ll
total 36K
-rw-r--r-- 1 jmnav jmnav   82 may 15 11:51 file1
-rw-r--r-- 1 jmnav jmnav  18K may 15 11:43 LICENSE
-rwxr-xr-x 1 jmnav jmnav 2,1K may 15 11:51 multigit
-rw-r--r-- 1 jmnav jmnav  221 may 15 11:43 README.md
-rw-r--r-- 1 jmnav jmnav  377 may 15 11:51 subrepos
jmnavarrol@:~/simplest-git-subrepos/sub1$ cat subrepos 
# Hash listing the subrepos this one depends on
# Structure of each element: ["directory name"]="path to repo"
# i.e.: REPOS=( ["sub1"]="git@gitolite:sub1" ["sub2"]="git@gitolite:sub2" )

declare -A REPOS
# REPOS=( ["sub1"]="git@gitolite:sub1" ["sub2"]="git@gitolite:sub2" )
REPOS["subsub"]="https://github.com/jmnavarrol/simplest-git-subrepos-subsub.git"  # The 'subsub' repo
jmnavarrol@:~/simplest-git-subrepos/sub1$ cd ..
jmnavarrol@:~/simplest-git-subrepos$ ./multigit --clone
subsub doesn't exist: about to clone it...
Cloning into 'subsub'...
remote: Counting objects: 19, done.
remote: Compressing objects: 100% (17/17), done.
remote: Total 19 (delta 4), reused 12 (delta 1), pack-reused 0
Unpacking objects: 100% (19/19), done.
Checking connectivity... done.
jmnavarrol@:~/simplest-git-subrepos$

...see the result now:

jmnavarrol@:~/$ tree -d simplest-git-subrepos/
simplest-git-subrepos/
├── sub1
│   └── subsub
└── sub2

...just as expected.

Feel free to experiment with the script. Comments, issues, pull requests... are welcomed at the project's page on GitHub.


1. See, for instance, this Stack Overflow question.

2. Quite a lot of them, in fact. Just to name a few:

About

The simplest way to manage git repos within git repos

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages