Git internal representations
To understand git you must go past the porceline and check out the plumbing
SHAs
git commits come with a sha (a hash with 0-9a-f characters).
You can specify the first 7 characters of a sha for git to identify what you are referring to.
To get the sha of a commit, we can use the log
command and copy the long hex string
Problem
Find the commit sha of your first commit (copy it to your system clipboard)
Solution
git log
You should see something like this
commit 5ba786fcc93e8092831c01e71444b9baa2228a4f (HEAD -> master)
Author: ThePrimeagen <the.primeagen@gmail.com>
Date: Sun Jan 21 19:40:56 2024 -0700
batman
commit 5ba786fcc93e8092831c01e71444b9baa2228a4f
---------^ this is the sha
Question
Why is your sha different than mine?
Answer
sha: SHA stands for Secure Hashing Algorithm. SHA is a modified version of MD5
Second, your sha, if you remember me saying, is the combination of changes you have made, author, time, etc etc
Problem
This next part will be a bit hard, but its doable.
Can you find your commit's file(s) of data within .git
folder. See if you
can cat
out any data
hint: Your commit's sha is a key piece of information
hint: The first 2 characters of the commit sha...
Solution
Upon initial look via ls, you will see a nothing that looks familiar.
➜ my-first-git git:(master) ls -la .git
total 52
drwxrwxr-x 8 ThePrimeagen ThePrimeagen 4096 Jan 21 12:10 .
drwxrwxr-x 3 ThePrimeagen ThePrimeagen 4096 Jan 21 12:10 ..
drwxrwxr-x 2 ThePrimeagen ThePrimeagen 4096 Jan 21 10:29 branches
-rw-rw-r-- 1 ThePrimeagen ThePrimeagen 7 Jan 21 12:10 COMMIT_EDITMSG
-rw-rw-r-- 1 ThePrimeagen ThePrimeagen 92 Jan 21 10:29 config
-rw-rw-r-- 1 ThePrimeagen ThePrimeagen 73 Jan 21 10:29 description
-rw-rw-r-- 1 ThePrimeagen ThePrimeagen 23 Jan 21 10:29 HEAD
drwxrwxr-x 2 ThePrimeagen ThePrimeagen 4096 Jan 21 10:29 hooks
-rw-rw-r-- 1 ThePrimeagen ThePrimeagen 209 Jan 21 12:10 index
drwxrwxr-x 2 ThePrimeagen ThePrimeagen 4096 Jan 21 10:29 info
drwxrwxr-x 3 ThePrimeagen ThePrimeagen 4096 Jan 21 10:52 logs
drwxrwxr-x 10 ThePrimeagen ThePrimeagen 4096 Jan 21 12:10 objects
drwxrwxr-x 4 ThePrimeagen ThePrimeagen 4096 Jan 21 10:29 refs
But within the objects
directory you should see at least one interesting entry
➜ my-first-git git:(master) ls -la .git/objects
total 28
drwxrwxr-x 7 ThePrimeagen ThePrimeagen 4096 Jan 21 19:40 .
drwxrwxr-x 8 ThePrimeagen ThePrimeagen 4096 Jan 21 19:40 ..
drwxrwxr-x 2 ThePrimeagen ThePrimeagen 4096 Jan 21 19:40 4e
drwxrwxr-x 2 ThePrimeagen ThePrimeagen 4096 Jan 21 19:40 5b
drwxrwxr-x 2 ThePrimeagen ThePrimeagen 4096 Jan 21 19:40 9a
drwxrwxr-x 2 ThePrimeagen ThePrimeagen 4096 Jan 21 19:40 info
drwxrwxr-x 2 ThePrimeagen ThePrimeagen 4096 Jan 21 19:40 pack
Do you see anything that is familiar in here?
I do. My commit, 5ba786fcc93e8092831c01e71444b9baa2228a4f
starts with 5b
and so does a directory here. Lets ls that directory
➜ my-first-git git:(master) ls -la .git/objects/5b
total 12
drwxrwxr-x 2 ThePrimeagen ThePrimeagen 4096 Jan 21 19:40 .
drwxrwxr-x 7 ThePrimeagen ThePrimeagen 4096 Jan 21 19:40 ..
-r--r--r-- 1 ThePrimeagen ThePrimeagen 125 Jan 21 19:40 a786fcc93e8092831c01e71444b9baa2228a4f
You may now notice again that a786...
is the remaining part of my commit sha,
and yours exists in the same format!
Observation
Commits exist in the .git/objects
directory with the first 2 letters as a
directory, and the remaining 38 as a file.
What's in its Pocketses, Precious?
If we try to cat out the commit file we see nothing useful
➜ cat .git/objects/5b/a786fcc93e8092831c01e71444b9baa2228a4f
x[ )@͢<41M]V%qP9C'*"iܣUfmA"DqFx3-C(U˅-YIw]0y6y1 @uڟ`V?9r%
Remember
ALL of git state is stored in files. everything.
The Tools of the Plumber
There are ways to inspect files within the git's data store.
git cat-file -p <some-sha>
This will echo out the contents of the sha. This can be a commit, a tree, or a blob (more on those in a bit)
Problem
Can you get git cat-file -p <sha>
to echo out the contents of first.md
by
inspecting the commit sha? You may have to have a few rounds of catting
Solution
First start by git cat-file -p <your commit sha>
. You should see something
similar.
➜ git cat-file -p 5ba786fcc93e8092831c01e71444b9baa2228a4f
tree 4e507fdc6d9044ccd8a4a3061324c9f711c4667d
author ThePrimeagen <the.primeagen@gmail.com> 1705891256 -0700
committer ThePrimeagen <the.primeagen@gmail.com> 1705891256 -0700
batman
You will notice there is no first.md
or its contents therefore we must be
able to find something... Oh look at that, tree
has a sha, lets try that
➜ git cat-file -p 4e507fdc6d9044ccd8a4a3061324c9f711c4667d
100644 blob 9a71f81a4b4754b686fd37cbb3c72d0250d344aa first.md
Wait... is that first.md...
➜ git cat-file -p 9a71f81a4b4754b686fd37cbb3c72d0250d344aa
hello world
Well, well, well, look at what the VSC has drug in... if it isn't our file!
Blob 9a71f81a4b4754b686fd37cbb3c72d0250d344aa
is literally first.md at the
point of our first commit.
Key Concepts
- tree: tree is analagous to directory
- blob: blob is analagous to file
You probably noticed that the tree (directory), when cat'd, contains a single entry, a blob, which was named first.md.
BIG TAKEAWAY
Git does not store diffs, git stores complete version of the entire source at the point of each commit. In other words, each commit contains all the information to completely reconstruct the source code that was tracked.
A Second Change
Problem
With your amazing git skillz, create a second file, second.md
, insert some
text, stage, and commit the file.
Solution
vim second.md # Great editor to add such a wonderful change
git add second.md
git commit -m "second"
[master 48d06ff] second
1 file changed, 2 insertions(+)
create mode 100644 second.md
Problem
Explore your new git commit. What can you say about first.md
? What about
second.md
?
Solution
Use git log
to get commit sha of your previous commit or use the line from
your commit message [master 48d06ff] second
.
Now lets list out the contents of that commit
➜ my-first-git git:(master) git cat-file -p 48d06ff
tree 6282551fedc655bc5ee9180ad67021c22245fdae
parent 5ba786fcc93e8092831c01e71444b9baa2228a4f
author ThePrimeagen <the.primeagen@gmail.com> 1706387467 -0700
committer ThePrimeagen <the.primeagen@gmail.com> 1706387467 -0700
second
Parents
All but the first commit will have a parent. Notice that our new commit has a parent!
Inspect the tree with git cat-file -p 6282551fedc655bc5ee9180ad67021c22245fdae
➜ my-first-git git:(master) git cat-file -p 6282551fedc655bc5ee9180ad67021c22245fdae
100644 blob 9a71f81a4b4754b686fd37cbb3c72d0250d344aa first.md
100644 blob 7f112b196b963ff72675febdbb97da5204f9497e second.md
Now compare the original tree from our first commit
➜ my-first-git git:(master) git cat-file -p 4e507fdc6d9044ccd8a4a3061324c9f711c4667d
100644 blob 9a71f81a4b4754b686fd37cbb3c72d0250d344aa first.md
Notice that our commit, 48d06ff
, has the same first.md
file, but with a
newly added second.md
! So to manually reconstruct the entire file tree thus
far, we just need to cat-file both first.md and second.md from the current
commhut!
That also means that git stores pointers to the ENTIRE worktree, not the entire worktree itself which means its significantly more efficient space wise.
hopefully this makes git feel less magical. Just remember at some point every program is just a bunch of if statements, for loops, and variables. This is true even about git
For Them
Create an inner directory and do it again and show them the inner trees!