Git - Internals

Expert

Understanding Git's internal architecture transforms you from a Git user to a Git expert. This knowledge helps debug complex issues, optimize workflows, and appreciate Git's elegant design.

Git Object Model

Git stores everything as objects in a content-addressable database. There are four fundamental object types:

1. Blob Objects (File Contents)

Blobs store file content without any metadata:

# View a blob object
git cat-file -p <blob-sha>

# Create a blob from a file
git hash-object -w filename.txt

# Check object type
git cat-file -t <object-sha>
# Example blob content
Hello, Git internals!
This is the actual file content.

2. Tree Objects (Directory Structure)

Trees represent directories and link to blobs and other trees:

# View a tree object
git cat-file -p <tree-sha>

# List tree contents with details
git ls-tree <tree-sha>

# List tree recursively
git ls-tree -r <tree-sha>
# Example tree content
100644 blob 5716ca5987cbf97d6bb54920bea6adde242d87e6    README.md
040000 tree 99f1a6d12cb4b6f19c8655fca46c3ecf317074e0    src
100755 blob 8a1218a1024317398ce132d7819b42b3e3e2b5a5    script.sh

3. Commit Objects (Snapshots)

Commits point to trees and contain metadata:

# View a commit object
git cat-file -p <commit-sha>

# Show commit with raw format
git show --format=raw <commit-sha>
# Example commit content
tree 99f1a6d12cb4b6f19c8655fca46c3ecf317074e0
parent 5716ca5987cbf97d6bb54920bea6adde242d87e6
author John Doe <[email protected]> 1635724800 +0000
committer John Doe <[email protected]> 1635724800 +0000

Add new feature implementation

This commit adds the core functionality for the new feature,
including unit tests and documentation updates.

4. Tag Objects (Annotated Tags)

Annotated tags are objects that point to other objects:

# Create an annotated tag
git tag -a v1.0 -m "Version 1.0 release"

# View tag object
git cat-file -p v1.0

Object Storage

Git uses SHA-1 hashes to identify objects:

# Calculate object hash
echo "Hello World" | git hash-object --stdin

# Find object file location
git rev-parse HEAD
# Objects stored in .git/objects/ab/cdef123...

# Verify object integrity
git fsck

References (Refs)

Refs are pointers to commits stored as files:

Branch References

# View branch refs
cat .git/refs/heads/main
cat .git/refs/heads/develop

# List all refs
git for-each-ref

# Show ref with commit info
git show-ref --heads

HEAD Reference

HEAD points to the current branch or commit:

# View HEAD
cat .git/HEAD

# HEAD pointing to branch
ref: refs/heads/main

# HEAD in detached state (direct commit)
5716ca5987cbf97d6bb54920bea6adde242d87e6

Remote References

# View remote refs
cat .git/refs/remotes/origin/main

# List remote refs
git ls-remote origin

Index (Staging Area)

The index is a binary file that stages changes:

# View index contents
git ls-files --stage

# View index in detail
git status --porcelain

# Index file location
ls -la .git/index
# Index contents example
100644 5716ca5987cbf97d6bb54920bea6adde242d87e6 0    README.md
100644 8a1218a1024317398ce132d7819b42b3e3e2b5a5 0    src/main.js

Packfiles and Compression

Git compresses objects into packfiles for efficiency:

# Trigger garbage collection and packing
git gc

# View pack files
ls -la .git/objects/pack/

# Verify pack files
git verify-pack -v .git/objects/pack/pack-*.idx

# Count objects
git count-objects -v

Reflog

Reflog tracks reference changes locally:

# View reflog
git reflog

# Reflog for specific branch
git reflog show main

# Reflog files location
ls -la .git/logs/refs/heads/

Configuration Hierarchy

Git configuration has multiple levels:

# System level (all users)
git config --system --list
# File: /etc/gitconfig

# Global level (current user)
git config --global --list  
# File: ~/.gitconfig

# Local level (current repository)
git config --local --list
# File: .git/config

Plumbing vs Porcelain Commands

Plumbing Commands (Low-level)

# Object manipulation
git cat-file -p <object>
git hash-object -w <file>
git mktree

# Reference manipulation  
git update-ref refs/heads/branch <commit>
git symbolic-ref HEAD refs/heads/branch

# Index manipulation
git update-index --add <file>
git write-tree
git commit-tree <tree> -m "message"

Porcelain Commands (High-level)

# User-friendly commands
git add, git commit, git merge
git branch, git checkout, git push
git status, git log, git diff

Creating a Commit Manually

Understanding the internal process:

# 1. Create blob objects
echo "Hello" | git hash-object -w --stdin
# Returns: 5d41402abc4b2a76b9719d911017c592

# 2. Create tree object
git update-index --add --cacheinfo 100644 5d41402abc4b2a76b9719d911017c592 hello.txt
git write-tree
# Returns: tree-sha

# 3. Create commit object
git commit-tree <tree-sha> -m "Manual commit"
# Returns: commit-sha

# 4. Update branch reference
git update-ref refs/heads/main <commit-sha>

Git Directory Structure

# Explore .git directory
tree .git

.git/
├── HEAD                    # Current branch pointer
├── config                  # Repository configuration
├── description            # Repository description
├── hooks/                 # Hook scripts
├── index                  # Staging area
├── info/                  # Repository info
│   └── exclude           # Local ignore patterns
├── logs/                  # Reference logs (reflog)
│   ├── HEAD
│   └── refs/
├── objects/               # Object database
│   ├── pack/             # Packed objects
│   └── [0-9a-f][0-9a-f]/ # Loose objects
└── refs/                  # References
    ├── heads/            # Branch references
    ├── remotes/          # Remote references
    └── tags/             # Tag references

Object Relationships

Visualizing how objects connect:

Commit A → Tree X → Blob 1 (file1.txt)
    ↓         ↓ → Blob 2 (file2.txt)  
 Parent       └ → Tree Y → Blob 3 (subdir/file3.txt)
    ↓
Commit B → Tree Z → Blob 1 (unchanged, same hash)
             ↓ → Blob 4 (file2.txt, modified)
             └ → Tree Y (unchanged, same hash)

Performance Considerations

Optimization Tips:
  • Pack files reduce storage by delta compression
  • Shallow clones limit history for faster operations
  • Sparse checkout reduces working directory size
  • Git LFS handles large binary files efficiently

Debugging with Internals

# Find dangling objects
git fsck --full

# Find large objects
git verify-pack -v .git/objects/pack/pack-*.idx | sort -k 3 -nr | head

# Recover lost commits
git reflog --all
git fsck --lost-found

# Analyze repository size
git count-objects -v -H

Advanced Scenarios

Recovering from Corruption

# Check repository integrity
git fsck --full --strict

# Rebuild index
git read-tree HEAD
git checkout-index -f -a

# Rebuild refs from reflog
git for-each-ref --format="%(refname) %(objectname)" refs/heads

Custom Object Creation

# Create custom tree programmatically
cat <<EOF | git mktree
100644 blob 5716ca5987cbf97d6bb54920bea6adde242d87e6    README.md  
040000 tree 99f1a6d12cb4b6f19c8655fca46c3ecf317074e0    src
EOF
Internal Manipulation: Direct manipulation of Git internals can corrupt your repository. Always backup before experimenting with plumbing commands.

Key Takeaways

  • Content-Addressable: Objects identified by SHA-1 hash of content
  • Immutable Objects: Objects never change, only references move
  • Directed Acyclic Graph: History forms a DAG structure
  • Snapshots: Git stores full snapshots, not differences
  • Distributed: Every clone has complete history and objects

Understanding Git internals demystifies complex operations and enables advanced troubleshooting. This knowledge helps you optimize workflows, debug issues, and appreciate Git's elegant architecture.