Understanding Git's Object Model
This document explains how Git stores data internally. If you've ever wondered what happens when you run git add or git commit, this guide will demystify Git's elegant storage system.
The Big Picture: Git is a Content-Addressable Filesystem
Before diving into details, let's understand Git's fundamental insight: Git is not really a version control system at its core—it's a content-addressable filesystem with a VCS built on top.
What does "content-addressable" mean? In a traditional filesystem, you access files by their path: /home/user/project/README.md. In Git's object store, you access data by its content. Specifically, Git computes a SHA-1 hash of the content, and that hash becomes the "address" of the data.
This has profound implications:
- Automatic deduplication: If two files have identical content, they're stored once
- Integrity verification: If any bit changes, the hash changes, so corruption is detectable
- Immutability: You can't modify an object without changing its address
Seeing Git Objects in the Real World
Before we dive into theory, let's see Git's object model in action. Every Git repository has a hidden .git directory—this is where all the magic happens.
Exploring the Object Database
# Look inside any Git repository
$ ls .git/objects/
4b/ 8a/ ce/ e6/ info/ pack/
# Objects are stored by their first 2 hash characters
$ ls .git/objects/ce/
013625030ba8dba906f756967f9e9ca394464a
The file .git/objects/ce/013625030ba8dba906f756967f9e9ca394464a is a Git object. Its full hash is ce013625030ba8dba906f756967f9e9ca394464a—Git splits it into a 2-character directory and 38-character filename.
Examining Objects with git cat-file
Git provides cat-file to inspect objects:
# What type is this object?
$ git cat-file -t ce013625030ba8dba906f756967f9e9ca394464a
blob
# What's the size?
$ git cat-file -s ce013625030ba8dba906f756967f9e9ca394464a
6
# Show the content
$ git cat-file -p ce013625030ba8dba906f756967f9e9ca394464a
hello
This object is a blob containing hello\n (6 bytes including the newline).
Creating Objects with git hash-object
You can create objects directly:
# Hash content without storing
$ echo "hello" | git hash-object --stdin
ce013625030ba8dba906f756967f9e9ca394464a
# Hash and store in the database
$ echo "hello" | git hash-object --stdin -w
ce013625030ba8dba906f756967f9e9ca394464a
This is the "plumbing" command that git add uses internally.
The Four Object Types
Git uses just four object types to represent an entire repository's history:
┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
│ Blob │ │ Tree │ │ Commit │ │ Tag │
│ (file) │ │ (dir) │ │(snapshot)│ │ (label) │
└──────────┘ └──────────┘ └──────────┘ └──────────┘
That's it. Four types to store unlimited history of any project. Let's explore each one.
How Objects Are Identified: SHA-1 Hashing
Every Git object has an "Object ID" (OID)—a 40-character hexadecimal string like ce013625030ba8dba906f756967f9e9ca394464a. This is computed using SHA-1, but not just on the raw content. Git prepends a header:
<type> <size>\0<content>
For example, if you have a file containing hello\n (6 bytes), Git computes:
SHA-1("blob 6\0hello\n") = ce013625030ba8dba906f756967f9e9ca394464a
The \0 is a null byte that separates the header from the content. This header is crucial—it means a blob and a commit with the same content will have different hashes (because the type differs).
Well-Known Hashes
These hashes are the same in every Git repository in the world:
| Content | SHA-1 |
|---|---|
| Empty blob | e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 |
| Empty tree | 4b825dc642cb6eb9a060e54bf8d69288fbee4904 |
hello\n |
ce013625030ba8dba906f756967f9e9ca394464a |
You can verify these yourself:
$ echo -n "" | git hash-object --stdin
e69de29bb2d1d6434b8b29ae775ad8c2e48c5391
$ git hash-object -t tree /dev/null
4b825dc642cb6eb9a060e54bf8d69288fbee4904
Blob: Storing File Contents
A blob represents file contents. That's it—just raw bytes. No filename, no permissions, no timestamps. Just content.
Why Separate Content from Metadata?
This is a key insight in Git's design. Consider two files:
/src/utils.py → contains "def helper(): pass\n"
/lib/utils.py → contains "def helper(): pass\n"
In a traditional system, these are two separate files. In Git, they're one blob referenced from two different trees. This is automatic deduplication in action.
Real-World Example: Large Binary Files
This is why Git struggles with large binary files. If you have a 100MB video and change one byte: - Traditional systems: Store the diff (a few bytes) - Git: Creates a new 100MB blob (entire new content)
This is also why Git LFS (Large File Storage) exists—it stores large files outside the normal object database.
Examining a Blob
# Stage a file
$ echo "Hello, World!" > greeting.txt
$ git add greeting.txt
# Find the blob
$ git ls-files --stage
100644 8ab686eafeb1f44702738c8b0f24f2567c36da6d 0 greeting.txt
# Examine it
$ git cat-file -p 8ab686ea
Hello, World!
Notice: the blob contains just Hello, World!—no filename. The filename greeting.txt is stored in the tree (index), not the blob.
Tree: Representing Directories
If blobs are files, how do we represent directories? That's what trees are for. A tree is a list of entries, where each entry maps a name to either a blob (file) or another tree (subdirectory).
What's in a Tree Entry?
Each entry contains:
- mode: File permissions (
100644for regular,100755for executable,40000for directory) - name: The filename (just the name, not the full path)
- sha: The OID of the referenced object
Examining a Real Tree
# Look at the tree for HEAD
$ git cat-file -p HEAD^{tree}
100644 blob 8ab686eafeb1f44702738c8b0f24f2567c36da6d README.md
100755 blob 5c1f5e3b8c9e2a1d7f6e4b3c2a1d8e7f6c5b4a3d run.sh
040000 tree 7a8b9c0d1e2f3a4b5c6d7e8f9a0b1c2d3e4f5a6b src
This tree has three entries:
- README.md (regular file) → points to a blob
- run.sh (executable) → points to a blob
- src (directory) → points to another tree
File Modes Explained
These cryptic numbers like 100644 come from Unix—but Git simplifies them dramatically.
Understanding the Mode Format
In Unix, file modes are 6-digit octal numbers:
100644
│││└┴┴─ Permission bits: 644 (rw-r--r--)
││└──── Special bits: 0 (no setuid/setgid/sticky)
└┴───── File type: 10 (regular file)
The file type prefix tells you what kind of entry it is:
| Prefix | Type | Full Mode |
|---|---|---|
10 |
Regular file | 100644, 100755 |
12 |
Symbolic link | 120000 |
04 |
Directory | 040000 (stored as 40000) |
16 |
Gitlink (submodule) | 160000 |
The permission bits are standard Unix:
644 = rw-r--r-- (owner read+write, others read-only)
755 = rwxr-xr-x (owner full access, others read+execute)
Git's Simplification
Here's the key insight: Git only tracks a handful of modes, not the full Unix permission space.
| Mode | Meaning | When Git Assigns It |
|---|---|---|
100644 |
Regular file | Default for non-executable files |
100755 |
Executable | File has the executable bit (+x) |
40000 |
Directory | Tree entries pointing to other trees |
120000 |
Symlink | File is a symbolic link |
160000 |
Gitlink | Submodule reference |
Why so limited? Three reasons:
- Portability: Windows doesn't have Unix permissions
- Simplicity: Git tracks content, not fine-grained metadata
- Consistency: Avoids "permission-only" changes cluttering history
How Git Decides the Mode
When you git add a file, Git uses this logic:
Is it a symbolic link?
→ Yes: 120000
→ No: Is the executable bit set?
→ Yes: 100755
→ No: 100644
Git ignores group permissions, setuid bits, and read/write distinctions. It only cares: is it executable or not?
Changing File Modes
# Make a script executable
$ chmod +x deploy.sh
$ git add deploy.sh
# Git notices the mode change
$ git diff --cached
diff --git a/deploy.sh b/deploy.sh
old mode 100644
new mode 100755
The Windows Problem
Windows has no executable bit, so Git uses a config setting:
# Check if Git tracks executable bit
$ git config core.fileMode
true # Unix (default)
false # Windows (default)
# Manually mark a file executable on Windows
$ git update-index --chmod=+x deploy.sh
Why 644 and 755?
These are sensible Unix defaults:
- 644 (
rw-r--r--): Owner can edit, everyone can read. Safe for source files. - 755 (
rwxr-xr-x): Everyone can run it, but only owner can edit. Standard for scripts.
Note: Git stores 40000 not 040000—leading zeros are omitted in mode strings.
Tree Sorting: A Subtle Detail
Trees must be sorted, but with a twist: directories sort as if they had a trailing /.
Consider: foo (directory), foo.txt (file), foobar (file)
Sort keys: "foo/" "foo.txt" "foobar"
Sorted order: foo.txt foo foobar
Why does this matter? Git uses binary search on trees. Wrong ordering = corrupt repository.
Nested Trees: How Paths Work
Git doesn't store paths like /src/lib/utils.py. Instead:
Root Tree
├── src (tree)
│ └── lib (tree)
│ └── utils.py (blob)
To find /src/lib/utils.py, Git:
1. Looks up src in the root tree → gets another tree
2. Looks up lib in that tree → gets another tree
3. Looks up utils.py in that tree → gets the blob
This is why Git is fast at switching branches but slow at git log -- path/to/file (must traverse all trees).
Commit: Capturing Snapshots
A commit ties everything together. It represents a complete snapshot of your project at a point in time, plus metadata.
Anatomy of a Commit
$ git cat-file -p HEAD
tree 7a8b9c0d1e2f3a4b5c6d7e8f9a0b1c2d3e4f5a6b
parent 1a2b3c4d5e6f7a8b9c0d1e2f3a4b5c6d7e8f9a0b
author Alice <alice@example.com> 1699900000 -0700
committer Bob <bob@example.com> 1699900100 +0000
Fix critical bug in authentication
This commit resolves the login issue reported in #123.
Users can now authenticate properly.
Breaking Down the Fields
tree: Points to the root tree—the complete state of all files at this commit.
parent: Points to the previous commit(s). This creates the history chain. - Root commit: No parent line - Regular commit: One parent - Merge commit: Multiple parents
author vs committer: These can differ! - Author: Who wrote the original change - Committer: Who added it to the repository
Example: Alice writes a patch, emails it to Bob, Bob applies it. Alice is author, Bob is committer.
Identity format: Name <email> timestamp timezone
- Timestamp: Unix epoch (seconds since 1970-01-01)
- Timezone: Offset from UTC (e.g., -0700 is 7 hours behind)
message: Everything after the blank line.
The Commit Graph
Commits form a directed acyclic graph (DAG):
┌─────────┐
│ Commit │ ← Root (no parents)
│ "Init" │
└────┬────┘
│
┌────┴────┐
│ Commit │ ← Linear history
│ "Add X" │
└────┬────┘
│
┌─────────┴─────────┐
│ │
┌───┴───┐ ┌────┴────┐
│ "Fix" │ │"Feature"│ ← Branches diverge
└───┬───┘ └────┬────┘
│ │
└─────────┬─────────┘
│
┌────┴────┐
│ "Merge" │ ← Merge (two parents)
└─────────┘
Why Commits Are Immutable
If you change anything in a commit—the message, the author, a single file—the hash changes. That new hash means a new commit.
This is why git commit --amend creates a new commit (new hash) rather than modifying the old one. The old commit still exists until garbage collected.
Real-World: How git log Works
When you run git log:
- Git reads HEAD → gets a commit hash
- Reads that commit → displays it
- Follows the parent pointer → gets another commit
- Repeat until no more parents
For git log --all, it starts from all refs (branches, tags) and traverses the entire graph.
Tag: Named References with Metadata
There are two kinds of tags:
- Lightweight tags: Just a name → commit mapping (like a branch that doesn't move)
- Annotated tags: Actual objects with metadata
Annotated Tag Structure
$ git cat-file -p v1.0.0
object 1a2b3c4d5e6f7a8b9c0d1e2f3a4b5c6d7e8f9a0b
type commit
tag v1.0.0
tagger Alice <alice@example.com> 1699900000 -0700
Version 1.0.0 - First stable release
This release includes:
- Feature A
- Feature B
- Bug fixes
When to Use Each?
Lightweight tags are quick bookmarks:
$ git tag temp-marker # Just creates refs/tags/temp-marker → SHA
Annotated tags are for releases:
$ git tag -a v1.0.0 -m "Version 1.0.0" # Creates a tag object
Annotated tags record who tagged when and why—important for releases.
Tags Can Point to Anything
Though tags usually point to commits, they can point to any object:
# Tag a specific file version (blob)
$ git tag important-config abc123
# Tag a directory state (tree)
$ git tag baseline-structure def456
How Git Commands Use Objects
Now let's see how common Git operations translate to object operations.
git add
- Reads file content
- Computes blob hash
- If blob doesn't exist, creates it in
.git/objects/ - Updates the index (staging area) with the file → blob mapping
$ echo "new content" > file.txt
$ git add file.txt
# Creates: .git/objects/ab/cd1234... (the blob)
# Updates: .git/index (file.txt → abcd1234)
git commit
- Writes a tree object from the current index
- Creates a commit object pointing to that tree
- Updates HEAD (and current branch) to the new commit
$ git commit -m "Add file"
# Creates: .git/objects/12/34abcd... (tree from index)
# Creates: .git/objects/56/78efgh... (commit)
# Updates: .git/refs/heads/main → 5678efgh
git checkout
- Reads the commit's tree
- Recursively reads all trees and blobs
- Writes blob contents to working directory
- Updates index to match
$ git checkout feature-branch
# Reads: commit → tree → blobs
# Writes: all files to working directory
git diff
- For staged changes: compares index trees
- For unstaged: compares working directory to index
- For commits: compares their trees recursively
Git doesn't store diffs—it computes them on-demand by comparing blob contents.
git merge
- Finds the common ancestor (merge base)
- Computes three-way diff (base, ours, theirs)
- Creates new blobs for merged files
- Creates a new tree
- Creates a merge commit with two parents
$ git merge feature
# If successful:
# Creates: new blobs (merged content)
# Creates: new tree
# Creates: commit with parents [HEAD, feature]
git clone
- Downloads all objects (usually as a packfile)
- Expands packfile into
.git/objects/ - Creates refs (branches, tags) pointing to objects
- Checks out HEAD
Object Storage on Disk
Loose Objects
Individual objects are stored "loose" as zlib-compressed files:
.git/objects/ce/013625030ba8dba906f756967f9e9ca394464a
^^
First 2 chars of hash (directory name)
The file contains: zlib_compress(header + content)
Packfiles
For efficiency, Git periodically packs loose objects:
.git/objects/pack/
├── pack-abc123.idx # Index (hash → offset mapping)
└── pack-abc123.pack # Packed objects (delta-compressed)
Packfiles use delta compression—storing differences between similar objects. This is how Git achieves small repository sizes despite storing full snapshots.
When Packing Happens
git gc(garbage collection)git push(sends packfile)git clone(receives packfile)- Automatically when loose objects exceed a threshold
The Object Graph: Putting It Together
Here's how all objects connect in a real repository:
┌───────────┐
│ Tag │
│ "v1.0" │
└─────┬─────┘
│ points to
▼
┌───────────┐
│ Commit │
┌─────────────│ "Release" │
│ parent └─────┬─────┘
│ │ tree
▼ ▼
┌───────────┐ ┌───────────┐
│ Commit │ │ Tree │
│ "Add docs"│ │ (root) │
└───────────┘ └─────┬─────┘
│
┌──────────────┼──────────────┐
│ │ │
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────┐
│ Blob │ │ Blob │ │ Tree │
│ README │ │ main.py │ │ src/ │
└──────────┘ └──────────┘ └────┬─────┘
│
▼
┌──────────┐
│ Blob │
│ utils.py │
└──────────┘
Key observations:
- Tags → Commits: Named release points
- Commits → Commits: History chain via parents
- Commits → Trees: Snapshots of directory state
- Trees → Trees/Blobs: Directory structure
- Blobs: Leaf nodes containing file data
Key Insights
Snapshots, Not Diffs
Unlike older VCS (CVS, SVN), Git stores complete snapshots. Each commit has a tree representing all files, not just changes.
Why? It makes checkouts fast (don't need to replay diffs) and enables efficient branching (branches are just pointers).
Everything is Reachable from Commits
If you have a commit hash, you can reach every file in that version of the project. This is why commit hashes are so important—they're the entry point to complete snapshots.
Unreachable Objects Get Garbage Collected
Objects not reachable from any ref (branch, tag) will eventually be deleted by git gc. This is how Git reclaims space from amended commits, rebases, etc.
Immutability Enables Sharing
Because objects never change, Git can share them freely: - Between branches (same blob in multiple trees) - Between commits (unchanged files keep same blob) - Between repositories (same content = same hash)
Implementation in gitpy
Now let's see how these concepts map to actual code in the gitpy codebase.
Code Organization
The object model lives in gitpy/objects/:
gitpy/objects/
├── __init__.py # Factory functions and exports
├── base.py # GitObject base class
├── blob.py # Blob implementation
├── tree.py # Tree and TreeEntry
├── commit.py # Commit and Identity
└── tag.py # Tag implementation
Each Git concept gets its own module, making the code easy to navigate.
The Base Class: GitObject
All four object types share common behavior. The base class in base.py:
class GitObject(ABC):
type_name: str # "blob", "tree", "commit", or "tag"
def serialize(self) -> bytes:
"""Convert to bytes (content only, no header)."""
...
def deserialize(cls, data: bytes) -> Self:
"""Create from bytes (content only, no header)."""
...
def compute_hash(self) -> str:
"""SHA-1 of header + content."""
content = self.serialize()
header = f"{self.type_name} {len(content)}\0".encode()
return hashlib.sha1(header + content, usedforsecurity=False).hexdigest()
@property
def oid(self) -> str:
"""The 40-character object ID."""
return self.compute_hash()
The key insight: serialize() returns just the content, while compute_hash() adds the header (type size\0) before hashing.
Blob Implementation
A blob is the simplest object—just a wrapper around bytes:
class Blob(GitObject):
type_name: str = "blob"
data: bytes = b""
def serialize(self) -> bytes:
return self.data
def deserialize(cls, data: bytes) -> Self:
return cls(data=data)
def from_file(cls, path: str | Path) -> Self:
with open(path, "rb") as f:
return cls(data=f.read())
Tree Implementation
Trees require more logic for binary serialization and sorting:
class TreeEntry:
mode: str # "100644", "100755", "40000"
name: str # Filename
sha: str # 40-char hex hash
@property
def is_tree(self) -> bool:
return self.mode == "40000"
def sort_key(self) -> str:
# Directories sort as if they had trailing "/"
return self.name + "/" if self.is_tree else self.name
class Tree(GitObject):
type_name: str = "tree"
entries: list[TreeEntry]
def serialize(self) -> bytes:
# Sort entries by Git's rules
sorted_entries = sorted(self.entries, key=lambda e: e.sort_key())
result = b""
for entry in sorted_entries:
# Mode and name as ASCII, null separator
mode_name = f"{entry.mode} {entry.name}\0".encode()
# SHA as 20 binary bytes (not 40 hex chars!)
sha_binary = bytes.fromhex(entry.sha)
result += mode_name + sha_binary
return result
def deserialize(cls, data: bytes) -> Self:
entries = []
pos = 0
while pos < len(data):
# Find space after mode
space_idx = data.index(b" ", pos)
mode = data[pos:space_idx].decode("ascii")
# Find null after name
null_idx = data.index(b"\0", space_idx)
name = data[space_idx + 1:null_idx].decode("utf-8")
# Next 20 bytes are binary SHA
sha_binary = data[null_idx + 1:null_idx + 21]
sha = sha_binary.hex()
entries.append(TreeEntry(mode=mode, name=name, sha=sha))
pos = null_idx + 21
return cls(entries=entries)
Commit Implementation
Commits parse and generate the header-based text format:
class Identity:
name: str
email: str
timestamp: int
tz_offset: str
def __str__(self) -> str:
return f"{self.name} <{self.email}> {self.timestamp} {self.tz_offset}"
def parse(cls, line: str) -> Self:
# Parse "Name <email> timestamp tz"
lt = line.index("<")
gt = line.index(">")
name = line[:lt].strip()
email = line[lt + 1:gt]
rest = line[gt + 1:].strip().split()
return cls(name=name, email=email,
timestamp=int(rest[0]), tz_offset=rest[1])
class Commit(GitObject):
type_name: str = "commit"
tree_sha: str
parent_shas: list[str]
author: Identity
committer: Identity
message: str
def serialize(self) -> bytes:
lines = [f"tree {self.tree_sha}"]
for parent in self.parent_shas:
lines.append(f"parent {parent}")
lines.append(f"author {self.author}")
lines.append(f"committer {self.committer}")
lines.append("") # Blank line before message
lines.append(self.message)
return "\n".join(lines).encode("utf-8")
def deserialize(cls, data: bytes) -> Self:
text = data.decode("utf-8")
lines = text.split("\n")
tree_sha, parent_shas, author, committer = "", [], None, None
for i, line in enumerate(lines):
if line == "":
message = "\n".join(lines[i + 1:])
break
if line.startswith("tree "):
tree_sha = line[5:]
elif line.startswith("parent "):
parent_shas.append(line[7:])
elif line.startswith("author "):
author = Identity.parse(line[7:])
elif line.startswith("committer "):
committer = Identity.parse(line[10:])
return cls(tree_sha=tree_sha, parent_shas=parent_shas,
author=author, committer=committer, message=message)
@property
def is_root(self) -> bool:
return len(self.parent_shas) == 0
@property
def is_merge(self) -> bool:
return len(self.parent_shas) > 1
Tag Implementation
Tags follow the same pattern as commits:
class Tag(GitObject):
type_name: str = "tag"
object_sha: str
object_type: str # "commit", "tree", "blob", or "tag"
tag_name: str
tagger: Identity
message: str
def serialize(self) -> bytes:
lines = [
f"object {self.object_sha}",
f"type {self.object_type}",
f"tag {self.tag_name}",
f"tagger {self.tagger}",
"",
self.message,
]
return "\n".join(lines).encode("utf-8")
Factory Functions
The __init__.py provides functions to handle the header:
OBJECT_TYPES = {
"blob": Blob,
"tree": Tree,
"commit": Commit,
"tag": Tag,
}
def parse_object(data: bytes) -> tuple[str, GitObject]:
"""Parse complete object data (with header)."""
# Find null byte separating header from content
null_idx = data.index(b"\0")
header = data[:null_idx].decode("ascii")
content = data[null_idx + 1:]
# Parse "type size" header
type_name, size_str = header.split(" ")
if len(content) != int(size_str):
raise ValueError("Size mismatch")
# Create the appropriate object
obj_class = OBJECT_TYPES[type_name]
obj = obj_class.deserialize(content)
# Compute hash of full data
sha = hashlib.sha1(data, usedforsecurity=False).hexdigest()
return sha, obj
def create_object_data(obj: GitObject) -> bytes:
"""Create complete object data (with header)."""
content = obj.serialize()
header = f"{obj.type_name} {len(content)}\0".encode()
return header + content
Design Decisions
Immutability: Objects don't have setter methods. To "change" a blob, you create a new one. This mirrors Git's actual behavior.
Separation of concerns: Objects know how to serialize themselves but not how to be stored. Storage (compression, disk I/O) is a separate layer.
Validation at boundaries: Tree.add_entry() rejects names with /. Invalid data is caught early, not during serialization.
Hash on demand: The oid property computes the hash each time. For performance-critical code, you'd cache this, but for clarity we keep it simple.
Testing Strategy
Tests live in tests/objects/ and verify:
def test_blob_hash_empty():
"""Empty blob must match Git's hash."""
blob = Blob(data=b"")
assert blob.oid == "e69de29bb2d1d6434b8b29ae775ad8c2e48c5391"
def test_blob_hash_hello():
"""'hello\\n' must match Git's hash."""
blob = Blob(data=b"hello\n")
assert blob.oid == "ce013625030ba8dba906f756967f9e9ca394464a"
def test_tree_empty():
"""Empty tree must match Git's hash."""
tree = Tree(entries=[])
assert tree.oid == "4b825dc642cb6eb9a060e54bf8d69288fbee4904"
def test_roundtrip():
"""Deserialize(serialize(obj)) must preserve all data."""
original = Blob(data=b"test content")
restored = Blob.deserialize(original.serialize())
assert original.data == restored.data
The known hash tests are critical—they prove we're Git-compatible.
What's Next?
The object model is the foundation. The next layers build on top:
- Object Storage ✅ Implemented: How objects are compressed and stored on disk
- See
gitpy/storage/forLooseObjectStore,ObjectDatabase - See
gitpy/repository.pyforRepositoryclass - Pack Files: Efficient storage combining many objects
- Delta Compression: How Git stores only differences
- Pack Index: Fast object lookup in packs
- References: How branches, tags, and HEAD point to objects
- Index: The staging area between working directory and repository
- Commands: The porcelain commands that orchestrate everything
Every Git operation ultimately reduces to creating, reading, or referencing these four simple object types. Understanding this model unlocks a deep understanding of how Git really works.
Implementation Status
| Component | Status | Module |
|---|---|---|
| Object Model | ✅ Complete | gitpy/objects/ |
| Object Storage | ✅ Complete | gitpy/storage/, gitpy/repository.py |
| Pack Files | 🔲 Phase 2b | gitpy/storage/pack*.py |
| References | 🔲 Planned | gitpy/refs/ |
| Index | 🔲 Planned | gitpy/index/ |
| Commands | 🔲 Planned | gitpy/commands/ |