Version Control System

Git and co.

Deutsche Version (original)

The Origin of Git

Until 2005, the Linux kernel project, one of the largest open-source projects in the world, used a proprietary, distributed version control system (VCS) called BitKeeper. However, the license for free use was revoked from the kernel team. This created an acute problem: a new VCS was needed that could meet the project's extreme requirements:

Since no existing solution met these criteria, Linus Torvalds, the initiator of Linux, took matters into his own hands.

Within a few weeks, Linus Torvalds developed the core of Git. His goal was not to create a user-friendly system, but an extremely fast and robust foundation. The first version was minimalistic, consisting of simple command-line tools that already implemented the core principles of Git.

Linus Torvalds' main interest remained the Linux kernel. After laying the foundation for Git, he handed over the project in July 2005 to Junio C Hamano, one of the earliest and most important contributors.

Under Hamano's leadership, Git became what we know today.

Git's real breakthrough with the general public came with the rise of code-hosting platforms, also known as "forges."

These platforms extend pure version control with crucial collaboration features:

More than just Git

Both before and after Git, there have been many other VCSs.

Changes

Many believe that Git only stores the changes from one commit to the next. Almost no VCS does this because it is inefficient.

Most VCSs store snapshots. This is a list of all files contained in a commit. A file in Git has a name (path), an executable flag, and content. Thus, all files are stored in every commit, not just those that have changed. Furthermore, it does not matter how much a file has changed; it is saved completely anew.

However, there is an important optimization: it constantly happens that a file's content appears multiple times. If a file does not change in a commit, its entire content does not need to be saved a second time. Likewise, if two files have the same content, it only needs to be stored once.

This can be compared to PNPM, which stores NPM packages centrally and references them via symlinks in the node_modules directory instead of copying them, (among other things) to save disk space.

COMMIT

FILE

CONTENT

Merge

When two commits are merged, the VCS must perform a three-way merge. In this process, the commit history is treated as a DAC (Directed Acyclic Graph). In a DAC, it is easy to find the LCA (Lowest Common Ancestor). This is the commit that is a parent of both commits to be merged and lies deepest in the DAC, meaning it is furthest from the initial commit.

This sounds complicated. However, represented graphically, it looks quite simple:

mainfeature-1initial commitLCAABCB+C merge

When commits B and C are to be merged, a common base is needed against which the changes from both commits can be compared. This common base is the most recent commit that is a parent of both B and C (the LCA).