123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899 |
- Git File format
- ===============
- For a better understanding of Dulwich, we'll start by explaining most of the
- Git secrets.
- Open the ".git" folder of any Git-managed repository. You'll find folders
- like "branches", "hooks"... We're only interested in "objects" here. Open it.
- You'll mostly see 2 hex-digits folders. Git identifies content by its SHA-1
- digest. The 2 hex-digits plus the 38 hex-digits of files inside these folders
- form the 40 characters (or 20 bytes) id of Git objects you'll manage in
- Dulwich.
- We'll first study the three main objects:
- - The Commit;
- - The Tree;
- - The Blob.
- The Commit
- ----------
- You're used to generate commits using Git. You have set up your name and
- e-mail, and you know how to see the history using ``git log``.
- A commit file looks like this::
- commit <content length><NUL>tree <tree sha>
- parent <parent sha>
- [parent <parent sha> if several parents from merges]
- author <author name> <author e-mail> <timestamp> <timezone>
- committer <author name> <author e-mail> <timestamp> <timezone>
-
- <commit message>
- But where are the changes you commited? The commit contains a reference to a
- tree.
- The Tree
- --------
- A tree is a collection of file information, the state of a single directory at
- a given point in time.
- A tree file looks like this::
- tree <content length><NUL><file mode> <filename><NUL><item sha>...
- And repeats for every file in the tree.
- Note that the SHA-1 digest is in binary form here.
- The file mode is like the octal argument you could give to the ``chmod``
- command. Except it is in extended form to tell regular files from
- directories and other types.
- We now know how our files are referenced but we haven't found their actual
- content yet. That's where the reference to a blob comes in.
- The Blob
- --------
- A blob is simply the content of files you are versionning.
- A blob file looks like this::
- blob <content length><NUL><content>
- If you change a single line, another blob will be generated by Git at commit
- time. This is how Git can fastly checkout any version in time.
- On the opposite, several identical files with different filenames generate
- only one blob. That's mostly how renames are so cheap and efficient in Git.
- Dulwich Objects
- ---------------
- Dulwich implements these three objects with an API to easily access the
- information you need, while abstracting some more secrets Git is using to
- accelerate operations and reduce space.
- More About Git formats
- ----------------------
- These three objects make up most of the contents of a Git repository and are
- used for the history. They can either appear as simple files on disk (one file
- per object) or in a ``pack`` file, which is a container for a number of these
- objects.
- The is also an index of the current state of the working copy in the
- repository as well as files to track the existing branches and tags.
- For a more detailed explanation of object formats and SHA-1 digests, see:
- http://www-cs-students.stanford.edu/~blynn/gitmagic/ch08.html
- Just note that recent versions of Git compress object files using zlib.
|