2
0

file-format.txt 3.0 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100
  1. Git File format
  2. ===============
  3. For a better understanding of Dulwich, we'll start by explaining most of the
  4. Git secrets.
  5. Open the ".git" folder of any Git-managed repository. You'll find folders
  6. like "branches", "hooks"... We're only interested in "objects" here. Open it.
  7. You'll mostly see 2 hex-digits folders. Git identifies content by its SHA-1
  8. digest. The 2 hex-digits plus the 38 hex-digits of files inside these folders
  9. form the 40 characters (or 20 bytes) id of Git objects you'll manage in
  10. Dulwich.
  11. We'll first study the three main objects:
  12. - The Commit;
  13. - The Tree;
  14. - The Blob.
  15. The Commit
  16. ----------
  17. You're used to generate commits using Git. You have set up your name and
  18. e-mail, and you know how to see the history using ``git log``.
  19. A commit file looks like this::
  20. commit <content length><NUL>tree <tree sha>
  21. parent <parent sha>
  22. [parent <parent sha> if several parents from merges]
  23. author <author name> <author e-mail> <timestamp> <timezone>
  24. committer <author name> <author e-mail> <timestamp> <timezone>
  25. <commit message>
  26. But where are the changes you committed? The commit contains a reference to a
  27. tree.
  28. The Tree
  29. --------
  30. A tree is a collection of file information, the state of a single directory at
  31. a given point in time.
  32. A tree file looks like this::
  33. tree <content length><NUL><file mode> <filename><NUL><item sha>...
  34. And repeats for every file in the tree.
  35. Note that the SHA-1 digest is in binary form here.
  36. The file mode is like the octal argument you could give to the ``chmod``
  37. command. Except it is in extended form to tell regular files from
  38. directories and other types.
  39. We now know how our files are referenced but we haven't found their actual
  40. content yet. That's where the reference to a blob comes in.
  41. The Blob
  42. --------
  43. A blob is simply the content of files you are versioning.
  44. A blob file looks like this::
  45. blob <content length><NUL><content>
  46. If you change a single line, another blob will be generated by Git each time you
  47. successfully run ``git add``. This is how Git can fastly checkout any version in
  48. time.
  49. On the opposite, several identical files with different filenames generate
  50. only one blob. That's mostly how renames are so cheap and efficient in Git.
  51. Dulwich Objects
  52. ---------------
  53. Dulwich implements these three objects with an API to easily access the
  54. information you need, while abstracting some more secrets Git is using to
  55. accelerate operations and reduce space.
  56. More About Git formats
  57. ----------------------
  58. These three objects make up most of the contents of a Git repository and are
  59. used for the history. They can either appear as simple files on disk (one file
  60. per object) or in a ``pack`` file, which is a container for a number of these
  61. objects.
  62. There is also an index of the current state of the working copy in the
  63. repository as well as files to track the existing branches and tags.
  64. For a more detailed explanation of object formats and SHA-1 digests, see:
  65. http://www-cs-students.stanford.edu/~blynn/gitmagic/ch08.html
  66. Just note that recent versions of Git compress object files using zlib.