瀏覽代碼

Merge tutorial from Hervé.

Jelmer Vernooij 15 年之前
父節點
當前提交
ee59ace111

+ 2 - 0
docs/tutorial/.gitignore

@@ -0,0 +1,2 @@
+*.html
+myrepo

+ 101 - 0
docs/tutorial/0-introduction.txt

@@ -0,0 +1,101 @@
+Introduction
+============
+
+Git repository format
+---------------------
+
+For a better understanding of Dulwich, we'll start by explaining most of the
+Git secrets.
+
+Open the ".git" folder of any Git-managed repository. You'll find folders
+like "branches", "hooks"... We're only interested in "objects" here. Open it.
+
+You'll mostly see 2 hex-digits folders. Git identifies content by its SHA-1
+digest. The 2 hex-digits plus the 38 hex-digits of files inside these folders
+form the 40 characters (or 20 bytes) id of Git objects you'll manage in
+Dulwich.
+
+We'll first study the three main objects:
+
+- The Commit;
+
+- The Tree;
+
+- The Blob.
+
+The Commit
+----------
+
+You're used to generate commits using Git. You have set up your name and
+e-mail, and you know how to see the history using ``git log``.
+
+A commit file looks like this::
+
+  commit <content length><NUL>tree <tree sha>
+  parent <parent sha>
+  [parent <parent sha> if several parents from merges]
+  author <author name> <author e-mail> <timestamp> <timezone>
+  committer <author name> <author e-mail> <timestamp> <timezone>
+ 
+  <commit message>
+
+But where are the changes you commited? The commit contains a reference to a
+tree.
+
+The Tree
+--------
+
+A tree is a collection of file information, the state of your working copy at
+a given point in time.
+
+A tree file looks like this::
+
+  tree <content length><NUL><file mode> <filename><NUL><blob sha>...
+
+And repeats for every file in the tree.
+
+Note that for a unknown reason, the SHA-1 digest is in binary form here.
+
+The file mode is like the octal argument you could give to the ``chmod``
+command.  Except it is in extended form to tell regular files from
+directories and other types.
+
+We now know how our files are referenced but we haven't found their actual
+content yet. That's where the reference to a blob comes in.
+
+The Blob
+--------
+
+A blob is simply the content of files you are versionning.
+
+A blob file looks like this::
+
+  blob <content length><NUL><content>
+
+If you change a single line, another blob will be generated by Git at commit
+time. This is how Git can fastly checkout any version in time.
+
+On the opposite, several identical files with different filenames generate
+only one blob. That's mostly how renames are so cheap and efficient in Git.
+
+Dulwich Objects
+---------------
+
+Dulwich implements these three objects with an API to easily access the
+information you need, while abstracting some more secrets Git is using to
+accelerate operations and reduce space.
+
+More About Git formats
+----------------------
+
+These three objects make 90 % of a Git repository. The rest is branch
+information and optimizations.
+
+For instance there is an index of the current state of the working copy.
+There are also pack files to group several small objects in a single indexed
+file.
+
+For a more detailled explanation of object formats and SHA-1 digests, see:
+http://www-cs-students.stanford.edu/~blynn/gitmagic/ch08.html
+
+Just note that recent versions of Git compress object files using zlib.

+ 119 - 0
docs/tutorial/1-initial-commit.txt

@@ -0,0 +1,119 @@
+The Repository
+==============
+
+After this introduction, let's start directly with code::
+
+  >>> from dulwich.repo import Repo
+
+The access to every object is through the Repo object. You can open an
+existing repository or you can create a new one. There are two types of Git
+repositories:
+
+  Regular Repositories -- They are the ones you create using ``git init`` and
+  you daily use. They contain a ``.git`` folder.
+
+  Bare Repositories -- There is not ".git" folder. The top-level folder
+  contains itself the "branches", "hooks"... folders. These are used for
+  published repositories (mirrors).
+
+Let's create a folder and turn it into a repository, like ``git init`` would::
+
+  >>> from os import mkdir
+  >>> mkdir("myrepo")
+  >>> repo = Repo.init("myrepo")
+  >>> repo
+  <Repo at '/tmp/myrepo/'>
+
+You can already look a the structure of the "myrepo/.git" folder, though it
+is mostly empty for now.
+
+Initial commit
+==============
+
+When you use Git, you generally add or modify content. As our repository is
+empty for now, we'll start by adding a new file::
+
+  >>> from dulwich.objects import Blob
+  >>> blob = Blob.from_string("My file content\n")
+  >>> blob.id
+  'c55063a4d5d37aa1af2b2dad3a70aa34dae54dc6'
+
+Of course you could create a blob from an existing file using ``from_file``
+instead.
+
+As said in the introduction, file content is separed from file name. Let's
+give this content a name::
+
+  >>> from dulwich.objects import Tree
+  >>> tree = Tree()
+  >>> tree.add(0100644, "spam", blob.id)
+
+Note that "0100644" is the octal form for a regular file with common
+permissions. You can hardcode them or you can use the ``stat`` module.
+
+The tree state of our repository still needs to be placed in time. That's the
+job of the commit::
+
+  >>> from dulwich.objects import Commit
+  >>> from time import time
+  >>> commit = Commit()
+  >>> commit.tree = tree.id
+  >>> author = "Your Name <your.email@example.com>"
+  >>> commit.author = commit.committer = author
+  >>> commit.commit_time = commit.author_time = int(time())
+  >>> tz = parse_timezone('-0200')
+  >>> commit.commit_timezone = commit.author_timezone = tz
+  >>> commit.encoding = "UTF-8"
+  >>> commit.message = "Initial commit"
+
+Note that the initial commit has no parents.
+
+At this point, the repository is still empty because all operations happen in
+memory. Let's "commit" it.
+
+  >>> object_store = repo.object_store
+  >>> object_store.add_object(blob)
+
+Now the ".git/objects" folder contains a first SHA-1 file. Let's continue
+saving the changes::
+
+  >>> object_store.add_object(tree)
+  >>> object_store.add_object(commit)
+
+Now the physical repository contains three objects but still has no branch.
+Let's create the master branch like Git would::
+
+  >>> repo.refs['refs/heads/master'] = commit.id
+
+The master branch now has a commit where to start, but Git itself would not
+known what is the current branch. That's another reference::
+
+  >>> repo.refs['HEAD'] = 'ref: refs/heads/master'
+
+Now our repository is officialy tracking a branch named "master" refering to a
+single commit.
+
+Playing again with Git
+======================
+
+At this point you can come back to the shell, go into the "myrepo" folder and
+type ``git status`` to let Git confirm that this is a regular repository on
+branch "master".
+
+Git will tell you that the file "spam" is deleted, which is normal because
+Git is comparing the repository state with the current working copy. And we
+have absolutely no working copy using Dulwich because we don't need it at
+all!
+
+You can checkout the last state using ``git checkout -f``. The force flag
+will prevent Git from complaining that there are uncommitted changes in the
+working copy.
+
+The file ``spam`` appears and with no surprise contains the same bytes as the
+blob::
+
+  $ cat spam
+  My file content
+
+.. attention:: Remember to recreate the repo object when you modify the
+               repository outside of Dulwich!

+ 61 - 0
docs/tutorial/2-change-file.txt

@@ -0,0 +1,61 @@
+Changing a File and Commit it
+=============================
+
+Now we have a first commit, the next one will show a difference.
+
+As seen in the introduction, it's about making a path in a tree point to a
+new blob. The old blob will remain to compute the diff. The tree is altered
+and the new commit'task is to point to this new version.
+
+In the following examples, we assume we still have the ``repo`` and ``tree``
+object from the previous chapter.
+
+Let's first build the blob::
+
+  >>> spam = Blob.from_string("My new file content\n")
+  >>> spam.id
+  '16ee2682887a962f854ebd25a61db16ef4efe49f'
+
+An alternative is to alter the previously constructed blob object::
+
+  >>> blob.data = "My new file content\n"
+  >>> blob.id
+  '16ee2682887a962f854ebd25a61db16ef4efe49f'
+
+In any case, update the blob id known as "spam". You also have the
+opportunity of changing its mode::
+
+  >>> tree["spam"] = (0100644, spam.id)
+
+Now let's record the change::
+
+  >>> c2 = Commit()
+  >>> c2.tree = tree.id
+  >>> c2.parents = [commit.id]
+  >>> c2.author = c2.committer = author
+  >>> c2.commit_time = c2.author_time = int(time())
+  >>> c2.commit_timezone = c2.author_timezone = tz
+  >>> c2.encoding = "UTF-8"
+  >>> c2.message = 'Changing "spam"'
+
+In this new commit we record the changed tree id, and most important, the
+previous commit as the parent. Parents are actually a list because a commit
+may happen to have several parents after merging branches.
+
+Remain to record this whole new family::
+
+  >>> object_store.add_object(spam)
+  >>> object_store.add_object(tree)
+  >>> object_store.add_object(c2)
+
+You can already ask git to introspect this commit using ``git show`` and the
+value of ``commit.id`` as an argument. You'll see the difference will the
+previous blob recorded as "spam".
+
+You won't see it using git log because the head is still the previous
+commit. It's easy to remedy::
+
+  >>> repo.refs['refs/heads/master'] = commit.id
+
+Now all git tools will work as expected. Though don't forget that Dulwich is
+still open!

+ 41 - 0
docs/tutorial/3-add-file.txt

@@ -0,0 +1,41 @@
+Adding a file
+=============
+
+If you followed well, the next lesson will be straightforward.
+
+We need a new blob::
+
+    >>> ham = Blob.from_string("Another\nmultiline\nfile\n")
+    >>> ham.id
+    'a3b5eda0b83eb8fb6e5dce91ecafda9e97269c70'
+
+But the same tree::
+
+    >>> tree["ham"] = (0100644, spam.id)
+
+And a new commit::
+
+  >>> c3 = Commit()
+  >>> c3.tree = tree.id
+  >>> c3.parents = [commit.id]
+  >>> c3.author = c3.committer = author
+  >>> c3.commit_time = c3.author_time = int(time())
+  >>> c3.commit_timezone = c3.author_timezone = tz
+  >>> c3.encoding = "UTF-8"
+  >>> c3.message = 'Adding "ham"'
+
+Save it all::
+
+    >>> object_store.add_object(spam)
+    >>> object_store.add_object(tree)
+    >>> object_store.add_object(c3)
+
+Update the head::
+
+    >>> repo.refs['refs/heads/master'] = commit.id
+
+A call to ``git show`` will confirm the addition of "spam".
+
+Remember you can also call ``git checkout -f`` to make it appear.
+
+Well... Adding "spam" was not such a good idea... We'll remove it.

+ 30 - 0
docs/tutorial/4-remove-file.txt

@@ -0,0 +1,30 @@
+Removing a file
+===============
+
+Removing a file just means removing its entry in the tree. The blob won't be
+deleted because Git tries to preserve the history of your repository.
+
+It's all pythonic::
+
+    >>> del tree["ham"]
+
+  >>> c4 = Commit()
+  >>> c4.tree = tree.id
+  >>> c4.parents = [commit.id]
+  >>> c4.author = c4.committer = author
+  >>> c4.commit_time = c4.author_time = int(time())
+  >>> c4.commit_timezone = c4.author_timezone = tz
+  >>> c4.encoding = "UTF-8"
+  >>> c4.message = 'Removing "ham"'
+
+Here we only have the new tree and the commit to save::
+
+    >>> object_store.add_object(spam)
+    >>> object_store.add_object(tree)
+    >>> object_store.add_object(c4)
+
+And of course update the head::
+
+    >>> repo.refs['refs/heads/master'] = commit.id
+
+If you don't trust me, ask ``git show``. ;-)

+ 33 - 0
docs/tutorial/5-rename-file.txt

@@ -0,0 +1,33 @@
+Renaming a file
+===============
+
+Remember you learned that the file name and content are distinct. So renaming
+a file is just about associating a blob id to a new name. We won't store more
+content, and the operation will be painless.
+
+Let's transfer the blob id from the old name to the new one::
+
+    >>> tree["eggs"] = tree["spam"]
+    >>> del tree["spam"]
+
+As usual, we need a commit to store the new tree id::
+
+  >>> c5 = Commit()
+  >>> c5.tree = tree.id
+  >>> c5.parents = [commit.id]
+  >>> c5.author = c5.committer = author
+  >>> c5.commit_time = c5.author_time = int(time())
+  >>> c5.commit_timezone = c5.author_timezone = tz
+  >>> c5.encoding = "UTF-8"
+  >>> c5.message = 'Rename "spam" to "eggs"'
+
+As for a deletion, we only have a tree and a commit to save::
+
+    >>> object_store.add_object(tree)
+    >>> object_store.add_object(c5)
+
+Remains to make the head bleeding-edge::
+
+    >>> repo.refs['refs/heads/master'] = commit.id
+
+As a last exercise, see how ``git show`` illustrates it.

+ 14 - 0
docs/tutorial/6-conclusion.txt

@@ -0,0 +1,14 @@
+Conclusion
+==========
+
+You'll find the ``test.py`` program with some tips I use to ease generating
+objects.
+
+You can also make Tag objects, but this is left as a exercise to the reader.
+
+Dulwich is abstracting  much of the Git plumbing, so there would be more to
+see.
+
+Dulwich is also able to clone and push repositories.
+
+That's all folks!

+ 11 - 0
docs/tutorial/Makefile

@@ -0,0 +1,11 @@
+TXT=$(shell ls *.txt)
+
+ALL: index.html
+
+index.html: $(TXT)
+	rst2html.py index.txt index.html
+
+clean:
+	rm -f index.html
+
+.PHONY: clean

+ 13 - 0
docs/tutorial/index.txt

@@ -0,0 +1,13 @@
+================
+Dulwich Tutorial
+================
+
+.. contents::
+
+.. include:: 0-introduction.txt
+.. include:: 1-initial-commit.txt
+.. include:: 2-change-file.txt
+.. include:: 3-add-file.txt
+.. include:: 4-remove-file.txt
+.. include:: 5-rename-file.txt
+.. include:: 6-conclusion.txt

+ 178 - 0
docs/tutorial/test.py

@@ -0,0 +1,178 @@
+#!/usr/bin/env python
+# -*- encoding: UTF-8 -*-
+
+# Import from the Standard Library
+from os import F_OK, access, mkdir
+from pprint import pprint
+from shutil import rmtree
+from subprocess import call
+from time import time
+
+# Import from dulwich
+from dulwich.repo import Repo
+from dulwich.objects import Blob, Tree, Commit, parse_timezone
+
+
+DIRNAME = "myrepo"
+AUTHOR = "Your Name <your.email@example.com>"
+TZ = parse_timezone('-200')
+ENCODING = "UTF-8"
+
+
+def make_commit(repo, tree_id, message):
+    """Build a commit object on the same pattern. Only changing values are
+    required as parameters.
+    """
+    commit = Commit()
+    try:
+        commit.parents = [repo.head()]
+    except KeyError:
+        # The initial commit has no parent
+        pass
+    commit.tree = tree_id
+    commit.message = message
+    commit.author = commit.committer = AUTHOR
+    commit.commit_time = commit.author_time = int(time())
+    commit.commit_timezone = commit.author_timezone = TZ
+    commit.encoding = ENCODING
+    return commit
+
+
+
+def make_tree(repo):
+    """Return the last known tree.
+    """
+    commit_id = repo.head()
+    commit = repo.commit(commit_id)
+    tree_id = commit.tree
+    return repo.tree(tree_id)
+
+
+
+def update_master(repo, commit_id):
+    repo.refs['refs/heads/master'] = commit_id
+
+
+
+def initial_commit(repo):
+    # Add file content
+    blob = Blob.from_string("My file content\n")
+    # Add file
+    tree = Tree()
+    tree.add(0100644, "spam", blob.id)
+    # Set commit
+    commit = make_commit(repo, tree.id, "Initial commit")
+    # Initial commit
+    object_store = repo.object_store
+    object_store.add_object(blob)
+    object_store.add_object(tree)
+    object_store.add_object(commit)
+    # Update master
+    update_master(repo, commit.id)
+    # Set the master branch as the default
+    repo.refs['HEAD'] = 'ref: refs/heads/master'
+
+
+
+def test_change(repo):
+    tree = make_tree(repo)
+    # Change a file
+    spam = Blob.from_string("My new file content\n")
+    tree.add(0100644, "spam", spam.id)
+    # Set commit
+    commit = make_commit(repo, tree.id, "Change spam")
+    # Second commit
+    object_store = repo.object_store
+    object_store.add_object(spam)
+    object_store.add_object(tree)
+    object_store.add_object(commit)
+    # Update master
+    update_master(repo, commit.id)
+
+
+
+def test_add(repo):
+    tree = make_tree(repo)
+    # Add another file
+    ham = Blob.from_string("Another\nmultiline\nfile\n")
+    tree.add(0100644, "ham", ham.id)
+    # Set commit
+    commit = make_commit(repo, tree.id, "Add ham")
+    # Second commit
+    object_store = repo.object_store
+    object_store.add_object(ham)
+    object_store.add_object(tree)
+    object_store.add_object(commit)
+    # Update master
+    update_master(repo, commit.id)
+
+
+
+def test_remove(repo):
+    tree = make_tree(repo)
+    # Remove a file
+    del tree["ham"]
+    # Set commit
+    commit = make_commit(repo, tree.id, 'Remove "ham"')
+    # Third commit
+    # No blob change, just tree operation
+    object_store = repo.object_store
+    object_store.add_object(tree)
+    object_store.add_object(commit)
+    # Update master
+    update_master(repo, commit.id)
+
+
+
+def test_rename(repo):
+    tree = make_tree(repo)
+    # Rename a file
+    tree["eggs"] = tree["spam"]
+    del tree["spam"]
+    # Set commit
+    commit = make_commit(repo, tree.id, 'Rename "spam" to "eggs"')
+    # Fourth commit
+    # No blob change, just tree operation
+    object_store = repo.object_store
+    object_store.add_object(tree)
+    object_store.add_object(commit)
+    # Update master
+    update_master(repo, commit.id)
+
+
+
+def test_history(repo):
+    pprint(repo.revision_history(repo.head()))
+
+
+
+def test_file(repo):
+    tree = make_tree(repo)
+    print "entries", tree.entries()
+    mode, blob_id = tree["eggs"]
+    blob = repo.get_blob(blob_id)
+    print "eggs", repr(blob.data)
+
+
+
+if __name__ == '__main__':
+    # Creating the repository
+    if access(DIRNAME, F_OK):
+        rmtree(DIRNAME)
+    mkdir(DIRNAME)
+    repo = Repo.init(DIRNAME)
+    initial_commit(repo)
+    test_change(repo)
+    test_add(repo)
+    test_remove(repo)
+    test_rename(repo)
+    last_commit_id = repo.head()
+    call(['git', 'gc'], cwd=DIRNAME)
+    # Re-load the repo
+    del repo
+    repo = Repo(DIRNAME)
+    # XXX the ref was removed and dulwich doesn't know where to read it
+    update_master(repo, last_commit_id)
+    assert last_commit_id == repo.head()
+    test_history(repo)
+    test_file(repo)