瀏覽代碼

* New upstream release.
* Bump standards version to 3.8.3.

Jelmer Vernooij 15 年之前
父節點
當前提交
7f39158f9a

+ 2 - 2
HACKING

@@ -1,5 +1,5 @@
 Please follow PEP8 with regard to coding style.
 
-All functionality should be available in pure Python. C replacements may 
-be written for performance reasons, but should never replace the Python 
+All functionality should be available in pure Python. Optional C implementations
+may be written for performance reasons, but should never replace the Python 
 implementation. The C implementations should follow the kernel/git coding style.

+ 21 - 0
NEWS

@@ -1,3 +1,24 @@
+0.4.0	2009-10-07
+
+ DOCUMENTATION
+
+  * Added tutorial.
+
+ API CHANGES
+
+  * dulwich.object_store.tree_lookup_path will now return the mode and 
+    sha of the object found rather than the object itself.
+
+ BUG FIXES
+
+  * Use binascii.hexlify / binascii.unhexlify for better performance.
+
+  * Cope with extra unknown data in index files by ignoring it (for now).
+
+  * Add proper error message when server unexpectedly hangs up. (#415843)
+
+  * Correctly write opcode for equal in create_delta.
+
 0.3.3	2009-07-23
 
  FEATURES

+ 1 - 2
README

@@ -15,8 +15,7 @@ this for you with file lookup soon.
 
 There is also support for creating blobs. Blob.from_string(string) will create
 a blob object from the string. You can then call blob.sha() to get the sha
-object for this blob, and hexdigest() on that will get its ID. There is
-currently no method that allows you to write it out though.
+object for this blob, and hexdigest() on that will get its ID. 
 
 The project is named after the part of London that Mr. and Mrs. Git live in 
 in the particular Monty Python sketch. It is based on the Python-Git module 

+ 7 - 0
debian/changelog

@@ -1,3 +1,10 @@
+dulwich (0.4.0-1) unstable; urgency=low
+
+  * New upstream release.
+  * Bump standards version to 3.8.3.
+
+ -- Jelmer Vernooij <jelmer@debian.org>  Wed, 07 Oct 2009 12:08:10 +0200
+
 dulwich (0.3.3-1) unstable; urgency=low
 
   * Run the testsuite during build.

+ 1 - 1
debian/control

@@ -4,7 +4,7 @@ Priority: optional
 Maintainer: Jelmer Vernooij <jelmer@debian.org>
 Homepage: http://samba.org/~jelmer/dulwich
 Build-Depends: python-central (>= 0.5), cdbs (>= 0.4.43), python-all-dev, debhelper (>= 5.0.37.2), python-nose
-Standards-Version: 3.8.2
+Standards-Version: 3.8.3
 XS-Python-Version: >= 2.4
 Vcs-Bzr: http://bzr.debian.org/users/jelmer/dulwich/unstable
 

+ 2 - 0
docs/tutorial/.gitignore

@@ -0,0 +1,2 @@
+*.html
+myrepo

+ 101 - 0
docs/tutorial/0-introduction.txt

@@ -0,0 +1,101 @@
+Introduction
+============
+
+Git repository format
+---------------------
+
+For a better understanding of Dulwich, we'll start by explaining most of the
+Git secrets.
+
+Open the ".git" folder of any Git-managed repository. You'll find folders
+like "branches", "hooks"... We're only interested in "objects" here. Open it.
+
+You'll mostly see 2 hex-digits folders. Git identifies content by its SHA-1
+digest. The 2 hex-digits plus the 38 hex-digits of files inside these folders
+form the 40 characters (or 20 bytes) id of Git objects you'll manage in
+Dulwich.
+
+We'll first study the three main objects:
+
+- The Commit;
+
+- The Tree;
+
+- The Blob.
+
+The Commit
+----------
+
+You're used to generate commits using Git. You have set up your name and
+e-mail, and you know how to see the history using ``git log``.
+
+A commit file looks like this::
+
+  commit <content length><NUL>tree <tree sha>
+  parent <parent sha>
+  [parent <parent sha> if several parents from merges]
+  author <author name> <author e-mail> <timestamp> <timezone>
+  committer <author name> <author e-mail> <timestamp> <timezone>
+ 
+  <commit message>
+
+But where are the changes you commited? The commit contains a reference to a
+tree.
+
+The Tree
+--------
+
+A tree is a collection of file information, the state of your working copy at
+a given point in time.
+
+A tree file looks like this::
+
+  tree <content length><NUL><file mode> <filename><NUL><blob sha>...
+
+And repeats for every file in the tree.
+
+Note that for a unknown reason, the SHA-1 digest is in binary form here.
+
+The file mode is like the octal argument you could give to the ``chmod``
+command.  Except it is in extended form to tell regular files from
+directories and other types.
+
+We now know how our files are referenced but we haven't found their actual
+content yet. That's where the reference to a blob comes in.
+
+The Blob
+--------
+
+A blob is simply the content of files you are versionning.
+
+A blob file looks like this::
+
+  blob <content length><NUL><content>
+
+If you change a single line, another blob will be generated by Git at commit
+time. This is how Git can fastly checkout any version in time.
+
+On the opposite, several identical files with different filenames generate
+only one blob. That's mostly how renames are so cheap and efficient in Git.
+
+Dulwich Objects
+---------------
+
+Dulwich implements these three objects with an API to easily access the
+information you need, while abstracting some more secrets Git is using to
+accelerate operations and reduce space.
+
+More About Git formats
+----------------------
+
+These three objects make 90 % of a Git repository. The rest is branch
+information and optimizations.
+
+For instance there is an index of the current state of the working copy.
+There are also pack files to group several small objects in a single indexed
+file.
+
+For a more detailled explanation of object formats and SHA-1 digests, see:
+http://www-cs-students.stanford.edu/~blynn/gitmagic/ch08.html
+
+Just note that recent versions of Git compress object files using zlib.

+ 119 - 0
docs/tutorial/1-initial-commit.txt

@@ -0,0 +1,119 @@
+The Repository
+==============
+
+After this introduction, let's start directly with code::
+
+  >>> from dulwich.repo import Repo
+
+The access to every object is through the Repo object. You can open an
+existing repository or you can create a new one. There are two types of Git
+repositories:
+
+  Regular Repositories -- They are the ones you create using ``git init`` and
+  you daily use. They contain a ``.git`` folder.
+
+  Bare Repositories -- There is not ".git" folder. The top-level folder
+  contains itself the "branches", "hooks"... folders. These are used for
+  published repositories (mirrors).
+
+Let's create a folder and turn it into a repository, like ``git init`` would::
+
+  >>> from os import mkdir
+  >>> mkdir("myrepo")
+  >>> repo = Repo.init("myrepo")
+  >>> repo
+  <Repo at '/tmp/myrepo/'>
+
+You can already look a the structure of the "myrepo/.git" folder, though it
+is mostly empty for now.
+
+Initial commit
+==============
+
+When you use Git, you generally add or modify content. As our repository is
+empty for now, we'll start by adding a new file::
+
+  >>> from dulwich.objects import Blob
+  >>> blob = Blob.from_string("My file content\n")
+  >>> blob.id
+  'c55063a4d5d37aa1af2b2dad3a70aa34dae54dc6'
+
+Of course you could create a blob from an existing file using ``from_file``
+instead.
+
+As said in the introduction, file content is separed from file name. Let's
+give this content a name::
+
+  >>> from dulwich.objects import Tree
+  >>> tree = Tree()
+  >>> tree.add(0100644, "spam", blob.id)
+
+Note that "0100644" is the octal form for a regular file with common
+permissions. You can hardcode them or you can use the ``stat`` module.
+
+The tree state of our repository still needs to be placed in time. That's the
+job of the commit::
+
+  >>> from dulwich.objects import Commit, parse_timezone
+  >>> from time import time
+  >>> commit = Commit()
+  >>> commit.tree = tree.id
+  >>> author = "Your Name <your.email@example.com>"
+  >>> commit.author = commit.committer = author
+  >>> commit.commit_time = commit.author_time = int(time())
+  >>> tz = parse_timezone('-0200')
+  >>> commit.commit_timezone = commit.author_timezone = tz
+  >>> commit.encoding = "UTF-8"
+  >>> commit.message = "Initial commit"
+
+Note that the initial commit has no parents.
+
+At this point, the repository is still empty because all operations happen in
+memory. Let's "commit" it.
+
+  >>> object_store = repo.object_store
+  >>> object_store.add_object(blob)
+
+Now the ".git/objects" folder contains a first SHA-1 file. Let's continue
+saving the changes::
+
+  >>> object_store.add_object(tree)
+  >>> object_store.add_object(commit)
+
+Now the physical repository contains three objects but still has no branch.
+Let's create the master branch like Git would::
+
+  >>> repo.refs['refs/heads/master'] = commit.id
+
+The master branch now has a commit where to start, but Git itself would not
+known what is the current branch. That's another reference::
+
+  >>> repo.refs['HEAD'] = 'ref: refs/heads/master'
+
+Now our repository is officialy tracking a branch named "master" refering to a
+single commit.
+
+Playing again with Git
+======================
+
+At this point you can come back to the shell, go into the "myrepo" folder and
+type ``git status`` to let Git confirm that this is a regular repository on
+branch "master".
+
+Git will tell you that the file "spam" is deleted, which is normal because
+Git is comparing the repository state with the current working copy. And we
+have absolutely no working copy using Dulwich because we don't need it at
+all!
+
+You can checkout the last state using ``git checkout -f``. The force flag
+will prevent Git from complaining that there are uncommitted changes in the
+working copy.
+
+The file ``spam`` appears and with no surprise contains the same bytes as the
+blob::
+
+  $ cat spam
+  My file content
+
+.. attention:: Remember to recreate the repo object when you modify the
+               repository outside of Dulwich!

+ 61 - 0
docs/tutorial/2-change-file.txt

@@ -0,0 +1,61 @@
+Changing a File and Commit it
+=============================
+
+Now we have a first commit, the next one will show a difference.
+
+As seen in the introduction, it's about making a path in a tree point to a
+new blob. The old blob will remain to compute the diff. The tree is altered
+and the new commit'task is to point to this new version.
+
+In the following examples, we assume we still have the ``repo`` and ``tree``
+object from the previous chapter.
+
+Let's first build the blob::
+
+  >>> spam = Blob.from_string("My new file content\n")
+  >>> spam.id
+  '16ee2682887a962f854ebd25a61db16ef4efe49f'
+
+An alternative is to alter the previously constructed blob object::
+
+  >>> blob.data = "My new file content\n"
+  >>> blob.id
+  '16ee2682887a962f854ebd25a61db16ef4efe49f'
+
+In any case, update the blob id known as "spam". You also have the
+opportunity of changing its mode::
+
+  >>> tree["spam"] = (0100644, spam.id)
+
+Now let's record the change::
+
+  >>> c2 = Commit()
+  >>> c2.tree = tree.id
+  >>> c2.parents = [commit.id]
+  >>> c2.author = c2.committer = author
+  >>> c2.commit_time = c2.author_time = int(time())
+  >>> c2.commit_timezone = c2.author_timezone = tz
+  >>> c2.encoding = "UTF-8"
+  >>> c2.message = 'Changing "spam"'
+
+In this new commit we record the changed tree id, and most important, the
+previous commit as the parent. Parents are actually a list because a commit
+may happen to have several parents after merging branches.
+
+Remain to record this whole new family::
+
+  >>> object_store.add_object(spam)
+  >>> object_store.add_object(tree)
+  >>> object_store.add_object(c2)
+
+You can already ask git to introspect this commit using ``git show`` and the
+value of ``commit.id`` as an argument. You'll see the difference will the
+previous blob recorded as "spam".
+
+You won't see it using git log because the head is still the previous
+commit. It's easy to remedy::
+
+  >>> repo.refs['refs/heads/master'] = commit.id
+
+Now all git tools will work as expected. Though don't forget that Dulwich is
+still open!

+ 41 - 0
docs/tutorial/3-add-file.txt

@@ -0,0 +1,41 @@
+Adding a file
+=============
+
+If you followed well, the next lesson will be straightforward.
+
+We need a new blob::
+
+    >>> ham = Blob.from_string("Another\nmultiline\nfile\n")
+    >>> ham.id
+    'a3b5eda0b83eb8fb6e5dce91ecafda9e97269c70'
+
+But the same tree::
+
+    >>> tree["ham"] = (0100644, spam.id)
+
+And a new commit::
+
+  >>> c3 = Commit()
+  >>> c3.tree = tree.id
+  >>> c3.parents = [commit.id]
+  >>> c3.author = c3.committer = author
+  >>> c3.commit_time = c3.author_time = int(time())
+  >>> c3.commit_timezone = c3.author_timezone = tz
+  >>> c3.encoding = "UTF-8"
+  >>> c3.message = 'Adding "ham"'
+
+Save it all::
+
+    >>> object_store.add_object(spam)
+    >>> object_store.add_object(tree)
+    >>> object_store.add_object(c3)
+
+Update the head::
+
+    >>> repo.refs['refs/heads/master'] = commit.id
+
+A call to ``git show`` will confirm the addition of "spam".
+
+Remember you can also call ``git checkout -f`` to make it appear.
+
+Well... Adding "spam" was not such a good idea... We'll remove it.

+ 30 - 0
docs/tutorial/4-remove-file.txt

@@ -0,0 +1,30 @@
+Removing a file
+===============
+
+Removing a file just means removing its entry in the tree. The blob won't be
+deleted because Git tries to preserve the history of your repository.
+
+It's all pythonic::
+
+    >>> del tree["ham"]
+
+  >>> c4 = Commit()
+  >>> c4.tree = tree.id
+  >>> c4.parents = [commit.id]
+  >>> c4.author = c4.committer = author
+  >>> c4.commit_time = c4.author_time = int(time())
+  >>> c4.commit_timezone = c4.author_timezone = tz
+  >>> c4.encoding = "UTF-8"
+  >>> c4.message = 'Removing "ham"'
+
+Here we only have the new tree and the commit to save::
+
+    >>> object_store.add_object(spam)
+    >>> object_store.add_object(tree)
+    >>> object_store.add_object(c4)
+
+And of course update the head::
+
+    >>> repo.refs['refs/heads/master'] = commit.id
+
+If you don't trust me, ask ``git show``. ;-)

+ 33 - 0
docs/tutorial/5-rename-file.txt

@@ -0,0 +1,33 @@
+Renaming a file
+===============
+
+Remember you learned that the file name and content are distinct. So renaming
+a file is just about associating a blob id to a new name. We won't store more
+content, and the operation will be painless.
+
+Let's transfer the blob id from the old name to the new one::
+
+    >>> tree["eggs"] = tree["spam"]
+    >>> del tree["spam"]
+
+As usual, we need a commit to store the new tree id::
+
+  >>> c5 = Commit()
+  >>> c5.tree = tree.id
+  >>> c5.parents = [commit.id]
+  >>> c5.author = c5.committer = author
+  >>> c5.commit_time = c5.author_time = int(time())
+  >>> c5.commit_timezone = c5.author_timezone = tz
+  >>> c5.encoding = "UTF-8"
+  >>> c5.message = 'Rename "spam" to "eggs"'
+
+As for a deletion, we only have a tree and a commit to save::
+
+    >>> object_store.add_object(tree)
+    >>> object_store.add_object(c5)
+
+Remains to make the head bleeding-edge::
+
+    >>> repo.refs['refs/heads/master'] = commit.id
+
+As a last exercise, see how ``git show`` illustrates it.

+ 14 - 0
docs/tutorial/6-conclusion.txt

@@ -0,0 +1,14 @@
+Conclusion
+==========
+
+You'll find the ``test.py`` program with some tips I use to ease generating
+objects.
+
+You can also make Tag objects, but this is left as a exercise to the reader.
+
+Dulwich is abstracting  much of the Git plumbing, so there would be more to
+see.
+
+Dulwich is also able to clone and push repositories.
+
+That's all folks!

+ 12 - 0
docs/tutorial/Makefile

@@ -0,0 +1,12 @@
+RST2HTML = rst2html
+TXT=$(shell ls *.txt)
+
+ALL: index.html
+
+index.html: $(TXT)
+	$(RST2HTML) index.txt index.html
+
+clean:
+	rm -f index.html
+
+.PHONY: clean

+ 13 - 0
docs/tutorial/index.txt

@@ -0,0 +1,13 @@
+================
+Dulwich Tutorial
+================
+
+.. contents::
+
+.. include:: 0-introduction.txt
+.. include:: 1-initial-commit.txt
+.. include:: 2-change-file.txt
+.. include:: 3-add-file.txt
+.. include:: 4-remove-file.txt
+.. include:: 5-rename-file.txt
+.. include:: 6-conclusion.txt

+ 178 - 0
docs/tutorial/test.py

@@ -0,0 +1,178 @@
+#!/usr/bin/env python
+# -*- encoding: UTF-8 -*-
+
+# Import from the Standard Library
+from os import F_OK, access, mkdir
+from pprint import pprint
+from shutil import rmtree
+from subprocess import call
+from time import time
+
+# Import from dulwich
+from dulwich.repo import Repo
+from dulwich.objects import Blob, Tree, Commit, parse_timezone
+
+
+DIRNAME = "myrepo"
+AUTHOR = "Your Name <your.email@example.com>"
+TZ = parse_timezone('-200')
+ENCODING = "UTF-8"
+
+
+def make_commit(repo, tree_id, message):
+    """Build a commit object on the same pattern. Only changing values are
+    required as parameters.
+    """
+    commit = Commit()
+    try:
+        commit.parents = [repo.head()]
+    except KeyError:
+        # The initial commit has no parent
+        pass
+    commit.tree = tree_id
+    commit.message = message
+    commit.author = commit.committer = AUTHOR
+    commit.commit_time = commit.author_time = int(time())
+    commit.commit_timezone = commit.author_timezone = TZ
+    commit.encoding = ENCODING
+    return commit
+
+
+
+def make_tree(repo):
+    """Return the last known tree.
+    """
+    commit_id = repo.head()
+    commit = repo.commit(commit_id)
+    tree_id = commit.tree
+    return repo.tree(tree_id)
+
+
+
+def update_master(repo, commit_id):
+    repo.refs['refs/heads/master'] = commit_id
+
+
+
+def initial_commit(repo):
+    # Add file content
+    blob = Blob.from_string("My file content\n")
+    # Add file
+    tree = Tree()
+    tree.add(0100644, "spam", blob.id)
+    # Set commit
+    commit = make_commit(repo, tree.id, "Initial commit")
+    # Initial commit
+    object_store = repo.object_store
+    object_store.add_object(blob)
+    object_store.add_object(tree)
+    object_store.add_object(commit)
+    # Update master
+    update_master(repo, commit.id)
+    # Set the master branch as the default
+    repo.refs['HEAD'] = 'ref: refs/heads/master'
+
+
+
+def test_change(repo):
+    tree = make_tree(repo)
+    # Change a file
+    spam = Blob.from_string("My new file content\n")
+    tree.add(0100644, "spam", spam.id)
+    # Set commit
+    commit = make_commit(repo, tree.id, "Change spam")
+    # Second commit
+    object_store = repo.object_store
+    object_store.add_object(spam)
+    object_store.add_object(tree)
+    object_store.add_object(commit)
+    # Update master
+    update_master(repo, commit.id)
+
+
+
+def test_add(repo):
+    tree = make_tree(repo)
+    # Add another file
+    ham = Blob.from_string("Another\nmultiline\nfile\n")
+    tree.add(0100644, "ham", ham.id)
+    # Set commit
+    commit = make_commit(repo, tree.id, "Add ham")
+    # Second commit
+    object_store = repo.object_store
+    object_store.add_object(ham)
+    object_store.add_object(tree)
+    object_store.add_object(commit)
+    # Update master
+    update_master(repo, commit.id)
+
+
+
+def test_remove(repo):
+    tree = make_tree(repo)
+    # Remove a file
+    del tree["ham"]
+    # Set commit
+    commit = make_commit(repo, tree.id, 'Remove "ham"')
+    # Third commit
+    # No blob change, just tree operation
+    object_store = repo.object_store
+    object_store.add_object(tree)
+    object_store.add_object(commit)
+    # Update master
+    update_master(repo, commit.id)
+
+
+
+def test_rename(repo):
+    tree = make_tree(repo)
+    # Rename a file
+    tree["eggs"] = tree["spam"]
+    del tree["spam"]
+    # Set commit
+    commit = make_commit(repo, tree.id, 'Rename "spam" to "eggs"')
+    # Fourth commit
+    # No blob change, just tree operation
+    object_store = repo.object_store
+    object_store.add_object(tree)
+    object_store.add_object(commit)
+    # Update master
+    update_master(repo, commit.id)
+
+
+
+def test_history(repo):
+    pprint(repo.revision_history(repo.head()))
+
+
+
+def test_file(repo):
+    tree = make_tree(repo)
+    print "entries", tree.entries()
+    mode, blob_id = tree["eggs"]
+    blob = repo.get_blob(blob_id)
+    print "eggs", repr(blob.data)
+
+
+
+if __name__ == '__main__':
+    # Creating the repository
+    if access(DIRNAME, F_OK):
+        rmtree(DIRNAME)
+    mkdir(DIRNAME)
+    repo = Repo.init(DIRNAME)
+    initial_commit(repo)
+    test_change(repo)
+    test_add(repo)
+    test_remove(repo)
+    test_rename(repo)
+    last_commit_id = repo.head()
+    call(['git', 'gc'], cwd=DIRNAME)
+    # Re-load the repo
+    del repo
+    repo = Repo(DIRNAME)
+    # XXX the ref was removed and dulwich doesn't know where to read it
+    update_master(repo, last_commit_id)
+    assert last_commit_id == repo.head()
+    test_history(repo)
+    test_file(repo)

+ 1 - 1
dulwich/__init__.py

@@ -27,4 +27,4 @@ import protocol
 import repo
 import server
 
-__version__ = (0, 3, 3)
+__version__ = (0, 4, 0)

+ 0 - 43
dulwich/_objects.c

@@ -19,34 +19,8 @@
 
 #include <Python.h>
 
-#define hexbyte(x) (isdigit(x)?(x)-'0':(x)-'a'+0xa)
 #define bytehex(x) (((x)<0xa)?('0'+(x)):('a'-0xa+(x)))
 
-static PyObject *py_hex_to_sha(PyObject *self, PyObject *py_hexsha)
-{
-	char *hexsha;
-	char sha[20];
-	int i;
-
-	if (!PyString_CheckExact(py_hexsha)) {
-		PyErr_SetString(PyExc_TypeError, "hex sha is not a string");
-		return NULL;
-	}
-
-	if (PyString_Size(py_hexsha) != 40) {
-		PyErr_SetString(PyExc_ValueError, "hex sha is not 40 bytes long");
-		return NULL;
-	}
-
-	hexsha = PyString_AsString(py_hexsha);
-
-	for (i = 0; i < 20; i++) {
-		sha[i] = (hexbyte(hexsha[i*2]) << 4) + hexbyte(hexsha[i*2+1]);
-	}
-
-	return PyString_FromStringAndSize(sha, 20);
-}
-
 static PyObject *sha_to_pyhex(const unsigned char *sha)
 {
 	char hexsha[41];
@@ -59,21 +33,6 @@ static PyObject *sha_to_pyhex(const unsigned char *sha)
 	return PyString_FromStringAndSize(hexsha, 40);
 }
 
-static PyObject *py_sha_to_hex(PyObject *self, PyObject *py_sha)
-{
-	if (!PyString_CheckExact(py_sha)) {
-		PyErr_SetString(PyExc_TypeError, "sha is not a string");
-		return NULL;
-	}
-
-	if (PyString_Size(py_sha) != 20) {
-		PyErr_SetString(PyExc_ValueError, "sha is not 20 bytes long");
-		return NULL;
-	}
-
-	return sha_to_pyhex((unsigned char *)PyString_AsString(py_sha));
-}
-
 static PyObject *py_parse_tree(PyObject *self, PyObject *args)
 {
 	char *text, *end;
@@ -131,8 +90,6 @@ static PyObject *py_parse_tree(PyObject *self, PyObject *args)
 }
 
 static PyMethodDef py_objects_methods[] = {
-	{ "hex_to_sha", (PyCFunction)py_hex_to_sha, METH_O, NULL },
-	{ "sha_to_hex", (PyCFunction)py_sha_to_hex, METH_O, NULL },
 	{ "parse_tree", (PyCFunction)py_parse_tree, METH_VARARGS, NULL },
 	{ NULL, NULL, 0, NULL }
 };

+ 4 - 0
dulwich/errors.py

@@ -104,3 +104,7 @@ class GitProtocolError(Exception):
 
 class HangupException(GitProtocolError):
     """Hangup exception."""
+
+    def __init__(self):
+        Exception.__init__(self,
+            "The remote server unexpectedly closed the connection.")

+ 19 - 3
dulwich/index.py

@@ -36,12 +36,20 @@ from dulwich.pack import (
 
 
 def read_cache_time(f):
-    """Read a cache time."""
+    """Read a cache time.
+    
+    :param f: File-like object to read from
+    :return: Tuple with seconds and nanoseconds
+    """
     return struct.unpack(">LL", f.read(8))
 
 
 def write_cache_time(f, t):
-    """Write a cache time."""
+    """Write a cache time.
+    
+    :param f: File-like object to write to
+    :param t: Time to write (as int, float or tuple with secs and nsecs)
+    """
     if isinstance(t, int):
         t = (t, 0)
     elif isinstance(t, float):
@@ -135,7 +143,10 @@ def write_index_dict(f, entries):
 
 def cleanup_mode(mode):
     """Cleanup a mode value.
+
+    This will return a mode that can be stored in a tree object.
     
+    :param mode: Mode to clean up.
     """
     if stat.S_ISLNK(mode):
         return stat.S_IFLNK
@@ -176,6 +187,8 @@ class Index(object):
             f = SHA1Reader(f)
             for x in read_index(f):
                 self[x[0]] = tuple(x[1:])
+            # FIXME: Additional data?
+            f.read(os.path.getsize(self._filename)-f.tell()-20)
             f.check_sha()
         finally:
             f.close()
@@ -185,7 +198,10 @@ class Index(object):
         return len(self._byname)
 
     def __getitem__(self, name):
-        """Retrieve entry by relative path."""
+        """Retrieve entry by relative path.
+        
+        :return: tuple with (ctime, mtime, dev, ino, mode, uid, gid, size, sha, flags)
+        """
         return self._byname[name]
 
     def __iter__(self):

+ 2 - 1
dulwich/object_store.py

@@ -288,6 +288,7 @@ class DiskObjectStore(BaseObjectStore):
         basename = os.path.join(self.pack_dir, 
             "pack-%s" % iter_sha1(entry[0] for entry in entries))
         write_pack_index_v2(basename+".idx", entries, p.get_stored_checksum())
+        p.close()
         os.rename(path, basename + ".pack")
         self._add_known_pack(basename)
 
@@ -480,7 +481,7 @@ def tree_lookup_path(lookup_obj, root_sha, path):
         if p == '':
             continue
         mode, sha = obj[p]
-    return lookup_obj(sha)
+    return mode, sha
 
 
 class MissingObjectFinder(object):

+ 5 - 3
dulwich/objects.py

@@ -21,6 +21,7 @@
 """Access to base git objects."""
 
 
+import binascii
 from cStringIO import (
     StringIO,
     )
@@ -64,7 +65,7 @@ def _decompress(string):
 
 def sha_to_hex(sha):
     """Takes a string and returns the hex of the sha within"""
-    hexsha = "".join(["%02x" % ord(c) for c in sha])
+    hexsha = binascii.hexlify(sha)
     assert len(hexsha) == 40, "Incorrect length of sha1 string: %d" % hexsha
     return hexsha
 
@@ -72,7 +73,7 @@ def sha_to_hex(sha):
 def hex_to_sha(hex):
     """Takes a hex sha and returns a binary sha"""
     assert len(hex) == 40, "Incorrent length of hexsha: %s" % hex
-    return ''.join([chr(int(hex[i:i+2], 16)) for i in xrange(0, len(hex), 2)])
+    return binascii.unhexlify(hex)
 
 
 def serializable_property(name, docstring=None):
@@ -429,6 +430,7 @@ class Tree(ShaFile):
         self._needs_serialization = True
 
     def __len__(self):
+        self._ensure_parsed()
         return len(self._entries)
 
     def add(self, mode, name, hexsha):
@@ -617,7 +619,7 @@ num_type_map = {
 
 try:
     # Try to import C versions
-    from dulwich._objects import hex_to_sha, sha_to_hex, parse_tree
+    from dulwich._objects import parse_tree
 except ImportError:
     pass
 

+ 5 - 5
dulwich/pack.py

@@ -774,9 +774,9 @@ def write_pack_object(f, type, object):
     """
     offset = f.tell()
     packed_data_hdr = ""
-    if type == 6: # ref delta
+    if type == 6: # offset delta
         (delta_base_offset, object) = object
-    elif type == 7: # offset delta
+    elif type == 7: # ref delta
         (basename, object) = object
     size = len(object)
     c = (type << 4) | (size & 15)
@@ -891,7 +891,7 @@ def write_pack_index_v1(filename, entries, pack_checksum):
 
 def create_delta(base_buf, target_buf):
     """Use python difflib to work out how to transform base_buf to target_buf.
-    
+
     :param base_buf: Base buffer
     :param target_buf: Target buffer
     """
@@ -925,12 +925,12 @@ def create_delta(base_buf, target_buf):
             o = i1
             for i in range(4):
                 if o & 0xff << i*8:
-                    scratch += chr(o >> i)
+                    scratch += chr((o >> i*8) & 0xff)
                     op |= 1 << i
             s = i2 - i1
             for i in range(2):
                 if s & 0xff << i*8:
-                    scratch += chr(s >> i)
+                    scratch += chr((s >> i*8) & 0xff)
                     op |= 1 << (4+i)
             out_buf += chr(op)
             out_buf += scratch

+ 23 - 0
dulwich/server.py

@@ -132,6 +132,8 @@ class UploadPackHandler(Handler):
             while want and want[:4] == 'want':
                 want_revs.append(want[5:45])
                 want = self.proto.read_pkt_line()
+                if want == None:
+                    self.proto.write_pkt_line("ACK %s\n" % want_revs[-1])
             return want_revs
 
         progress = lambda x: self.proto.write_sideband(2, x)
@@ -142,15 +144,36 @@ class UploadPackHandler(Handler):
             def __init__(self, proto):
                 self.proto = proto
                 self._last_sha = None
+                self._cached = False
+                self._cache = []
+                self._cache_index = 0
 
             def ack(self, have_ref):
                 self.proto.write_pkt_line("ACK %s continue\n" % have_ref)
 
+            def reset(self):
+                self._cached = True
+                self._cache_index = 0
+
             def next(self):
+                if not self._cached:
+                    return self.next_from_proto()
+                self._cache_index = self._cache_index + 1
+                if self._cache_index > len(self._cache):
+                    return None
+                return self._cache[self._cache_index]
+
+            def next_from_proto(self):
                 have = self.proto.read_pkt_line()
+                if have is None:
+                    self.proto.write_pkt_line("ACK %s\n" % self._last_sha)
+                    return None
+
                 if have[:4] == 'have':
+                    self._cache.append(have[5:45])
                     return have[5:45]
 
+
                 #if have[:4] == 'done':
                 #    return None
 

+ 1 - 1
setup.py

@@ -5,7 +5,7 @@
 from distutils.core import setup
 from distutils.extension import Extension
 
-dulwich_version_string = '0.3.3'
+dulwich_version_string = '0.4.0'
 
 include_dirs = []
 # Windows MSVC support