123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189 |
- .. _tutorial-object-store:
- The object store
- ================
- The objects are stored in the ``object store`` of the repository.
- >>> from dulwich.repo import Repo
- >>> repo = Repo.init("myrepo", mkdir=True)
- Initial commit
- --------------
- When you use Git, you generally add or modify content. As our repository is
- empty for now, we'll start by adding a new file::
- >>> from dulwich.objects import Blob
- >>> blob = Blob.from_string(b"My file content\n")
- >>> print(blob.id.decode('ascii'))
- c55063a4d5d37aa1af2b2dad3a70aa34dae54dc6
- Of course you could create a blob from an existing file using ``from_file``
- instead.
- As said in the introduction, file content is separated from file name. Let's
- give this content a name::
- >>> from dulwich.objects import Tree
- >>> tree = Tree()
- >>> tree.add(b"spam", 0o100644, blob.id)
- Note that "0o100644" is the octal form for a regular file with common
- permissions. You can hardcode them or you can use the ``stat`` module.
- The tree state of our repository still needs to be placed in time. That's the
- job of the commit::
- >>> from dulwich.objects import Commit, parse_timezone
- >>> from time import time
- >>> commit = Commit()
- >>> commit.tree = tree.id
- >>> author = b"Your Name <your.email@example.com>"
- >>> commit.author = commit.committer = author
- >>> commit.commit_time = commit.author_time = int(time())
- >>> tz = parse_timezone(b'-0200')[0]
- >>> commit.commit_timezone = commit.author_timezone = tz
- >>> commit.encoding = b"UTF-8"
- >>> commit.message = b"Initial commit"
- Note that the initial commit has no parents.
- At this point, the repository is still empty because all operations happen in
- memory. Let's "commit" it.
- >>> object_store = repo.object_store
- >>> object_store.add_object(blob)
- Now the ".git/objects" folder contains a first SHA-1 file. Let's continue
- saving the changes::
- >>> object_store.add_object(tree)
- >>> object_store.add_object(commit)
- Now the physical repository contains three objects but still has no branch.
- Let's create the master branch like Git would::
- >>> repo.refs[b'refs/heads/master'] = commit.id
- The master branch now has a commit where to start. When we commit to master, we
- are also moving HEAD, which is Git's currently checked out branch:
- >>> head = repo.refs[b'HEAD']
- >>> head == commit.id
- True
- >>> head == repo.refs[b'refs/heads/master']
- True
- How did that work? As it turns out, HEAD is a special kind of ref called a
- symbolic ref, and it points at master. Most functions on the refs container
- work transparently with symbolic refs, but we can also take a peek inside HEAD:
- >>> import sys
- >>> print(repo.refs.read_ref(b'HEAD').decode(sys.getfilesystemencoding()))
- ref: refs/heads/master
- Normally, you won't need to use read_ref. If you want to change what ref HEAD
- points to, in order to check out another branch, just use set_symbolic_ref.
- Now our repository is officially tracking a branch named "master" referring to a
- single commit.
- Playing again with Git
- ----------------------
- At this point you can come back to the shell, go into the "myrepo" folder and
- type ``git status`` to let Git confirm that this is a regular repository on
- branch "master".
- Git will tell you that the file "spam" is deleted, which is normal because
- Git is comparing the repository state with the current working copy. And we
- have absolutely no working copy using Dulwich because we don't need it at
- all!
- You can checkout the last state using ``git checkout -f``. The force flag
- will prevent Git from complaining that there are uncommitted changes in the
- working copy.
- The file ``spam`` appears and with no surprise contains the same bytes as the
- blob::
- $ cat spam
- My file content
- Changing a File and Committing it
- ---------------------------------
- Now we have a first commit, the next one will show a difference.
- As seen in the introduction, it's about making a path in a tree point to a
- new blob. The old blob will remain to compute the diff. The tree is altered
- and the new commit'task is to point to this new version.
- Let's first build the blob::
- >>> from dulwich.objects import Blob
- >>> spam = Blob.from_string(b"My new file content\n")
- >>> print(spam.id.decode('ascii'))
- 16ee2682887a962f854ebd25a61db16ef4efe49f
- An alternative is to alter the previously constructed blob object::
- >>> blob.data = b"My new file content\n"
- >>> print(blob.id.decode('ascii'))
- 16ee2682887a962f854ebd25a61db16ef4efe49f
- In any case, update the blob id known as "spam". You also have the
- opportunity of changing its mode::
- >>> tree[b"spam"] = (0o100644, spam.id)
- Now let's record the change::
- >>> from dulwich.objects import Commit
- >>> from time import time
- >>> c2 = Commit()
- >>> c2.tree = tree.id
- >>> c2.parents = [commit.id]
- >>> c2.author = c2.committer = b"John Doe <john@example.com>"
- >>> c2.commit_time = c2.author_time = int(time())
- >>> c2.commit_timezone = c2.author_timezone = 0
- >>> c2.encoding = b"UTF-8"
- >>> c2.message = b'Changing "spam"'
- In this new commit we record the changed tree id, and most important, the
- previous commit as the parent. Parents are actually a list because a commit
- may happen to have several parents after merging branches.
- Let's put the objects in the object store::
- >>> repo.object_store.add_object(spam)
- >>> repo.object_store.add_object(tree)
- >>> repo.object_store.add_object(c2)
- You can already ask git to introspect this commit using ``git show`` and the
- value of ``c2.id`` as an argument. You'll see the difference will the
- previous blob recorded as "spam".
- The diff between the previous head and the new one can be printed using
- write_tree_diff::
- >>> from dulwich.patch import write_tree_diff
- >>> from io import BytesIO
- >>> out = BytesIO()
- >>> write_tree_diff(out, repo.object_store, commit.tree, tree.id)
- >>> import sys; _ = sys.stdout.write(out.getvalue().decode('ascii'))
- diff --git a/spam b/spam
- index c55063a..16ee268 100644
- --- a/spam
- +++ b/spam
- @@ -1 +1 @@
- -My file content
- +My new file content
- You won't see it using git log because the head is still the previous
- commit. It's easy to remedy::
- >>> repo.refs[b'refs/heads/master'] = c2.id
- Now all git tools will work as expected.
|