8 年之前 · a0e591771c
--- a/docs/tutorial/encoding.txt
+++ b/docs/tutorial/encoding.txt
@@ -0,0 +1,26 @@
 
				+Encoding
			
 
				+========
			
 
				+
			
 
				+You will notice that all lower-level functions in Dulwich take byte strings
			
 
				+rather than unicode strings. This is intentional.
			
 
				+
			
 
				+Although `C git`_ recommends the use of UTF-8 for encoding, this is not
			
 
				+strictly enforced and C git treats filenames as sequences of non-NUL bytes.
			
 
				+There are repositories in the wild that use non-UTF-8 encoding for filenames
			
 
				+and commit messages.
			
 
				+
			
 
				+.. _C git: https://github.com/git/git/blob/master/Documentation/i18n.txt
			
 
				+
			
 
				+The library should be able to read *all* existing git repositories,
			
 
				+irregardless of what encoding they use. This is the main reason why Dulwich
			
 
				+does not convert paths to unicode strings.
			
 
				+
			
 
				+A further consideration is that converting back and forth to unicode
			
 
				+is an extra performance penalty. E.g. if you are just iterating over file
			
 
				+contents, there is no need to consider encoded strings. Users of the library
			
 
				+may have specific assumptions they can make about the encoding - e.g. they
			
 
				+could just decide that all their data is latin-1, or the default Python
			
 
				+encoding.
			
 
				+
			
 
				+Higher level functions, such as the porcelain in dulwich.porcelain, will
			
 
				+automatically convert unicode strings to UTF-8 bytestrings.
			
--- a/docs/tutorial/index.txt
+++ b/docs/tutorial/index.txt
@@ -8,6 +8,7 @@ Tutorial
 
				    :maxdepth: 2
			
 
				 
			
 
				    introduction
			
 
				+   encoding 
			
 
				    file-format
			
 
				    repo
			
 
				    object-store