Ver Fonte

Improve handling comment characters in config file

Comment characters that are within balanced quotes should not discard
the rest of the line:

    [branch "my#branch"]

is a valid section for a branch named `my#branch` (`#` is a valid
character in a branch name [0]). Previously this normalized to:

    [branch "my

which raised an exception for finding an invalid section header.

---

Note that the current comment parser solution leaves ambiguity in case
of multi-byte encoded data in configuration files due to the risk of a
byte clashing with either of `"`, `#` or `;` as they are represented in
ASCII, likely leading to interpreting a configuration file as malformed
depending on e.g. branch names. In real-life systems, the only
multi-byte encoding that is likely to be encountered for Git
configuration files is UTF-8, which should be safe from such accidental
clashes due to how it uses the high bit as continuation byte marker.

The same non-UTF-8 multi-byte issue seems to hold true for Git itself as
its current configuration parser is written [1], and it makes sense to
follow the reference implementation in this regard.

[0]: man git-check-ref-format
[1]: https://github.com/git/git/blob/c2ece9dc/config.c#L504
Daniel Andersson há 7 anos atrás
pai
commit
f0bb68f00f
2 ficheiros alterados com 20 adições e 2 exclusões
  1. 10 2
      dulwich/config.py
  2. 10 0
      dulwich/tests/test_config.py

+ 10 - 2
dulwich/config.py

@@ -262,8 +262,16 @@ def _check_section_name(name):
 
 
 def _strip_comments(line):
-    line = line.split(b"#")[0]
-    line = line.split(b";")[0]
+    comment_bytes = {ord(b"#"), ord(b";")}
+    quote = ord(b'"')
+    string_open = False
+    # Normalize line to bytearray for simple 2/3 compatibility
+    for i, character in enumerate(bytearray(line)):
+        # Comment characters outside balanced quotes denote comment start
+        if character == quote:
+            string_open = not string_open
+        elif not string_open and character in comment_bytes:
+            return line[:i]
     return line
 
 

+ 10 - 0
dulwich/tests/test_config.py

@@ -81,6 +81,16 @@ class ConfigFileTests(TestCase):
         cf = self.from_file(b"[section]\nbar= foo # a comment\n")
         self.assertEqual(ConfigFile({(b"section", ): {b"bar": b"foo"}}), cf)
 
+    def test_comment_character_within_value_string(self):
+        cf = self.from_file(b"[section]\nbar= \"foo#bar\"\n")
+        self.assertEqual(
+            ConfigFile({(b"section", ): {b"bar": b"foo#bar"}}), cf)
+
+    def test_comment_character_within_section_string(self):
+        cf = self.from_file(b"[branch \"foo#bar\"] # a comment\nbar= foo\n")
+        self.assertEqual(
+            ConfigFile({(b"branch", b"foo#bar"): {b"bar": b"foo"}}), cf)
+
     def test_from_file_section(self):
         cf = self.from_file(b"[core]\nfoo = bar\n")
         self.assertEqual(b"bar", cf.get((b"core", ), b"foo"))