Repo.blob: respect reported blob size

Batch output of git-cat-file has the form:

    <sha1> SP <type> SP <size> LF <contents> LF

It unconditionally includes a trailing line-feed which Repo.blob()
incorrectly returns as part of blob content. For textual blobs, this
extra character is often benign, however, for binary blobs, it can
easily change the meaning of the data in unexpected or disastrous ways.
Fix this by respecting the blob size reported by git-cat-file.

(The alternate approach of unconditionally dropping the final LF also
works, however, respecting the reported size is perhaps a bit more
robust and "correct".)

Signed-off-by: Eric Sunshine <sunshine@sunshineco.com>
Signed-off-by: Alberto Bertogli <albertito@blitiri.com.ar>
This commit is contained in:
Eric Sunshine 2015-01-13 04:57:12 -05:00 committed by Alberto Bertogli
parent 50c004f8a5
commit 58037e57c5

4
git.py

@ -345,7 +345,7 @@ class Repo:
ref = self.branch ref = self.branch
cmd = self.cmd('cat-file') cmd = self.cmd('cat-file')
cmd.raw(True) cmd.raw(True)
cmd.batch = None cmd.batch = '%(objectsize)'
if isinstance(ref, unicode): if isinstance(ref, unicode):
ref = ref.encode('utf8') ref = ref.encode('utf8')
@ -356,7 +356,7 @@ class Repo:
if not head or head.strip().endswith('missing'): if not head or head.strip().endswith('missing'):
return None return None
return Blob(out.read()) return Blob(out.read()[:int(head)])
def last_commit_timestamp(self): def last_commit_timestamp(self):
"""Return the timestamp of the last commit.""" """Return the timestamp of the last commit."""