Skip to content

Commit

Permalink
is_hfs_dotgit: loosen over-eager match of \u{..47}
Browse files Browse the repository at this point in the history
Our is_hfs_dotgit function relies on the hackily-implemented
next_hfs_char to give us the next character that an HFS+
filename comparison would look at. It's hacky because it
doesn't implement the full case-folding table of HFS+; it
gives us just enough to see if the path matches ".git".

At the end of next_hfs_char, we use tolower() to convert our
32-bit code point to lowercase. Our tolower() implementation
only takes an 8-bit char, though; it throws away the upper
24 bits. This means we can't have any false negatives for
is_hfs_dotgit. We only care about matching 7-bit ASCII
characters in ".git", and we will correctly process 'G' or
'g'.

However, we _can_ have false positives. Because we throw
away the upper bits, code point \u{0147} (for example) will
look like 'G' and get downcased to 'g'. It's not known
whether a sequence of code points whose truncation ends up
as ".git" is meaningful in any language, but it does not
hurt to be more accurate here. We can just pass out the full
32-bit code point, and compare it manually to the upper and
lowercase characters we care about.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>
  • Loading branch information
peff authored and gitster committed Dec 29, 2014
1 parent d08c13b commit 6aaf956
Show file tree
Hide file tree
Showing 2 changed files with 35 additions and 12 deletions.
15 changes: 15 additions & 0 deletions t/t1450-fsck.sh
Original file line number Diff line number Diff line change
Expand Up @@ -273,4 +273,19 @@ dot-backslash-case .\\\\.GIT\\\\foobar
dotgit-case-backslash .git\\\\foobar
EOF

test_expect_success 'fsck allows .Ňit' '
(
git init not-dotgit &&
cd not-dotgit &&
echo content >file &&
git add file &&
git commit -m base &&
blob=$(git rev-parse :file) &&
printf "100644 blob $blob\t.\\305\\207it" >tree &&
tree=$(git mktree <tree) &&
git fsck 2>err &&
test_line_count = 0 err
)
'

test_done
32 changes: 20 additions & 12 deletions utf8.c
Original file line number Diff line number Diff line change
Expand Up @@ -630,8 +630,8 @@ int mbs_chrlen(const char **text, size_t *remainder_p, const char *encoding)
}

/*
* Pick the next char from the stream, folding as an HFS+ filename comparison
* would. Note that this is _not_ complete by any means. It's just enough
* Pick the next char from the stream, ignoring codepoints an HFS+ would.
* Note that this is _not_ complete by any means. It's just enough
* to make is_hfs_dotgit() work, and should not be used otherwise.
*/
static ucs_char_t next_hfs_char(const char **in)
Expand Down Expand Up @@ -668,23 +668,31 @@ static ucs_char_t next_hfs_char(const char **in)
continue;
}

/*
* there's a great deal of other case-folding that occurs,
* but this is enough to catch anything that will convert
* to ".git"
*/
return tolower(out);
return out;
}
}

int is_hfs_dotgit(const char *path)
{
ucs_char_t c;

if (next_hfs_char(&path) != '.' ||
next_hfs_char(&path) != 'g' ||
next_hfs_char(&path) != 'i' ||
next_hfs_char(&path) != 't')
c = next_hfs_char(&path);
if (c != '.')
return 0;
c = next_hfs_char(&path);

/*
* there's a great deal of other case-folding that occurs
* in HFS+, but this is enough to catch anything that will
* convert to ".git"
*/
if (c != 'g' && c != 'G')
return 0;
c = next_hfs_char(&path);
if (c != 'i' && c != 'I')
return 0;
c = next_hfs_char(&path);
if (c != 't' && c != 'T')
return 0;
c = next_hfs_char(&path);
if (c && !is_dir_sep(c))
Expand Down

0 comments on commit 6aaf956

Please sign in to comment.