At present, git-pbchk from onbld depends on the ignore module from Mercurial. As we move away from Mercurial, it would be better if we could process ignore files and exception lists without depending on modules outside of illumos.
Former user commented on 2013-01-23T23:39:53.000-0500:
illumos-joyent commit bfb56a4 (branch master, by Joshua M. Clulow)
OS-1823#icft=OS-1823 git-pbchk should not depend on mercurial
Reviewed by: Richard Lowe <richlowe@richlowe.net>
Former user commented on 2013-01-24T02:23:39.000-0500:
Nobody expects the Spanish Inquisition^H CR!
Doesn't matter, but at https://mo.joyent.com/illumos-joyent/commit/bfb56a4#L78 you only need ".split('#', 1)"
>>> s = "foo # bar # blah"
>>> s.split('#', 2)
['foo ', ' bar ', ' blah']
>>> s.split('#', 1)
['foo ', ' bar # blah']
Technically you aren't handling escaped '#' chars. Compare to https://github.com/ajaxorg/mercurial/blob/master/mercurial/ignore.py#L24-L31 But, I'm presuming you don't care about crazy crap like that.
On https://mo.joyent.com/illumos-joyent/commit/bfb56a4#L89 you are accepting "regex" as a syntax. However from
http://www.selenic.com/mercurial/hgignore.5.html it looks like that should be "regexp" with a "p". The mercurial implementation also seems to accept "syntax: re", FWIW. Undocumented... so screw it.
https://mo.joyent.com/illumos-joyent/commit/bfb56a4#L109 Idiomatic Python here would be:
if not ignore_list:
...
https://mo.joyent.com/illumos-joyent/commit/bfb56a4#L114 Using REGEX.match(STRING) instead of REGEX.search(STRING) is going to artificially anchor at the start of the search string:
>>> import re
>>> foo = re.compile('foo')
>>> foo.match('blahfooblah') # does not match
>>> foo.search('blahfooblah') # does match
<_sre.SRE_Match object at 0x26a678>
IOW, you want "regex.search(path)" here.
https://mo.joyent.com/illumos-joyent/commit/bfb56a4#L88 Given the s/match/search/ above, I think that (a) you can remove the ".*" prefix to the glob->regex conversion. Also (b) the hgignore man page hints at it, "A glob-syntax pattern of the form *.c will match a file ending in .c", and my quick read of the implementation suggests that you want this:
if syntax == 'glob':
ignore_list.append(re.compile(fnmatch.translate(l) + '$'))
IOW, you want to anchor globs at the end of the path. Actually the mercurial impl does closer to the equiv of this:
if syntax == 'glob':
ignore_list.append(re.compile(fnmatch.translate(l) + '(?:/|$)'))
i.e. it anchors at either the end of the path, or at a dir separator, '/'.