Issue
I'm developing a git
post-receive
hook in Python. Data is supplied on stdin
with lines similar to
ef4d4037f8568e386629457d4d960915a85da2ae 61a4033ccf9159ae69f951f709d9c987d3c9f580 refs/heads/master
The first hash is the old-ref, the second the new-ref and the third column is the reference being updated.
I want to split this into 3 variables, whilst also validating input. How do I validate the branch name?
I am currently using the following regular expression
^([0-9a-f]{40}) ([0-9a-f]{40}) refs/heads/([0-9a-zA-Z]+)$
This doesn't accept all possible branch names, as set out by man git-check-ref-format. For example, it excludes a branch by the name of build-master
, which is valid.
Bonus marks
I actually want to exclude any branch that starts with "build-". Can this be done in the same regex?
Tests
Given the great answers below, I wrote some tests, which can be found at https://github.com/alexchamberlain/githooks/blob/master/miscellaneous/git-branch-re-test.py.
Status: All the regexes below are failing to compile. This could indicate there's a problem with my script or incompatible syntaxes.
Solution
Let's dissect the various rules and build regex parts from them:
They can include slash
/
for hierarchical (directory) grouping, but no slash-separated component can begin with a dot.
or end with the sequence.lock
.# must not contain /. (?!.*/\.) # must not end with .lock (?<!\.lock)$
They must contain at least one
/
. This enforces the presence of a category like heads/, tags/ etc. but the actual names are not restricted. If the--allow-onelevel
option is used, this rule is waived..+/.+ # may get more precise later
They cannot have two consecutive dots
..
anywhere.(?!.*\.\.)
They cannot have ASCII control characters (i.e. bytes whose values are lower than
\040
, or\177 DEL
), space, tilde~
, caret^
, or colon:
anywhere.[^\000-\037\177 ~^:]+ # pattern for allowed characters
They cannot have question-mark
?
, asterisk*
, or open bracket[
anywhere. See the--refspec-pattern
option below for an exception to this rule.[^\000-\037\177 ~^:?*[]+ # new pattern for allowed characters
They cannot begin or end with a slash
/
or contain multiple consecutive slashes (see the--normalize
option below for an exception to this rule)^(?!/) (?<!/)$ (?!.*//)
They cannot end with a dot
.
.(?<!\.)$
They cannot contain a sequence
@{
.(?!.*@\{)
They cannot contain a
\
.(?!.*\\)
Piecing it all together we arrive at the following monstrosity:
^(?!.*/\.)(?!.*\.\.)(?!/)(?!.*//)(?!.*@\{)(?!.*\\)[^\000-\037\177 ~^:?*[]+/[^\000-\037\177 ~^:?*[]+(?<!\.lock)(?<!/)(?<!\.)$
And if you want to exclude those that start with build-
then just add another lookahead:
^(?!build-)(?!.*/\.)(?!.*\.\.)(?!/)(?!.*//)(?!.*@\{)(?!.*\\)[^\000-\037\177 ~^:?*[]+/[^\000-\037\177 ~^:?*[]+(?<!\.lock)(?<!/)(?<!\.)$
This can be optimized a bit as well by conflating a few things that look for common patterns:
^(?!@$|build-|/|.*([/.]\.|//|@\{|\\))[^\000-\037\177 ~^:?*[]+/[^\000-\037\177 ~^:?*[]+(?<!\.lock|[/.])$
Answered By - Joey
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.