Regular expressions as siteswap generator filters

The Juggling Lab siteswap generator allows regular expressions to be used as output filters. This page is not intended to be a comprehensive discussion of regular expressions; for this read one of the several good tutorials available on the web.

Juggling Lab uses standard regular expression syntax, with some important differences.

Difference 1: Swapped metacharacters and literals

In standard regular expressions, the characters []()| act as metacharacters with special non-literal meaning. Doing a literal match of one of these characters requires a preceding backslash '\', for example the regex \[ matches the string [. In siteswap notation the characters []()| have special meaning, so relative to standard regular expressions we swap the roles of [ and \[. So within Juggling Lab the regex [ is a literal match for [, and \[ and \] are used to define character classes (see below).

Difference 2: When regular expression filters are applied

The "include" patterns are applied once, after a pattern is generated but before it is printed. Therefore the boundary matchers ^ and $ do what one expects, matching to the beginning or end of the pattern respectively. For example the include filter 4$ results in patterns ending with a 4 throw, since $ matches to the end of the pattern.

By contrast, for efficiency reasons the "exclude" filters are applied as the pattern is being built up, throw by throw. So the beginning matcher ^ always matches the beginning of the pattern, but the end matcher $ can match the end of any throw. (For this purpose, a "throw" is any set of events occurring simultaneously, e.g., (4,[2x2]) counts as a single throw.) So an exclude filter of 4$ excludes patterns containing 4 throws anywhere, not just at the end of the pattern.

Difference 3: Implied wildcard matching

If the beginning matcher ^ is not supplied in a given filter term, then a .* wildcard match is prepended to it. For include filters only, the same .* is appended to the pattern if no ending matcher $ is supplied. This is done for convenience, so that for example an include filter of 4 will match a 4 throw anywhere in the pattern (it is converted to .*4.* before the regex matching is done).

Note that .* is not automatically added to the end of exclude filters. Thus for example an exclude filter 33 will match two successive 3 throws anywhere in the pattern, but it will not match the siteswap throw [33]. One could exclude the latter with a filter pattern of [33] or 33].

Juggling Lab regular expression summary

Characters

   Char          Matches any identical character

Character Classes

   \[abc\]       Simple character class
   \[a-zA-Z\]    Character class with ranges
   \[^abc\]      Negated character class

Predefined Classes

   .             Matches any character other than newline
   \d            Matches a digit character
   \D            Matches a non-digit character

Boundary Matchers

   ^             Matches only at the beginning of a pattern
   $             Matches at the end of a pattern, or throw (see note above)

Greedy Closures

   A*            Matches A 0 or more times (greedy)
   A+            Matches A 1 or more times (greedy)
   A?            Matches A 1 or 0 times (greedy)
   A{n}          Matches A exactly n times (greedy)
   A{n,}         Matches A at least n times (greedy)
   A{n,m}        Matches A at least n but not more than m times (greedy)

Reluctant Closures

   A*?           Matches A 0 or more times (reluctant)
   A+?           Matches A 1 or more times (reluctant)
   A??           Matches A 0 or 1 times (reluctant)

Logical Operators

   AB            Matches A followed by B
   A\|B          Matches either A or B
   \(A\)         Used for subexpression grouping
   \(?:A\)       Used for subexpression clustering (just like grouping but no backrefs)

Backreferences

   \1            Backreference to 1st parenthesized subexpression
   \2            Backreference to 2nd parenthesized subexpression
   \3            Backreference to 3rd parenthesized subexpression
   \4            Backreference to 4th parenthesized subexpression
   \5            Backreference to 5th parenthesized subexpression
   \6            Backreference to 6th parenthesized subexpression
   \7            Backreference to 7th parenthesized subexpression
   \8            Backreference to 8th parenthesized subexpression
   \9            Backreference to 9th parenthesized subexpression

You can refer to the contents of a parenthesized expression within a regular expression itself. This is called a 'backreference'. The first backreference in a regular expression is denoted by \1, the second by \2 and so on. So the expression:

\(\[0-9\]+\)=\1

will match any string of the form n=n (like 0=0 or 2=2).

All closure operators (+, *, ?, {m,n}) are greedy by default, meaning that they match as many elements of the string as possible without causing the overall match to fail. If you want a closure to be reluctant (non-greedy), you can simply follow it with a '?'. A reluctant closure will match as few elements of the string as possible when finding matches. {m,n} closures don't currently support reluctancy.