Solving the Circle City Con 5.0 regex crossword challenge (Part 1)

Circle City Con runs a CTF. This year, I was the first solve of the Regex Crossword.

Post challenge, I was encouraged to describe how I solved it.

But first, the challenge.

If you’ve never seen one before, this is an entirely opaque problem. It is a regex crossword. Many examples of this exist at https://regexcrossword.com. Of the categories listed on the site, this would be a double cross problem.

Solving a regex crossword is usually best done by working from the outsides in. The matches for a given space are going to be most constrained by the character classes and grouping near the outside of each regex. When hunting matches, it’s best to start with positive character classes. The top line, right regex,

(.CTF|C.TF|CT.F|CTF.).?[^flag]?(\(|\[|\{)[AEIOUY]

had a very strong match sequence, showing me that in the first line I would be finding the characters ‘C’, ‘T’, and ‘F’ in that order.  Further more, there would be one of the set ([{ followed by one of the set AEIOUY.  The CTF sequence would have to come before it, and there would be one other character inside that sequence.

A reminder, ? means zero or 1 of the preceding sequence.  This means the sequence .? is anything maybe.

A less commonly known operation is character set inversion.  The match [^flag] means anything that’s not the characters in the set flag.  The following ? means zero or one copies of it.

Combine the above sequence with my knowledge that the crossword puzzle is only 6 characters wide, and I have a greatly reduced search space.  .? and [^flag]? both drop out to zero instances of them and I’m left with matching

(.CTF|C.TF|CT.F|CTF.)(\(|\[|\{)[AEIOUY]

From that, one needs to start eliminating options, so start working through the column expressions in the same way.

I started from the left most column solves, looking for something that would constrain the CTF sequence.

.?(Lt2P|B0T|NuL|m0H)(P|o|p)+[^a-m,n-y,z](\\\|v|//)

[IN|CCC]?[^A-Z]+\w[p4SsW0RdHa5h]*.?

Well, that’s not helpful.  Both of these regex’s start with ? blocks that may or may not exist, and both could match C, or a wide variety of anything else.  Next!

[FL4IR]*(M|0|R|3)*([abcde])(1|3|5|7|9)\2

(L|4|V|A)*[FluBb3R]+\w[^aeiou]

Okay, here’s a more interesting case, but still difficult to deal with.  The leading match classes in both cases have the * (zero or more) operator applied to them.  The only fixed sequence in the first string is ([abcde])(1|3|5|7|9)\2 .  That is an interesting sequence however, as it uses back references.  I know that the first and third characters of that sequence will match.

Alright, on to the third column.

\w+(SG|Sg|sG|sg)[jobs][WITH][hammers]

.?[vegas]+[G0LD]+[overnumerousness]+[\\\W]\h*[garbage]?

Again, not much to work with.  I could tell that I was going to lead with a ‘word’ character (\w+) one or more times.  .? is again something of ‘might exist’.  This is getting frustrating.  On to column 4.

[fail|FAIL]?[variableS^*]*\d(2|5|Z|S)+[^cone].+

[A-F,I-L](e|d|b)[^0-2,3-5,6-8][A-Z]+[encode|decode][\\\|\(|\)|\-|\=]

Well, finally, something that’s vaguely constrained.  The second regex has 6 character classes, with one which is marked as 1 or more  (+ operator).  The crossword has 7 spaces there.

At this point, I choose to make an assumption.  I choose to assume that these Regex’s are designed to consume the entire string, which is not a requirement generally.  I make that choice because otherwise, I’m going to end up with almost no constraints given the number of ?, +, and * operators that are in this problem.

With that assumption in place, I can see that the first Character is going to be an A or an F.  The overlap of the sets [fail|fail] and [A-F,I-L].  Looking back to the row one reduced regex,

(.CTF|C.TF|CT.F|CTF.)(\(|\[|\{)[AEIOUY]

I can see F would be valid in position 4 no matter what.  I declare that to be what I’m aiming at, and write it down.

I can also take a pretty serious stab at the second character of the column, as the set of (e|d|b) overlaps with [variableS^*] in one case, b.

Continuing on, I have column 5’s regex’s

(\(|\[|\{)[GOODLUCK]*\b?[52SZ]?(l|m|n|o)\D+

(.)+[^!LABEL]?[p-q,a-l,m-z]([SiGnS]*)(\1|.)

Well, great, I kinda knew that already, that it was going to be one of the set of ([{, and of course (.) matches anything.  The back reference operator \1  at the end is amusing, but is parred with an | operator of a . , thus anything could match there.

Assumption, most of the flags in this game were in the form CCC{something} . The problem indicated that the flag was in a non-standard format, but you would know it when you saw it.  I just started assuming that this character was a { and moved forward.

Column 6 regexes are:

[AEIOUYaeiouy0][Version2.0]*[Out|In]?[Blashpemy]+[V|^|v|/\]+\D?]

(M|0|N|E|Y)[VERIFY]+[^a-z0-9,.][birdlaw]+(\)|\})

Okay there, here’s something I can work with.  The first character match is the intersection of [AEIOUYaeiou0] and (expressed as a character set) [M0NEY].  The intersection of those two sets is [EY0].  My first row matching final set is [AEIOUY], which reduces the set to [EY].  Now the second match for first row has the restriction in the final group of (Y|O|U).  One overlap, which is the character Y.  At least one character isn’t ambiguous.

The solve at this point looks like this

???F{Y
???b??
??????
??????
??????
??????
??????

Well, that’s not much to work with.  Time to reprocess the column 1 -3 lines with my expanded assumptions.

.?(Lt2P|B0T|NuL|m0H)(P|o|p)+[^a-m,n-y,z](\\\|v|//)

[IN|CCC]?[^A-Z]+\w[p4SsW0RdHa5h]*.?

C could match either one of these as the leading character.  Sadly, so could anything that’s a capital letter.  So, this either matches on C.TF or .CTF for the row regex.  Can I eliminate C from the second column?

[FL4IR]*(M|0|R|3)*([abcde])(1|3|5|7|9)\2

(L|4|V|A)*[FluBb3R]+\w[^aeiou]

Again, given the assumption that the match will consume the entire line, I can tell that the first character matched on the second column will be in the set [FL4IRM0R3] which is the full matching set of the first two clauses of the first regex match, as each is followed by a *, zero or more matches is possible, so either could be the leading character.  What’s not in that list?  C.  C can’t be in the second position, so it must be in the first.  Thus we are matching on either C.TF or CT.F as the opening string.

At this point, I also learn that T can’t be in that location, which means I’m matching the string C.TF as the prefix, and I can fill in more of the solve matrix.

C?TF{Y
???b??
??????
??????
??????
??????
??????

Great, now what is the content of the second column.

[FL4IR]*(M|0|R|3)*([abcde])(1|3|5|7|9)\2

(L|4|V|A)*[FluBb3R]+\w[^aeiou]

If I assume that the * groups will consume at least one character on the match, The intersection of the two results in [L4].

The regex for row one demands a digit in the second location

[^foxtrot]\d[^qwertyUIOP](F|4|1|L)\W(Y|O|U)

which reduces the set to 4.

The solve matrix now looks like this

C4TF{Y
???b??
??????
??????
??????
??????
??????

This string matches the left and right row one regex’s, and provisionally matches all of the column regexes.

Problems when solving this problem:

  • The font selected to write the regex’s in was miserably hard to differentiate upper case L from upper case I from lowercase l. The difference between lowercase l and an uppercase I was about 2 pixels.
  • Not knowing if the regexes are anchored to the edge of the puzzle makes this much, much harder.  See the assumptions I’ve needed to add.
  • Extensive use of * and ? radically increase the search space.

Thus ends part 1, where I push out the partial work done as an introduction.  Part 2 completes the walk through of the solve.