CS8395 — Homework 1

Virtual Machines, Stack Smashing, and Metasploit

This assignment is due Wednesday, February 7 at 11:59PM Central time.

Introduction

This class focuses on the discussion of cybersecurity research. This homework introduces foundational tools and concepts relevant to cybersecurity:

Virtual Machines — A Virtual Machine (VM) is a tool that allows emulating an entire computer system as an application within another computer. Examples include installing Windows on your Mac OS device or Linux on your Windows device (e.g., via VirtualBox, VMWare, QEMU, and Parallels).
Stack Smashing — A foundational software security attack in which execution is hijacked by an adversary by overwriting the return address placed on the stack as part of a function activation.
Metasploit — A suite of tools for bundling together and generating malicious payloads, shellcode, and exploits.

While you should read this spec in its entirety, you can find the specific items to turn in by searching for "Turn-in".

Turn-in for HW1

Read this assignment document and complete the tasks described. The final deliverable is a zip file containing your exploit.py, badfile, and badfile-shell files, as well as a written PDF report. There is no length requirement or limit for the report, but I expect this will take 3 pages or fewer (depending on the size of your screenshots where appropriate).

Ensure that your name, VUNetID, and email address appear at the top of each of page.

Ensure that you have three sections to your submission labeled: Task 1, Task 2, and Task 3.

I strongly recommend the use of LaTeX to complete this (and all) assignments. LaTeX is the lingua franca for scientific communication in computer science — every peer-reviewed publication I have submitted and reviewed has been written in LaTeX. You can use overleaf.com to help you write LaTeX documents. I have also created a template that you can copy, available here.

Task 1: Virtual Machines and Linux Fundamentals

A Virtual Machine is an emulation of a computer system. Loosely, you can think of a VM as a program that can run a whole virtual computer system. Virtual machines are powerful software systems that enable running software for one operating system inside another.

For example, you can use your Windows host computer to run a Virtual Machine that contains a Linux operating system. Consider the image below:

This is a Windows 10 host computer running three different Virtual Machine guests. The guest instances are complete (virtual) environments that are isolated from the host. All of the guests share the host's hardware as they execute — each window in the screenshot above lets you interact with a separate emulated guest.

Thus, even though the host is a Windows computer, you can use one of the guests to execute Linux software inside the guest. Virtual Machines can be used in many combinations. You can have a Windows, Linux, or Mac host computer, and run arbitrary numbers and combinations of Linux and Windows guests. Finally, guests are stored as files in the host computer — this means you can move your VM guest from one host to another by transferring the file around.

Virtual Machines are a critical part of computer security research and practice. VM guest instances allow analysts to safely execute certain malicious code without damaging underlying system software or data. Moreover, VMs can be instrumented to analyze execution of malware samples to measure what damage the sample does. Note that the use of VMs in this manner is often referred to as sandboxing.

Kali Linux

VMs are useful in a variety of contexts for myriad purposes. For example, in cloud computing, a provider like Amazon or Microsoft can create and lease VM guests to paying customers. As a more specific example, a malware analysis pipeline may involve creating multiple VM environments in which to run and examine many thousands of malware samples in sequence.

However, VMs are also very useful for Penetration Testing ("pentesting," "ethical hacking," "white hat hacking"). A large variety of tools are available for pentesting, including debuggers (e.g., gdb), disassemblers (e.g., Ghidra), network monitoring (e.g., Wireshark), intrusion detection (e.g., Fireeye), and more. Because Penetration Testing is such an important part of cybersecurity, many of these tools have been packaged together by an organization called Offensive Security called Kali Linux.

Kali Linux out-of-the-box includes a set of tools used in pentesting. It is an invaluable asset in starting with cybersecurity research and practice. For example, Kali contains a dictionary of commonly-used passwords for conducting dictionary attacks. You can find this dictionary under /usr/share/wordlists/rockyou.txt.gz. We will use this to demonstrate some basic shell scripting.

Warning: If you are using an ARM-based host like Apple M1 or M2, you may want to consider the following alternatives:

You may not be able to run Kali Linux natively on ARM. Previous students have reported success using UTM on an Apple M1 host to run an x86-64 Kali guest, but this is slow.
You may want to use Amazon EC2. You can get a free EC2 instance and there is a Kali Linux distribution that will work out of the box on EC2 (see here). This may be challenging if you are not familiar with the command line interface in Linux.
While not advised, it is nonetheless possible to run these applications natively in ARM without Kali. I really discourage this since it would require disabling ASLR. Only consider this option if you really understand what you are doing.

Turn-in for Task 1: VM Setup and Linux Fundamentals

Your first task is to set up a Virtual Machine that runs Kali Linux. There are several options, including VirtualBox, VMWare, and QEMU. VirtualBox and QEMU are both freely available, and VMWare is a pro solution that may require a paid commercial license. For this assignment, I strongly recommend VirtualBox, but you are welcome to use any platform.

Once you have installed a Virtual Machine platform, use it to create a VM guest and install Kali Linux on it.

Document that you have downloaded and installed an appropriate Virtual Machine platform with a Kali Linux guest running. A screenshot will suffice.
Once your VM is set up and running, use it to search the password dictionary for all passwords containing a contiguous sequence of 3 pairs of letters but that does not end in a number (e.g., "bookkeeper" counts because it contains oo, kk, ee in sequence, but "bookkeeper1" would not count). Sort the list alphabetically, and return the first 10 such passwords. Include them in your writeup.

As a hint, you can complete this in a single line on the terminal. Consider using zcat, egrep, sort, and head utilities, as well as pipes in the terminal.

Task 2: Smashing the Stack

For this task, you will work with a small C program that contains a stack overflow vulnerability. You will craft a malicious input that exploits the vulnerability to execute a payload that you will build. This consists of several steps:

Download and build a vulnerable C project.
Analyze the C code to understand where the vulnerability exists.
Construct an input that leads to unexpected behavior (a segfault).
Refine the input to hijack execution at the vulnerability.
Develop and deploy a malicious payload.

After you complete this task, Task 3 will go through the use of Metasploit to automate the generation of exploits and payloads.

Runtime Organization Primer

On almost every modern computer system, programs are provided a common runtime environment consisting of a stack and a heap (and regions for executable code, libraries, file handles, etc.). On modern computers, when a user runs a program, the operating system allocates pages of memory for that program to use (e.g., to store variables). The stack is used to by the program to pass parameters to functions and to allocate space for local variables (e.g., locally-scoped variables). The stack grows and shrinks as the program calls and returns from functions.

Runtime Stack and Activation Records

Whenever a program needs to call a function, it follows a calling convention, which is set of rules the program must follow to ensure it can properly interface with the target function. This is especially important when importing library functions written by others — if your program does not follow a convention used by existing library code, the program cannot properly use the library's functions. Calling conventions are an important part of system integration, and often depend on the operating system, compiler, optimization levels, and architecture.

While many calling conventions exist, one of the most commonly-used is the C Calling Convention (sometimes called cdecl. In cdecl, a function call consists of several steps:

The program saves the current stack frame by pushing the current base pointer (ebp) and exchanging the current stack pointer (by copying esp to ebp)
The program pushes the parameters to the target function onto the stack in reverse order (consider: why do parameters get pushed in reverse order?)
The program issues a call instruction to the target function, which saves the address of the next instruction onto the stack (the return address gets pushed onto the stack)
Inside the target function, the program allocates space for local variables (usually by subtracting from the current stack pointer, esp)
After allocating space for locals, the function executes.
At the end of the function, the program deallocates stack space (consider: how does this relate to scope of variables?)
The program exits the function with the ret instruction, which pops the return address off of the stack, restoring the previous base pointer, and resuming execution from the saved address.

For an illustration, see the figure below.

Runtime stack overview

In the code on the left, main calls foo, which calls bar, each with some number of parameters.

In a typical runtime environment, when the program calls a function, it pushes the parameters onto the stack, saves the current instruction's address and old base pointer on the stack, then jumps to the called function. This is visualized on the right side of the figure, where the stack grows down in memory as elements are pushed onto the stack.

When a function returns, the saved address and base pointer are restored. Because the return address is saved on the stack, an attacker can overwrite it with an address of their choosing if the called function does not properly check input sizes.

Stack Overflow Vulnerabilities

The runtime organization inherently requires mixing control flow with data — the address of the instruction to execute after a function returns is saved on the stack (i.e., the return address). If the stack is controllable by a malicious user, they can smash the stack by overwriting the return address with a carefully-controlled address of their choosing.

Stack smashing vulnerabilities typically emerge when a program accepts user input (e.g., through a function like gets, which retrieves a string from the command line input). A program may allocate a fixed-size buffer in which to store the user's input. If the program does not carefully check or restrict the size of the input provided by the user, the buffer may be too small to store all the input. In the runtime environment described above, an input that is too large will extend beyond the end of the buffer, overwriting (or smashing) the return address saved on the stack.

Example vulnerable code

In the code above, the vulnerable function accepts an arbitrarily-long string as input, which is copied into the stack-allocated variable buffer. Note that buffer is initially supposed to be 12 bytes long. Thus, if vulnerable is called with an input longer than 12 bytes, then the stack will be smashed. Under the correct circumstances, the return address previously saved before calling vulnerable can be overwritten.

Hijacking control

A key idea we have discussed so far is that an attacker can overwrite a stored return address, causing the program to begin executing instructions at a location specified by the attacker. Consider what that means in the example vulnerable code above: an attacker can place instructions on the stack, then overwrite the return address with the address of the stack itself, so that the attacker can cause the program to execute instructions they specify! A successful stack smashing attack results in the attacker executing instructions they provide.

Notes and setup for Task 2

This task contains a package of starter code that you can download. Inside your Kali VM, download the following file: kjl.name/cs8395/hw1-baked.zip.

Download the starter package and unzip it.
Inside, consult the readme file.
Build the bof.c program and ensure you can run it with a clean input (the file goodinput is provided).
Disable address space layout randomization with the following commands:
sudo su
echo 0 > /proc/sys/kernel/randomize_va_space
exit
Use gdb to understand the vulnerable function, the runtime stack, and how the parameter to vulnerable gets placed on the stack.

As a simplifying assumption, you can use the included invoke.sh script so that the environment is the same between runs:
./invoke.sh ./a.out (no debugger)
./invoke.sh -d ./a.out (attach gdb)

Use the exploit.py script to craft a malicious input that, when provided to the vulnerable program, hijacks control of execution and causes it to run echo Hello world. Executing the exploit.py script will produce a file called badfile which you can provide to the program as input.
Recall that, when executed, the vulnerable program looks in the same directory for a file called input. You are responsible for renaming your input file to be input as you test your payload.
You are responsible for using gdb (or other tools) to determine the values required in the exploit.py script:
- The start location in which to place the payload. The malicious input is a large NOP sled with the payload in it. How far in should the payload be placed?
- The return address which should contain the address of a location on the stack that contains the shellcode. This should simply be a pointer to an address on the stack near where the payload exists — this is the address the CPU will go to when it tries to return from the vulnerable function.
- The offset on the stack in which the new return address is placed. This offset helps you tune exactly where the return address needs to be placed.

Turn-in for Task 2: Stack overflow proof of concept

As noted in the readme file, you must include a copy of your functioning exploit.py and badfile files in your final zip submission.
In your written report, describe your approach to determining the correct values that yielded a successful stack smashing attack. Include a screenshot of a successful run of the vulnerable program with your payload.
Note that exploit.py starts by creating a buffer of 512 bytes filled with value 0x90. What does 0x90 mean? Why do we use it when constructing stack smashing attacks? In addition, explain how your approach would have to change if you did not have access to the invoke.sh script.

Task 3: Metasploit

The MetaSploit framework is a powerful suite of tools that can be used to automatically craft malicious payloads and exploits against a wide variety of existing software. For example, it is possible to use MetaSploit to output a malicious input that can exploit well-known software in a few commands. MetaSploit also contains a variety of useful payloads, including a TCP shell server — that is, you can use MetaSploit to create a payload that, when executed, creates a server that an attacker can connect to, enabling remote access to a victim computer. In Task 3, you will revise the payload you were provided to create a remotely-accessible server within the context of the vulnerable program.

Metasploit is included with Kali out of the box. In a terminal, just type msfconsole to launch the Metasploit console. It takes a bit to load, but once finished, you can type help to list out the various features it supports.

Metasploit payloads

Metaploit has a number of payloads built in, targeting various use cases (e.g., dropping to a shell to run commands, opening a socket server, creating a remote desktop server, injecting arbitrary programs, and more for 32- and 64-bit Linux and Windows systems). To see a list of the available payloads, type show payloads in the Metasploit console.

For Task 3, we will use the payload/linux/x86/shell_bind_tcp payload. This payload, when executed, causes the vulnerable program to create a TCP server that allows an attacker to connect to the victim's computer remotely and control it. Doing so allows an attacker to exfiltrate valuable data from the victim system (by viewing file contents that would normally be inaccessible). To work with this payload, type use payload/linux/x86/shell_bind_tcp into the Metasploit console.

While I encourage you to work in the x86 environment, if you decide to or need to use aarch64 (e.g., Mac M1 or M2 silicon, or other ARM-based CPU hosts), then you may consider other payloads supported by Metasploit, such as the payload/linux/aarch64/shell_reverse_tcp. (Thanks to Aadi Bajpai for completing the assignment for aarch64).

Metasploit allows you to configure payloads for specific circumstances. For example, the shell_bind_tcp payload creates a shell server to which the attacker can connect. Under the hood, metasploit lets you configure the TCP port that it binds to (so that the attacker knows which port to connect to), and (optionally) lets you specify the remote host from which a connection should come so that only the attacker can remotely access the host. You can type show options to see which options are configurable for the payload you selected. For the shell_bind_tcp payload, you need only specify the LPORT, which is 4444 by default.

You generate the payload by typing generate LPORT=4444 (i.e., you include whatever options you need to set). The resulting output is a string of binary instructions that represent the raw payload that must be placed on the stack during the attack.

When this payload is executed by the vulnerable program, it opens a shell server that allows remote connection. You can connect to the shell that is created by using telnet localhost 4444 in a separate terminal in the Kali VM. In practice, attackers embed other properties like callbacks to help track which victim IP addresses are ready to connect to. For this assignment, you can contain everything within your Kali VM, so working with localhost or 127.0.0.1 is fine. Once connected via telnet, you can issue arbitrary commands. For this assignment, use cat /etc/passwd to display the accounts created on your Kali VM.

Turn-in for Task 3: Metasploit

Generate a shell_bind_tcp payload using Metasploit in your Kali VM.

In your report, include a copy of the payload output from Metasploit (a screenshot will suffice).
Adapt this payload to work with the bof.c program provided in Task 2. Demonstrate that you can start a shell by exploiting the stack overflow vulnerability. Attach a copy of the input file in your submitted zip called badfile-shell.
Provide a screenshot showing two terminals: one in which you run the vulnerable program and execute your payload, and another in which you connect to the shell that is created using telnet. As indicated above, connect to the shell and run cat /etc/passwd to show the accounts on your Kali VM.

What to turn in for HW1

you must submit a single .zip file called vunetid.zip. While you can work with others in the class conceptually, please submit your own copy of the assignment. Your zip file must contain:

hw1.pdf, a single PDF file containing the written elements required in this assignment. You should have a Task 1, Task 2, and Task 3 section in your PDF.
exploit.py, the modified version of the exploit.py file that contains your modifications to get the stack smashing exploit to work by running echo Hello World in the vulnerable application context.
badfile, a file containing the malicious input generated by your exploit.py script that, when fed to your bof.c program, causes the payload to execute.
badfile-shell, a file containing the modified payload produced by Metaspoit that creates a shell that is remotely accessible via TCP.

Use the submission system (VU login required) to submit your zip file.