Utilizing QELP for Rapid ESXi Analysis

thatdfirdude
4 minutes ago
6 min read

Welcome back fellow DFIRs! This blog will be terse, tactical and to the point! In other words, this blog will be strictly focused on explaining a fantastic tool by Stroz Friedberg that is exactly that; a tool that just works, doesn’t have complicated syntax and outputs files like you’d expect. It’s as easy as pointing it at a collection (I’ll go into more detail on this soon), spit out an output, and the analyst reviews it! Basically, allowing you to get to the important data even faster!

The Rundown

QELP stands for Quick ESXi Log Parser
QELP was created by Stroz Friedberg
QELP can be downloaded from here - https://github.com/strozfriedberg/QELP
The tool is designed to put pointed to a UAC (Unix-Like Artifact Collection) or ESXi Support Bundle and outputs relevant findings that an analyst should review to identify malicious activity
QELP will not collect ALL data, but rather very focused and specific logs and TTPs (Tactics, Techniques and Procedures) that Stroz often saw in their engagements
Some of the QELP collections include SSH activity, logins, password events, file transfer events that may suggest exfiltration, and new user creation for persistence
QELP parses the important data found in standard ESXi logs such as
- Hostd
- Syslog
- auth
- Shell
- and others
Once executed, QELP will generate multiple CSVs from the various logs it collected from; for example, authd.csv
My favorite feature, QELP will create a super timeline of all the events it collected from relevant logs; essentially, it will place all the findings into a single CSV. An analyst can then review the super timeline to see what happened before and after an event, from all collected logs in a single view

If you’ve worked a ransomware case, then you’ve likely conducted some sort of ESXi analysis; if not, then buckle up! If you have an ESXi environment and haven’t experienced a ransomware incident yet, then kudos! However, be prepared that when you experience one, your ESXi is going to be a prime target. Why is this? Well… it's a quick and dirty way that malicious actors can encrypt a large amount of data in a very short amount of time.

How do they encrypt all of this data so quickly? It’s simple! All they need to do is navigate to your ESXi server, power-down all VMs using a simple command line, and then encrypt the VMDK/VHD files themselves! So rather than encrypting a large amount of files on a Windows host, like what you see in a standard ransomware incident, they only need to encrypt a single file (the VMDK/VHD), and then boom! You’re locked out! In my own personal experience, I have yet to work an engagement where ransomware occurred and ESXi was not impacted! If it's accessible to the threat actor, you better believe they are going to hit it in a very short amount of time.

In my cases, threat actors won’t spend much time on the ESXi server; typically, they will remotely connect to the server using a CLI based tool, such as puTTY via SSH, power down the VMs, encrypt the VMDK/VHD, drop their ransom note, and they’re done. However, from a forensics perspective, there’s a handful of logs you’ll want to review to quickly identify how they moved to the server, what they did, and where they came from. QELP helps exactly with this! Out of all the relevant logs, it will quickly parse these and help you answer the question of who, what, when and where! Once you identify where they came from and what account they leveraged, you can begin working backwards to continue identifying root cause and blast radius!

Additionally, from the cases I’ve worked on, these logs and artifacts align with what you’d need to review in most cases to answer the important questions fast; as you know, in an engagement, things are going to move fast and the faster you can parse and analyze the data, the better! Especially if you have to report findings to C-Suite or clients.

Now, it goes without saying that if you see additional activity or have reason to believe that you need to perform a deeper review of the ESXi server, then you can’t rely only on QELP; this tool is specifically designed to answer very specific questions from very specific logsets. With this said though, you can most definitely utilize it to determine if an additional deep-dive is needed.

Collecting the data

Okay, let’s get started by collecting the data. There are two common ways to collect triage, and I stress “triage” here, from an ESXi server; first, you can run the very well known tool known as UAC, or Unix-Like Artifact Collector, which I have a write-up on here - http://thedfirspot.com/post/linux-forensics-collecting-a-triage-image-using-the-uac-tool. Another approach is creating an ESXi Support Bundle; this is accomplished by navigating to your vSphere client/portal, select actions, and then select “generate support bundle”; this will create a zip folder that contains relevant logs that are not only useful for troubleshooting, but also for forensics investigations for ESXi servers. More information for this can be found here and here.

Once you have your triage collection (UAC or the support bundle), you can run QELP against the data!

Executing QELP

As mentioned, the command-line for QELP is very straightforward. Point it at your archive containing the UAC/Support Bundle, provide an output folder, and you're good to go! Its very important to note that QELP must point to the DIRECTORY containing your UAC/Support Bundle archive, it cannot point to the file itself; my example shows a ".tar", but this was actually a directory for my testing purpose. If you do not specify a directory, QELP will error. This is one change I think I recommend for QELP is to potentially allow the user to provide a -f flag for a single file or -d to specify a directory! Because QELP supports a directory, you can point this to a directory that contains multiple ESXi collections. Fast analysis at scale?! Yes, please!

Once complete, QELP will create multiple CSVs based on the log file that was analyzed within your specified output directory. "Extracted_logs" contain the raw logs that QELP analyzed. Note that the majority of these logs are often located in /var/log or /var/run/log, though they can exist in other locations depending on the configuration and version.

Although I won't go into detail of each of these logs, I want to quickly explain why they're important

hostd - This contains login information and potentially files accessed
shell - Contains commands run via an interpreter, such as BASH
syslog - Contains logins, file activity, system events, and more
vobd - Contains SSH activity and system events
auth - Contains login information
vmauthd - Login events

Now, lets take a look at one of them and see what's inside! Lets start with the syslog.csv file. Here, we can see logons occurring for the "root" user. As you can see, this is actually categorized by "access type". In the below example, we see logon, logoff, and user_activity events. If you have a user of interest you're investigating, you can quickly come here and identify the logon activity!

How about reviewing the shell.log? For reference, shell.log contains commands that were executed via a shell, such as bash. Typically, this where you'll see the TA executing recon commands, shutting down the VMs, and running their encryptor! As we can see, this is categorized as "bash_activity" and has all the commands I was running; yes, I run "ls" each and every time I run CD. Its a habit.

Now, as I mentioned at the beginning of the blog, one of my favorite features of this tool is the fact that it creates a "super timeline". A super timeline is essentially a single file/timeline that contains all events from every log source in temporal (time) based order. So in this case, it will contain the contents of shell.log, esxcli, shell, syslog, and vobd in a single CSV file. This can be used to quickly get context on what occurred before and after an event.

From the above image, we can see exactly what happened from all log sources! We see an SSH session was established from 192.168.202.1 with the root account, they ran ls, navigated to /vmfs, volumes and a specific datastore. It can't get much easier than that!

You can see that if you're investigating an incident and need to get findings fast on an ESXi server, this is a fantastic tool to run; again, its not designed to answer every single question and find all evil, but its specifically designed to focus on relevant logs/keywords and events from ESXi logs to get you the important data in a friendly format fast. Note that with any of my blogs, I always mention potential caveats; here, a big one is that a threat actor CAN clear these various event logs, but you're forwarding all of your logs via syslog to a centralized server, right?... RIGHT?!

The DFIR Spot

Utilizing QELP for Rapid ESXi Analysis

The Rundown

Collecting the data

Executing QELP

Recent Posts

We love automation, right? Subscribe to get notifications from us.