Cloud Incident Response: Investigating AWS Incidents

thatdfirdude
Sep 22, 2023
8 min read

Updated: Sep 24, 2023

Let’s change it up a bit. With the ever-evolving threat landscape and technology revolving around infrastructure, we have to talk about investigations regarding Cloud platforms, right? This will be a slightly unique discussion, as depending on your role, you may be involved in certain aspects of cloud IR. Whether you’re a contractor and learn a customer’s cloud environment on the fly or you’re reading this to understand how to better respond to your own environment, I’m hoping that everyone can learn something from this post. In the future, I’m hoping to discuss additional cloud investigations, as there are many platforms such as Google Cloud, Azure, Microsoft 365, and Google Workspace and of course, many ways that cloud IR can be conducted. I won’t be able to cover all investigations, features, or attacks here, but let’s jump right in with Amazon Web Services (AWS)!

WARNING! This post will be lots of words! Unfortunately with learning cloud platforms, its not as easy as spinning up a VM and teaching yourself. So bear with me here!

The rundown:

Nearly everything in the cloud has a cost. Be aware of your data center and region location if you’ll be transferring anything out of AWS
Understand your current policies and roles within your AWS environment
Built-in Policy Simulator can help you understand scope of an incident and access a Threat Actor (TA) may have had
EC2 are AWS Virtual Machines (VM)
S3 Buckets are AWS Cloud Storage and can be used to store forwarded AWS logs
AWS logs are called “CloudTrail” logs
GuardDuty and AWS Detective are an AWS paid service to automate CloudTrail analysis
Virtual Private Cloud (VPC) can have network related virtual appliances/data
- This includes flow logs, load balancer, web-app firewalls, gateway, virtual firewall rules, etc
Glue and Athena are paid services for AWS to collect and search through AWS related logs from S3 buckets
AWS uses Lambda functions for event triggers and alerts

Okay, you entirely understand AWS, right? Probably not! As mentioned earlier, there’s a TON of information to learn when it comes to understanding cloud platforms. However, my main goal here is to touch on the key things to understand and look for when responding to AWS incidents in particular.

When it comes to AWS, its important to understand levels of access users have. If you’re responding to a compromised user/system, you want to know what level of access that user had right? This can help drive your investigative roadmap and begin understanding what a TA can do with that level of access. Will they need to escalate privileges in order to access more sensitive data or did this user have full access right off the bat? I’m sure many of us are familiar with Active Directory within Windows systems, right? Well, what is this called for other platforms, such as AWS? AWS refers to this as “Identity and Access Management” (IAM). This is term you should not only be familiar with in AWS, but in nearly anything security related. Speaking in AWS, IAM allows you, as an administrator to attach various roles and policies to a user account, defining what the user has access to and can do.

A unique feature within AWS is something called “IAM Policy Simulator”. You may think this is a game like The Sims, but this is actually a neat feature. Let’s say you have a compromised user you’re investigating and what to see what the user has access to. This can help you scope other impacted services/resources. You can search these services to see if the compromised user, or which user(s) have/had access to this. Pretty neat, right?

Now just like Google Cloud, Microsoft 365, etc. there are often numerous ways to access these platforms. It’s important to understand your overall attack surface and think like an attacker! Identify how your users can access your AWS portal/environment and how they can authenticate. Is this with a specific role, credentials, token based, or keys, such as an API. Remember, two-factor authentication all things where applicable!

The majority of this blog will be focused around CloudTrail logs, as these are the cornerstone of investigating AWS incidents. This is what AWS calls their “event logs”. The great thing about these is that they’re enabled by default! Wooohoo! Thanks, Amazon! Far too often do I hop into an environment for analysis and find that logging was disabled by default! Now, much like other cloud logs, keep in mind that these have a retention rate of 90 days or three months. It’s better than most that have 30 days, but can still be detrimental if you’re investigating an incident that goes back further than that. So, you’re asking yourself, “Great. I have to pay more money to increase the retention...”, well yes, but there’s another solution. FORWARD ALL THINGS. Consider forwarding these logs to an S3 bucket or another log collection platform, such as Splunk or Elastic Stack. AWS calls this a “Trail”, which can allow you to forward your CloudTrail logs to an S3 bucket, for unlimited retention.

Take note that although these are recording API events, such as the creation of a new IAM policy or role or even the creation of an EC2 instance, they are not logging what occurs within your VMs, such as operating system event logs and OS based artifacts. These are known as CloudWatch Logs.

Shown below is a great image describing CloudTrail logs by Amazon

Source: https://aws.amazon.com/cloudtrail/

What do these logs collect? Great question! These basically collect all API calls within AWS. The great thing with AWS is that the majority of activity makes some sort of API call! AWS names these calls Management and Data. Note that Management events are recoded by default.

Think of it like this:

Management – resource creation, such as a VM or a user

Data – Accessing some database or service

For informational purposes, I'll go ahead and list the event ID fields that you'd see when analyzing CloudTrail logs. Although I won't go into depth for each one (based off of the name, you'll likely understand what each one is).

EventTime

UserIdentity

EventSource

EventName

AwsRegion

SourceIPAddress

ARN

UserAgent

RequestParameters

EventID

EventType

Resources

One field that I will expand on because of its important for AWS IR, is 'ARN' or Amazon Resource Name. This is a unique identifier attached to a particular identity within AWS. Keep in mind that this is globally unique and ties activity to a specific resource, user and instance within AWS. This is a great indicator when hunting for a compromised user or to scope an incident. Shown below is a screenshot from Amazon describing the various parameters found within an ARN. More information regarding this can be found here

Source: https://docs.aws.amazon.com/IAM/latest/UserGuide/reference-arns.html

The CloudTrail dashboard itself packs some nice features for IR. This includes the ability to search for events such as times, users, IP addresses, types, user agents and more. Although as voiced earlier, cloud services can be a money pit, with nearly every service and action costing money. Keep in mind that although I won’t be going into depth about this hear, there are many different versions of AWS that offer additional paid services, such as Guard Duty, Detective, and others. As an analyst, you’d likely be using CloudTrail to search for unauthorized access and suspicious persistence created/modified, such as API keys.

As I briefly mentioned earlier, not all events are considered to be "CloudTrail" logs. There are many events that may appear or be recorded in other AWS logs. As an investigator, our goal is to understand what logs exists for forensics purposes. As per AWS, I wanted to list the various log sources that you'll find within an AWS environment. More information regarding these can be found here.

CloudTrail - Various API events

CloudWatch Events - AWS Resource changes

AWS Config - Resource Configurations

S3 Access Logs - Specific events within buckets. Such as uploading, downloading or modifying data

CloudWatch Logs - Application and OS logs (windows event logs)

VPC Logs - Flow logs related to network traffic

WAF Logs - Logs from Web App Firewalls

Route 53 Logs - DNS events

Understand that many of you may be working incidents that involve a compromised system within an AWS environment. These VMs are known as “EC2 Instances” or Elastic Cloud Compute. Incidents such as this may include, generic malware execution, maldoc, trojans, etc. In this case, you can treat these VMs similarly to a physical system. For example, you may want to collect a triage image of the VM using something such as KAPE by Eric Zimmerman. Since these are VMs, you may also want to hunt for anomalous creation of EC2 instances within your environment. This can be a great approach for a TA to maintain persistence or possibly utilize your resources for a botnet or even a CoinMiner. Be sure to consider creating certain alerts for the creation of EC2 instances. There are many types of EC2 that can be created, so understand what is normal within your environment.

Since AWS EC2 instances are VMs, the first thing that might be coming to your head is the potential to create SnapShots. Can an analyst take a snapshot of an EC2 instance? Yep! This can be downloaded as well and analyzed or mounted within your favorite forensics tool. Just keep in mind the costs associated with downloading content outside of your cloud environment.

As an analyst, hopefully we all ask for various forms of network logs, right? You have your firewall logs, Web-app firewall logs, proxy, netflow, IDS/IPS, load balancer, and more! Ugh, so many logs! Which is great, but exhausting! The great thing about cloud forensics, is that these logs exist as well! Many cloud platforms utilize some variation of a virtual network. In the discussion of AWS, this is known as a VPC or Virtual Private Cloud. This can logically manage your firewall and load balancer, but can also contain logs such as VPC Flow Logs! Anytime you hear that word “Flow”, don’t think of the Progressive Commercial. Think of a summarized data of network traffic. Essentially metadata. So, although you won’t have packet content, you’ll have bytes sent/received, IP address, time, ports, protocol, etc. When investigating AWS incident, be sure to collect these VPC based logs, as this can be a great way to further scope the incident if you identified a DNS request, IP address, useragent, etc. Additionally, this can be a great source to utilize for analysis into potential data exfiltration. Although this is thoroughly tough to confirm, NetFlow can be a great resource to collect large amounts of bytes being sent.

Lastly, let's talk about S3 Buckets. I'll start right off the bat with saying that these are no longer public by default. S3 buckets don't need to be complicated, though they can be. Essentially, think of these as just a virtual file-share within AWS. You can place nearly anything within an S3 bucket. As analyst, the key takeaway here is that these can be great for storing logs and data. If you're managing you're own environment, consider using an S3 bucket as log storage. For example, forwarding your CloudTrail logs here for extended retention rates. Once logs are sent to an S3 bucket, you can then use other AWS features such as Athena to search through the logs. Note that with the basic tier of AWS, you're allowed to create one CloudTrail "Trail" for free, so there's no reason why you shouldn't have a bucket for log collection! A great comparison for this is understanding the Elastic Stack (formally ELK Stack). There are log shippers known as "beats" that look for changes in a directory and forward these logs to a centralized location. Well, AWS has a similar feature known as "Glue". This will essentially collect and normalize logs from buckets and make it searchable with Athena. Basically, Glue and Athena go hand-in-hand. Lets think of it like this. You forward all logs within AWS to an S3 bucket. Glue actively collects data from these buckets and places it within a normalized database. Athena is used to perform searches against the large dataset.

Please note that if you're looking to investigate S3 bucket access, these are recorded in two locations. API access will be in CloudTrail logs, where as HTTP/GUI access will be within S3 Access logs.

I'm terrible at charts, so let's use a fantastic one explaining this more in depth by Amazon. As shown below, these "events" are collected and aggregated into "CloudTrail" logs.

Source: https://aws.amazon.com/blogs/aws/aws-cloudtrail-capture-aws-api-activity/

As I've mentioned many times already, there are obviously many more things to learn regarding AWS and cloud platforms as a whole! However, I feel as the items discussed here are great takeaways from a DFIR standpoint. Remember, forward all things, understand your environment, and know your logs!

I want to give a HUGE shoutout to David Cowen for the content provided in the FOR509 SANS course. You have done amazing research and I wouldn't have been able to understand, let alone write about any of this content without your course!

The DFIR Spot

Cloud Incident Response: Investigating AWS Incidents

Recent Posts

We love automation, right? Subscribe to get notifications from us.