Uncomplicating cloud Security - Infrastructure protection (Part 4)
As humans we all know we have our own problems and vulnerabilities, let’s not let our cloud accounts overwhelm us with theirs too!
Nowadays like many people, I’m really into podcasts. As of the last few years, I undoubtedly listen to conversations more than music, something my 16-year-old music-addicted younger self would find very surprising and would definitely judge me for. I get a lot of enjoyment and generally learn a lot from the wide array of shows I tune into. A common feature I find really engaging in some top podcasts is that some podcasters such as Tim Ferris and Sam Harris will regularly spend a large part of the initial conversation prefacing the notions that both interlocutors hold to be true, they usually insist on hashing out the definitions that they both agree upon to be true. Agreeing on the conceptual basis of a conversation and having a clear definition of what you are talking about is he only way to be part of a fruitful conversation.
This is introduction is aimed at doing just that, I want to plant a flag in the ground and start of this exploration into Infrastructure protection in the context of AWS by stating the three main concepts we have to have clear in our minds if we want to have a productive conversation.
They are ⬇️
Once we understand the different networking infrastructure layers we can then talk about protecting them by exploring the wide array of firewalls we can leverage. Then finally we will talk about how important it is to know how to detect, patch, and remove the vulnerabilities of your infrastructure. It’s through the understanding and combination of these three pillars, the infrastructure protection trifecta if you will, that we can consider ourselves great custodians of our infrastructure.
Infrastructure layers
When it comes to infrastructure layers there are many different analogies we can use to illustrate why it’s important to understand them well and know how to harden each one individually.
An analogy I’ll use in this case is as follows. Imagine trying to make sure your kids are safe when they get into bed at night. You would never send them to bed with a yellow reflective jacket and a helmet so they don’t get run over by a car. This would be so completely unnecessary, there are so many layers of protection between them and a car when they are in bed, that type of protection is ridiculous. In order for a car to threaten them, it would have to run off the road, shrink to be able to drive up the stair, and break down the front door of the house, I could go on but as you can see it’s not a worthwhile worry to think about. If anything you would want to think of measures that are bedroom specific such as if their windows are closed properly. do they have enough blankets? did you assure them the boogie man isn’t under the bed?
We can apply this form of layer-specific threat segmentation to cloud infrastructure layers by aligning them to our use case. For example, if you have an application made up of a series of microservices, gather the components that don’t need internet access and place them in a subnet in a VPC that has no internet access, by making the decision of eliminating all internet access you have just removed the potential of making a configuration mistake if you were depending on a more resource level measure like a security group. Cutting off internet access to the full subnet is similar to removing any threat of being run over by a car.
Speaking of layers, the main ones we should have in our mind are Regions, Availability zones, VPCs, and Subnets.
In the given diagram, an AWS Organization is shown to be composed of multiple AWS accounts. One of these accounts, Account-1, is shown to contain three VPCs that span three availability zones across three cloud regions. These VPCs are connected via a multi-region VPC peering connection. Additionally, there are two example managed services within Account-1 that do not belong to any particular region. These services can be used by the resources within the account.
Let’s take a closer look at each layer.
Regions
A region is a series of clustered data centers in a physical region of the world, each region has a minimum of three of these data centers which are called Availability Zones (AZ). Decisions about which region or group to place your application logic might depend on the location of the users of your application, you might have regulatory restrictions around where user data can be stored legally or you might be worried about availability and you might want to store your application in multiple regions in the event that part of the world suffered any outages.
Availability zones
AZs might be a single or multiple data centers that are connected to offer high availability, scaling, high-throughput, and low latency networking performance that simply wouldn’t be possible in a single traditional data center. Network data shared between AZs is encrypted and secure. There are always 3 AZs minimum per region and they are independant from each other, they are individually self-sufficient and fault tolerant located inside a radius of 100 km (60 miles) from each other.
VPCs
A Virtual Private Cloud (VPC) is a virtual network in which you control the IP ranges, subnets, and routing. Only when an IP range and subnets have been allocated can we then provision resources that can be routed to. VPCs can be public or private and have a wide selection of security features embedded in them. We can create VPC endpoints to communicate privately with other resources, we can attach gateways to them, VPC peering and enabling VPC flow logs are some of the many features VPCs provide. The VPC layer is a crucial security layer.
Subnets
A subnet is a range of IP addresses in your VPC. You can launch AWS resources into a specified subnet. Use a public subnet for resources that must be connected to the internet, and a private subnet for resources that won't be connected to the internet. This is the lowest rung in the infrastructure layer hierarchy. We can further protect resources inside a subnet using firewalls like Security groups or network access control lists.
Amazon Web Services (AWS) provides managed services that make it easy to connect different layers of your infrastructure. One such service is AWS Transit Gateway, which acts as a centralized portal for internet communication. Traffic between the transit gateway and your VPC is encrypted and private, thanks to AWS PrivateLink. You can also connect AWS regions using AWS transit gateway inter-region peering, which is a secure way to transfer data or connect VPCs with VPC peering too. Another method of integration is to use AWS VPC endpoints, which allow a VPC to communicate with AWS-managed services. All of these methods leverage the AWS backbone, which is a private, encrypted, and secure internal networking infrastructure. By using these methods and minimizing internet-facing resources, you can reduce your exposure to potential threats.
Firewalls
Firewalls are like nightclub bouncers, they are the network devices in charge of protecting the resources inside of them and keeping out or denying unwanted traffic. We have a number of firewalls at our disposal which come with different characteristics and use cases.
Security Groups
A security group is a stateful firewall, meaning that allowing inbound traffic on a certain port is automatically allowed outbound. They are usually applied directly to EC2 instances. You can reference allowed IP addresses in the inbound rule, but it’s best practice to be as buttoned down and specific as possible and reference the name of the security groups that are allowed to communicate with the target security groups. In the example below only the resources attached to security group SG-2a can communicate with the resources attached to security group SG-3a. Security groups deny all traffic by default.
NACLs
A network access control list is a stateless firewall (outbound rules are not automatically added if an inbound rule is) that is applied at the subnet level. It determines which network traffic is allowed into the subnet. NACL’s add an extra layer of protection over security groups. By default they allow all inbound and outbound traffic.
Routing tables
Routing tables are lists of directions, they can be assigned to gateways, routers, and subnets. I include them in the firewall section since they are the routes that allow connections between entities. Without a route, there is no possible communication. When it comes to routing tables in the context of security we want to make sure not to misconfigure them and add overly permissive and open routes such as 0.0.0.0/0 on port 80, which would allow any internet to communicate over HTTP with the route target.
AWS WAF
No matter how creative hackers think they are, many penetration techniques are the same. AWS WAF can help you protect yourself against some of the most common hacking techniques such as SQL injection or cross-site scripting (XSS)
VPC Peering
Adding VPC peering to the firewalls list makes sense because it’s another way we can use to take advantage of network communication of the AWS backbone. We can set up peering connection to link our VPCs in different regions. We only need to make sure that they down have any overlapping subnets. Once the peering connection is establish, resources in connected VPCs can communicate without traversing the public internet
Virtual Private Network (VPN)
A virtual private network, or VPN, is a type of network that allows you to create a secure connection to another network over the Internet. This is done using virtual tunneling protocols, which encrypt the data that is sent over the VPN connection so that it cannot be accessed by unauthorized users. A VPN is often used to secure connections to sensitive information or to access resources on a private network, such as a corporate network, from a remote location. A VPN can be used to connect to a VPC in order to securely access resources that are running in the VPC.
Vulnerabilities
Cloud security vulnerabilities are a significant concern and they come in many shapes and sizes. One type of vulnerability that is particularly concerning are code dependency vulnerabilities. These vulnerabilities arise when an application or system relies on a third-party code library or module that contains a security flaw. If this flaw is not discovered and addressed, it can be exploited by attackers to gain unauthorized access to sensitive data or disrupt operations. We can use tools such as Snyk and OWASP to help us with this type of vulnerability scanning by integrating them with our source code repositories.
By removing vulnerable dependencies we are one step closer to what is sometimes referred to as reducing our attack surface. Doing so makes it more difficult for potential attackers to access your systems and data. This can also involve a variety of measures, such as implementing strong authentication and access controls, regularly patching and updating your software, and using encryption to protect sensitive information. By reducing the number of entry points and vulnerabilities in our cloud infrastructure, you can help protect your systems and data from being compromised by attackers.
So if we imagine our attack surface as all of the possible ways we can be breached or attacked, by putting effort into reducing the number of vulnerabilities we are less likely to be successfully attacked. Since each environment is different and there is not one complete answer that works for everyone, an argument can be made that the AWS Well-Architected Framework is just a long list of ways to have to harden your systems and help you reduce your attack surface.
The Trifecta in action
A vulnerability-conscious CICD pipeline
Below we can find a generic CICD pipeline that consists of a commit action that is pushed to a Git repository, the commit triggers the build of a new artifact. It passes through a series of unit and integration tests and upon completion passes on to the deployment phase. The package, we can assume to be encapsulated in some flavor of container a series of segmented EKS clusters which relate to the different development environments (Dev, Staging, and Production)
In this diagram I want to focus more on the two infrastructure protection methods we are using:
1 - Snyk
We have integrated the open-source version of Snyk to run a vulnerability scan every time there is a commit to the source code repository, we can use the Snyk console to run scans on the repo on a dedicated schedule and with every scan, Snyk will detect code and library dependency vulnerabilities, open PRs on your behalf and serve as the first level of defense against known package dependency issues that might expose your code.
2- AWS Systems Manager Patch manager
In the diagram below our AWS account is made up of 3 distinct EKS clusters which constitute our development environments and production. It’s safe to assume that the EC2 instances that make up the cluster are part of an auto-scaling group and are provisioned dynamically. As time passes some Amazon Machine Image’s (AMI) get breached or show vulnerabilities we don’t want to have to patch them manually ourselves. We can leverage the AWS Systems Manager Patch manager feature to automate the task of patching problematic AMIs when needed.
A vulnerability-conscious 3-tiered diagram
Let’s imagine an application made up of a front end and a backend that persists in data to a series of highly available RDS instances. This is a typical 3-tiered architecture and can utilize firewalls and keep a high level of network and infrastructure protection by adopting measures like the ones in the example.
1 - Internet Gateway
We have an internet gateway that serves as out connection to the internet, any public originated IP traffic will have to pass through it.
The internet gateway is connected to a router that has a routing table that designated all of the possible routes incoming traffic can take. Here we would add routes and integrate them with the NAT Gateway in the public subnet.
We have the first Elastic Loadbalancer that will direct traffic inside the first public subnet, notice that it’s inside the Public Network Access control list, if this NACL doesn’t have any rules associated with it, all traffic that is directed to the load balancer will hit it, but we can add inbound and outbound rules to curate this traffic.
4 - Security Groups
We can see an example of a security group in action applied to an EC2 instance that houses the front end of the application in the example. It more than likely has an inbound rule that only allows traffic from the subnet Elastic LoadBalancer so any traffic that reaches the EC2 instance must pass through it first (even if it’s in a public subnet).
5 - NAT Gateway
In subnet 1a we have a NAT Gateway which is an EC2 instance that will serve as a bridge between IP communication in the Public and Private subnets.
6 - Private Subnet
The IP addresses in these subnets are private and cannot be reached unless passing through the NAT Gateway and if routes are provided for them in the subnet routing table additionally there is no NACL rule that impedes the communication nor any blocking security groups.
7 - VPC Endpoints
Some of our resources might depend on other AWS resources. If this is the case. It makes no sense to traverse the open internet to communicate and transfer data between the resources. Utilize VPC endpoints which is an integration method that AWS provides to integrate some AWS-managed services with resources inside your VPC. Since VPC endpoints use AWS PrivateLink which routes traffic on the internal, encrypted AWS backbone. This traffic is safe and highly secure. Removing therefore a large surface that could have been open to exploitation otherwise.
Conclusion
When it comes to cloud security, the main goal is to prevent attacks, exploitation, and other security incidents. One way to do this is to understand the attack surface of your systems and identify potential vulnerabilities. Then, you can use security measures such as firewalls and appropriate architecture designs to protect your systems. Using tools like AWS Private link and the AWS Backbone can also help by increasing the amount of private and encrypted data transfer. By taking these steps, you can help ensure the security and protection of your cloud infrastructure.
Regardless if you are a Developer, DevOps, or Cloud engineer. Dealing with the cloud can be tough at times, especially on your own. If you are using Tailwarden or Komiser and want to share your thoughts doubts and insights with other cloud practitioners feel free to join our Tailwarden Discord server. Where you will find tips, community calls, and much more.