ScaleSec Blog

Test Driven Development for Secure Infrastructure | ScaleSec

Written by Eric Evans | Sep 24, 2020 7:00:00 AM

Test Driven Development for Secure Infrastructure

TDD: A Cloud Security Perspective

This is the second in a series of posts about using Test Driven Development (TDD) for secure cloud infrastructure. For an introduction on what TDD is and how it can be beneficial to cloud infrastructure, please read the first article of the TDD series. In this article, we will examine how to use a common test framework - Chef Inspec to test Google Cloud Platform (GCP) security configurations, and how infrastructure can yield significant security benefits from using a test-driven approach. Keep in mind that there are many other testing frameworks available for a variety of clouds (see the first article of the series) and that the principles explored here can generally be used with those tools as well. Where applicable, throughout this article we will mention other security controls that can be used along with TDD best practices for a proper defense in depth strategy in the cloud.

Read Getting Started with TDD in AWS

Write Tests First Before Deploying Infrastructure

Implementing Security Requirements using a TDD Approach

Security mandates typically come from a cloud security and/or compliance team(s) (which themselves are typically driven from business or sales requirements). The implementation of the controls associated with these mandates typically fall on the shoulders of cloud practitioners. Using TDD, tests can be written prior to deploying cloud infrastructure to ensure that proper security controls are in place before security misconfigurations are detected. Writing tests first will ensure that each security control has been implemented properly and can be consistently verified as changes occur. In addition, writing tests first fosters a deeper and earlier understanding of cloud infrastructure requirements leading to a visible focus on security as soon as possible in the process - a concept typically referred to as shift left security. Tests are expected to fail at first or not even compile because the resources they are checking do not yet exist. This is expected and should be viewed as a goal to get the written tests passed!

Reducing Public Exposure

Reducing public exposure of resources in the cloud is an effective way to mitigate risk. The likelihood of a vulnerability causing damage to a cloud resource goes down if there isn’t a network path to reach it. Using tests with your infrastructure as code (IaC) ensures that your resources are limiting their exposure to the public internet.

External Compute IP Addresses

An effective way to protect Google Compute Engine (GCE) instances from being reachable from the internet (and thus exposed to reconnaissance and public vulnerability scanning) is to ensure that they do not have a publicly facing IP address. Not having an external IP address assigned to a GCE instance is a security control referenced in several security and compliance regimens.

control "limit-public-exposure" do
 impact 1.0
 title "Instance does not have an External IP address"
 desc "Check that compute instances do not have an IP address"
 describe google_compute_instance(project: gcp_project_id, zone: 
gcp_zone, name: gcp_instance_name) do its('first_network_interface_type'){ should_not eq
"one_to_one_nat" } end end

Note that this test can also be used with the organization policy constraints/compute.vmExternalIpAccess as a preventative measure to ensure GCE instances are not created with an assigned external IP address.

Follow the Principle of Least Privilege

The Principle of Least Privilege is a security best practice that states that systems should only have the access required to perform its tasks and no more. In the cloud, this is extremely important when it comes to Identity and Access Management (IAM). When IAM is restricted correctly, risks such as blast radius for compromised resources and privilege escalations are mitigated since the permissions that the particular identities have are limited.

Default Compute Service Accounts

By default, GCE VM instances use the default service account. This account is overly permissive in nature having been granted the primitive Editor role in IAM - allowing the instance to change existing resources and create/delete resources for most Google Cloud services. In addition, simply deleting this default service account is not recommended since any applications that depend on that service account’s credentials can fail.

In order to ensure our GCE instances are not provisioned with the default service account, another custom service account can be added to the instance (which overrides the default). The test below ensures that the instance’s service account is not nil (which means null or non-existent) to ensure that the default service account credentials are not used on this instance.

control "no-default-service-account" do
 impact 1.0
 title "Instance does not use the default service account"
 desc "Check that compute instance has a custom service account attached to it and not default (nil)"
 describe google_compute_instance(project: gcp_project_id, zone: 
gcp_zone, name: gcp_instance_name) do its('service_accounts'){ should_not be nil } end end

Note that the organization policy constraints/iam.automaticIamGrantsForDefaultServiceAccounts can be used in conjunction with an infrastructure test suite to ensure that the default App Engine and GCE service accounts are not automatically granted the Editor role when they are created.

Ensuring Proper Data Protection

Stating that storing data in the cloud is becoming a trend is an understatement: according to the “Data Age 2025” International Data Corporation (IDC) white paper, worldwide data will grow from 45 Zettabytes last year to 175 Zettabytes by 2025 with as much data residing in the cloud as on-premises data centers. In light of numerous public breaches that have occurred on the cloud, having proper controls for data is becoming an increasing concern - and rightfully so.

Proper Environment & Data Classification Labels

The advent of labels (or tags) for resources has given cloud practitioners a useful tool to store attributes about their resources and data. Using labels facilitates the use of automation via tag filtering, targeting via tags, and so on. For example - updates can be sent to resources with an environment tag “staging” before rolling out the updates to instances in “production.” From a data protection standpoint, classification of information can be achieved by using labels.

The test below is a good example of combining multiple checks to satisfy a control. We are first ensuring that the keys environment and data_classification exist, then verifying that the values for these keys match an expected environment or data classification level. Regular expressions (regex) can be used to test that proper tagging has occurred.

control "proper-tagging" do
 impact 1.0
 title "Proper Tagging for Compute Resources"
 desc "Check that compute instances have proper tags"
 describe google_compute_instance(project: gcp_project_id, zone: 
gcp_zone, name: gcp_instance_name) do its('labels_keys') { should include 'environment' } its('labels_keys') { should include 'data_classification' } end describe google_compute_instance(project: gcp_project_id, zone:
gcp_zone, name:
gcp_instance_name).label_value_by_key('environment') do it { should match '^(test|staging|production)$' } end describe google_compute_instance(project: gcp_project_id, zone:
gcp_zone, name:
gcp_instance_name).label_value_by_key('data_classification') do it { should match '^(public|sensitive|secret|top_secret)$' } end end

Test IaC for Security Posture Using a TDD Approach

In this article, we went over a few examples of cloud best practices and how to test for them using a common infrastructure testing tool. We have also shown how cloud security and compliance controls can be accounted for using a TDD approach. We went over several practical examples with code of how to use Chef Inspec to test for proper security controls in GCP. Using TDD, infrastructure can be tested in test environments before they are deployed (IaC is promoted) to staging or production environments. Using TDD with other security controls can provide a robust defense in depth strategy for the cloud.

Acknowledgements

I would like to extend a special thanks to Jason Dyke for his editor contributions to this blog article.