Event driven programming has grown increasingly popular with the rise of cloud computing. As application architectures begin to embrace Serverless, many applications are being run purely by event driven services. AWS Lambda, Azure Functions, and Google Cloud Functions are all popular cloud services which allow for applications to execute arbitrary code based on specific events.
Reference: https://cloud.google.com/images/products/functions/how-it-works.svg
At ScaleSec, automation is one of our guiding principals when working with customers to improve their security posture in the cloud. As DevOps continues to take over the application development ecosystem, we consistently see security programs struggle to keep pace. In order to close this gap, the concepts of event driven programming can be integrated into DevOps processes to build automated responses to security related events.
A common problem InfoSec teams encounter when running applications in the cloud is virtual machines (VMs) whose virtual firewalls allow unfettered access from the internet. These VMs can create big problems in a matter of minutes, not hours or days. Internet exposed machines are exponentially more vulnerable to adversarial takeover, simply on the basis of increased attack surface.
In order to combat internet exposed VM’s, Google Cloud Platform users can leverage the following services to automatically respond to the creation o update of a firewall rule to ensure open access to the Internet is not allowed:
In this section, we will describe how each of the above services integrate together to create an automated security workflow for remediating firewall rules allowing SSH access from the Internet.
Visual Diagram of Automated Workflow
Every time an interaction occurs with a GCP resource, the API call made is logged in Google Stackdriver as an Admin Activity Log.
In our solution, we can leverage these logs to alert us to when certain security related events occur, such as an update to an existing firewall rule. Admin Activity logs are always enabled, but to use them in a workflow we must first create an aggregate export sink.
Aggregate Export Sinks are set at an organization or folder level and include all child objects (ie all projects in a folder). These sinks allow users to select log entries based on a set filter and send them to BigQuery, Cloud PubSub, or Cloud Storage.
In our solution, the Stackdriver filter we will use to capture logs which relate to firewall rule creation or update is:
logName:logs/compute.googleapis.com%2Factivity_log
resource.type:gce_firewall_rule
jsonPayload.event_subtype: (compute.firewalls.insert OR compute.firewalls.update OR compute.firewalls.patch)
jsonPayload.event_type:GCE_API_CALL
This filter can be translated in english to the following criteria:
Additional information on how to create advanced Stackdriver filters can be found here.
Cloud Pub/Sub is a real-time messaging service which allows for independent systems to publish or subscribe to messages in a queue. In our demonstration workflow, a Pub/Sub topic is used by the aggregate export sink as a destination to send logs which have matched our filter.
Our PubSub topic also serves as the trigger for our Cloud Function. The PubSub will only trigger the Cloud Function when it receives a log, providing greater efficiency and minimal compute cost when compared to earlier implementations, where persistent VMs would periodically poll APIs to check configuration settings.
Cloud Functions allow us to write small, purpose built functions which are triggered when a specified event occurs. In our solution, a Cloud Function will be trigged whenever a firewall is created or updated.
Let’s step through the Cloud Function to understand how it works.
The activity log generated during the firewall event is passed to the Cloud Function in the data field of the PubSub Message. PubSub messages are base64 encoded JSON objects so we must decode the message and parse the JSON.
This is shown in the first lines of our process_log_entry function in main.py.
import base64 |
import json |
import googleapiclient.discovery |
import string |
import time |
def process_log_entry(data, context): |
#base64 decode the data field |
data_buffer = base64.b64decode(data['data']) |
#Load JSON so we can parse it |
log_entry = json.loads(data_buffer) |
# Get Firewall Name by Parsing Log |
firewall_name = log_entry['jsonPayload']['resource']['name'] |
# Get Project ID by Parsing Log |
project_id = log_entry['resource']['labels']['project_id'] |
With the firewall name and the project ID, we can use the Google Cloud Python SDK to describe the firewall.
# Create Client for the Compute Engine API |
service = create_service() |
print('Describing Firewall') |
# Checks if the Firewall is disabled. Returns True/Faalse |
disabled = check_for_disabled(project_id,service,firewall_name) |
# Gets the Allowed Source Ranges for the firewall rule. Returns a list of CIDR Blocks |
source_ranges = get_source_ranges(project_id, service, firewall_name) |
# Checks if the Firewall is an "Allow All" Rule. Returns True/False |
allow_all = check_for_allowed_all(project_id, service, firewall_name) |
After we have the necessary information about the firewall resource we can assess if the firewall allows SSH from the internet.
# If Firewall Rule Allows all then call the disable firewall function |
if allow_all == True: |
# Require short sleep for demo as it takes a few seconds between log creation and firewall resource completion. Will get 400 Error if we don't |
time.sleep(20) |
disable_firewall(project_id, service, firewall_name) |
print("Firewall %s Disabled" % firewall_name) |
else: |
# Function to get all of the allowed Ports for the Firewall Rule. Returns list of ports and ranges |
allowed_ports = get_allowed_ports_list(project_id, service, firewall_name) |
# Function checks if SSH is allowed. Returns True or False |
ssh_allowed = check_for_port_22(allowed_ports) |
# If TCP Port 22 is allowed and 0.0.0.0/0 is in the Source Ranges List and the Firewall is not disabled, then disable the firewall |
if ssh_allowed == True and '0.0.0.0/0' in source_ranges and disabled == False: |
# Require short sleep for demo as it takes a few seconds between log creation and firewall resource completion. Will get 400 Error if we don't |
time.sleep(20) |
disable_firewall(project_id, service, firewall_name) |
print("Firewall %s Disabled" % firewall_name) |
# If TCP Port 22 is allowed and 0.0.0.0/0 is in the Source Ranges list and the firewall is disabled. Do nothing as Firewall is already disabled |
elif ssh_allowed == True and '0.0.0.0/0' in source_ranges and disabled == True: |
print("Firewall %s allows SSH from the Internet but is disabled") |
# If any of these are false do nothing as SSH is not allowed from the internet |
else: |
print('Firewall %s does not allow SSH inbound from the internet' % firewall_name) |
We have provided a Terraform Module on our GitHub which will provision all the resources necessary for a demonstration of the concepts discussed in this article.