By now, many organizations have realized that there are significant benefits to moving to the cloud and have moved past their initial fears. However, oftentimes teams are still not taking full advantage of the new capabilities available to them, specifically in the area of Security Operations and automation. Many organizations have carried over manual processes from on-premise environments, which are typically filled with red tape. Now teams are even more strained because they may have to manage on-premise and/or multiple cloud environments with the same sized staff. One way to move past this hurdle and optimize a security program in the cloud is to take those old manual processes and modernize them. In the cloud, this typically means automating as much of these processes as reasonably possible.
There are major benefits to teams that can incorporate automation into their daily security processes. Just a few include:
Are you looking to continually increase the system capacity for your SIEM, and in turn increase costs? Do you want your team members to spend significant amounts of time monitoring investigating CloudTrail events, VPC flow logs, or low priority finding that appears in SecurityHub? Should they spend time manually remediating by reverting changes to security groups or NACLs that developers accidentally made? Probably not. Typically security teams are understaffed, so anything that can be done to reduce their workload and remove repetitive tasks is welcomed. This is where process automation will drastically improve your response time and allow your smaller team to scale their workload exponentially. For the remainder of this article we are going to walk through a specific example of automation that can be used as a starting point to implementing more advanced automation workflows.
A first step many organizations perform when configuring a security baseline in AWS is to implement the Center for Internet Security (CIS) AWS Foundations Benchmark Standards. The benchmark provides guidelines for a secure infrastructure and application architecture. In addition, it includes some basic automation recommendations, specifically around automated alerting. Standards 4.1 through 4.15 in the Foundations Benchmark v1.3.0 dictate how to implement automated alerting when specific AWS API actions, possibly representing malicious activity, occur. This is a great first step when wading into automation, but there are some limitations with CIS’s implementation guidance as you try and mature your automation suite.
CIS’s recommended approach for automating alerts on potentially malicious API calls is to use CloudWatch Logs and metric filters. This implementation requires CloudTrail events to be forwarded to a CloudWatch Logs group. A metric filter per alert is then applied to that log group. The filters monitor the incoming CloudTrail events and maintain a count of when defined matching patterns are found that may signal incidents or policy violations. When a specific number of these patterns are detected in a predefined time period, the metric filters trigger an alarm that then publishes a message to SNS stating the threshold has been reached. This approach works great for automating alerts, but it does not lend itself well to advanced workflows, like automated remediation. Specifically:
How can we address the limitations in the CIS recommended implementation? The key is taking advantage of AWS’s EventBridge service to create a fully event driven SecOps architecture instead of relying on CloudWatch Logs. EventBridge is a serverless event bus, and the successor to CloudWatch Events. It provides a default event bus that many AWS services, over 90 at this time, automatically emit a variety of events to. This includes CloudTrail, which streams every event it logs to the bus in near real time. Every event that is delivered to the bus can be filtered on via rules, and then acted upon. When an event matches a pattern defined in a rule, the rule will then initiate an action against a target such as sending a notification to an SNS topic, executing a lambda or ECS task execution, posting to a custom API, or about 20 other available targets. We can take advantage of EventBridge rules to monitor CloudTrail events as opposed to CloudWatch metric filters. The benefits realized by this approach include:
Knowing the benefits that EventBridge can provide, let’s walk through how using EventBridge for CIS alerting works.
CloudTrail events are automatically written to the default EventBridge bus. This means there is no need to set up CloudTrail forwarding to CloudWatch logs for this solution. Instead, we will create EventBridge rules that monitor the default bus and trigger when a pattern match occurs. When the rules trigger they will send a copy of the entire CloudTrail event record to an SNS topic. This SNS topic can then be subscribed to and receive alerts. In addition to SNS, the rules will also forward the CloudTrail event record to an auto-remediation Lambda function. In our example, this Lambda will only remediate a single type of event (such as the stopping of CloudTrail logging) but it can be easily modified to address the other findings you may want to auto-remediate. The high level architecture deployed is shown below.
We have created a Github repository that contains all the code necessary to deploy all 15 of the CIS alerting rules and supporting resources using either CloudFormation or Terraform. It implements the architecture shown above. See the README files in the repository for specific steps to use the templates.
For demonstration purposes let’s walk through the creation of one EventBridge rule via the console instead of using the IaC options available in our Github repository. Specifically we will create a rule to alert on changes to CloudTrail trails and implement a remediation rule that will automatically restart logging on a trail if it is stopped (CIS Benchmark control 4.5 in v1.3.0). These steps will assume that you have an SNS topic and Lambda already in place. The IaC templates we provide will create these automatically for you. Also, when deploying anything, keep in mind that EventBridge is a region specific service.
aws.cloudtrail
that appear on the default EventBridge bus. When those events appear, our rule will run a pattern match against the event looking for API calls that include StopLogging
and others. You can see a screenshot of the example CloudTrail pattern we want to match against below. There are many other predefined patterns available for other use cases, including many other services besides CloudTrail. We strongly recommend taking a look at the predefined patterns to get an idea for what else is possible with EventBridge.With this setup we have improved on the CIS CloudWatch Metrics / Alerts remediation approach by sending enriched data to SNS and Lambdas that includes all the data necessary to perform remediation actions. We are also able to send out SNS notifications and execute a Lambda with a single rule eliminating the need for complex chaining. To really productionize this you will want to further develop the Lambda to handle other event types and integrate the SNS notifications with any desired communication channels (Email, Slack, SIEMs, PagerDuty, etc. You could also create another Lambda that creates a formatted notification message with only the required data and then delivers it to some communication channel, instead of posting the raw message to SNS.
This was just a basic example of how EventBridge can be used to improve compliance with CIS monitoring and alerting standards. But EventBridge can be used to automate a multitude of event-driven security processes and use cases. You can use it to do things such as:
If you are interested in learning more about building out security automation with EventBridge, or implementing general best-practice SecOps capabilities in AWS, reach out and we can discuss how we can help. ScaleSec is an AWS Advanced Partner with Security Competency in Governance, Risk, and Compliance. We can help accelerate your SecOps maturity with our training, analysis, architecture, and engineering capabilities.