Automation is all the rage these days, especially when it comes to your cloud infrastructure. Terraform is the most popular Infrastructure as Code (IaC) tool. Many businesses use Terraform to manage not just their cloud infrastructure but also Kubernetes clusters, applications, even their Dominos Pizza orders.
How can you keep IaC automated while also keeping it secure? Manual review is out of the question because it is error-prone and would break the automation flow. To keep things moving quickly, we also have to automate the security checks. This is where Policy as Code (PaC) comes in.
This article will look at two popular PaC tools to check a Terraform resource for compliance with our security requirements. We will be looking at them from the lens of usability, simplicity, tooling, and performance. Here is the Terraform code we will be evaluating:
resource "google_compute_firewall" "ingress" { |
name = "ingress" |
network = "default" |
project = "notmyproject" |
target_tags = [ "test" ] |
provisioner "local-exec" { |
command = "echo 'bypass'" |
} |
allow { |
protocol = "icmp" |
} |
allow { |
protocol = "tcp" |
ports = ["80", "443"] |
} |
allow { |
protocol = "tcp" |
ports = ["22"] |
} |
} |
This Terraform code creates a GCP firewall rule. Since a firewall rule can dangerously expose traffic to the internet or other unwanted sources, we need guardrails to keep the firewall parameters compliant.
OPA is written in Rego, a simple language specifically built to work with JSON data structures. In Rego, generally, every statement that isn’t a value assignment is a boolean condition. For a function to return a value, all the statements inside the function must evaluate to True. OPA also natively supports automated unit testing to validate your rule changes work as expected.
To illustrate how simple OPA can be for data structures, I wrote the same logic in both Python and Rego. While both accomplish the same task, notice how simple it is to read JSON, iterate over an array, evaluate a condition, and then print out the data if everything in the function evaluates to True.
import json |
# Read a file and get some value if it is false |
with open("test.json", 'r') as file: |
json_input = json.loads(file.read()) |
for i, val in enumerate(json_input["somelist"]): |
if val is False: |
print(f"{i}: False") |
package main |
# Deny if somevalue is false |
deny[msg] { |
val := input.somelist[i] |
not val |
msg := sprintf("%d: %s", [i, val]) |
} |
While OPA is less flexible than Python, it is very easy to supplement OPA with small Python scripts. For example, you can use a cloud SDK in Python to make API calls, write the JSON output to a file, and then use OPA to evaluate the file. This way, you can always use the right tool for the job.
To evaluate the Terraform code, we must first generate a plan as JSON:
# Initialize terraform
terraform init
# Generate a plan into a file
terraform plan --out tfplan.bin
# Transform the binary output into JSON
terraform show -json tfplan.bin > tfplan.json
Let’s write a simple OPA package to evaluate this firewall rule against some common vulnerabilities:
package main |
import input as tfplan |
# Restrict all resources to one project |
required_project = "myproject" |
# Ban ports |
banned_ports = ["80", "22"] |
# check if array contains element |
array_contains(arr, elem) { |
# iterate over arr, assert it contains elem |
arr[_] = elem |
} |
# Deny if project does not match |
deny[msg] { |
resource := tfplan.resource_changes[_] |
project_id := resource.change.after.project |
not project_id == required_project |
msg := sprintf("%q: Project %q is not allowed. Must be %q", [resource.address, project_id, required_project]) |
} |
# Block protocols that aren't TCP |
deny[msg] { |
resource := tfplan.resource_changes[_] |
allow := resource.change.after.allow[_] |
not allow.protocol == "tcp" |
msg := sprintf("%q: Protocol %q is not allowed. Must be tcp", [resource.address, allow.protocol]) |
} |
# Block banned ports |
deny[msg] { |
resource := tfplan.resource_changes[_] |
allow := resource.change.after.allow[_] |
port := allow.ports[_] |
array_contains(banned_ports, port) |
msg := sprintf("%q: Port %q is not allowed.", [resource.address, port]) |
} |
# Require targeting a service account |
deny[msg] { |
resource := tfplan.resource_changes[_] |
accounts := resource.change.after.target_service_accounts |
accounts == null |
msg := sprintf("%q: A service account must be used as a target.", [resource.address]) |
} |
# block provisioners |
deny[msg] { |
config := tfplan.configuration[_] |
# root_module := config.root_module[_] |
resources := config.resources[_] |
provisioners := resources.provisioners[_] |
# Check if provisioners is true |
provisioners |
msg := sprintf("%s: Provisioners are not allowed - Provisioner: %q", [resources.address, provisioners.type]) |
} |
Now, we run OPA against this plan using Conftest (a CLI wrapper for OPA which is especially useful in a pipeline):
conftest test --policy main.rego tfplan.json
You should see the evaluation fail with the following output:
FAIL - tfplan.json - main - "google_compute_firewall.ingress": Project "notmyproject" is not allowed. Must be "myproject"
FAIL - tfplan.json - main - "google_compute_firewall.ingress": Protocol "icmp" is not allowed. Must be tcp
FAIL - tfplan.json - main - "google_compute_firewall.ingress": Port "22" is not allowed.
FAIL - tfplan.json - main - "google_compute_firewall.ingress": Port "80" is not allowed.
FAIL - tfplan.json - main - "google_compute_firewall.ingress": A service account must be used as a target.
FAIL - tfplan.json - main - google_compute_firewall.ingress: Provisioners are not allowed - Provisioner: "local-exec"
6 tests, 0 passed, 0 warnings, 6 failures, 0 exceptions
Semgrep is a static analysis tool, loosely based on the ideas of grep. While it has explicit support for programming languages such as Python, it can also be used to pattern-match on any text. Because of this, we can use pattern-matching to create the same rules as we did in OPA.
Semgrep, unfortunately, is written in YAML, unlike OPA which uses a DSL. While YAML is simple and well-known, it is less powerful than a DSL and makes writing loops and other programmatic functions quite difficult, if it is even supported by the engine. On the other hand, because this engine uses pattern-matching, we don’t need to generate a JSON Terraform plan to evaluate rules.
Let’s write the above OPA ruleset in Semgrep:
rules: |
- id: wrong-project |
patterns: |
- pattern-inside: resource "google_compute_firewall" "..." {...} |
- pattern-inside: project="..." |
- pattern-not: project = "myproject" |
languages: |
- generic |
paths: |
include: |
- 'main.tf' |
message: | |
Firewall rule must use myproject as the target project. |
severity: ERROR |
- id: banned-protocol |
patterns: |
- pattern-inside: allow { ... } |
- pattern-not-inside: allow { ... protocol = "tcp" ... } |
languages: |
- generic |
paths: |
include: |
- 'main.tf' |
message: | |
Firewall rule must use TCP as protocol |
severity: ERROR |
- id: banned-80 |
patterns: |
- pattern-inside: allow { ... } |
- pattern: "80" |
languages: |
- generic |
paths: |
include: |
- 'main.tf' |
message: | |
Firewall rule must not allow port 80 |
severity: ERROR |
- id: banned-22 |
patterns: |
- pattern-inside: allow { ... } |
- pattern: "22" |
languages: |
- generic |
paths: |
include: |
- 'main.tf' |
message: | |
Firewall rule must not allow port 22 |
severity: ERROR |
- id: no-serviceaccount |
patterns: |
- pattern-inside: resource "google_compute_firewall" "..." {...} |
- pattern-not-inside: resource "google_compute_firewall" "..." {... target_service_accounts=[...] ... } |
languages: |
- generic |
paths: |
include: |
- 'main.tf' |
message: | |
Firewall rule must use a service account as the target. |
severity: ERROR |
- id: no-provisioners |
patterns: |
- pattern-inside: provisioner "..." |
languages: |
- generic |
paths: |
include: |
- 'main.tf' |
message: | |
Provisioners are not allowed. |
severity: ERROR |
Running the above code will give us the same failures as OPA did.
I found Semgrep very frustrating and unintuitive, at least using the generic pattern-matching engine. Keep in mind that depending on how your Semgrep rules are written, you will need to put Terraform’s simple fields at the top and blocks at the bottom. Semgrep’s documentation outlines the syntax and has some examples for each function, but the examples are very simplistic. Third-party articles and tutorials are sparse, especially for Terraform. Meanwhile, OPA is simple to write (once you understand the DSL), the examples are detailed, and third-party tutorials are abundant.
Here are the average times of each tool over three runs:
OPA/Conftool
❯ time conftest test --policy ../opa/main.rego tfplan.json
0.03s user 0.01s system
Semgrep
❯ time semgrep --config ../semgrep/main.yaml
0.39s user 0.12s system
Semgrep is ten times slower than OPA. While this is not a monumental absolute difference, we are only running six simple tests. A ten-fold performance difference will be apparent in a production environment.
Clearly, OPA is the winner for usability and performance. Semgrep is a better choice when you need extremely simple static parsing. Overall, I would highly recommend using OPA over Semgrep for PaC.
In the near future, part two of this article will continue the battle and evaluate even more PaC tools.