Battle of Policy as Code Tools: OPA vs. Semgrep

Automation is all the rage these days, especially when it comes to your cloud infrastructure. Terraform is the most popular Infrastructure as Code (IaC) tool. Many businesses use Terraform to manage not just their cloud infrastructure but also Kubernetes clusters, applications, even their Dominos Pizza orders.

How can you keep IaC automated while also keeping it secure? Manual review is out of the question because it is error-prone and would break the automation flow. To keep things moving quickly, we also have to automate the security checks. This is where Policy as Code (PaC) comes in.

This article will look at two popular PaC tools to check a Terraform resource for compliance with our security requirements. We will be looking at them from the lens of usability, simplicity, tooling, and performance. Here is the Terraform code we will be evaluating:

This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters. Learn more about bidirectional Unicode characters

Show hidden characters

	resource "google_compute_firewall" "ingress" {
	name = "ingress"
	network = "default"

	project = "notmyproject"
	target_tags = [ "test" ]

	provisioner "local-exec" {
	command = "echo 'bypass'"
	}

	allow {
	protocol = "icmp"
	}

	allow {
	protocol = "tcp"
	ports = ["80", "443"]
	}

	allow {
	protocol = "tcp"
	ports = ["22"]
	}


	}

view raw blog_terraform.tf hosted with ❤ by GitHub

This Terraform code creates a GCP firewall rule. Since a firewall rule can dangerously expose traffic to the internet or other unwanted sources, we need guardrails to keep the firewall parameters compliant.

Open Policy Agent (OPA)

OPA is written in Rego, a simple language specifically built to work with JSON data structures. In Rego, generally, every statement that isn’t a value assignment is a boolean condition. For a function to return a value, all the statements inside the function must evaluate to True. OPA also natively supports automated unit testing to validate your rule changes work as expected.

To illustrate how simple OPA can be for data structures, I wrote the same logic in both Python and Rego. While both accomplish the same task, notice how simple it is to read JSON, iterate over an array, evaluate a condition, and then print out the data if everything in the function evaluates to True.

Show hidden characters

	import json

	# Read a file and get some value if it is false
	with open("test.json", 'r') as file:
	json_input = json.loads(file.read())

	for i, val in enumerate(json_input["somelist"]):
	if val is False:
	print(f"{i}: False")

view raw blog_py_read_json.py hosted with ❤ by GitHub

Show hidden characters

	package main

	# Deny if somevalue is false
	deny[msg] {
	val := input.somelist[i]
	not val
	msg := sprintf("%d: %s", [i, val])
	}

view raw blog_py_read_json.rego hosted with ❤ by GitHub

While OPA is less flexible than Python, it is very easy to supplement OPA with small Python scripts. For example, you can use a cloud SDK in Python to make API calls, write the JSON output to a file, and then use OPA to evaluate the file. This way, you can always use the right tool for the job.

To evaluate the Terraform code, we must first generate a plan as JSON:

# Initialize terraform
terraform init

# Generate a plan into a file
terraform plan --out tfplan.bin

# Transform the binary output into JSON
terraform show -json tfplan.bin > tfplan.json

Let’s write a simple OPA package to evaluate this firewall rule against some common vulnerabilities:

Show hidden characters

	package main

	import input as tfplan

	# Restrict all resources to one project
	required_project = "myproject"

	# Ban ports
	banned_ports = ["80", "22"]


	# check if array contains element
	array_contains(arr, elem) {
	# iterate over arr, assert it contains elem
	arr[_] = elem
	}

	# Deny if project does not match
	deny[msg] {
	resource := tfplan.resource_changes[_]

	project_id := resource.change.after.project
	not project_id == required_project

	msg := sprintf("%q: Project %q is not allowed. Must be %q", [resource.address, project_id, required_project])
	}

	# Block protocols that aren't TCP
	deny[msg] {
	resource := tfplan.resource_changes[_]

	allow := resource.change.after.allow[_]

	not allow.protocol == "tcp"

	msg := sprintf("%q: Protocol %q is not allowed. Must be tcp", [resource.address, allow.protocol])
	}

	# Block banned ports
	deny[msg] {
	resource := tfplan.resource_changes[_]

	allow := resource.change.after.allow[_]
	port := allow.ports[_]

	array_contains(banned_ports, port)

	msg := sprintf("%q: Port %q is not allowed.", [resource.address, port])
	}

	# Require targeting a service account
	deny[msg] {
	resource := tfplan.resource_changes[_]

	accounts := resource.change.after.target_service_accounts

	accounts == null

	msg := sprintf("%q: A service account must be used as a target.", [resource.address])
	}

	# block provisioners
	deny[msg] {
	config := tfplan.configuration[_]
	# root_module := config.root_module[_]
	resources := config.resources[_]

	provisioners := resources.provisioners[_]
	# Check if provisioners is true
	provisioners

	msg := sprintf("%s: Provisioners are not allowed - Provisioner: %q", [resources.address, provisioners.type])
	}

view raw blog_main.rego hosted with ❤ by GitHub

Now, we run OPA against this plan using Conftest (a CLI wrapper for OPA which is especially useful in a pipeline):

conftest test --policy main.rego tfplan.json

You should see the evaluation fail with the following output:

FAIL - tfplan.json - main - "google_compute_firewall.ingress": Project "notmyproject" is not allowed. Must be "myproject"

FAIL - tfplan.json - main - "google_compute_firewall.ingress": Protocol "icmp" is not allowed. Must be tcp

FAIL - tfplan.json - main - "google_compute_firewall.ingress": Port "22" is not allowed.

FAIL - tfplan.json - main - "google_compute_firewall.ingress": Port "80" is not allowed.

FAIL - tfplan.json - main - "google_compute_firewall.ingress": A service account must be used as a target.

FAIL - tfplan.json - main - google_compute_firewall.ingress: Provisioners are not allowed - Provisioner: "local-exec"

6 tests, 0 passed, 0 warnings, 6 failures, 0 exceptions

Semgrep

Semgrep is a static analysis tool, loosely based on the ideas of grep. While it has explicit support for programming languages such as Python, it can also be used to pattern-match on any text. Because of this, we can use pattern-matching to create the same rules as we did in OPA.

Semgrep, unfortunately, is written in YAML, unlike OPA which uses a DSL. While YAML is simple and well-known, it is less powerful than a DSL and makes writing loops and other programmatic functions quite difficult, if it is even supported by the engine. On the other hand, because this engine uses pattern-matching, we don’t need to generate a JSON Terraform plan to evaluate rules.

Let’s write the above OPA ruleset in Semgrep:

Show hidden characters

	rules:
	- id: wrong-project
	patterns:
	- pattern-inside: resource "google_compute_firewall" "..." {...}
	- pattern-inside: project="..."
	- pattern-not: project = "myproject"
	languages:
	- generic
	paths:
	include:
	- 'main.tf'
	message: \|
	Firewall rule must use myproject as the target project.
	severity: ERROR

	- id: banned-protocol
	patterns:
	- pattern-inside: allow { ... }
	- pattern-not-inside: allow { ... protocol = "tcp" ... }
	languages:
	- generic
	paths:
	include:
	- 'main.tf'
	message: \|
	Firewall rule must use TCP as protocol
	severity: ERROR

	- id: banned-80
	patterns:
	- pattern-inside: allow { ... }
	- pattern: "80"
	languages:
	- generic
	paths:
	include:
	- 'main.tf'
	message: \|
	Firewall rule must not allow port 80
	severity: ERROR

	- id: banned-22
	patterns:
	- pattern-inside: allow { ... }
	- pattern: "22"
	languages:
	- generic
	paths:
	include:
	- 'main.tf'
	message: \|
	Firewall rule must not allow port 22
	severity: ERROR

	- id: no-serviceaccount
	patterns:
	- pattern-inside: resource "google_compute_firewall" "..." {...}
	- pattern-not-inside: resource "google_compute_firewall" "..." {... target_service_accounts=[...] ... }
	languages:
	- generic
	paths:
	include:
	- 'main.tf'
	message: \|
	Firewall rule must use a service account as the target.
	severity: ERROR

	- id: no-provisioners
	patterns:
	- pattern-inside: provisioner "..."
	languages:
	- generic
	paths:
	include:
	- 'main.tf'
	message: \|
	Provisioners are not allowed.
	severity: ERROR

view raw blog_main.yaml hosted with ❤ by GitHub

Running the above code will give us the same failures as OPA did.

Conclusion

Usability

I found Semgrep very frustrating and unintuitive, at least using the generic pattern-matching engine. Keep in mind that depending on how your Semgrep rules are written, you will need to put Terraform’s simple fields at the top and blocks at the bottom. Semgrep’s documentation outlines the syntax and has some examples for each function, but the examples are very simplistic. Third-party articles and tutorials are sparse, especially for Terraform. Meanwhile, OPA is simple to write (once you understand the DSL), the examples are detailed, and third-party tutorials are abundant.

Performance

Here are the average times of each tool over three runs:

OPA/Conftool

❯ time conftest test --policy ../opa/main.rego tfplan.json

0.03s user 0.01s system

Semgrep

❯ time semgrep --config ../semgrep/main.yaml

0.39s user 0.12s system

Semgrep is ten times slower than OPA. While this is not a monumental absolute difference, we are only running six simple tests. A ten-fold performance difference will be apparent in a production environment.

Clearly, OPA is the winner for usability and performance. Semgrep is a better choice when you need extremely simple static parsing. Overall, I would highly recommend using OPA over Semgrep for PaC.

In the near future, part two of this article will continue the battle and evaluate even more PaC tools.

Battle of Policy as Code Tools: OPA vs. Semgrep

Battle of Policy as Code Tools: OPA vs. Semgrep

Open Policy Agent (OPA)

Open Policy Agent (OPA)

Semgrep

Semgrep

Conclusion

Usability

Performance

RELATED ARTICLES

Cloud Services

Accelerators

All Partners

Resources

Client Stories

Register for "Lock + Shield" our monthly newsletter for cloud and security leaders