Battle of Policy as Code Tools: OPA vs. Semgrep
Automation is all the rage these days, especially when it comes to your cloud infrastructure. Terraform is the most popular Infrastructure as Code (IaC) tool. Many businesses use Terraform to manage not just their cloud infrastructure but also Kubernetes clusters, applications, even their Dominos Pizza orders.
How can you keep IaC automated while also keeping it secure? Manual review is out of the question because it is error-prone and would break the automation flow. To keep things moving quickly, we also have to automate the security checks. This is where Policy as Code (PaC) comes in.
This article will look at two popular PaC tools to check a Terraform resource for compliance with our security requirements. We will be looking at them from the lens of usability, simplicity, tooling, and performance. Here is the Terraform code we will be evaluating:
This Terraform code creates a GCP firewall rule. Since a firewall rule can dangerously expose traffic to the internet or other unwanted sources, we need guardrails to keep the firewall parameters compliant.
Open Policy Agent (OPA)
OPA is written in Rego, a simple language specifically built to work with JSON data structures. In Rego, generally, every statement that isn’t a value assignment is a boolean condition. For a function to return a value, all the statements inside the function must evaluate to True. OPA also natively supports automated unit testing to validate your rule changes work as expected.
To illustrate how simple OPA can be for data structures, I wrote the same logic in both Python and Rego. While both accomplish the same task, notice how simple it is to read JSON, iterate over an array, evaluate a condition, and then print out the data if everything in the function evaluates to True.
While OPA is less flexible than Python, it is very easy to supplement OPA with small Python scripts. For example, you can use a cloud SDK in Python to make API calls, write the JSON output to a file, and then use OPA to evaluate the file. This way, you can always use the right tool for the job.
To evaluate the Terraform code, we must first generate a plan as JSON:
# Initialize terraform terraform init # Generate a plan into a file terraform plan --out tfplan.bin # Transform the binary output into JSON terraform show -json tfplan.bin > tfplan.json
Let’s write a simple OPA package to evaluate this firewall rule against some common vulnerabilities:
Now, we run OPA against this plan using Conftest (a CLI wrapper for OPA which is especially useful in a pipeline):
conftest test --policy main.rego tfplan.json
You should see the evaluation fail with the following output:
FAIL - tfplan.json - main - "google_compute_firewall.ingress": Project "notmyproject" is not allowed. Must be "myproject" FAIL - tfplan.json - main - "google_compute_firewall.ingress": Protocol "icmp" is not allowed. Must be tcp FAIL - tfplan.json - main - "google_compute_firewall.ingress": Port "22" is not allowed. FAIL - tfplan.json - main - "google_compute_firewall.ingress": Port "80" is not allowed. FAIL - tfplan.json - main - "google_compute_firewall.ingress": A service account must be used as a target. FAIL - tfplan.json - main - google_compute_firewall.ingress: Provisioners are not allowed - Provisioner: "local-exec" 6 tests, 0 passed, 0 warnings, 6 failures, 0 exceptions
Semgrep is a static analysis tool, loosely based on the ideas of grep. While it has explicit support for programming languages such as Python, it can also be used to pattern-match on any text. Because of this, we can use pattern-matching to create the same rules as we did in OPA.
Semgrep, unfortunately, is written in YAML, unlike OPA which uses a DSL. While YAML is simple and well-known, it is less powerful than a DSL and makes writing loops and other programmatic functions quite difficult, if it is even supported by the engine. On the other hand, because this engine uses pattern-matching, we don’t need to generate a JSON Terraform plan to evaluate rules.
Let’s write the above OPA ruleset in Semgrep:
Running the above code will give us the same failures as OPA did.
I found Semgrep very frustrating and unintuitive, at least using the generic pattern-matching engine. Keep in mind that depending on how your Semgrep rules are written, you will need to put Terraform’s simple fields at the top and blocks at the bottom. Semgrep’s documentation outlines the syntax and has some examples for each function, but the examples are very simplistic. Third-party articles and tutorials are sparse, especially for Terraform. Meanwhile, OPA is simple to write (once you understand the DSL), the examples are detailed, and third-party tutorials are abundant.
Here are the average times of each tool over three runs:
❯ time conftest test --policy ../opa/main.rego tfplan.json 0.03s user 0.01s system
❯ time semgrep --config ../semgrep/main.yaml 0.39s user 0.12s system
Semgrep is ten times slower than OPA. While this is not a monumental absolute difference, we are only running six simple tests. A ten-fold performance difference will be apparent in a production environment.
Clearly, OPA is the winner for usability and performance. Semgrep is a better choice when you need extremely simple static parsing. Overall, I would highly recommend using OPA over Semgrep for PaC.
In the near future, part two of this article will continue the battle and evaluate even more PaC tools.
The information presented in this article is accurate as of 6/17/2021. Follow the ScaleSec blog for new articles and updates.
ScaleSec is a service-disabled, veteran-owned small business (SDVOSB) for cloud security and compliance that helps innovators meet the requirements of their most scrutinizing customers. We specialize in cloud security engineering and cloud compliance. Our team of experts guides customers through complex cloud security challenges, from foundations to implementation, audit preparation and beyond.