Henry PhillipsJun 18, 2025 8:29:42 AM5 min read

Jules and the Rise of Agentic AI

What if your AI coding assistant did more than autocomplete or respond to prompts? What if it could take entire tickets, write the code, run the tests, and push the changes to a branch in your repository? That’s no longer a hypothetical. That’s what it looks like to work with Jules, Google's agentic AI for software development. The real question is: can it truly hold its own alongside human engineers?

When I first heard about Jules, I imagined a near-autonomous engineering partner that could quietly manage tasks, produce its own code, and free up time for humans to focus on high-level design. I put that idea into practice to see how well Jules performs in real-world, secure software engineering tasks.

Meet Jules: Google’s Agentic AI for Developers

Jules is an experimental automated coding agent developed by Google that helps developers fix bugs, add documentation, and build new features. Unlike traditional autocomplete tools, Jules integrates directly with Github, understands your codebase, and operates asynchronously, allowing you to continue working while it executes tasks in the background.

Once connected to your GitHub account, Jules clones your repository into a secure virtual machine, installs dependencies, and starts modifying files according to your prompt. The workflow starts with a plan. You describe a task in natural language, and Jules responds with an implementation strategy for your approval. If accepted, it proceeds to edit your code autonomously and, with permission, pushes the changes to a new branch.

Unlike traditional chat-based LLMs, Jules excels at managing full tasks within structured repositories. You choose the repo, describe the objective, and Jules gets to work. This level of integration and task planning sets it apart from prompt-only tools like Gemini and Claude, highlighting its potential in advancing agentic AI within real-world software development workflows.

Putting Jules to the Test

To evaluate Jules, I assigned real-world engineering task: write unit tests for a specific Go file that interacts with Google Cloud’s Secret Manager. This was not a simple code generation exercise. It required mocking external dependencies, abstracting interfaces, and following idiomatic Go practices. These are the kinds of secure coding challenges that engineers face every day.

The initial implementation from Jules fell short. It included incorrect imports, argument mismatches, and an unexported interface that caused visibility issues across packages. I offered inline feedback, and Jules responded with a revised commit that addressed some, but not all, of the concerns. Despite re-prompting, the final implementation still required manual intervention, reinforcing that AI development tools like Jules can scaffold, but not yet finalize, non-trivial work without oversight.

feedback_loop

revisions

Bootstrapping with Jules

Aside from modifying existing code, I wanted to see how Jules would perform with no scaffolding at all. I created a completely empty repository and asked Jules to build a command line interface tool from scratch. The tool was meant to recursively scan a given directory and output file metadata in JSON or CSV format. Jules responded with its typical task plan, outlining a clean file structure, and scaffolding out unit tests and a sample `README`.

The initial output wasn’t perfect. It skipped generating a requested folder for test data and introduced a few syntax errors. After I re-prompted, it acknowledged the issues and began addressing them in follow-up iterations. What surprised me most was watching Jules run `go build`  and `go test` commands in its environment, interpret the resulting errors, and adjust code accordingly. It repeated this cycle until the console output was free of errors.

While the process was not flawless, it demonstrated something meaningful. Jules is not simply generating code based on a prompt. It is actively engaging with the secure development environment, responding to feedback, and attempting to validate its own work. This kind of iterative behavior suggests a future where AI engineering tools become active contributors to software development workflows.

Where Jules Excels and Falls Short

Jules demonstrated strengths in:

  • Applying strong unit test mocking patterns
  • Adhering to common Go conventions
  • Writing clean commit messages and PR summaries

But limitations were also evident:

  • Required help with basic concepts like exporting interfaces
  • Occasionally included unused imports or referenced non-existent libraries
  • Struggled with nuanced judgement on naming and code structure

While Jules is highly capable at executing tasks, it lacks the engineering intuition and secure coding judgment to identify architectural flaws or evaluate trade-offs. It often falters when abstraction is introduced—an area where human developers still have the edge.

Implications for Teams and Engineers

From task assignment to PR submission, the entire cycle took less than 30 minutes for both exercises. This is significantly faster than a typical human task of similar complexity.

Jules performs best in environments with strong CI/CD pipelines, well-defined conventions, and thorough test coverage. In these contexts, AI-powered development workflows gain another layer of validation, helping catch mistakes even when human review may miss something.

That being said, speed should not come at the expense of trust. Agentic AI in software engineering is not a substitute for oversight, and it should never be left to operate unsupervised in production workflows.

For teams:

  • Use agentic AI for narrowly scoped, boilerplate-heavy tasks
  • Always pair it with human reviewers and code owners
  • Implement feedback loops and enforce PR quality gates
  • Treat AI output as a draft, not a deployment-ready artifact

For engineers:

  • Think of agentic AI as an eager junior teammate
  • Offload repetitive tasks, but review and refine every submission
  • Use it to accelerate delivery, not to bypass design thinking
  • Stay in the loop. Your judgement is the safety net.

dif

Conclusion

While Jules exhibits clear value for scoped, repetitive tasks, I remain cautious about its application to more complex or nuanced software development projects.

Like all other LLM implementations, Jules lacks implicit context and situational awareness. In mature codebases, guardrails such as unit tests and established CI/CD pipelines can help mitigate minor issues. But in legacy or early-stage systems, ambiguity and edge cases are the norm.

In these environments, success often depends on the human ability to recognize patterns, infer intent, and reconcile conflicting signals across a large, and oftentimes messy, codebase. This capacity of knowing when something feels off, even if the compiler is satisfied, is not captured in documentation or linters. It comes from experience with system failures and deep familiarity with the system’s history.

Jules cannot and should not replace that intuition. But with guidance, structure, and clearly defined boundaries, it can become a secure, reliable contributor to engineering workflows.

If your team is exploring how to safely integrate AI into software development, reach out to us at ScaleSec. We can help you build secure systems with confidence.

 

RELATED ARTICLES

The information presented in this article is accurate as of 6/9/25. Follow the ScaleSec blog for new articles and updates.