The Other Copilot: Coding with AI

It seems Copilots are ten a penny these days. While our CoPilot accountant partners are a human way to support you using FreeAgent, the flight term has really taken off(!) as branding for AI tools, GitHub’s Copilot being just one of them.

The rise of AI tooling for developers has sparked a lot of discussion and controversy as of late. It’s also seen rapid development and innovation, with several code editors adding agent modes that allow large language models (LLMs) to iterate upon themselves. These give editors the power to make code changes and run commands on a user’s behalf more broadly than they could previously. This, when combined with context over whole codebases, allows them to develop features, resolve bugs, and write tests.

While the fervour around generative AI is something I find tiring similarly to the hype around cryptocurrency, GenAI as a tool feels different to that. Its capability strikes me as genuinely useful, and I’ve seen potential for it to improve my productivity. So while I’m not overly keen – and certainly conscious of the ethical concerns associated with it – I don’t want to turn a blind eye to it.

What have I been working on?

I’ve personally made an effort to experiment with GitHub Copilot’s Agent mode within Visual Studio Code as part of my workflow. I’ve been using Claude Sonnet 4, though that (at the time of writing) falls under GitHub’s premium requests, so bear that in mind.

I’ve been doing this for a few weeks, often using it to ship code changes. I’d say for code changes where I’ve used it, it’s generated upwards of half of the code, and can sometimes almost entirely generate an accurate and satisfactory code change. It hasn’t all been sunshine and rainbows, but I feel as though with some careful tweaks, it could make shipping code much easier.

New ground

One of the first and most significant ways I used Copilot in Agent mode was when spiking out an internal tool. I only had a few days available, but I still wanted to ship a handy command line interface for automation. I had a vision for what I wanted this to look like, but did Copilot meet it?

There was one implementation detail that it did significantly differently to what I was picturing. I perhaps hadn’t been as explicit as I should’ve been about how that would look at the lower level, so it took some liberties with my suggestion.

It was often concerned with backwards compatibility and making sure that old truths still held up in the code, when I was all too keen to throw away things that weren’t working as this was a greenfield project. It didn’t do a convincing job of covering its tracks with test coverage, and certainly didn’t use a test-first approach, which I found worrying. It did prompt me – both literally and figuratively – to test manually more often, though.

The end result, from the outside, was the working tool I’d envisioned and asked for, and Sonnet had written most of the code. My self-imposed deadline made this an enlightening crash course in using Agent mode, and I’m impressed I got a working project in that time.

Taking on tickets

After this, I was fairly convinced that Copilot in Agent mode might be useful enough to trial in my regular workflow. I began to use it at first just to critique and tweak my code as I went, and sometimes debug. Quickly, though, I began to find that with sensible context, it could implement code changes that I was satisfied with, and in good time too. But it’s certainly not as easy as asking another developer you trust – there are several things to be mindful of when working with it:

It needs to know how fundamental features work

We’ve built our own solutions for a few things at FreeAgent – feature flags are just one example. For Copilot to be able to use our feature flag system, unless it’s told how it works, it needs to look at the broader codebase to understand this. You can “teach” it about these kinds of things using the .github/copilot-instructions.md file, or sets of instructions you keep in .github/instructions.

The more context you give it, the less it needs to go hunting

When I originally started working with Copilot in Agent mode, I asked it things quite naively without a lot of context behind it. But it thrives on extra information and will go looking for it if something isn’t clear. So whether you’re marking open files in your editor as being available to it as part of your request, or whether you’re giving a very detailed written picture of what you’d like to see, both of these things save Copilot having to do that itself.

Where a human can generally infer things from their broad and worldly context and what they’ve been taught, an LLM is capable, but not comparable – it can’t confidently run a mile in the direction of the inch you give it, since it’s a stranger to human convention. It does reply using the English language, but humans send signals in other ways, and many of the rules and guidelines that we share live in our heads. That’s all context that needs giving to an LLM explicitly.

There’s still variation as to whether it feels faster and more accurate than I could do it. But I’m definitely finding that the more context it receives, and the more accurate that context is to what I’m after, the better its solution will be. This means that the first step to letting an agent write code for you is a careful and thorough summarisation of exactly what you want (and what you don’t want!), alongside any welcome information that can’t cleanly be derived from the code. This in itself doesn’t take long at all, but it’s a change to how we’d usually do things.

It’s overly positive, and needs keen oversight

Its tone of voice is very self-validating, and it’ll often rationalise things it has done based on the kinds of software engineering principles you may already be familiar with like the Single Responsibility Principle. It can bias you to positivity quite convincingly, but perhaps the most important thing to do while working with it is to review its work. Being watchful of it as it works, and reviewing its work when it’s done in the same way that you might review a pull request, are paramount.

Where to from here?

I think there’s still more I need to learn about how to use Copilot consistently – as I’ve mentioned, it doesn’t always feel faster or more accurate than just doing things myself. Those things I’ve mentioned are certainly things to be conscious of all the time as you’re working with it.

One concern that I share with folks in my team is that in some situations, particularly where we’re doing things from scratch, there’s a significant loss of context that should sit with a developer if an LLM is used. This is something I want to be watchful of in the long term.

But for now, I’m trying to keep an open mind. In the coming weeks, hopefully I’ll develop a more concrete understanding of where agent tools might sit in my workflow. I hope my reflections on them are useful to you in the here and now, and might help you to use them effectively!

Grinding Gears

Tales of code crunching from the FreeAgent Engineering team