AI Coding Assistants – The Good, the Bad and the Ugly

For most of the last few decades, AI has over-promised and under-delivered. However, behind the scenes, very significant advances in deep learning technologies have had a revolutionary impact within important but narrow domains. In 2022 these technologies became generally usable with the release of ChatGPT-3.

ChatGPT, together with similar Generative Large Language Models (LLMs) such as Google Bard, can parse complex human language requests and produce human language outputs that are astonishingly similar (in many cases) to what a human expert would produce. 

Technologies such as ChatGPT are called "generative AI" because their objective is to generate output. This contrasts with other AI systems that may be optimized to solve equations, recognize faces, operate a car, etc. 

Simplistically, a text-based generative AI is designed to work out "what comes next" after certain inputs. Determining what comes next is based on a neural network, which has been "trained" on large amounts of text.  In the case of a large language model, the training data is a large subset of the internet—including sites such as GitHub that store masses of computer code.

When you use ChatGPT, it just starts with the most logical first reply and keeps trying to work out what should come next. A certain amount of randomness is built into the response to avoid stilted or repetitive output. This is why occasionally ChatGPT will "hallucinate"—having randomly started with an inaccurate response; it will "double down" as it expands upon that initial bad response.

As impressive as LLMs are for human language, they are arguably even more powerful when generating and analyzing computer code. Computer code is less ambiguous, more deterministic, and more pattern-based than natural languages, and the program code of an LLM is less likely to contain "hallucinations" or otherwise incorrect results.

Last year, Microsoft released GitHub CoPilot—an AI coding assistant powered by GPT-4.  CoPilot can generate complete programs or code snippets from human language requests or can offer a sort of “auto complete on steroids” by analyzing the code that has come before and predict the most likely next lines of code. Similar offerings from other companies have emerged over the past year, some based on GPT-4, others based on different LLMs.

Using CoPilot in this autocomplete mode generates some of the same astonishment as most of us experienced when first using ChatGPT. It’s amazing how good some of the recommended next lines of code are and how intelligent the suggestions seem.  Sure, some of the suggestions are wrong, but—at least for me—the majority are spot on, and as good as I would have written myself.

When generating whole programs or subroutines, the results are still impressive but somewhat less accurate. I found that sometimes CoPilot would generate code that was based on obsolete coding standards or that relied on deprecated libraries.

Studies to date have supported improvements in programmer productivity of as much as 100% (e.g., a doubling of productivity) for routine tasks. 

So, the Good—in my opinion—is that AI programming assistants can radically improve the productivity of software developers—perhaps the biggest improvement in productivity since the emergence of high-level languages in the late 1950s.

The potential Bad is that it’s possible that programmers will use AI to generate code that they don’t fully understand and which they, therefore, can’t maintain.

And the Ugly? We’re potentially entering a phase of automation in which software can write software. This could result—eventually—in a decrease in demand for software developers. Certainly not immediately, but a few more iterations of OpenAI technology could result in an AI that can write better code than the average developer.

Further beyond, many are concerned that once we develop a general-purpose AI (AGI) that can program as well as a human, it will be able to rewrite its own code. When that happens, a feedback loop of increasingly sophisticated AI will result, and a superintelligence way beyond the control of us poor humans may be born.

If these dystopian visions are true, then GitHub CoPilot could be the first step in the creation of an intelligence that ultimately replaces us. Of course, as Doc Brown says in Back to the Future—Granted, that’s a worst-case scenario!