ChatGPT and Databases

The biggest technology news of the year so far has undoubtedly been the release of the OpenAI ChatGPT technology. ChatGPT is a chat-bot style AI that can generate amazingly knowledgeable and human-like conversations. ChatGPT can also generate essay-type answers to exam questions, write halfway decent song lyrics, or even generate computer code. I was going to explain how ChatGPT works, but instead, I’ll let ChatGPT tell you.

Here’s how it replied when I asked, “Explain how ChatGPT works in one paragraph suitable for a 12-year-old”:

ChatGPT is a chatbot that can talk to you like a human and help you with tasks such as composing emails, essays, and code. It is a member of the generative pre-trained transformer (GPT) family of language models. It was developed by OpenAI and launched in November 2022. You can use ChatGPT by logging in with your OpenAI account.

Most people have an initial “OMG!” response to the sophistication and accuracy of ChatGPT. And to be sure, ChatGPT is a new high watermark in practical AI. However, in some ways, it’s just a more sophisticated internet search.

ChatGPT sources existing material on the internet and synthesizes responses that appear coherent. It creates nothing new. And it can go wrong in weird and amusing ways. When a New York Times reporter spent a few hours randomly chatting with ChatGPT, it eventually started insisting that it loved the reporter and that the reporter should leave his wife. Other conversations have resulted in ChatGPT expressing a frightening “dark side.”

However, as a database professional, I’m fascinated by how this technology might help us interact with databases. Can ChatGPT help? I asked ChatGPT to write me an SQL statement to sum sales by department. The “conversation” went like this:

  • Write me a PostgreSQL-compatible SQL statement that will break down sales by month and department.
  • Now add a rank for each department showing how it compares with other departments in the same month.
  • Now show the percentage increase for each department compared with the previous month.

After each query, I got an increasingly sophisticated SQL statement.

The final iteration included Windowing functions partitioning the data month over month and using the LAG function to retrieve the previous month. (Look here: I’ve more than 30 years of experience in SQL programming and would find that sort of SQL a little challenging.

ChatGPT also generated a similar set of MongoDB commands (see The MongoDB aggregation framework is notoriously complex for advanced queries, and the ChatGPT-generated aggregation ran to almost 100 lines of code. I’d probably have to spend an hour or so working on similar code, while ChatGPT completed it in about 20 seconds.

The implications for ad hoc database queries are exciting. It’s long been hoped that a “conversational” database query language could be created. However, all previous attempts failed because the “natural” language had to be sufficiently unambiguous to be compiled into database data access commands. The result was a language that was far from natural. However, ChatGPT can use a massive database of existing database query patterns and select the one that best fits the question. As with most ChatGPT capabilities, the results seem strikingly accurate.

Before the relational database and SQL emerged, IT departments had vast backlogs of report requests that required experienced programmers to resolve. SQL relieved much of that pressure because it was accessible to a broader range of professionals.

With ChatGPT, we might be finally approaching the point where anybody can query a database by simply asking for the information in human language.