In the Age of AI-Written Code — What Must We Decide

The Moment the Author of Code Changes

For a long time, the software industry has operated on a clear assumption: code is written by humans. On top of this simple fact, copyright systems were built, open-source licenses were designed, and the culture of developer communities was formed. When someone writes code, copyright belongs to that person, and once the code is released, it can be reused under specific licensing rules. The legal structure of software and its technical ecosystem have both functioned stably on this foundation.

However, in recent years, this assumption has quietly begun to shift. With the rise of generative AI, the act of writing code is no longer exclusive to humans. Developers still write code, but they also request code from models. Within development environments, AI is increasingly participating naturally in the process of code production, sometimes generating entire functions or modules. This change does not end with improved productivity. The moment the author of code becomes ambiguous, existing copyright and licensing systems are inevitably called into question.

This series has been a journey through those questions.

The Problem of AI-Rewritten Code

The starting point of the first discussion was a real controversy in the open-source community. A project attempted to rewrite existing code using AI and then change its license. On the surface, this might look like simple refactoring. If the original code is discarded and rewritten, applying a new license may seem reasonable.

But there was no easy agreement on whether AI-generated code is truly entirely new code. AI does not copy existing code verbatim. Yet it is also difficult to claim that it produces completely independent creations. Models generate new code based on statistical patterns learned from vast amounts of existing code. The relationship between the generated code and existing code becomes highly ambiguous.

Ultimately, the core question raised by this case is simple: Is AI-rewritten code an independent creation, or merely another form of the original code? There is still no clear answer. What is certain, however, is that in the AI era, software copyright can no longer be judged solely by whether code has been copied.

Copyleft and the New Problem of Training Data

The second question concerns Copyleft licenses. Licenses like GPL are not just open-source licenses—they are philosophical mechanisms designed to preserve software freedom. The principle that software built on GPL code must also be released under GPL has been a cornerstone of the free software ecosystem for decades.

But with the emergence of AI models, this structure faces new challenges. If a model has learned from GPL-licensed code, should the code it generates also be subject to GPL? This is both a legally and technically complex question. There is no clear standard yet for whether training is merely data analysis or part of a derivative creation process.

AI does not copy code directly, but neither does it generate code in complete isolation. Inside the model, countless code patterns exist in compressed form, ultimately derived from training data. As a result, the question of how the licenses of training data influence generated outputs is likely to remain an ongoing debate.

A Past Solution: Clean-Room Implementation

To better understand this issue, we must briefly look back at software history. Similar debates around copyright and implementation boundaries have existed before. A well-known example is clean-room implementation.

In a clean-room approach, two teams work in complete separation. One team analyzes the original code and produces a functional specification, while another team writes new code based solely on that specification. Because the implementation team never sees the original code, the result can be legally argued as an independent implementation. This method played a crucial role in cases like IBM BIOS and other system software reimplementations.

Interestingly, AI code generation shares some resemblance to this structure. Models generate code based on learned patterns without directly referencing a specific codebase. However, there is a critical difference. In clean-room implementation, analysis and implementation are clearly separated. In contrast, AI models absorb entire datasets and generate code from within that integrated knowledge. This distinction may carry significant legal implications.

The Longstanding Principle of Human Authorship

Another key concept in AI-era copyright discussions is the Human Authorship principle. Most copyright systems assume that the creator is human. Creation is seen as a human intellectual activity, and rights belong to that human creator.

However, when AI generates code or text, this assumption becomes problematic. Questions arise about whether AI-generated outputs can have copyright, and if so, who owns it. Across multiple jurisdictions, debates have already begun. So far, the prevailing view is that outputs generated independently by AI are not eligible for copyright protection.

If this interpretation continues, a large portion of AI-generated code may fall into a legally unprotected domain. This could have unexpected consequences for open-source licensing, since licenses rely on the existence of copyright. Without copyright, licensing itself may lose its legal foundation.

A New Form of Software Supply Chain

The final question explored in this series was more structural. Are AI models merely development tools, or are they a new form of software supply chain?

Modern software development already operates on complex dependency structures. Applications are built from countless interconnected libraries and packages. Developers often rely more on external dependencies than on their own code.

AI models introduce another layer to this structure. Some of the code developers use may now be generated by models, which themselves are trained on vast collections of open-source projects. In this sense, models act as compressed repositories of prior code ecosystems.

From this perspective, AI is not just a code generator—it can be seen as a new supply chain that reconstructs the existing software ecosystem.

The Question That Remains

This series has followed a range of legal and technical questions: Who owns AI-rewritten code? How does GPL apply to training data? What distinguishes clean-room implementation from AI generation? Does AI-generated code have copyright? What is the future of Copyleft?

But behind all these questions lies a more fundamental one: What kind of software ecosystem do we want to build?

Open source was not just a technical model—it was a social agreement. Developers chose to share code, and that choice created today’s vast ecosystem. The same applies to AI. Technology itself does not determine direction. The rules we establish and the principles we uphold are ultimately human decisions.

In an era where AI writes code, the role of developers does not disappear. Instead, more important decisions remain in human hands. Deciding which code should be used, which licenses should apply, and what philosophical foundation software should be built upon remains a human responsibility.

At the Beginning of Change

The impact of AI on the software industry is still in its early stages. Over the coming years, new legal interpretations and technical standards are likely to emerge. Open-source licenses may evolve into new forms as well.

But one thing is already clear: the most fundamental questions about software are returning. Who is the author of code? How should knowledge be shared? What will the concept of free software mean in the future?

This series has been an attempt to organize those questions. And this story is likely far from over. In fact, it may have only just begun.

In the age where AI writes code, what we must reconsider is not the technology itself, but the new software order that will be built upon it.