AI Begins Writing Code — The Question That Emerges

The software industry has long evolved around the creative work of human developers. Programmers analyze problems, design algorithms, and express them in the form of code. That code reflects the developer’s intent, choices, structural design, and style of expression. For this reason, software code has been regarded not merely as a technical output, but as a form of creative work. Copyright systems are built on this very assumption. In other words, the human who writes the code holds its copyright, and various licensing systems are constructed on top of that right. Licenses such as MIT, Apache, and GPL exist based on the idea that “the creator of the code defines how it may be used.”

However, generative AI tools that have emerged in recent years are beginning to challenge this long-standing assumption. Many developers now rely on AI to generate code rather than writing it directly. Tools like GitHub Copilot, ChatGPT, and Claude Code can generate entire functions, write test cases, and even propose complex algorithm implementations. Developers are increasingly shifting from being the authors of code to becoming its reviewers or editors. While this change is widely seen as a positive development in terms of productivity, it introduces new issues from a legal perspective.

The problem begins with a surprisingly simple question.

Does code generated by AI have copyright?

This is not merely a legal curiosity. If AI-generated code has no copyright, it may effectively belong to no one. On the other hand, if such code does have copyright, a new question arises: who owns that right? Is it the company that created the AI model, the user who provided the prompt, or the original authors whose code was included in the training data? No matter which answer is chosen, it leads to conflicts with the existing copyright framework.

This issue is already appearing in real-world cases. Attempts to rewrite existing open source projects using AI and change their licenses, as well as disputes over AI-generated code resembling existing projects, are increasingly common within the community. These cases go beyond simple developer disagreements—they reveal that the current software legal framework is struggling to keep pace with new technological realities. And at this point, we are led back to a fundamental legal question.

What is copyright actually meant to protect?

To understand the issue of AI-generated code, we must first understand what copyright law is designed to protect. Many people think of copyright simply as a law that protects creative works, but in reality, it is a far more precise system. Copyright does not protect the work itself—it protects the expression embodied in that work. Conversely, ideas themselves are not protected by copyright. For example, the concept of an algorithm or a problem-solving approach can be freely used by anyone. However, the specific way that algorithm is expressed in code is subject to copyright protection.

This principle is critically important in the software industry. It explains why multiple programs can exist that perform the same function. Even if two developers write programs based on the same algorithm, their implementations may differ, and each can be recognized as an independent work. This structure enables competition and reimplementation in software. Technologies such as database systems or web servers have been independently implemented by different companies and projects, yet they coexist legally because their code expressions differ.

Another important point is that software licenses are essentially copyright-based agreements. Licenses such as MIT or GPL are not merely declarations—they are contractual structures through which the copyright holder defines how their code may be used. Because copyright exists, licenses can be established, and through those licenses, usage conditions are determined. For example, the MIT license allows relatively free use, while GPL requires that derivative works maintain the same license. All of this only makes sense under the assumption that the code itself is protected by copyright.

This is where the issue of AI-generated code arises. If code is created by a non-human entity, does that code have copyright? And if it does not, what meaning does a license hold? Since the open source ecosystem operates on the foundation of copyright, this question extends far beyond individual projects—it has the potential to impact the entire software industry. This is why the debate around AI-generated code has attracted attention not only in developer communities but also in legal scholarship.

To find a clue to this question, we need to examine one of the most fundamental principles of copyright law: human authorship.

One of the most fundamental assumptions in the history of copyright law is that copyright applies only to human creations. This principle is a long-established legal tradition, more deeply rooted than many people realize. The U.S. Copyright Office has consistently maintained that copyright protection is granted only when there is human creative contribution. In other words, outputs generated by natural phenomena or purely mechanical processes are, in principle, not eligible for copyright protection. This principle applies not only to photography, music, and literature, but also to software.

A frequently cited example illustrating this principle is the so-called Monkey Selfie case. The case involves a photograph taken in an Indonesian forest, where a photographer set up a camera and a curious monkey ended up triggering it. The image became widely popular online, but it also sparked a legal debate: who owns the copyright? The photographer argued that the photo existed because of his equipment and setup, but the court reached a different conclusion. Since the actual act of taking the photograph was performed by a non-human—the monkey—the image could not be granted copyright. As a result, the photograph was effectively treated as being in the public domain, rather than belonging to any individual.

While this case may seem like an interesting anecdote, it carries an important message in copyright law. Copyright does not arise simply because something has been created. It requires that the work be produced through human creative choices and expression. If the creator is not human, the law does not recognize copyright in the result.

Now, consider how this principle applies to AI-generated code. If an AI model generates code, can that code be considered a human creation? Or is it more analogous to the monkey selfie scenario? If AI-generated code is not regarded as a human-authored work, it may not be eligible for copyright protection. In that case, the entire structure of software licensing could be called into question.

At this point, we are led to the next question.

How does the process of AI code generation actually work?

To answer this, we first need to understand the technical structure behind AI code generation. And within that structure, another critical issue in the AI copyright debate begins to emerge.

The Actual Structure of AI Code Generation

To discuss the copyright of AI-generated code, we must first understand one basic fact. When we say “AI writes code,” the reality of what AI is actually doing is often oversimplified. Many people imagine AI as an intelligent programmer, but large language models do not operate that way. An LLM is fundamentally a massive probabilistic model. It learns from vast amounts of text and code data, and when given an input, it predicts the most likely next token. Code generation is simply an extension of this mechanism.

Looking at this structure more closely, the model is trained on large datasets consisting of publicly available code and documentation. This may include GitHub repositories, open source projects, technical documents, and programming Q&A sites. The model does not learn these as raw text, but breaks them down into units called tokens. A token can be a word or a fragment of code syntax. During training, the model statistically learns which tokens are likely to appear in a given context. Through this process, patterns such as algorithms, coding styles, and function structures are compressed into the model’s internal weights.

During code generation, these learned patterns are applied. When a user provides a prompt such as “write a Python function to send an HTTP request,” the model identifies similar patterns from its training data and generates code by predicting the most probable sequence of tokens. The key point is that this process does not simply copy existing code. In most cases, the model produces a new sequence of tokens based on learned patterns rather than reproducing exact code. As a result, the generated code may be structurally similar to existing code, but it is unlikely to be identical.

However, this is precisely where an important question arises. If the model is not directly copying existing code, how should the legal status of that code be determined? If the patterns learned by the model effectively reconstruct the expression of existing code, is the output an independent creation, or is it a derivative of prior works? This is not merely a technical question—it is directly tied to legal interpretation, because one of the core concepts in copyright law is the derivative work.

Once we understand how AI code generation works, the next question naturally follows. Should AI-generated code be considered simply “new code,” or should it be viewed as a derivative influenced by existing code? This question lies at the heart of the debate over AI and software copyright.

Applying the Human Authorship principle to AI-generated code leads to an intuitive conclusion. If code is created not by a human but by an AI, it may not be eligible for copyright protection. In fact, the U.S. Copyright Office has repeatedly stated in its guidelines that outputs without human creative input cannot be protected by copyright. If this principle is applied strictly, AI-generated code may fall outside the scope of copyright protection.

However, this conclusion introduces a far more complex set of problems. If there is no copyright, then no legal rights exist over that code. No one can claim ownership, and no one can assert exclusive rights to its use. This situation is often referred to as a “copyright vacuum”—a state in which a work exists, but no one holds rights over it. In such cases, the output effectively resembles something in the public domain.

Here, an interesting paradox emerges. Open source licenses are built on copyright. Whether it is MIT or GPL, these licenses function as contractual frameworks that define usage conditions based on the existence of a copyright holder. But if AI-generated code has no copyright, the license itself may lose its legal meaning. For example, even if AI-generated code is distributed under the MIT license, there may be no entity with the legal authority to grant that license. In this case, the license becomes little more than a declaration, lacking binding legal force.

Another issue is the “ownership void.” If AI-generated code has no copyright, then who owns it? It is difficult to attribute ownership to the user, since the user did not directly write the code. It is also problematic to assign ownership to the company that created the AI model, since the model merely performs a probabilistic generation process. For this reason, some legal scholars argue that AI-generated works may ultimately exist in a state close to the public domain.

Yet the issue does not end with the conclusion that “AI-generated code has no copyright.” Most AI-generated code is not created in complete isolation—it is based on patterns learned from existing code. If the generated code reconstructs the expression of prior works, it may still be considered a derivative work. At this point, the next debate emerges: is AI code generation merely a reworking of existing code, or is it an entirely new form of creation?

The Derivative Work Problem — Is AI Rewriting Existing Code

In copyright law, a derivative work refers to a new creation based on an existing work. Examples include adapting a novel into a film or modifying an existing program to add new features. While derivative works involve new creativity, they are still influenced by the original work, meaning the original copyright holder’s rights continue to apply. This concept is precisely why Copyleft licenses such as GPL have strong legal force. If GPL code is modified to create a new program, that program must also comply with the GPL.

One of the most critical questions in the debate over AI-generated code lies here. If code produced by AI is influenced by existing open source code, should it be considered a derivative work? The answer is not straightforward. AI models are trained on vast amounts of open source code, which may include a wide range of licenses such as GPL, MIT, and Apache. While the model does not store code directly, it learns patterns. However, if those patterns effectively reconstruct the expression of existing code, there is a possibility that the output could be interpreted as a derivative work.

This issue has already surfaced in the GitHub Copilot controversy. Some developers have identified cases where Copilot reproduced code from specific open source projects almost verbatim. If such cases occur, the issue extends beyond simple code generation and may constitute copyright infringement. On the other hand, the Copilot team argues that the model merely learns general coding patterns, and that most generated outputs are not identical to existing code. As of now, this debate has not reached a clear legal conclusion.

The problem cannot be resolved simply by comparing code similarity. The very structure of how AI models learn makes the relationship between generated code and existing code inherently complex. If AI-generated code is recognized as a form of rewriting existing code, then the entire practice of AI-assisted coding could face widespread license contamination risks. Conversely, if AI-generated code is treated as entirely independent creation, the existing copyright framework may need to be reinterpreted for the AI era. At this point, we are confronted with another question.

How should copyright be determined for code created collaboratively by humans and AI?

This question becomes even more important in real-world development environments, because most code generation today is not performed by AI alone, but through collaboration between humans and AI.

So far, the discussion has largely assumed cases where AI generates code “on its own.” In reality, however, such situations rarely exist. Most code generation occurs through collaboration between human developers and AI tools. Developers define the problem, craft prompts, review the generated code, and refine or refactor it as needed. AI may propose code or generate initial implementations, but human judgment and decision-making are continuously involved before the result becomes part of a real project. This collaborative structure makes the copyright debate around AI-generated code even more complex.

One of the key concepts in copyright law is creative contribution. Copyright does not arise merely because a result exists—it requires human creative choices in the process of creation. For example, in photography, copyright is not granted simply by pressing the shutter button; it depends on creative decisions such as composition, lighting, and subject selection. Applying this logic to AI-generated code, one could argue that designing prompts, selecting outputs, and modifying code may qualify as human creative contribution.

However, the issue is not that simple. If the human input is minimal—for instance, a prompt like “write a Python function to send an HTTP request”—how much human creativity is actually involved in the result? On the other hand, if the final code is produced through complex prompt design and iterative refinement, the human contribution could be considered substantial. In such cases, it becomes difficult to determine whether AI is merely a tool or a co-participant in the creative process.

This question is not entirely new. Similar debates arose in the past with the introduction of compilers, macro systems, and code generators. At that time, the conclusion was relatively clear: these were tools, and the human developer remained the author. However, generative AI differs in a fundamental way. It does not simply execute predefined rules—it generates code structures and produces new forms of expression. This difference has led to ongoing debate over whether existing legal frameworks can still be applied as they are.

Ultimately, most AI-generated code falls into a gray area—neither purely AI-created nor purely human-created. How copyright should be interpreted in this space is likely to become a major legal issue in the future. And this leads to another important question. If human creativity is not sufficiently recognized in AI-generated code, what is its legal status? Some scholars suggest that in such cases, AI-generated code may effectively exist in a state close to the public domain.

The Possibility of AI Code Becoming Public Domain

One of the most intriguing scenarios in the debate over AI-generated code is the possibility that such code may fall into the public domain. Public domain refers to works that are not owned exclusively by any individual or organization. Anyone can freely use, modify, and distribute them without license restrictions. Classic literature and older music, whose copyright terms have expired, are typical examples.

In the case of AI-generated code, the situation is slightly different. As discussed earlier, if the output is not considered a human creation, it may not qualify for copyright protection. If no copyright exists, the code may effectively exist in a state similar to the public domain from the outset. This creates a subtle tension with open source licensing structures. Licenses such as MIT or GPL rely on the existence of a copyright holder—without copyright, the legal force of these licenses may collapse.

This scenario could lead to significant changes in the software industry. In an era where AI can generate large volumes of code, if that code exists in the public domain, the scarcity of code itself may diminish dramatically. Traditionally, writing code has been a costly creative activity, making code a valuable asset. But if AI can easily generate code that carries no exclusive rights, the very structure of value in software may shift.

This change also creates an interesting relationship with the open source movement. Open source has historically evolved by preserving copyright while enabling code sharing. Copyleft licenses like GPL allow free use of code while requiring derivative works to remain open. However, if AI-generated code exists in the public domain, such licensing strategies may lose their relevance. If code can be freely used by anyone without restriction, there is little need to impose license conditions.

Of course, this scenario is not yet a settled legal conclusion. Many legal scholars argue that some level of human creative contribution still exists in AI-assisted code generation, and therefore copyright may still apply. However, one point is clear: AI code generation is weakening the traditional assumption of “creative scarcity” that underpins software copyright systems. And this shift leads to a broader question about the future of copyright itself.

The debate over AI-generated code is not merely about a specific project or tool. It raises a broader question: how should the copyright system adapt to a new technological environment? Most modern copyright laws are built on frameworks established in the mid-20th century. These laws assume that humans create works, and other humans copy or distribute them. However, generative AI fundamentally challenges this assumption.

AI models can produce content at speeds far beyond human capability. Not only code, but also text, images, music, and video are now being generated by AI. In this context, it is increasingly unclear whether the traditional copyright system can remain unchanged. Some scholars argue that a new form of copyright framework is needed for AI-generated works. Proposals include applying limited protection periods or providing compensation to the original copyright holders whose works were used in training AI models.

On the other hand, a contrasting perspective suggests that AI may actually reduce the need for copyright protection. If the cost of generating code becomes extremely low, the competitive advantage may shift away from owning code toward providing services and operational capabilities. From this viewpoint, AI could drive a transition in the software industry—from a “code-centric economy” to a “service-centric economy.”

This debate has not yet reached a conclusion. Legal precedents are still limited, and copyright policies vary across countries. What is clear, however, is that AI code generation is not just changing development tools—it is redefining the relationship between law and technology. And within this transformation, one of the core philosophies of the open source movement, Copyleft, is facing a new challenge.

In an era where AI generates code, can Copyleft still retain its meaning?
Or will its role gradually diminish as code generation technologies evolve?

This question leads directly into the next topic.

Can Copyleft survive in the age of AI?

So far, we have explored multiple layers of issues centered around a single question. From the technical structure of AI code generation, to what copyright protects, to the Human Authorship principle, and finally to derivative works and licensing, all of these discussions converge on one conclusion: generative AI is not merely a new development tool—it is fundamentally challenging the assumptions of the software copyright system. For a long time, the software industry has operated on the premise that human developers write code. Writing code required time and effort, and the result was treated as a creative work protected by copyright. This structure allowed software to become an asset, and licensing systems built around that asset became a core foundation of the industry.

However, with the rise of generative AI, this structure is becoming increasingly unstable. AI can generate code at a speed incomparable to humans, and much of that code is already of sufficient quality to be used in real development environments. As the cost of producing code drops dramatically, the scarcity of code itself is diminishing. At the same time, the concept of human creativity—on which copyright depends—is becoming blurred. It is no longer clear where to draw the boundary between a human-written prompt and an AI-generated output. This is not just a technical issue—it is a legal and philosophical one.

Who owns code created by AI?

There is still no definitive answer to this question. Some legal scholars emphasize that AI-generated code may not qualify for copyright because it is not a human creation. Others argue that prompt design and human selection may constitute sufficient creative contribution. Still others suggest that AI-generated code could be interpreted as a derivative work of existing code. Each interpretation leads to different legal outcomes. Depending on which view is adopted, the structure of AI code licensing, the open source ecosystem, and even the economic foundations of the software industry could change.

This debate is particularly important because the software industry has already embraced AI code generation. Many developers now use tools like Copilot or Claude in their daily workflows, and code generation is no longer experimental—it is part of production environments. In this reality, the gap between law and technology continues to widen. While the law remains grounded in human authorship, actual code production is rapidly shifting toward human–AI collaboration. How this gap will be resolved remains uncertain.

One thing, however, is clear. AI code generation is forcing us to revisit the fundamental questions of copyright. What is creation? What is expression? And who owns code? These are no longer abstract legal theories—they must now be redefined within the realities of modern technology. The traditional copyright system was built for a world where humans create and humans copy. But we are now entering an era where humans and machines create together.

This transformation also has significant implications for the open source movement. Open source licensing is built on copyright as a mechanism for enabling code sharing. Yet as AI-generated code becomes more widespread, the meaning of ownership and licensing itself is being questioned. If AI-generated code is not protected by copyright, how can licensing structures be maintained? Conversely, if such code is considered derivative of existing works, how should Copyleft licenses be applied?

These questions ultimately lead to another critical issue, which will be explored in the next article.

Can Copyleft survive in the age of AI?