The code generated by Copilot currently seeems to not be a
work, due to how Berne Convention works. That’s in line with earlier US Copyright Office’s opinions and US courts’ verdicts, and confirmed in the most recent case.
(1)(2) US law is not binding to court elsewhere, but:
- For the matter discussed the copyright law in United States is the primary concern, as this is where “everything happens”.
- That is tightly bound to the very core of the Berne Convention, which applies virtually everywhere on this planet.
If that is not a
work, I find the entire discussion about licensing breach by the generated code moot.
What in my opinion is more important, is whether a trained model is considered a derived
work. I see that as much more interesting: if only a binary answer is possible, I can conceive three options:
- A trained model is not a work at all
Implies: under the existing copyright regime it has no protection. A new set of IP laws may be forged, but that takes time and gives an opportunity to influence them considerably easier than what is possible with copyright. - A trained model is a work, and is a derived work
Implies: a requirement to abide to licensing terms, including both attribution and granting various rights to the licensees. - A trained model is a work, and is not a derived work
Implies: opposition on these grounds is not possible and creators of later models may use works for training in a similar fashion.
One of the contention points with Copilot is also that it was trained on FOSS sources and data leeched from community, but avoided touching proprietary
works to which GitHub also has access. Just because that’s possibly legal doesn’t mean it is perceived as acceptable. My own gripe with Copilot is of different nature. Machine learning on that scale is a relatively new subject facing many philosophical challenges. I think it is still too early to definitely say that Copilot significally differs, in qualitative terms, from a programmer acquiring knowledge from reading sources. The pain, which I find understated in opposition, is the possibility of making software development dependent on such tools. You may say “it’s like a calculator to a mathematician”. But it becomes a problem if — in order to be able to compete with other developers — you are forced to use that calculator and it’s almost guaranteed there will only be a few calculator manufacturers, which use their position to push abusive licensing terms.
Returning to the first paragraph, that situation gives rise to another interesting and very complex situation. If Copilot’s output is not a copyrightable
work, any program written with it seem to contain fragments that can’t be protected by copyright. What could that imply if owner of the entire program claims infrigement, but it’s found it only applied to such a fragment? What if the defendant could prove that? If that’s a possibility, whose obligation is it to make the proof and how should it look like? Though hypothetical and a thought experiment in nature, extreme cases to that problem are pretty intriguing.
(1) https://www.theverge.com/2022/2/21/22944335/us-copyright-office-reject-ai-generated-art-recent-entrance-to-paradise(2) Second Request for Reconsideration for Refusal to Register A Recent Entrance to Paradise, US Copyright Office (February 2022)