Tag Archives: License Compliance

GPL compliance and permissive training data theory

This is the second post within a new series that I might start one day, about how companies abuse common misunderstanding of the GNU General Public License (GPL) to sell their stuff. Today, a slightly scary example. Scary, as it is so off the point.

The company Exafunction, Inc. claims that with their product “Codeium” they can provide intelligent programming assistance, based on a large language model (LLM). Just like Copilot of GitHub, Inc. and even better as they do not infringe any license and specifically not the GPL. Their writeup “GitHub Copilot Emits GPL. Codeium Does Not.” provides an adventurous interpretation about the GPL: You need consent to use it in a commercial context. Moreover training your model on purely permissive-licensed code will free you of any legal trouble.

Things are slightly different. Strange that nobody told them in their “… early conversations with the open source community”.

The GPL does not restrict commercial use. It does not even refer to it at all. You are fine in any fields of endeavour as long as you respect and fulfil its obligations.

The main problem with generative AI and the current ML-based programming assistants is that you cannot trace verbatim copies of code to its origin. Due to that you cannot fulfil the most essential obligation of any Free and Open Source Software license: attribution. Calling out the original authors.

It does not help if you train your model with just permissive-licensed code. You will infringe the underlying licensing terms if you do not provide any reference to the original authors and license(s). No matter if it is a permissive or copyleft license. Either way you will not have a valid legal base, speak license, to re-use the original work and it is as bad as any copyright violation with all of its consequences.

For more details or before starting the marketing campaign of your new programming assistant, it could be worth to take a closer look, for example at the ongoing GitHub Copilot litigation and its underlying motivation.

GPL compliance and the persistent cancer theory

In the golden age of Open Source compliance offerings, one of the key marketing argument still appears to be: “The General Public License (GPL) is sooo risky. In case of GPL infringement, you will have to release all of your code – speak your intellectual property (IP) – under the same terms. Take our license scanner as we are the best to protect you against such nightmares.”

That statement simply is not correct. But very effective if you want to sell your services. Which company wants to be forced to release its valuable IP into the public only by not following specific license terms?

This myth was supposedly framed by Steve Balmer of Microsoft who once said back in 2001: “The way the license is written, if you use any open-source software, you have to make the rest of your software open source. […] Linux is a cancer that attaches itself in an intellectual property sense to everything it touches. That’s the way that the license works.”

His general understanding of one of the basic principles of Free Software and the GPL – reciprocity – speaks of great intellectual power. However this muddle-headed theory in total is utterly wrong but still persistent today serving as one of the main arguments to sell license compliance offerings.

Even infringing the terms of the GPL will never force you to put your own source code under the same license. Simple as that.

Sure, in the worst case you have violated a software license. In this aspect there is no difference between the GPL or any other even proprietary license. Copyright infringement claims are caused by

  • the actual violation of the license and
  • the unlicensed use of software.

You have to cope with its consequences. Legal remedies are

  • punitive damages and
  • injunction to not distribute your product any further.

Not more, not less.

Continue reading GPL compliance and the persistent cancer theory