Tag Archives: Copyright

GitHub Copilot – Your AI-powered accomplice to steal code?

Last week GitHub and its parent company Microsoft announced “GitHub Copilot – their/your new AI pair programmer”. E.g. The New Stack, The Verge or CNBC have reported extensively about it. And there is a lot of buzz around this new service, especially within the Open Source and Free Software world. Not only by its developers, but also among its supporting lawyers and legal experts, although the actual news is not that ground breaking, because it is not the first of its kind. Similar ML-/AI-based offers like Tabnine, Kite, CodeGuru, and IntelliCode are already out there, which have also been trained with public code.

Copilot currently is in “technical preview” and planned to be offered as commercial version according to GitHub.

Illustration: GitHub Inc. © 2021

The core of it appears to be OpenAI Codex, a descendant of the famous GPT-3 for natural language processing. According to its homepage it “[…] has been trained on a selection of English language and source code from publicly available sources, including code in public repositories on GitHub”. Update 2021/07/08: GitHub Support appears to have confirmed that all public code at GitHub was used as training data.

GitHub is the platform where the majority of source code of the global Open Source community has meanwhile been accumulated: 65+ million developers, 200+ million repositories (as of 2021) or 23+ million owners of 128+ million public repositories (as of 2020). Alternatives to it have become scarce as long as you do not want to host it on your own.

Great, in what amazing times we are living in! Sounds like with Copilot you do not need your human co-programmers any longer, who assisted you during the good old times in form of pair-programming or code review. Lucky you and especially your employer. On top you will save precious time because it will help you to directly fix a bug, write typical functions or even “[…] learn how to use a new framework without spending most of your time spelunking through the docs or searching the web”. Not to forget about copying & pasting useful code fragments from Stackoverflow or other publicly available sources like GitHub.

At the same time, two essential questions arise, in case you care a bit about authorship:

  1. Did the training of the AI infringe any copyright of the original authors who actually wrote the code that was used as training data?
  2. Will you violate any copyright by including Copilot’s code suggestions in your source code?

Let’s not talk about another aspect that GitHub mentions in their FAQs – personal data: “[…] In some cases, the model will suggest what appears to be personal data – email addresses, phone numbers, access keys, etc. […]”

Continue reading GitHub Copilot – Your AI-powered accomplice to steal code?

Open Source Legal Notes

In his post “Is it time to revise the Open Source Definition?” the legal council of Red Hat Richard Fontana argues that the Open Source Definition (OSD) might need some review and improvement:

  • Aiming at OSD #7: Patents should be addressed to prevent recent (mis)interpretations that Open Source licenses are “Copyright only”.
  • Aiming at OSD #9: Unwanted licensing effects on non-related software should be excluded upfront to prevent any future disputes like about the SSPL.
  • Freedom 0 of the Free Software Definition – “to run the program as you wish” – should be included in the OSD for reasons of clarity.

The Software Freedom Conservancy received a $100,000 grant by the Amateur Radio Digital Communications (ARDC) for GPL enforcement.

A few days ago the oral hearing of the lawsuit between Oracle and Google were held at the U.S. Supreme Court, after it had been delayed by COVID-19. McCoy Smith shares his observations and interpretation in a detailed post “Oracle/Google” at Lex Pan Law. The litigation is over the copyrightability and if so infringement of certain parts of Java (mainly APIs) that were used within Android. If Oracle wins it will have significant impact on the whole software world and especially Open Source. Ultimately any API (use) would become subject to copyright.