Response to: Is GPT-3 Useful for Natural Language Generation?

Professor Ehud Reiter (my PhD supervisor) just wrote a piece on OpenAI’s GPT-3, arguing that its applications will be quite limited.

While I do broadly agree with the claims being made, I noticed several points of disagreement, so I resolved to take this chance to explore why our models of the world differ.

Let me first state our common ground:

  • Without humans aiding it, GPT-3 produces inconsistent results
  • Its context window (its memory span) is quite limited, so it cannot produce long coherent texts on its own
  • GPT-3 will most likely not be able to accurately summarize and explain data on its own
  • GPT-3 probably cannot be used to produce fully automatic propaganda, fake news, or any other kind of writing
  • There is probably a lot of hype around GPT-3 based on misunderstandings of its capabilities

So I do actually agree a lot with Ehud’s post!

But there are two parts where I noticed a clash of beliefs:

Fantastic PR/Marketing

Ehud framed the staged release of GPT-3’s older sibling GPT-2 as a publicity stunt.

I think that this is too uncharitable— OpenAI has brought together a team of ethicists and policy experts on board of the caliber of Amanda Askell and Miles Brundage who are doing very fruitful work around publication norms in AI.

But regardless of the intentions of its creators, I am more interested on the actual consequences of the staged release of GPT-2:

  1. First of all let me claim something contentious: as an isolated case, the stage release of GPT-2 was probably unneccessary. Open AI’s expertise could have assessed and concluded on its own that potential downsides were reasonably bounded in time and magnitude.
  2. But this has started a practical and much needed conversation around publications norms in AI. When dealing with cutting edge technology, regulation lags behind research and development. For something as deeply transformative as AI, this means that researchers need to step up to assess the risks and consequences and cooperate to avoid bad outcomes. The degree of opennes in publications is a decision that faces complex trade-offs.

The staged release of GPT-2 did not, in my opinion, reduce a lot of danger in expectation— in the worst case, it prevented a few months of hypothetical disarray as we raced to develop cultural and technological solutions to deal with its automated writing (a worry that in the end did not materialize).

But the staged release sets precedent for starting a community-wide conversation around potentially dangerous technologies — and that is an space for innovation we should take seriously.


Ehud discusses some applications, pointing own that GPT-3 is probably not consistent and reliable enough to produce marketable fiction, dialogue, or advertisement by itself.

But I think this assessment plays to GPT-3’s weaknesses rather than its strength as an amplifier for human thought.

There are plenty of times when I have a feeling of what I want to say, but I struggle to write it. A sort of writer’s block that internally feels like a “When I see I will know it”. If a language model was to offer me multiple sentence completions and let me choose which one to keep, I am fairly sure I would use it, both in my academic writing, when writing code and when writing fiction.

There is plenty of precedent for this — my writing is already augmented by spell chekers, my phone autocompletes my sentences (with some annoying consistent mistakes, but on net it is good enough that I keep it on) and my compiler gives me plenty of helpful messages to fix my code.

I think this is the context where we will see GPT-X-like language models shine, and how we may think of them for the time being — as autocompleters, turned up to the 11.

GPT-3 has been undoubtably an important development of our field. It is an empirical demostration of the scaling laws posited by Open AI, and an exciting avenue of research.

As with many new technologies, it is right now hard to anticipate what its consequences will be — which does not mean we should not try, but we at least attempt to calibrate our predictions.

I am personally looking forward to seeing what applications for GPT-3 developers will come up with, and to maintain a healthy conversation about responsable disclosure norms in our research community.

ESR NL4XAI. Math and computer science expert.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store