Opinion

Podcast: Jeremy Howard is bearish on LLMs

​Jeremy Howard was recently[1] interviewed on the Machine Learning Street Talk podcast: YouTube link, interactive transcript, PDF transcript.Jeremy co-invented LLMs in 2018, and taught the excellent fast.ai online course which I found very helpful back when I was learning ML, and he uses LLMs all the time, e.g. 90% of his new code is typed by an LLM (see below).So I think his “bearish”[2] take on LLMs is an interesting datapoint, and I’m putting it out there for discussion.Some relevant excerpts from the podcast, focusing on the bearish-on-LLM part, are copied below! (These are not 100% exact quotes, instead I cleaned them up for readability.)So you know Piotr Woźniak, who’s a guy I really respect, who kinda rediscovered spaced repetition learning, built the SuperMemo system, and is the modern day guru of memory: The entire reason he’s based his life around remembering things is because he believes that creativity comes from having a lot of stuff remembered, which is to say, putting together stuff you’ve remembered in interesting ways is a great way to be creative.LLMs are actually quite good at that.But there’s a kind of creativity they’re not at all good at, which is, you know, moving outside the distribution….You have to be so nuanced about this stuff because if you say “they’re not creative”, it can give the wrong idea, because they can do very creative seeming things.But if it’s like, well, can they really extrapolate outside the training distribution? The answer is no, they can’t. But the training distribution is so big, and the number of ways to interpolate between them is so vast, we don’t really know yet what the limitations of that is.But I see it every day, because my work is R&D. I’m constantly on the edge of and outside the training data. I’m doing things that haven’t been done before. And there’s this weird thing, I don’t know if you’ve ever seen it before, but I see it multiple times every day, where the LLM goes from being incredibly clever to, like, worse than stupid, like not understanding the most basic fundamental premises about how the world works. And it’s like, oh, whoops, I fell outside the training data distribution. It’s gone dumb. And then, like, there’s no point having that discussion any further because you’ve lost it at that point.…I mean, I think they can’t go outside their distribution because it’s just something that that type of mathematical model can’t do. I mean, it can do it, but it won’t do it well.You know, when you look at the kind of 2D case of fitting a curve to data, once you go outside the area that the data covers, the curves disappear off into space in wild directions, you know. And that’s all we’re doing, but we’re doing it in multiple dimensions. I think Margaret Boden might be pretty shocked at how far “compositional creativity” can go when you can compose the entirety of the human knowledge corpus. And I think this is where people often get confused, because it’s like—So for example, I was talking to Chris Lattner yesterday about how Anthropic had got Claude to write a C compiler. And they were like, “oh, this is a clean-room C compiler. You can tell it’s clean-room because it was created in Rust.” So, Chris created the, I guess it’s probably the top most widely used C / C++ compiler nowadays, Clang, on top of LLVM, which is the most widely used kind of foundation for compilers. They’re like: “Chris didn’t use rust. And we didn’t give it access to any compiler source code. So it’s a clean-room implementation.”But that misunderstands how LLMs work. Right? Which is: all of Chris’s work was in the training data. Many many times. LLVM is used widely and lots and lots of things are built on it, including lots of C and C++ compilers. Converting it to Rust is an interpolation between parts of the training data. It’s a style transfer problem. So it’s definitely compositional creativity at most, if you can call it creative at all. And you actually see it when you look at the repo that it created. It’s copied parts of the LLVM code, which today Chris says like, “oh, I made a mistake. I shouldn’t have done it that way. Nobody else does it that way.” Oh, wow. Look. The Claude C compiler is the only other one that did it that way. That doesn’t happen accidentally. That happens because you’re not actually being creative. You’re actually just finding the kind of nonlinear average point in your training data between, like, Rust things and building compiler things.…I’m much less familiar with math than I am computer science, but from talking to mathematicians, they tell me that that’s also what’s happening with like, Erdős problems and stuff. It’s some of them are newly solved. But they are not sparks of insight. You know, they’re solving ones that you can solve by mashing up together very closely related things that humans have already figured out.…The thing is, none of these guys have been software engineers recently. I’m not sure Dario’s ever been a software engineer at all. Software engineering is a unusual discipline, and a lot of people mistake it for being the same as typing code into an IDE. Coding is another one of these style transfer problems. You take a specification of the problem to solve and you can use your compositional creativity to find the parts of the training data which interpolated between them solve that problem, and interpolate that with syntax of the target language, and you get code.There’s a very famous essay by Fred Brooks written many decades ago, No Silver Bullet, and it almost sounded like he was talking about today. He was pointing to something very similar, which is, in those days, it was all like, “oh, what about all these new fourth generation languages and stuff like that, you know, we’re not gonna need any software engineers anymore, because software is now so easy to write, anybody can write it”. And he said, well, he guessed that you could get at maximum a 30% improvement. He specifically said a 30% improvement in the next decade, but I don’t think he needed to limit it that much. Because the vast majority of work in software engineering isn’t typing in the code.So in some sense, parts of what Dario said were right: for quite a few people now, most of their code is being typed by a language model. That’s true for me. Say, like, maybe 90%. But it hasn’t made me that much more productive, because that was never the slow bit. It’s also helped me with kind of the research a lot and figuring out, you know, which files are gonna be touched.But any time I’ve made any attempt to getting an LLM to like design a solution to something that hasn’t been designed lots of times before, it’s horrible. Because what it actually, every time, gives me is the design of something that looks on its surface a bit similar. And often that’s gonna be an absolute disaster, because things that look on the surface a bit similar and like I’m literally trying to create something new to get away from the similar thing. It’s very misleading.…The difference between pretending to be intelligent and actually being intelligent is entirely unimportant, as long as you’re in the region in which the pretense is actually effective. So it’s actually fine, for a great many tasks, that LLMs only pretend to be intelligent, because for all intents and purposes, it just doesn’t matter, until you get to the point where it can’t pretend anymore. And then you realize, like, oh my god, this thing’s so stupid.^The podcast was released March 3 2026. Not sure exactly when it was recorded, but it was definitely within the previous month, since they talk about a blog post from Feb. 5.^I mean, he’s “bearish” compared to the early-2026 lesswrong zeitgeist—which really isn’t saying much!Discuss ​Read More

​Jeremy Howard was recently[1] interviewed on the Machine Learning Street Talk podcast: YouTube link, interactive transcript, PDF transcript.Jeremy co-invented LLMs in 2018, and taught the excellent fast.ai online course which I found very helpful back when I was learning ML, and he uses LLMs all the time, e.g. 90% of his new code is typed by an LLM (see below).So I think his “bearish”[2] take on LLMs is an interesting datapoint, and I’m putting it out there for discussion.Some relevant excerpts from the podcast, focusing on the bearish-on-LLM part, are copied below! (These are not 100% exact quotes, instead I cleaned them up for readability.)So you know Piotr Woźniak, who’s a guy I really respect, who kinda rediscovered spaced repetition learning, built the SuperMemo system, and is the modern day guru of memory: The entire reason he’s based his life around remembering things is because he believes that creativity comes from having a lot of stuff remembered, which is to say, putting together stuff you’ve remembered in interesting ways is a great way to be creative.LLMs are actually quite good at that.But there’s a kind of creativity they’re not at all good at, which is, you know, moving outside the distribution….You have to be so nuanced about this stuff because if you say “they’re not creative”, it can give the wrong idea, because they can do very creative seeming things.But if it’s like, well, can they really extrapolate outside the training distribution? The answer is no, they can’t. But the training distribution is so big, and the number of ways to interpolate between them is so vast, we don’t really know yet what the limitations of that is.But I see it every day, because my work is R&D. I’m constantly on the edge of and outside the training data. I’m doing things that haven’t been done before. And there’s this weird thing, I don’t know if you’ve ever seen it before, but I see it multiple times every day, where the LLM goes from being incredibly clever to, like, worse than stupid, like not understanding the most basic fundamental premises about how the world works. And it’s like, oh, whoops, I fell outside the training data distribution. It’s gone dumb. And then, like, there’s no point having that discussion any further because you’ve lost it at that point.…I mean, I think they can’t go outside their distribution because it’s just something that that type of mathematical model can’t do. I mean, it can do it, but it won’t do it well.You know, when you look at the kind of 2D case of fitting a curve to data, once you go outside the area that the data covers, the curves disappear off into space in wild directions, you know. And that’s all we’re doing, but we’re doing it in multiple dimensions. I think Margaret Boden might be pretty shocked at how far “compositional creativity” can go when you can compose the entirety of the human knowledge corpus. And I think this is where people often get confused, because it’s like—So for example, I was talking to Chris Lattner yesterday about how Anthropic had got Claude to write a C compiler. And they were like, “oh, this is a clean-room C compiler. You can tell it’s clean-room because it was created in Rust.” So, Chris created the, I guess it’s probably the top most widely used C / C++ compiler nowadays, Clang, on top of LLVM, which is the most widely used kind of foundation for compilers. They’re like: “Chris didn’t use rust. And we didn’t give it access to any compiler source code. So it’s a clean-room implementation.”But that misunderstands how LLMs work. Right? Which is: all of Chris’s work was in the training data. Many many times. LLVM is used widely and lots and lots of things are built on it, including lots of C and C++ compilers. Converting it to Rust is an interpolation between parts of the training data. It’s a style transfer problem. So it’s definitely compositional creativity at most, if you can call it creative at all. And you actually see it when you look at the repo that it created. It’s copied parts of the LLVM code, which today Chris says like, “oh, I made a mistake. I shouldn’t have done it that way. Nobody else does it that way.” Oh, wow. Look. The Claude C compiler is the only other one that did it that way. That doesn’t happen accidentally. That happens because you’re not actually being creative. You’re actually just finding the kind of nonlinear average point in your training data between, like, Rust things and building compiler things.…I’m much less familiar with math than I am computer science, but from talking to mathematicians, they tell me that that’s also what’s happening with like, Erdős problems and stuff. It’s some of them are newly solved. But they are not sparks of insight. You know, they’re solving ones that you can solve by mashing up together very closely related things that humans have already figured out.…The thing is, none of these guys have been software engineers recently. I’m not sure Dario’s ever been a software engineer at all. Software engineering is a unusual discipline, and a lot of people mistake it for being the same as typing code into an IDE. Coding is another one of these style transfer problems. You take a specification of the problem to solve and you can use your compositional creativity to find the parts of the training data which interpolated between them solve that problem, and interpolate that with syntax of the target language, and you get code.There’s a very famous essay by Fred Brooks written many decades ago, No Silver Bullet, and it almost sounded like he was talking about today. He was pointing to something very similar, which is, in those days, it was all like, “oh, what about all these new fourth generation languages and stuff like that, you know, we’re not gonna need any software engineers anymore, because software is now so easy to write, anybody can write it”. And he said, well, he guessed that you could get at maximum a 30% improvement. He specifically said a 30% improvement in the next decade, but I don’t think he needed to limit it that much. Because the vast majority of work in software engineering isn’t typing in the code.So in some sense, parts of what Dario said were right: for quite a few people now, most of their code is being typed by a language model. That’s true for me. Say, like, maybe 90%. But it hasn’t made me that much more productive, because that was never the slow bit. It’s also helped me with kind of the research a lot and figuring out, you know, which files are gonna be touched.But any time I’ve made any attempt to getting an LLM to like design a solution to something that hasn’t been designed lots of times before, it’s horrible. Because what it actually, every time, gives me is the design of something that looks on its surface a bit similar. And often that’s gonna be an absolute disaster, because things that look on the surface a bit similar and like I’m literally trying to create something new to get away from the similar thing. It’s very misleading.…The difference between pretending to be intelligent and actually being intelligent is entirely unimportant, as long as you’re in the region in which the pretense is actually effective. So it’s actually fine, for a great many tasks, that LLMs only pretend to be intelligent, because for all intents and purposes, it just doesn’t matter, until you get to the point where it can’t pretend anymore. And then you realize, like, oh my god, this thing’s so stupid.^The podcast was released March 3 2026. Not sure exactly when it was recorded, but it was definitely within the previous month, since they talk about a blog post from Feb. 5.^I mean, he’s “bearish” compared to the early-2026 lesswrong zeitgeist—which really isn’t saying much!Discuss ​Read More

Leave a Reply

Your email address will not be published. Required fields are marked *