Transcript
PUAdj3w3wO4 • François Chollet: Measures of Intelligence | Lex Fridman Podcast #120
/home/itcorpmy/itcorp.my.id/harry/yt_channel/out/lexfridman/.shards/text-0001.zst#text/0440_PUAdj3w3wO4.txt
Kind: captions Language: en the following is a conversation with francois chalet his second time in the podcast he's both a world-class engineer and a philosopher in the realm of deep learning and artificial intelligence this time we talk a lot about his paper titled on the measure of intelligence that discusses how we might define and measure general intelligence in our computing machinery quick summary of the sponsors babel masterclass and cash app click the sponsor links in the description to get a discount and to support this podcast as a side note let me say that the serious rigorous scientific study of artificial general intelligence is a rare thing the mainstream machine learning community works on very narrow ai with very narrow benchmarks this is very good for incremental and sometimes big incremental progress on the other hand the outside the mainstream renegade you could say agi community works on approaches that verge on the philosophical and even the literary without big public benchmarks walking the line between the two worlds is a rare breed but it doesn't have to be i ran the agi series at mit as an attempt to inspire more people to walk this line deep mind and open ai for time and still on occasion walk this line francois chole does as well i hope to also it's a beautiful dream to work towards and to make real one day if you enjoy this thing subscribe on youtube review it with five stars on apple podcast follow on spotify support on patreon or connect with me on twitter at lex friedman as usual i'll do a few minutes of ads now and no ads in the middle i try to make these interesting but i give you time stamps so you can skip but still please do check out the sponsors by clicking the links in the description it's the best way to support this podcast this show is sponsored by babel an app and website that gets you speaking in a new language within weeks go to babble.com and use colex to get three months free they offer 14 languages including spanish french italian german and yes russian daily lessons are 10 to 15 minutes super easy effective designed by over 100 language experts let me read a few lines from the russian poem by alexander bloch that you'll start to understand if you sign up to babble no it's now i say that you'll start to understand this poem because russian starts with a language and ends with the vodka now the latter part is definitely not endorsed or provided by babel it will probably lose me this sponsorship although it hasn't yet but once you graduate with babel you can roll my advanced course of late night russian conversation over vodka no app for that yet so get started by visiting babel.com and use codelex to get three months free this show is also sponsored by masterclass sign up at masterclass.com lex to get a discount and to support this podcast when i first heard about masterclass i thought it was too good to be true i still think it's too good to be true for 180 a year you get an all-access pass to watch courses from to list some of my favorites chris hadfield on space exploration hope to have him in this podcast one day neil degrasse tyson on scientific thinking communication neil two will wright creator of simcity and sims on game design carlos santana on guitar carrie casparov von chasse daniel negrano and poker and many more chris hadfield explaining how rockets work and the experience of being launched at the space alone is worth the money by the way you can watch it on basically any device once again sign up at masterclass.com lex to get a discount and to support this podcast this show finally is presented by cash app the number one finance app in the app store when you get it use code lex podcast cash app lets you send money to friends buy bitcoin and invest in the stock market with as little as one dollar since cash app allows you to send and receive money digitally let me mention a surprising fact related to physical money of all the currency in the world roughly eight percent of it is actually physical money the other 92 percent of the money only exists digitally and that's only going to increase so again if you get cash out from the app store google play and use code lex podcast you get ten bucks and cash app will also donate ten dollars to first an organization that is helping to advance robotics and stem education for young people around the world and now here's my conversation with francois chalet what philosophers thinkers or ideas had a big impact on you growing up and today so one author that had a big impact on me when i read these books as a teenager with jean pierre who is a swiss psychologist is considered to be the father of developmental psychology and he has a large body of work about um basically how intelligence develops uh in children and so it's really old work like most of it is from the 1930s 1940s so it's not quite up to date it's actually superseded by many neural developments in developmental psychology but to me it was it was very uh very interesting very striking and actually shaped the early ways in which i started thinking about the mind and development of intelligence as a teenager his actual ideas or the way he thought about it or just the fact that you could think about the developing mind at all i guess both jean-pierre is the author that's reintroduced me to the notion that intelligence and the mind is something that you construct through throughout your life and that you the children uh construct it in stages and i thought that was a very interesting idea which is you know of course very relevant uh to ai to building artificial minds another book that i read around the same time that had a big impact on me uh and and there was actually a little bit of overlap with john pierre as well and i read it around the same time is jeff hawkins on intelligence which is a classic and he has this vision of the mind as a multi-scale hierarchy of temporal prediction modules and these ideas really resonated with me like the the notion of a modular hierarchy um of you know potentially um of compression functions or prediction functions i thought it was really really interesting and it reshaped uh the way it started thinking about how to build minds the hierarchical nature the which aspect also he's a neuroscientist so he was thinking yes actual he's basically talking about how our mind works yeah the notion that cognition is prediction was an idea that was kind of new to me at the time and that i really loved at the time and yeah and the notion that yeah there are multiple scales of processing uh in the brain the hierarchy yes this is before deep learning these ideas of hierarchies in here i've been around for a long time even before on intelligence i mean they've been around since the 1980s um and yeah that was before deep learning but of course i think these ideas really found their practical implementation in deep learning what about the memory side of things i think he's talking about knowledge representation do you think about memory a lot one way you can think of neural networks as a kind of memory you're memorizing things but it doesn't seem to be the kind of memory that's in our brains or it doesn't have the same rich complexity long-term nature that's in our brains yes the brain is more for sparse access memory so that you can actually retrieve um very precisely like bits of your experience the retrieval aspect you can like introspect you can ask yourself questions again yes you can program your own memory and language is actually the tool you used to do that i think language is a kind of operating system for the mind and use language well one of the uses of language is as a query that you run over your own memory use words as keys to retrieve specific experiences of basic concepts specific starts like language is the way you store thoughts not just in writing in the in the physical world but also in your own mind and it's also how you reach with them like imagine if you didn't have language then you would have to you would not have really have a self internally triggered uh way of retrieving past thoughts you would have to rely on external experiences for instance you you see a specific site you smell specific smell and it brings up memories but you would naturally have a way to deliberately deliberately access these memories without language well the interesting thing you mentioned is you can also program the memory you can change it probably with language yeah using language yes well let me ask you a chomsky question which is like first of all do you think language is like fundamental like uh there's turtles what's at the bottom of the turtles they don't go it can't be turtles all the way down is language at the bottom of cognition of everything is like language the fundamental aspect of like what it means to be a thinking thing no i don't think so i think language you disagree with noam chomsky yes language is a layer on top of cognition so it is fundamental to cognition in the sense that to to use a computing metaphor i see language as the operating system uh of the brain of the human mind yeah and the operating system you know is a layer on top of the computer the computer exists before the operating system but the operating system is how you make it truly useful and the operating system is most likely windows not not linux because it's uh language is messy yeah it's messy and it's uh it's um pretty difficult to uh uh inspect it introspect it how do you think about language like we use actually sort of human interpretable language but is there something like a deeper that's closer to like like logical type of statements um like yeah what is the nature of language do you think because there's something deeper than like the syntactic rules we construct is there something that doesn't require utterances or writing or so on are you asking about the possibility that there could exist uh languages for thinking that are not made of words yeah yeah i think so i think so uh the mind is layers right and language is almost like the the outermost the uppermost layer um but before we think in words i think we think in in terms of emotion in space and we think in terms of physical actions and i think a baby babies in particular probably express his thoughts in terms of um the actions uh that they've seen of that or that they can perform and in terms of the in in terms of motions of objects in their environment before they start thinking in terms of words it's amazing to think about that as the building blocks of language so like the kind of actions and ways the babies see the world as like more fundamental than the beautiful shakespearean language you construct on top of it and we we probably don't have any idea what that looks like right like what because it's important for them trying to engineer it into ai systems i think visual analogies and motion is a fundamental building block of the mind and you you actually see it reflected in language like language is full of special metaphors and when you think about things i consider myself very much as a visual thinker you you often express your thoughts um by using things like uh visualizing concepts um in in 2d space or like you solve problems by image imagining yourself navigating a concept space i don't know if you have this sort of experience you said visualizing concept space so like so i certainly think about i certainly met i certainly visualize mathematical concepts but you mean like in concept space visually you're embedding ideas into some into a three-dimensional space you can explore with your mind essentially yeah 2d you're a flatlander you're um okay no i i i do not i always have to uh before i jump from concept to concept i have to put it back down on pape and it has to be on paper i can only travel on 2d paper not inside my mind you're able to move inside your mind but even if you're writing like a paper for instance don't you have like a special representation of your paper like you you visualize where ideas lie topologically in relationship to other ideas kind of like a subway map of the ideas in your paper yeah that's true i mean there there is uh in papers i don't know about you but there feels like there's a destination um there's a there's a key idea that you want to arrive at and a lot of it is in in the fog and you're trying to kind of it's almost like um what's that called when um you do a path planning search from both directions from the start and from the end but and then you find you do like shortest path but like uh you know in game playing you do this with like a star from both sides when you see where they join yeah so you kind of do at least for me i think like first of all just exploring from the start from like uh first principles what do i know uh what can i start proving from that right and then from the destination if i you start backtracking like if if i want to show some kind of sets of ideas what would it take to show them and you kind of backtrack but yeah i don't think i'm doing all that in my mind though like i'm putting it down on paper do you use mind maps to organize your ideas yeah i like mind maps let's get into this i've been so jealous of people i haven't really tried it i've been jealous of people that seem to like they get like this fire of passion in their eyes because everything starts making sense it's like uh tom cruise in the movie was like moving stuff around some of the most brilliant people i know use mind maps i haven't tried really can you explain what the hell a mind map is i guess mind map is a way to make connected mess inside your mind to just put it on paper so that you gain more control over it it's a way to organize things on paper and as as kind of like a consequence for organizing things on paper it start being more organized inside inside your own mind what what does that look like you put like do you have an example like what what what do you what's the first thing you write on paper what's the second thing you write i mean typically uh you you draw a mind map to organize the way you think about a topic so you would start by writing down like the the key concept about that topic like you would write intelligence or something and then you would start adding uh associative connections like what do you think about when you think about intelligence what do you think are the key elements of intelligence so maybe you would have language for instance instead of motion and so you would start drawing notes with these things and then you would see what do you think about when you think about motion and so on and you would go like that like a tree it's a tree or a tree mostly there's a graph to like a tree oh it's it's more of a graph than a tree and um and it's not limited to just you know writing down words you can also uh draw things and it's not it's not supposed to be purely hierarchical right like you can um the point is that you can start once once you start writing it down you can start reorganizing it so that it makes more sense so that it's connected in a more effective way see but i'm so ocd that you just mentioned intelligence and language emotion i would start becoming paranoid that the categorization isn't perfect like that i'll become paralyzed with the mind map that like this may not be so like the even though you're just doing associative kind of connections there's an implied hierarchy that's emerging and i would start becoming paranoid that's not the proper hierarchy so you're not just one way to see mind maps is you're putting thoughts on paper it's like a stream of consciousness but then you can also start getting paranoid well if is this the right hierarchy sure like which it's a mind map it's your mind map you're free to draw anything you want you're free to draw any connection you want and you can just make a different mind my opinion is if you think the central node is not the right node yeah so i suppose there's a fear of being wrong if you want to if you want to organize your ideas by writing down what you think which i think is is very effective like how do you know what you think about something if you don't write it down right uh if you do that the thing is that it imposes a much more uh syntactic structure over your ideas which is not required with mind map so mind map is kind of like a lower level more freehand way of organizing your thoughts and once you've drawn it then you can start uh actually voicing your thoughts in terms of you know paragraphs it's a two-dimensional aspect of layout too right yeah and it's it's a kind of flower i guess you start there's usually you want to start with a central concept yes typically it ends up more like a subway map so it ends up more like a graph a topological graph without a root note yeah so like in a subway map there are some nodes that are more connected than others and there are some nodes that are more important than others right so there are destinations but it's it's not going to be purely like a tree for instance yeah it's fascinating to think that if there's something to that about our about the way our mind thinks by the way i just kind of remembered obvious thing that i have probably thousands of documents in google doc at this point that bullet point lists uh which is you can probably map a mine map to a bullet point list it's the same it's a no it's not it's a tree it's a tree yeah so i create trees but also they don't have the visual element like um i guess i'm comfortable with the structure it feels like it the narrowness the constraints feel more comforting if you have thousands of documents with your own thoughts in google docs why don't you write uh some kind of search engine like maybe a mind map um a piece of software mind mapping software where you write down a concept and then it gives you sentences or paragraphs from your thousand google docs document that match this concept the problem is it's so deeply unlike mind maps it's so deeply rooted in natural language so it's not um it's not semantically searchable i would say because the categories are very you kind of mention intelligence language and motion they're very strong semantic like it feels like the mind map forces you to be semantically clear and specific the bullet points list i have are are sparse desperate thoughts that uh poetically represent a category like motion as opposed to saying motion so unfortunately it's that's the same problem with the internet that's why the idea of semantic web is difficult to get it's uh most language on the internet is a giant mess of natural language that's hard to interpret which so do you think uh do you think there's something to mind maps as um you actually originally brought up as we were talking about kind of cognition and language do you think there's something to mind maps about how our brain actually deals like think reasons about things it's possible i think it's reasonable to assume that there is some level of topological processing in the brain that the brain is very associative in nature and i also believe that a topological space is a better medium to encode thoughts than a geometric space then so i think what's the difference in topological and geometric space well um if you're talking about topologies uh then points are either connected or not so the topology is more like a subway map and geometry is when you're interested in the distance between things and in subway maps you don't really have the concept of distance you only have the concept of whether there is a train going from station a to station b and what we do in deep learning is that we're we're actually dealing with uh geometric spaces we're dealing with concept vectors word vectors uh that have a distance between the gist expressed in terms of dot product um we are not we are not really building topological models usually i think you're absolutely right like distance is a fundamental importance in deep learning i mean it's the continuous aspect of it yes because everything is a vector and everything has to be a vector because everything has to be differentiable if your space is discrete it's no longer differentiable you cannot do deep learning in it anymore well you could but you could only do it by embedding it in a bigger continuous space so if you do topology in the in the context of deep learning you have to do it by embedding your topology in a geometry right yeah well let me uh let me zoom out for a second uh let's get into your paper on the measure of intelligence that uh did you put on 2019 yes okay yeah november november yeah remember 2018 that was a different time yeah i remember i still remember it feels like a different and different different world you could travel you can you know actually go outside and see friends yeah let me ask the most absurd question i think uh there's some non-zero probability there'll be a textbook one day like 200 years from now on artificial intelligence or it'll be called like just intelligence because humans will already be gone it'll be your picture with a quote you know one of the early biological systems would consider the nature of intelligence and they'll be like a definition of how they thought about intelligence which is one of the things you do in your paper on measure intelligence is to ask like well what is intelligence and and uh how to test for intelligence and so on so is there a spiffy quote about what is intelligence what is the definition of intelligence according to francois charley yes so do you think the the superintendent ais of the future will want to remember us do we remember humans from the past and do you think they would be you know they won't be ashamed of having a biology called origin uh no i i think it would be a niche topic it won't be that interesting but it'll be it'll be like the people that study in certain contexts like historical civilization that no longer exist the aztecs and so on that that's how it'll be seen and it'll be studying also the context on social media there will be hashtags about the atrocity committed to human beings um when when the when the robots finally got rid of them like it was a mistake it'll be seen as a as a giant mistake but ultimately in the name of progress and it created a better world because humans were uh over consuming the resources and all they were not very rational and were destructive in the end in terms of productivity and putting more love in the world and so within that context there'll be a chapter about these biological systems seems to have a very detailed vision of that feature you should write a sci-fi novel about it i said i'm working i'm working on a sci-fi novel currently yes yes self-published yeah the definition of intelligence so intelligence is the efficiency with which you acquire new skills at tasks that you did not previously know about that you did not prepare for all right so it is not intelligence is not skill itself it's not what you know it's not what you can do it's how well and how efficiently you can learn new things new things yes the idea of newness there seems to be fundamentally important yes so you would see intelligence on display for instance whenever you see a human being or you know an ai creature adapt to a new environment that it has not seen before that its creators did not anticipate when you see adaptation when you see improvisation when you see generalization that's intelligence uh in reverse if you have a system that's when you put it in a slightly new environment it cannot adapt it cannot improvise it cannot deviate from what it's hardcoded to do oh what what it has been trying to do um that is a system that is not intelligent there's actually a quote from einstein that captures this idea which is the measure of intelligence is the ability to change i i like that quote i think it captures at least part of this idea you know there might be something interesting about the difference between your definition and einsteins i mean he's just being einstein and clever but acquisition of new ability to deal with new things versus ability to just change what's the difference between those two things so just changing itself do you think there's something to that just being able to change yes being able to adapt so not not change but certainly uh changes direction being able to adapt yourself to your environment whatever the environment that's that's a big part of intelligence yes and intelligence is more precisely you know how efficiently you're able to adapt how efficiently you're able to basically master your environment how efficiently you can acquire new skills and i think there's a there's a big distinction to be drawn between intelligence which is a process and the output of that process which is skill so for instance if you have a very smart human programmer that considers the game of chess and that writes down a static program that can play chess then the intelligence is the process of developing that program but the program itself is just encoding the output artifact of that process the program itself is not intelligent and the way you tell it's not intelligent is that if you put it in a different context you ask it to play go or something it's not going to be able to perform well with human involvement because the source of intelligence the entity that is capable of that process is the human programmer so we should be able to tell the difference between the process and its output we should not confuse the output and the process it's the same as you know do not confuse a road building company and one specific road because one specific road takes you from point a to point b but a road building company can take you from you can make a path from anywhere to anywhere else yeah that's beautifully put but it's also to play devil's advocate a little bit you know um it's possible that there's something more fundamental than us humans so you kind of said the programmer creates uh the difference between the the choir of the skill and the skill itself there could be something like you could argue the universe is more intelligent like the the deep the base intelligence of um that we should be trying to measure is something that created humans we should be measuring god or what the source the universe as opposed to like there's there could be a deeper intelligence sure there's always deeper intelligence you can argue that but that does not take anything away from the fact that humans are intelligence and you can't tell that because they are capable of adaptation and and generality um and you see that in particular and the fact that uh humans are capable of handling uh situations and tasks that are quite different from anything that any of our evolutionary ancestors has ever encountered so we are capable of generalizing very much out of distribution if you consider our evolutionary history as being in a way else training data course evolutionary biologists would argue that we're not going too far out of the distribution we're like mapping the skills we've learned previously desperately trying to like jam them into like these new situations i mean there's definitely a little bit a little bit of that but it's pretty clear to me that we're able to uh you know most of the things we do any given day in our modern civilization are things that are very very different from what you know our ancestors a million years ago would have been doing in in a given day and your environment is very different so i agree that um everything we do we do it with cognitive building blocks that we acquired over the course of revolution right and that anchors um our cognition to a certain context which is the human condition very much but still our mind is capable of a pretty remarkable degree of generality far beyond anything we can create in artificial systems today like the degree in which the mind can generalize from its evolutionary history can generalize away from its evolutionary history is much greater than the degree to which a depending system today can generalize away from its training data and like the key point you're making which i think is quite beautiful is like we shouldn't measure if we talk about measurement we shouldn't measure the skill we should measure like the creation of the new skill the ability to create that new skill yes but there it's tempting like it's weird because the skill is a little bit of a small window into the into the system so whenever you have a lot of skills it's tempting to measure the skills yes i mean the skill is the uh only thing you can objectively measure but yeah so the the thing to keep in mind is that when you see skill in the human it gives you a strong signal that that human is intelligent because you knew they weren't born with that skill typically like you say this you see a very strong chess player maybe you're a very stronger player yourself i think you're and you're you're saying that because i'm russian and now now you're you're prejudiced you assume oh yeah it's just biased i'm biased yeah well you're dead by us um so if you see a very strong chess player you know they weren't born knowing how to play chess so they had to acquire that skill with their limited resources with their limited lifetime and you know they did that because they are generally intelligent and so they may as well have acquired any other skill you know they have this potential and on the other hand if you see a computer playing a chess you cannot make the same assumptions because you cannot you know just assume the computer is generally intelligent the computer may be born knowing how to play chess in the sense that it may have been programmed by a human that has understood chess for the computer and and that has just encoded um the output of that understanding in aesthetic program and that program is not intelligent so let's zoom out just for a second and say like what is the goal of the on the measure of intelligence paper like what do you hope to achieve with it so the goal of the paper is to clear up some long-standing misunderstandings about the way we've been conceptualizing intelligence in the ai community and in the way we've been evaluating progress in ai there's been a lot of progress recently in machine learning and people are you know extrapolating from that progress that we're about to solve general intelligence and if you want to be able to evaluate these statements you need to precisely define what you're talking about when you're talking about general intelligence and you need a formal way a reliable way to measure how much intelligence how much general intelligence a system processes and ideally this measure of intelligence should be actionable so it should not just describe what intelligence is it should not just be a binary indicator that tells you the system is intelligent or it isn't um it should be actionable it should have explanatory power right so you could use it as a feedback signal it would show you uh the way towards building more intelligent systems so at the first level you draw a distinction between two divergent views of intelligence of um as we just talked about intelligence is a collection of tax task specific skills and a general learning ability so what's the difference between kind of this memorization of skills and a general learning ability we've talked about a little bit but can you try to linger on this topic for a bit yeah so the first part of the paper uh is uh an assessment of the different ways uh we've been thinking about intelligence and the different ways we've been evaluating progress in ai and the history of cognitive sciences has been shaped by two views of the human mind and one view is the evolutionary psychology view in which the mind is a collection of fairly static special purpose ad-hoc mechanisms that have been hard coded by evolution over our our history as a species over a very long time and um early ai researchers people like marvin minsky for instance they clearly subscribed to this view and they saw they saw the mind as a kind of you know collection of static programs uh similar to the programs they would they would run on like mainframe computers and in fact they i think they very much understood the mind uh through the metaphor of the mainframe computer because that was the tool they they were working with right and so you had the static programs this collection of very different static programs operating over a database like memory and in this picture learning was not very important learning was considered to be just memorization and in fact learning is basically not featured in ai textbooks until the 1980s with the rise of machine learning it's kind of fun to think about that learning was the outcast like the the weird people were learning like the mainstream ai world was um i mean i don't know what the best term is but it's non-learning it was seen as like reasoning yes would not be learning based yes it was seen it was considered that the mind was a collection of programs that were primarily logical in nature and that's all you needed to do to create a mind was to write down these programs and they would operate over your knowledge which would be stored in some kind of database and as long as your database would encompass you know everything about the world and your logical rules were uh comprehensive then you would have in mind so the other view of the mind is the brain as a sort of blank slate right this is a very old idea you find it in john locke's writings this is the tabulata and this is this idea that the mind is some kind of like information sponge that starts empty it starts blank and that absorbs uh knowledge and skills from experience right so it's uh it's a sponge that reflects the complexity of the world the complexity of your life experience essentially that everything you know and everything you can do is a reflection of something you found in the outside world essentially so this is an idea that's very old uh that was not very popular for instance in the in the 1970s but that had gained a lot of vitality recently with the rise of connectionism in particular deep learning and so today deep learning is the dominant paradigm in ai and i feel like lots of ai researchers are conceptualizing the mind via a deep learning metaphor like they see the mind as a kind of randomly initialized neural network that starts blank when you're born and then that gets trained yeah exposure to training data that acquires knowledge and skills exposure to training data by the way it's a small tangent i feel like people who are thinking about intelligence are not conceptualizing it that way i actually haven't met too many people who believe that a neural network will be able to reason who seriously think that rigorously because i think it's actually interesting world view and and we'll talk about it more but it it's been impressive what the uh what neural networks have been able to accomplish and it's i to me i don't know you might disagree but it's an open question whether like like scaling size eventually might lead to incredible results to us mere humans will appear as if it's general i mean if you if you ask people who are seriously thinking about intelligence they will definitely not say that all you need to do is is like the mind is just in your network uh however it's actually you that's that's very popular i think in the deep learning community that many people are kind of uh conceptually you know intellectually lazy about it right but what i guess what i'm saying exactly right it's uh i i me i haven't met many people and i think it would be interesting uh to meet a person who is not intellectualized about this particular topic and still believes that neural networks will go all the way i think january is probably closest to that there are definitely people who argue that uh current deep learning techniques are already the way to general artificial intelligence and that all you need to do is to scale it up to all the available training data and that's if you look at the the waves that open ai's gpt stream model has made you see echoes of this idea so on that topic gpt-3 similar to gpt-2 actually have captivated some part of the imagination of the public there's just a bunch of hype of different kind that's i would say it's emergent it's not artificially manufactured it's just like people just get excited for some strange reason in in the case of gpt3 which is funny that there's i believe a couple months delay from release to hype maybe i'm not historically correct on that but it feels like there was a little bit of a lack of hype and then there's a phase shift into into hype but nevertheless there's a bunch of cool applications that seem to captivate the imagination of the public about what this language model that's trained in unsupervised way without any fine tuning is able to achieve so what do you make of that what are your thoughts about gbt3 yeah so i think what's interesting about gpg3 is the idea that it may be able to learn new tasks in after just being shown a few examples so i think if it's actually capable of doing that that's novel and that's very interesting and that's something we should investigate that said i must say i'm not entirely convinced that we have shown it's it's capable of doing that it's very likely given the amount of data that the model is trained on that what it's actually doing is pattern matching uh a new task you give it with the task that it's been exposed to in its training data it's just recognizing the task instead of just developing a model of the task right but there's a side to interrupt there's there's a parallels to what you said before which is it's possible to see gpt3 as like the prompts that's given as a kind of sql query into this thing that it's learned similar to what you said before which is language is used to query the memory yes so is it possible that neural network is a giant memorization thing but then if it gets sufficiently giant it'll memorize sufficiently large amounts of thing in the world where it becomes more intelligence becomes a querying machine i think it's possible that uh a significant chunk of intelligence is this giant associative memory uh i definitely don't believe that intelligence is just a giant issue of memory but it may well be a big component so do you think gpt 3 4 5 gpt 10 will eventually like what do you think where's the ceiling do you think you'll be able to reason um no that's a bad question uh like what is the ceiling is the better question how well is it going to scale how good is gptn going to be yeah so i believe gptn is going to chiptn is going to improve on the strength of gpt2 and 3 which is it will be able to generate you know ever more plausible text in context just monitoring the process performance um yes if you train if you're training bigger more on more data then your text will be increasingly more context aware and increasingly more plausible in the same way that gpd3 it is much better at generating clausable text compared to gpd2 but that said i don't think just getting up uh the model to more transformer layers and more train data is going to address the flaws lgbt3 which is that it can generate plausible text but that text is not constrained by anything else other than plausibility so in particular it's not constrained by factualness uh or even consistency which is why it's very easy to get gpt3 to generate statements that are factually untrue uh or to general statements that are even self-contradictory right uh because it's uh it's it's only goal is plausibility and it has no other constraints it's not constrained to be self-consistent for instance right and so for this reason one thing that i thought was very interesting with gpd3 is that you can present mind the answer it will give you by asking the question in specific way because it's very responsive to the way you ask the question since it has no understanding of the content of the question right and if you if you ask the same question in two different ways that are basically adversarially engineered to produce certain answers you will get two different answers to contractor answers it's very susceptible to adversarial attacks essentially potentially yes so in in general the problem with these models is generative models is that they are very good at generating plausible text but that's just that's just not enough right um you need uh i think one one avenue that would be very interesting to make progress is to make it possible to write programs over the latent space that these models operate on that you would rely on these self-supervised models to generate a sort of flag pool of knowledge and concepts and common sense and then you will be able to write explicit uh reasoning programs over it uh because the current problem with gpt stream is that you it's it can be quite difficult to get it to do what you want to do if you want to turn gpd3 into products you need to put constraints on it you need to um force it to obey certain rules so you need a way to program it explicitly yeah so if you look at its ability to do program synthesis it generates like you said something that's plausible yeah so if you if you try to make it generate programs it will perform well for any program that it has seen it in its training data but because uh program space is not interpretive right um it's not going to be able to generalize to problems it hasn't seen before now that's currently do you think sort of an absurd but i think useful um i guess intuition builder is uh you know the gpt-3 has 175 billion parameters a human brain has a hundred has about a thousand times that or or more in terms of number of synapses do you think obviously very different kinds of things but there is some degree of similarity do you think what do you think gpt will look like when it has a hundred trillion parameters you think our conversation might be so in nature different like because you've criticized gbt3 very effectively now do you think no i don't think so so the the to begin with the bottleneck with scaling upgrades gbt models uh alternative pre-trained transformer models is not going to be the size of the model or how long it takes to train it the bottleneck is going to be the trained data because openui is already training gpt3 on a crore of basically the entire web right and that's a lot of data so you could imagine training on more data than that like google could try on more data than that but it would still be only incrementally more data and i i don't recall exactly how much more data gpd3 was trained on compared to gpt2 but it's probably at least like 100 or maybe even a thousand x don't have the exact number uh you're not going to be able to train the model on 100 more data than with what you already with what you're already doing so that's that's brilliant so it's not you know it's easier to think of compute as a bottleneck and then arguing that we can remove that bottleneck but we can remove the compute bottleneck i don't think it's a big problem if you look at the at the base at which we've uh improved the efficiency of deep learning models in the past a few years i'm not worried about uh trying time bottlenecks or model size bottlenecks the the bottleneck in the case of these generative transformer models is absolutely the trained data what about the quality of the data so so yeah so the quality of the data is an interesting point the thing is if you're going to want to use these models in real products um then you you want to feed them data that's as high quality as factual i would say as unbiased as possible but you know there's there's not really such a thing as unbiased data in the first place but you probably don't want to to train it uh on reddit for instance it sounds sounds like a bad plan so from my personal experience working with a large scale deep learning models so at some point i was working on a model at google that's trained on extra 150 million labeled images it's image classification model that's a lot of images that's like probably most publicly available images on the web at the time and it was a very noisy data set because the labels were not originally annotated by hand by humans they were automatically derived from like tags on social media or just keywords in in the same page as the image was fun and so on so it was very noisy and it turned out that you could uh easily get a better model uh not just by training like if you train on more of the noisy data you get an incrementally better model but you you you very quickly hit diminishing returns on the other hand if you try on smaller data set with higher quality annotations quality that are annotations that are actually made by humans you get a better model and it also takes you know less time to train it uh yeah that's fascinating it's the self-supervised learnings there's a way to get better doing the automated labeling yeah so you can enrich or refine your labels in an automated way that's correct do you have a hope for um i don't know if you're familiar with the idea of a semantic web is this a semantic web just for people who are not familiar and is uh is the idea of being able to convert the internet or be able to attach like semantic meaning to the words on the internet this the sentences the paragraphs to be able to contr convert information on the internet or some fraction of the internet into something that's interpretable by machines that was kind of a dream for um i think the the semantic white papers in the 90s it's kind of the dream that you know the internet is full of rich exciting information even just looking at wikipedia we should be able to use that as data for machines and so information is not it's not really in a format that's available to machines so no i don't think the semantic web will ever work simply because it would be a lot of work right to make to provide that information in structured form and there is not really any incentive for anyone to provide that work uh so i think the the way forward to make the knowledge on the web available to machines is actually something closer to unsupervised deep learning yeah the gpg 3 is actually a bigger step in the direction of making the knowledge of the web available to machines than the semantic web was yeah perhaps in a human-centric sense it it feels like gpt-3 hasn't learned anything that could be used to reason but that might be just the early days yeah i think that's correct i think the forms of reasoning that you that you see it perform are basically just reproducing patterns that it has seen in string data so of course if you're trained on uh the entire web then you can produce an illusion of reasoning in many different situations but it will break down if it's presented with a novel uh situation that's the opening question between the illusion of reasoning and actual reasoning yes the power to adapt to something that is genuinely new because the thing is even imagine you had uh you could train on every bit of data ever generated in history of humanity uh it remains so that model would be capable of of anticipating uh many different possible situations but it remains that the future is going to be something different like for instance if you train a gpt stream model on on data from the year 2002 for instance and then use it today it's going to be missing many things it's going to be missing many common sense facts about the world it's even going to be missing vocabulary and so on yeah it's interesting that uh gbt3 even doesn't have i think any information about the coronavirus yes which is why you know uh a system that's uh you you tell that the system is intelligent when it's capable to adapt so intelligence is gonna require uh some amount of continuous learning but it's also gonna require some amount of improvisation like it's not enough to assume that what you're going to be asked to do is something that you've seen before or something that is a simple interpolation of things you've seen before yeah in fact that model breaks down for uh even even very tasks that look relatively simple from a distance like l5 self-driving for instance google had a paper couple of years back showing that something like 30 million different road situations were actually completely insufficient to train a driving model it wasn't even l2 right and that's a lot of data that's a lot more data than the the 20 or 30 hours of driving that a human needs to learn to drive given the knowledge they've already accumulated well let me ask you on that topic elon musk tesla autopilot one of the only companies i believe is really pushing for a learning based approach are you you're skeptical that that kind of network can achieve level four l4 is probably achievable l5 is probably not what's the distinction there this l5 is completely you can just fall asleep yeah alpha is basically human level well it will drive you have to be careful saying human level because like that's yeah most of the drivers yeah that's the clearest example of like you know cars will most likely be much safer than humans in situ in many situations where humans fail it's the vice versa so i'll tell you you know the thing is the the amounts of training data you would need to anticipate for pretty much every possible situation you'll encounter in the real world uh is such that it's not entirely unrealistic to think that at some point in the future we'll develop a system that's running on enough data especially uh provided that we can uh simulate a lot of that data we don't necessarily need actual uh actual cars on the road for everything but it's a massive effort and it turns out you can create a system that's much more adaptative that can generalize much better if you just add explicit models of the surroundings of the car and if you use deep learning for what it's good at which is to provide perceptive information so in general deep learning is is a way to encode perception and a way to encode intuition but it is not a good medium for any sort of explicit reasoning and uh in ai systems today uh strong generalization tends to come from um explicit models tend to come from abstractions in the human mind that are encoded in program form by a human engineer right yeah these are the abstractions you can actually generalize not the sort of weak abstraction that is learned by a neural network yeah and the question is how much how much reasoning how much strong abstractions are required to solve particular tasks like driving that's that's the question or human life existence how much how much strong obs abstractions does existence require but more specifically on driving that's that seems to be that seems to be a coupled question about intelligence is like uh how much intelligence like how do you build an intelligent system and uh the coupled problem how hard is this problem how much intelligence does this problem actually require so we're um we get to cheat right because we get to look at the problem like it's not like you get to close our eyes and completely new to driving we get to do what we do as human beings which is uh for the majority of our life before we ever learn quote unquote to drive we get to watch other cars and other people drive we get to be in cars we get to watch we get to get to see movies about cars we get to you know get to observe all this stuff and that's similar to what neural networks are doing it's getting a lot of data and the the the question is yeah how much is uh how many leaps of reasoning genius is required to be able to actually effectively drive i think it's an example of driving i mean sure you've seen a lot of cars in your life before you learn to drive but let's say you've learned to drive in silicon valley and now you rent a car in tokyo well now everyone is driving on the other side of the road and the signs are different and the roads are more narrow and so on so it's a very very different environment uh a smart human even an average human should be able to just zero shot it to just be operational in this in this very different environment yeah right away despite having add new contacts with the novel complexity that is contained in this environment right and that is another complexity is not just interpolation over the situations that you've encountered previously like learning to drive in the u.s right i would say the reason i ask this one of the most interesting tests of intelligence we have today actively which is driving in terms of having an impact on the world like when do you think we'll pass that test of intelligence so i i don't think driving is that much of a test institutions because again there is no task for which skid at that task demonstrates intelligence unless it's a kind of meta task that involves acquiring new skills so i don't think i think you can actually solve driving without having any any real amount of intelligence for instance if you really did have infinite trained data um you could just literally train an end-to-end deep learning model that's driving provided infinite training data the only problem with the whole idea is um collecting a data sets that's sufficiently comprehensive that covers the very long tail of possible situations you might encounter and it's really just a scale problem so i think the there's nothing fundamentally wrong uh uh with this plan with this idea it's just that um it strikes me as a fairly inefficient thing to do because you run into this uh this uh scanning issue with diminishing returns whereas if instead you took a more manual engineering approach where you use deep learning modules in combination with um engineering an explicit model of the surrounding of the cars and you and you bridge the two in a clever way your model will actually start generalizing much earlier and more effectively than the end-to-end depleting model so why would you not go with the more manually engineering oriented approach like even if you created that system either the end-to-end deep learning model system that's infinite data or the slightly more human system i i don't think achieving alpha would demonstrate uh general intelligence or intelligence of any generality at all again the only possible test of generality in ai would be a test that looks at skill acquisition over unknown tasks but for instance you could take your l5 driver and ask it to to learn to to pilot a a commercial airplane for instance and then you would look at how much human involvement is required and how much training data is required uh for the system to learn to pirate an airplane and that that gives you a measure of how intelligent that system is yeah well i mean that's a big leap i get you but i'm more interested as a problem i would see to me driving is a black box that can generate novel situations at some rate that what people call edge cases like so it does have newness that keeps being like we're confronted let's say once a month it is a very long time yes long term that doesn't mean you cannot solve it uh just by by training a statistical model a lot of data huge amount of data it's it's really a matter of scale but i guess what i'm saying is if you have a vehicle that achieves level five it is going to be able to deal with new situations or i mean the data is so large that the rate of new situations is very low yes that's not intelligent so if we go back to your kind of definition of intelligence it's the efficiency with which you can adapt to new situations to truly new situations not situations you've seen before right not situations that could be anticipated by your creators by the creators of the system but three new situations the efficiency with which you acquire new skills if you require if in order to pick up a new skill you require a very extensive training data sets of most possible situations that can that can occur in the practice of that skill then the system is not intelligent it is mostly just a lookup table yeah well likewise if uh in order to acquire a skill you need a human engineer to write down a bunch of rules that cover most or every possible situation likewise the system is is not intelligent the system is merely the output artifact of a process that that depends that happens in the minds of the engineers that are creating it right it is including uh an abstraction that's produced by the human mind and intelligence that would actually be the process of producing of autonomously producing this abstraction yeah not like if you take an abstraction you encode it on a piece of paper or in a computer program the abstraction itself is not intelligent what's intelligent is the the agent that's capable of producing these abstractions right yeah it feels like there's a little bit of a gray area like because you're basically saying that deep learning forms abstractions too but those abstractions do not seem to be effective for generalizing far outside of the things that's already seen but generalize a little bit yeah absolutely no depending does generalize a little bit like generalization is not it's not a binary it's mark a spectrum yeah and there's a certain point it's a gray area but there's a certain point where there's an impressive degree of generalization that happens no like i guess exactly what you were saying is uh intelligence is um how efficiently you're able to generalize far outside of the distribution of things you've seen already yes so it's both like the the distance of how far you can like how new how radically new something is and how efficiently yes absolutely so you you can think of uh intelligence as a measure of an information conversion ratio like imagine uh a space of possible situations and you've covered some of them so you have some amount of information about your space of possible situations that's provided by the situations you already know and that's on the other hand also provided by the prior knowledge that the system brings to the table the prior knowledge that's embedded in the system so the system starts with some information right about the problem but the task and it's about going from that information to a program what you would call a skill program a behavioral program that can cover a large area of possible situation space and essentially the ratio between that area and the amount of information you start with is intelligence so a very smart agent uh can make efficient uses of very little information about a new problem and very little prior knowledge as well to cover a very large area of potential situations in that problem without knowing what these future new situations are going to be so one of the other big things you talk about in in the paper we've talked about a little bit already but let's talk about it some more is uh actual tests of intelligence so if we look at like human and machine intelligence do you think tests of intelligence should be different for humans and machines or how we think about testing of intelligence are these fundamentally the same kind of intelligences that we're after and therefore the test should be similar so if your goal is to create ais that are more human-like then it will be super variable obviously to have a test that's that's universal at a price to both uh ais uh and humans so that you can you could establish a comparison uh between the two that you could tell exactly how uh intelligent in terms of human intelligence a given system is so that said the constraints that apply to artificial intelligence and to human intelligence are very different and your tests should account for this difference because if you look at artificial systems it's always possible for an experimenter to buy arbitrary levels of skill at arbitrary tasks either by injecting a hard-coded prior knowledge into the system via rules and so on that come from the human mind from the minds of the programmers and also buying uh higher levels of skill just by training on more data for instance you could generate an infinity of different goal games and you could train a good playing system that way but you could not directly compare it to human goal playing skills because a human that plays go had to develop that skill in a very constrained environment they had a limited amount of time their limited amount of energy and of course this started from a different set of priors to solids from uh um you know innate uh human priors um so i think if you want to compare the intelligence of two systems like the intentions of an ai and the intelligence of a human you have to um control for priors you have to start from the the same set of knowledge priors about the task and you have to control for for experience and that is to say for training data so prior what's priors so prior is whatever information you have about a given task before you start learning about this task and how's the difference from experience well experience is acquired right so for instance if you're if you're trying to play goal your experience with goal is all the goal games you've played or you've seen or you've simulated in your mind let's say and uh your priors are things like well go go is a game on on a 2d grid and we have lots of hard-coded priors about the organization of 2d space and the rules of how the the dynamics of the physics of this game in this 2d space yes and the idea that you have what winning is yes exactly so like and all other board games can also share some similarities with school and if you've played these board games then uh with respect to the game of go that would be part of your priors about the game well it's interesting to think about the game of goes how many priors are actually brought to the table when you look at uh self-play reinforcement learning based mechanisms that do learning it seems like the number of prizes pretty low yes but you're saying you should be exp there's a 2d special priority in the covenant right but you should be clear at making those priors explicit yes uh so in particular i think if your if your goal is to measure a human-like form of intelligence then you should clearly establish that you want the ai your testing to start from the same set of priors that humans start with right so i mean to me personally but i think to a lot of people the human side of things is very interesting so testing intelligence for humans what um what do you think is a good test of human intelligence well that's the question that psychometrics is is interested in what is there's an entire subfield of psychology that deals with this question so what's psychometrics the psychometrics is the sub-field of psychology that that tries to measure quantify aspects of the human mind so in particular community abilities intelligence and personality threats as well so uh like what are might be a weird question but what are like the first principles of the of psychometrics that operates on the you know what what are the priors it brings to the table so it's a filled with a with a fairly long history um it's so you know psychology sometimes gets a bad reputation for not having very reproducible uh results and some psychometrics as actually some fairly solidly or producible results so the ideal goals of the field is you know tests should be be reliable which is a an ocean type reproducibility it should be valid uh meaning that it should actually measure what you say but you say it measures um so for instance if you're if you're saying that you're measuring intelligence then your test results should be created with things that you expect to be correlated with intelligence like success in school or success in the workplace and so on should be standardized meaning that you can administer your tests to many different people in the same conditions and it should be free from bias meaning that for instance uh if you're if if your test involves uh the english language then you have to be aware that this creates a bias against people who have english as their second language or people who can't speak english at all so of course these these principles for creating psychometric tests are very much nighty old i don't think every psychometric test is is really either reliable valid or offer from bias but at least the field is aware of these weaknesses and is trying to address them so it's kind of interesting um ultimately you're only able to measure like you said previously the skill but you're trying to do a bunch of measures of different skills that correlate as you mentioned strongly with some general concept of cognitive ability yes yes so what's the g factor so right there are many different kinds of tests tests of intelligence and uh each of them is interested in in uh different aspects of intelligence you know some of them will deal with language some of them we deal with a special vision maybe mental rotations numbers and so on when you run these very different tests at scale what you start seeing is that there are clusters of correlations among test results so for instance if you look at uh homework at school um you will see that people who do well at math are also likely statistically to do well in physics and what's more uh there there also people do well at math and physics are also statistically likely to do well in things that sound completely unrelated like writing in english essay for instance and so when you see clusters of correlations uh in in statical statistical terms you would explain them with a latent variable and the latent variable that would for instance explain uh the relationship between being good at math and being good at physics would be cognitive ability right and the g factor is the the latent variable that explains uh the fact that every test of intelligence that you can come up with results on that on on this test end up being correlated so there is some a single uh a unique variable uh that that explains this correlations that's the g factor so it's a statistical construct it's not really something you can directly measure for instance in a person um but it's there but it's there it's there it's the art scale and that's also one thing i want to mention about psychometrics like you know when you talk about measuring intelligence in in humans for instance some people get a little bit worried they will say you know that sounds dangerous maybe that's not potentially discriminatory and so on and they're not wrong and the thing is so personally i'm not interested in psychometrics as a way to characterize one individual person like if if i get your psychometric personality assessment or your iq i don't think that actually tells me much about you as a person i think psychometrics is most useful as a statistical tool so it's most useful at scale it's most useful when you start getting test results for a large number of people and you start cross-correlating these test results because that gives you information about the structure of the human mind particularly about the structure of human cognitive abilities so at scale psychometrics paints a certain picture of the human mind and that's interesting and that's what's relevant to ai the structure of human currency abilities yeah it gives you an insight into it i mean to me i remember when i learned about g factor it seemed it it seemed like it would be impossible for it even it to be real even as a statistical variable like it felt uh kind of like astrology like it's like wishful thinking among psychologists but uh the more i learned i realized that there's some i mean i'm not sure what to make about human beings the fact that the jig factor is a thing that there's a commonality across all of human species is there destiny to be a strong correlation between cognitive abilities that's kind of fascinating yeah actually so human connectivities have uh a structure like the the most mainstream theory of the structure of cancer abilities it's called a chc theory it's a cattle horn carol it's name of the industry psychologist who contributed key pieces of it and it describes uh cognitive abilities as a hierarchy with three levels and at the top you have the g-factor then you have broad cognitive abilities for instance fluid intelligence right that that encompass um a broad set of possible kinds of tasks that are all related and then you have narrow cognitivity is at the last level which is uh closer to task specific skill and there are actually different theories of the structure of clinical abilities that just emerge from different statistical analysis of iq test results but they all describe a hierarchy with a kind of g factor at the top and you're right that the g factor is it's not quite real in the sense that it's not something you can observe and measure like your height for instance but it's really in the sense that you you see it in in a statistical analysis of the data right one thing i want to mention is that the fact that there is a g-factor does not really mean that human intelligence is a general in a strong sense does not mean human intentions can can be applied to any problem at all and that someone who has a high iq is going to be able to solve any problem at all that's not quite what it means i think um one one popular analogy to understand it is the sports analogy if you consider the concept of physical fitness it's a concept that's very similar to intelligence because it's a useful concept it's something you can intuitively understand some people are fit uh maybe like you some people are not as fit maybe like me um but none of us can fly absolutely it's so constrained even if you're very fit that doesn't mean you can do uh anything at all in any environment you you obviously cannot fly you cannot uh survive at the bottom of the ocean and so on and if you were a scientist say you want you wanted to precisely define and measure physical fitness in humans then you would come up with a battery uh of tests uh like you would you know have running android meter uh playing soccer playing table tennis swimming and so on and uh if you run these tests over many different people you will start seeing correlations and test results for instance people who are good at soccer are so good at sprinting right and you will explain these correlations with physical abilities that are strictly analogous to cognitive abilities right and then you would start also observing correlations between biological uh characteristics like maybe lung volume is correlated with being a a fast runner for instance uh in the same way that there are neurophysical uh correlates of cognitive abilities right and at the top of the hierarchy of physical abilities that you would be able to observe you would have a g-factor a physical g-factor which would map to physical fitness right and as you just said that doesn't mean that people with a with high physical fitness can fly doesn't mean uh human morphology and human physiology is universal it's actually super specialized we can only do the things and that we were evolved to do right like we are not appropriate to to to you you could not exist on venus or mars or in the void of space but on the ocean so that said one thing that's really striking and remarkable is that our morphology generalizes far beyond the environments that we evolved for like in a way you could say we evolved to run after prey in the seminar right that's very much where our human morphology comes from and that said we can we can do a lot of things that are that are completely unrelated to that we can climb mountains we can we can swim across lakes uh we can play a table tennis i mean table tennis is very different from what we were evolved to do right so our morphology our bodies or our sense of motor affordances are of a degree of generality that is absolutely remarkable right and i think cognition is very similar to that our cognitive abilities have a degree of generality that goes far beyond what the mind was initially supposed to do which is why we can you know play music and write novels and and go to mass and do all kinds of crazy things but it's not universal in the same way that human morphology and our body is not appropriate for actually most of the universe by volume in the same way you could say that the human mind is naturally appropriate for most of problem space potential problem space uh by volume so we have very strong cognitive biases actually that mean that there are certain types of problems that we handle very well and certain certain types of problem that we are completely adapted for so that's really how we interpret the g-factor it's not a sign of strong generality it's it's really just a broader the broadest cognitive ability but our abilities whether we are talking about sensory motor abilities or cognitive abilities they still they remain very specialized in the human condition right within the constraints of the human cognition they're general yes absolutely so but the constraints as you're saying are very limited what i think what's yeah limiting so we we evolved our cognition and our body evolved in in very specific environments because our environment was so viable fast changing and so unpredictable part of the constraints that that drove our evolution is generality itself so we were in a way evolved to to be able to improvise in all kinds of physical or cognitive environments right yeah um and for this reason it turns out that uh the the minds and bodies that we ended up with uh can be applied to much much broader scope than what they were evolved for right and that's truly remarkable and that goes that's the degree of generalization that is far beyond anything you can see in artificial systems today right um that's it it does not mean that that uh human intelligence is anywhere universal yes yeah it's not general you know it's a kind of exciting topic for people even you know outside of artificial intelligence iq tests there i think it's mensa whatever there's different degrees of difficulty for questions we talked about this offline a little bit too about sort of difficult questions you know what makes a question on an iq test more difficult or less difficult do you think so the the thing to keep in mind is that there's no such thing as a question that's intrinsically difficult it has to be difficult to respect to the things you already know and the things you can already do right so in in terms of an iq test question typically you would have it will be structured for instance as a set of demonstration input and output pairs right and then you would be given a test input a prompt and you you you would need to recognize or produce the corresponding output and in that narrow context you could say a difficult question is a question where um the input prompt is very surprising and unexpected given the the training examples just even the nature of the patterns that you're observing in the input problem for instance let's say you have a rotation problem you must rotate the shape by 90 degrees if i give you two examples and then i'll give you one one prompt which is actually one of the two training examples then there is zero generalization difficulty for the task it's actually triggered task you just recognize that it's one one of the training examples and you produce the same answer now if it's uh if it's a more complex shape there is you know a little bit more generalization but it remains that you are still doing the same thing at this time as you were being demonstrated at training time a difficult task starts to require some amount of uh test time adaptation some amount of improvisation right so uh consider i don't know you're teaching a class on like quantum physics or something um if uh if you wanted to kind of test the understanding that students have of the material you would come up with an exam that's very different from anything they've seen like on the internet when they were cramming uh on the other hand if you wanted to make it easy you would just give them something that's very similar to the the mock exams that that that they've taken something that's just a simple interpolation of questions that they've they've already seen and so that would be an easy exam it's very similar to what you've been trained on and a difficult exam is one that really probes your understanding because it forces you to improvise it forces you to do things uh that are different from what you were exposed to before so that said it doesn't mean that the exam that requires improvisation is intrinsically hard right because maybe you're you're a quantum physics expert so when you take the exam this is actually stuff that despite being you new to the students it's not new to you right so it can only be difficult with respect to what the test taker already knows and with respect to the information that the test taker has about the task so that's what i mean by controlling for priors what you the information you bring to the table and the exp and experience which is the training data so in in the case of the the quantum physics exam that would be uh all the the the course material itself and all the mock exams that students might have taken online yeah it's interesting because um i've also i i sent you an email and i asked you like i've been this just this curious question of um you know what's a really hard iq test question and i've been talking to also people who have designed iq tests there's a few folks on the internet it's like a thing people are really curious about it first of all most of the iq tests they designed they like religiously protect against the correct answers like you can't find the correct answers anywhere in fact the question is ruined once you know even like the approach you're supposed to take so they're very the approach is implicit in in the training examples so here it is the training examples it's over well which is why in arc for instance there is a test set that is private and no one has seen it no for really tough iq questions it's not obvious it's not because the ambiguity like it's uh and you have to look to them but like some number sequences and so on it's not completely clear so like you can get a sense but there's like some you know when you look at a number sequence i don't know uh like your fibonacci number sequence if you look at the first few numbers that sequence could be completed in a lot of different ways and you know some are if you think deeply or more correct than others like there's a kind of intuitive simplicity and elegance to the correct solution yes i am personally not a fan of ambiguity in in test questions actually but i think you can have difficulty uh without requiring ambiguity simply by making the test uh require a lot of extrapolation over the training examples but the beautiful question is difficult but gives away everything when you give the training example basically yes meaning that so the the tests i'm interested in in creating are not necessarily difficult uh for humans because uh human intelligence is the benchmark uh they're supposed to be difficult uh for machines in ways that are easy for humans like i think an ideal uh test of human and machine intelligence is a test that is uh actionable uh that highlights uh the need for progress and that highlights the direction in which you should be making progress i i think we'll talk about the arc challenge and the test you've constructed you have these elegant examples i think that highlight like this is really easy for us humans but it's really hard for machines but on the you know the designing an iq test for iqs of like a higher than 160 and so on you have to say you have to take that and put on steroids right you have to think like what is hard for humans and that's a fascinating exercise in in itself i think and it was an interesting question of what it takes to create a really hard question for humans because um you again have to do the same process as you mentioned which is uh you know something basically where the experience that you have likely to have encountered throughout your whole life even if you've prepared for iq tests which is a big challenge that this will still be novel for you yeah i mean novelty is a requirement you should not be able to practice for the questions that you're gonna be tested on that's important because otherwise what you're doing is not exhibiting intelligence what you're doing is just retrieving uh what you've been exposed before it's it's the same thing as deep learning model if you train a deep learning model on uh all the possible answers then it will ace your test in the same way that uh um you know uh as a stupid student uh can still ace the test if they cram for it they memorize you know 100 different possible mock exams and then they hope that the actual exam will be a very simple interpolation of the mock exams and that student could just be a deep learning model at that point but you can actually do that without any understanding of the material and in fact many students pass the exams in exactly this way and if you want to avoid that you need an exam that's unlike anything they've seen that really probes their understanding so how do we design an iq test for machines and intelligent tests for machines all right so in the paper i outline a number of requirements that you expect of such a test and in particular we should start by acknowledging the priors that we expect to be required in order to perform the test so we should be explicit about the priors right uh and if the goal is to compare machine intelligence and human intelligence then we should assume uh human cognitive bias right and secondly we should make sure that we are testing for skilled acquisition ability uh skill acquisition efficiency in particular and not for skill itself meaning that every task featured in your test should be novel and should not be something that you can anticipate so for instance it should not be possible to brute force the space of possible questions right to pre-generate every possible question and the answer so it should be tasks that cannot be anticipated not just by the system itself but by the creators of the system right yeah you know what's fascinating i mean one of my favorite aspects of the paper and the work you do with the arc challenge is the the process of making priors explicit just even that act alone is a really powerful one of like what are it's a it's a really powerful question ask of us humans what are the priors that we bring to the table so the the next step is like once you have those priors how do you use them to solve a novel task but like just even making the prize explicit is a really difficult and really powerful step and that's like visually beautiful and conceptually philosophically beautiful part of the work you did with uh and i guess continue to do uh probably with the with the paper and the arc challenge can you talk about some of the priors that we're talking about here yes so a researcher has done a lot of work on what exactly uh um are the knowledge priors that that are innate to humans is elizabeth spelkie from harvard so she developed the core knowledge uh theory which uh outlines four different uh core knowledge systems uh so systems of knowledge that we are basically either born with or that we are hardwired to acquire very early on in our development and there's no uh there's no strong um distinction between the two like if you are um primed to acquire as a certain type of knowledge uh in just a few weeks you might as well just be born with it it's just it's just part of who you are and so there are there are four different core knowledge systems like the first one is the notion of objectness and a basic physics like you recognize that um something that moves uh currently for instance is an object so we intuitively naturally innately divide the world into objects based on this notion of coherence physical currents and in terms of elementary physics there's the the fact that uh you know objects can bump against each other and the fact that they can occlude each other so these are things that we are essentially born with or at least that we are going to be acquiring extremely early because really hard wire to acquire them so a bunch of points pixels that move together on objects are partly the same object yes i mean i mean that like i don't i don't smoke weed but if i did that's something i could sit like all night and just like think about i remember right in your paper just object-ness i wasn't self-aware i guess of how that particular prior that that's such a fascinating prior that like and that's that's the most basic one but yes just identity just yeah object yes it's it's very basic i suppose but it's so fundamental is this phenomenal team and cognition yeah and uh the second prior that's also fundamental is agent-ness which is not a real world a real world but so agentness the fact that some of these objects uh that you that you segment your environment into some of these objects are agents so what's an agent it's uh basically it's an object that has goals um so that has what that has goals this this capable of person goals so for instance if you see two dots uh moving in in a roughly synchronized fashion you will intuitively infer that one of the dots is pursuing the other so that one of the dots is uh and and one of the dots is an agent and its goal is to avoid the other dot and one of the dots the other dot is also an asian and its goal is to catch the first start pelkey has shown that babies you know as young as three months identify uh agentness and goal directedness in their environment another prior is basic you know geometry and topology like the notion of distance the ability to navigate in your environment and so on this is something that is fundamentally hardwired into our brain it's in fact backed by very specific neural mechanisms like for instance grid cells and plate cells so it's it's something that's literally hard coded at the at the new level uh you know you know hypocampus and the last prior would be the notion of numbers like numbers are not actually cultural constructs we are intuitively innately able to do some basic counting and to compare quantities uh so it doesn't mean we can do arbitrary arithmetic uh uh counting the actual accounting scanning like counting one two three ish then maybe more than three uh you can also compare quantities if i give you uh uh three dots and five dots you can tell the the the side with five dots there's more dots uh so this is actually an innate uh prior um so that said the list may not be exhaustive uh so spelke is still you know pursuing the potential existence of new knowledge systems for instance uh knowledge systems that would deal with social uh relationships yeah yeah i mean which is which is much much less relevant uh uh to something like arc or iq testing right so there could be stuff that's uh like like you said rotation symmetry is really interesting it's very likely that there is uh speaking about rotation that there is uh in the brain a hard-coded system that is capable of performing rotations uh one one famous experiment uh that people did in the uh i don't remember who it was exactly but in the in the 70s was that people found that if you asked people if you give them uh two different shapes and one of the shapes is a rotated version of the first shape and you ask them is is that shape a related version of first step or not what you see is that the time it takes people to answer is linearly proportional right to the angle of rotation so it's almost like you have in somewhere in your brain like a turntable um with a fixed speed and if you want to know if two two objects uh uh are rotated version of each other you put the object on the turntable you let it move around a little bit and then you and then you stop when you have a match and and that that's really interesting so what's the arc challenge so in in the paper outline you know all these principles that a good test of machine intelligence and humanitarian should follow and the arc challenge is one attempt to embody as many of these principles as possible so i don't think it's it's anywhere near a perfect attempt right it does not actually follow every principle but it is what i was able to do given the given the constraints so the format of arc is very similar to classic iq tests in particular ravens progressive mattresses ravens yeah ravens privacy mattresses i mean if you've done like you test in the past you know where that is probably at least you've seen it even if you don't know what it's called and so um you have a set of uh tasks that's what they're called and for each task you have um training data which is a set of input and output pairs so i uh an input or output pair is a grid of colors basically the grid the size of the grades these variables is the size of the grid is variable and um you're given an input and you must transform it into the proper outputs right and so you're shown a few demonstrations of a task in the form of existing input output pairs and then you're given a new input and you must provide you must produce the correct output and the assumptions in arc is that every task should only require cool knowledge priors should not require any outside knowledge so for instance uh no language uh no english nothing like this uh new concepts uh taken from uh our human experience like trees dogs cats and so on so only uh tasks that are reasoning tasks that are built on top of a core knowledge priors and some of the tasks are um actually explicitly trying to probe uh specific forms of abstraction right uh part of the reason why i wanted to create arc is i'm a big believer in you know when you're faced with uh a problem as murky as understanding how to autonomously generate abstraction in a machine you have to co-evolve the solution and the problem and so part of the reason why i design act was to clarify my ideas about the nature of abstraction right and some of the tasks are actually designed to to probe uh bits of that theory and there are things that are turned out to be very easy for humans to perform including young kids right but turn out to be near impossible informations so whatever you learn from the nature of abstraction uh from from designing that like what can you clarify what you mean one of the things you wanted to try to understand was this uh idea of abstraction yes so clarifying uh my own ideas about abstraction by forcing myself to produce tasks that would require uh the ability to produce that form of abstraction in order to solve them got it okay so and by the way just uh i mean people should check out i'll probably overlay if you're watching the video part but the the grid input output with the different colors on the grid and that's it that's i mean it's a very simple world but it's kind of beautiful it's it's very similar to classic acutes like it's not very original in that sense the main difference with iq tests is that we make the priors explicit which is not usually the case in iq test so you make it explicit that everything should only be built out of core knowledge priors i also think it's generally more more diverse than iq tests in general and it's it perhaps requires a bit more manual work to produce solutions because you have to to click around on a grid for a while sometimes the grades can be as large as 30 by 30 cells so how did you come up um if you can reveal uh with the questions like what's the process of the questions was it mostly you yeah that came up with the questions what uh how difficult is it to come up with a question like is this um scalable to a much larger number if you think you know with iq tests you might not necessarily want to or need it to be scalable with machines it's possible you could argue that it needs to be scalable so there are a thousand questions a thousand tasks yes wow including the test and the private test set i think it's fairly difficult in the sense that a big requirement is that every task should be novel and unique and unpredictable right like you don't want to create your your own little world that is uh simple enough that it would be possible for a human to reverse and generate and write down an algorithm that could generate every possible arc task and their solution for instance that we completely invalidated the test so you're constantly coming up on new stuff you need yeah you need a source of novelty of unthinkable novelty and one thing i found is that as a human uh you are not a very good source of uh unthinkable novelty and so you have to pace the creation of these tasks quite a bit there are only so many unique tasks that you can do in a given day so that means coming up with truly original new ideas um did psychedelics help you at all i'm just gonna but i mean that's fascinating to think about like so you would be like walking or something like that are you constantly thinking of something totally new yes i mean this is hard this is yeah i i i mean i i'm not saying i've done anywhere near a perfect job at it uh there is some amount of redundancy and there are many imperfections in arc so that said you should you should consider arc as a work in progress it is not uh the definitive state uh where where the the arc tasks today are not definitive states of the test i want to keep refining it um in the future i also think it should be possible to open up the creation of tasks to a broad audience to do crowdsourcing um that would involve several levels of filtering obviously but i think it's possible to apply crowdsourcing to to develop a much bigger and much more diverse arc data set that would also be free of potentially you know some of my own personal biases but is there always need to be a part of arc that's the test like is hidden yes absolutely it is impressive that uh the test that you're using to actually benchmark algorithms is not accessible to the people developing these algorithms because otherwise what's going to happen is that the human engineers are just going to solve the tasks themselves and and encode their solution in program form but that again what you're seeing here is the process of intelligence happening in the mind of the human and and then you're just uh capturing its crystallized output but that crystallized output is not the same thing as the process generated that's right it's not intelligent in itself so what uh by the way the idea of crowdsourcing it is fascinating i think i think the creation of questions is really exciting for people i think i think there's a lot of really brilliant people out there that love to create these kinds of stuff yeah one thing that uh that kind of surprised me that i wasn't expecting is that lots of people seem to actually enjoy ark as a as a kind of game and i was really seeing it as as a test as a benchmark uh of uh a fluid uh general intelligence and lots of people just including kids just started you know enjoying it as a game so i think that's that's encouraging yeah i'm fascinated by there's a world of people who create iq questions i think i think that's a cool uh it's a cool activity for machines that for humans and people humans are themselves fascinated by taking the questions like you know measuring their own intelligence i mean that's just really compelling it's really interesting to me too it helps one of the cool things about arc you said it's kind of uh inspired by iq tests or whatever follows a similar process but because of its nature because of the context in which it lives it immediately forces you to think about the nature of intelligence as opposed to just a test of your own like it forces you to really think there's i don't know if it's if it's within the question inherent in the question or just the fact that it lives in the test that's supposed to be a test of machine intelligence absolutely as you as you solve arc tasks as a human you will uh be forced to basically introspect yeah higher how you come up with solutions and that forces you to reflect on uh the human problem solving process and the way your own mind uh generates uh abstract representations of the problems uh it's exposed to i i think it's due to the fact that the set of core knowledge priors that arc is built upon is so small it's all a recombination of a very very small set of assumptions okay so what's the future of ark so you you held arc as a challenge as part of like a kegel competition yes calgary competition and uh what do you think do you think that's something that continues for five years ten years like just continues growing yes absolutely so arc itself will keep evolving so i've talked about crowdsourcing i think that's a that's a good avenue another thing i'm starting is i'll be collaborating with folks from the psychology department at nyu to do human testing on arc and i think there are lots of interesting questions you can start asking especially as you start correlating machine solutions to arc tasks and and the human characteristics of solutions like for instance you can try to see if there's a relationship between the human perceived difficulty of a task and the machine person yes and and exactly some measure of machine perceived difficulties yeah it's a nice big playground in which to explore this very difference it's the same thing as we talked about the autonomous vehicles the things that could be difficult for humans might be very different than the things that yes absolutely and uh formalizing or making explicit that difference in difficulty will teach us something may teach us something fundamental about intelligence so one thing i think we did well uh with arc is that it's proving to be a very uh actionable test in the sense that uh machine performance and arcs started at very much zero initially while you know humans found actually the tasks very easy and that that alone was like a big red flashing light saying that something is going on and that we are missing something and at the same time uh machine performance did not stay at zero for very long actually within two weeks of the carol competition we started having a non-zero number and now the state of the art is around uh twenty percent of the test set uh solved um and so arc is actually a challenge where our capabilities start at zero which indicates the need for progress but it's also not an impossible change it's not accessible you can start making progress basically right away at the same time we are still very far from having solved it and that's actually a very positive outcome of the competition is that the competition has has proven that there was no obvious shortcut to solve these tasks right yeah so the test held up yeah exactly that was the primary reason to do the cargo competition is to check if some some you know clever person was going to hack the benchmark and that did not happen right like people who are solving the tasks are essentially doing it uh uh well in a way they're they're they're actually exploiting some flaws of art that we will need to address in the future especially they're essentially anticipating what sort of uh tasks may be contained in the test sets right right um which is kind of yeah that's the kind of hacking it's it's human hacking of the town yes that that said you know uh uh with the state of the art it's like uh 20 percent we're still very very far uh from even level which is closer to and so and i i do believe that you know it will it will take a while uh until we reach a human parity on ark and that by the time we have human party we will have ai systems that are probably pretty close to human level in terms of general fluid intelligence which is i mean it's they're not going to be necessarily human-like they're not necessarily uh you would not necessarily recognize them as you know being an egi but they would be capable of a degree of generalization that matches the generalization performed by human food intelligence sure i mean this is a good point in terms of general flu intelligence to mention in your paper you describe different kinds of generalizations uh local broad extreme and there's kind of a hierarchy that you form so when we say generalizations what are we talking about what kinds are there right so uh generalization is is very old idea i mean it's even older than machine learning in the context of machine learning you say a system generalizes if it can uh make sense of an input it has it has not yet seen and that's what i would call a system-centric uh generalization you is generalization with respect to novelty uh for the specific system you're considering so i think a good test of intelligence should actually uh deal with uh developer aware generalization which is slightly stronger than system-centric transition so developer generalization developer aware generalization would be the ability to generalize to novelty or uncertainty that not only the system itself has not accessed to but the developer of the system could not have access to either that that's a fascinating that's a fascinating meta definition so like the system is uh it's basically the edge case thing we're talking about with autonomous vehicles yes neither the developer nor the system know about the edge cases so it's up to they get the system should be able to generalize the thing that that uh nobody expected neither the designer of the training data nor obviously the contents of the training that's a fascinating definition so you can see generalization degrees of generalization as a spectrum and the lowest level is what machine learning is trying to do is the assumption that any new situation is going to be sampled from a static distribution of possible situations and that you already have a representative sample of that distribution that's your training data and so in machine learning you generalize to a new sample from a known distribution and the ways in which your new sample will be new or different are ways that are already understood by the developers of the system so you are generalizing to known unknowns for one specific task that's what you would call robustness you are robust to things like noise small variations and so on um for one a fixed known distribution that that you know through your training data and a higher degree would be flexibility in machine intelligence so flexibility would be something like an l5 cell driving car or maybe a robot that can you know pass the the coffee cup test which is the notion that you would be given a random kitchen uh somewhere in the country and you would have to you know go make a cup of coffee in that kitchen right so flexibility would be the ability to deal with unknown unknowns so things that could not uh dimensions of viability that could not have been possibly foreseen by the creators of the system within one specific task so generalizing to the long tail of situations in self-driving for instance would be flexibility so you have robustness flexibility and finally you would have extreme generalization which is basically flexibility but instead of just considering one specific domain like driving or domestic robotics you're considering an open-ended range of possible domains so a robot would be capable of extreme generalization if let's say it's designed and trained uh to to for cooking for instance um and if i if i buy the robot and if i'm able uh if it's able uh to teach itself gardening in in a couple weeks it would be capable of extreme generalization for instance so the ultimate goal is extreme generalization yes so be uh creating a system that is so general that it could essentially achieve a human skill parity over arbitrary tasks and arbitrary domains with the same level of you know improvisation and adaptation power as humans when when it encounters new situations and it would do so uh over basically the same range of possible domains and tasks uh as humans and using this essentially the same amount of training experience of practice as humans would require that will be human level extreme generalization so i i don't actually think humans are anywhere near the uh optimal intelligence bound if there is such a thing so i think for humans or in general in general i think it's quite likely you know that there is an a hard limit to how intelligent any system can be but at the same time i don't think humans are anywhere near that limit yeah last time i think we talked i think you had this idea that uh we're only as intelligent as the problems we face sort of uh yes we are upper bounded by the problem so in a way yes we are we are bounded by on our environments and we are bounded by the problems we try to solve yeah yeah what do you make of neuralink and uh outsourcing some of the brain power like bring computer interfaces do you think we can expand our augment our intelligence i am fairly skeptical of neural interfaces because they are trying to fix one specific bottleneck in in human machine cognition which is the bandwidth bottleneck input and output of information in the brain and my perception of the problem is that bandwidth is not at this time a bottleneck at all meaning that we already have sensors that enable us to to take in far more information than what we can actually process well to push back on that a little bit uh to sort of play dell's advocate a little bit is if you look at the internet wikipedia let's say wikipedia i would say that humans after the advent of wikipedia are much more intelligent yes i think that's a good one but that's also not about that's about um externalizing our intelligence via uh uh information processing systems the accidental function processing system which is very different from uh brain computer interfaces right but the question is whether if we have direct access if our brain has direct access to wikipedia with our brain already has direct access to wikipedia it's on your phone and you have your hands and your eyes and your ears and so on uh to access that information and the speed at which you can access it is bottlenecked by the customer i think it's already closed fairly close to optimal which is why speed reading for instance does not work yeah the faster you read the less you understand but maybe it's because it uses the eyes so maybe um so i don't believe so i think you know the brain is very slow um it typically operates you know the fastest things that happen in the brain at the level of 50 milliseconds uh forming a conscious out can potentially take entire seconds right and you can already read pretty fast so i think the speed at which you can take information in and even the speed at which you can add with information can only be very incrementally improved maybe i think if you're a very fast typer if you're a very trained typer the speed at which you can express your thoughts is already a speed at which you can form your thoughts right so that's kind of an idea that there are fundamental bottlenecks to the human mind but it's possible that the everything we have in the human mind is just to be able to survive in the environment and there's a lot more to expand maybe you know you said this the speed of the thought so yeah i i think augmenting human intelligence is a very valid and very powerful avenue right and that's what computers are about in fact that's what you know all of culture and civilization is about they are culture is externalized cognition and we rely on culture to think constantly yeah yeah i mean that's that's another yeah that's not just not just computers not just phones and the internet i mean all of culture like language for instance is a form of external recognition books are obviously external recognition yeah that's right and you you can scale that external exclamation you know far beyond the capability of the human brain and you could see you know civ civilization itself is um it has capabilities that are far beyond any individual brain and will keep scaling it because it's not rebound by individual brains it's a different kind of system yeah and and that system includes non-human non-humans first of all includes all the other biological systems which are probably contributing to the overall intelligence of the organism and then yeah computers are part of it no non-human systems probably not contributing much but ais are definitely contributing to that like google search for instance big part of it yeah yeah a huge part a part we can probably introspect like how the world has changed in the past 20 years it's probably very difficult for us to be able to understand until of course whoever created the simulation we're in is probably doing metrics measuring the progress there was probably a big spike in performance uh they're enjoying they're enjoying this so what are your thoughts on um the touring test and the lobner prize which is the you know one of the most famous attempts at the test of human intelligence uh sorry of artificial intelligence by uh doing a natural language open dialogue test that's test that's uh judged by humans as far as how well the machine did so i'm not a fan of the chewing test itself or any of its variants for two reasons so first of all it's um it's really copping out of trying to define and measure intelligence because it's entirely outsourcing that to a panel of human judges and these human judges they may not themselves have any proper methodology they may not themselves have any proper definition of intelligence they may not be reliable so the joint is already failing one of the core psychometrics principles which is reliability because you have biased human judges uh it's also violating the the standardization requirement and the freedom from bias requirement and so it's really a coop out because you are outsourcing everything that matters which is precisely describing intelligence and finding a standalone test uh um to measure it you're outsourcing everything to uh to people so it's really cool and by the way uh we should keep in mind that uh when turing proposed uh the imitation game it was not meaning for the imitation game to be an actual uh goal for the field of ai an actual test of intelligence he was using uh it was using the imitation game as a thought experiment in a philosophical discussion in his uh 1950 paper he was trying to argue that theoretically it should be possible for something very much like the human mind indistinguishable from the human mind to be encoded in ensuring machine and at the time that was that was you know um a very daring idea it was stretching credibility but uh nowadays i think it's it's fairly well accepted that the the mind is an information processing system and that you could probably encode it into a computer so another reason why i'm not a fan of this type of test is that it the incentives that it creates are incentives that are not conducive to proper scientific research if your goal is to trick to convince a panel of human judges that they're talking to a human then you have an incentive to rely on on tricks and press the digitization in the same way that let's say you're doing physics and you want to solve teleportation and what if the test that you set out to pass is you need to convince a panel of judges that teleportation took place and and they're just sitting there and watching what you're doing and that is something that you can achieve with you know david copperfield could could achieve it in his in his show at vegas right but is it and what he's doing is very elaborate but it's not actually it's not physics it's not making any progress you know understanding of the universe right to push back on that it's possible that's the hope with these kinds of subjective evaluations is that it's easier to solve it generally than it is to come up with tricks that convince a large number of judgments that's the whole in practice when it turns out that it's very easy to deceive people in the same way that you know you can you can do magic in vegas you can actually very easily convince people that they're talking to human when they're actually talking to liberalism i just disagree i disagree with that i think it's easy i i would i would push it's not easy it's uh uh it's doable it's very easy because i wouldn't say it's very easy though we are biased like we have theory of mind we are constantly projecting emotions intentions yes uh uh agentness agentness is one of our core innate priors right we are projecting these things on everything around us like if you if you paint a smiley on a rock the rock becomes happy you know eyes and because we have this extreme bias that permeates everything everything we see around this it's actually pretty easy to trick people like this it is very very short i so totally disagree with that you brilliantly put there's a huge it's a the anthropomorphization that we naturally do the agentness of that word is that real word but no it's not a real word i like it but it's exactly why it's useful well it's a useful word let's make it real it's a huge help but i still think it's really difficult to convince uh if you do like the alexa prize formulation where you know you talk for an hour like there's formulations of the test you can create where it's very difficult so i like i like the extra price better because it's more pragmatic it's more practical it's actually incentivizing developers to create something that's useful yeah as a as as a a human machine interface uh so that's slightly better than just the imitation so i like your your your ideas like a test which hopefully help us in creating intelligent systems as a result like if you create a system that passes it it'll be useful for creating further intelligence systems yes at least yeah i mean i'm just to kind of comment i'm a little bit surprised how little inspiration people draw from the touring test today you know the media and the popular press might write about it every once in a while the philosophers might talk about it but like most engineers are not really inspired by it and i know i know you don't like the touring test but uh we'll have this argument another time you know i there's something inspiring about it i think that as as a philosophical device in a physical discussion i think there is something very interesting about it i don't think it is in practical terms i don't think it's it's conducive to to progress and one of the reasons why is that you know i think being very human-like being indistinguishable from a human is actually the very last step in the creation of machine intelligence that the first ais that will show strong generalization uh uh in in uh that that will actually uh implement human like broad cognitive abilities they will not actually be able to look anything uh like humans human likeness is the very last step in that process and so a good test is a test that points you towards the first step uh on the ladder not towards the top of the ladder right yeah so to push back on that so i guess i usually agree with you on most things i remember you i think at some point tweeting something about the turing test not being being counterproductive or something like that and i think a lot of very smart people agree with that i uh uh a uh you know uh computation speaking not very smart person uh disagree with that because i think there's some magic to the interactivity interactivity with other humans so to push to play devil's advocate on your statement it's possible that in order to demonstrate the the generalization abilities of a system you have to show your ability in conversation show your ability to adjust adapt to the conversation through not just like as a standalone system but through the process of like the interaction like game theoretic where you're you really are changing the environment by your actions so in the arc challenge for example you're an observer you can't you can't scare the test into into changing you can't talk to the test you can't play with it so there's some aspect of that interactivity that becomes highly subjective but it feels like it could be conducive to yeah generalization you make a great point the interactivity is a very good setting to force the system to show adaptation to shoot generalization uh that that said you at the same time uh it's not something very scalable because you rely on human judges it's not something reliable because the images may not may not so you don't like human judges basically yes and i think so i i love the idea of interactivity um i initially wanted an arc test that had some amount of interactivity where your score on a task would not be one or zero if you can solve it or not but would be the number of attempts that you can make before you hit the right solution which means that now you can start applying the scientific method as you solve our tasks that you can start formulating hypothesis and and and probing the system to see whether the hypothesis is the observation will match the buddhists or not it would be amazing if you could also even higher level than that measure the quality of your attempts which of course is impossible but again that's gets subjective yes like how good was your thinking like it's yeah how efficient was so one thing that's interesting about this notion of scoring you as how many attempts you need is that you can start producing tasks that are way more ambiguous right right because you can with the problem with the with the different attempts you can actually probe that ambiguity right right so that's in a sense which yeah so it's how good can you adapt to the uncertainty and reduce the uncertainty yes it's half fast with is the efficiency with which to reduce uncertainty in in program space exactly very difficult to come up with that kind of test though yeah so uh i would love to be able to create something like this in practice it would be it would be very very difficult but yes but uh i mean what you're doing what you've done with the arc challenge is is uh brilliant i'm also not i'm surprised that it's not more popular but i think it's picking up what does it snitch it does yeah what are your thoughts about another test that talks with marcus hutter he has the harder prize for compression of human knowledge and the idea is really sort of quantify like reduce the test of intelligence purely to just ability to compress what's your thoughts about this intelligence that's compression i mean it's a it's a very uh fun test because it's it's such a simple idea like you're given wikipedia basically english wikipedia and you must compress it and so it stems from the idea that cognition is compression that the brain is basically a compression algorithm this is a very old idea it's a very i think striking and beautiful idea i used to believe it uh i eventually had to realize that it was it was very much a flawed idea so i no longer believe that compression is recognition is compression so but i can tell you what's the difference so it's very easy to believe that cognition and compression are the same thing because uh so jeff hawkins for instance says that cognition is prediction and of course prediction is basically the same thing as compression right it's just including the temporal axis and it's very easy to believe this because compression is something that we do all the time very naturally we are constantly you know compressing information we are um uh constantly trying we have this bias towards simplicity we we're constantly trying to organize things in our mind and around us to be more regular right so uh it's it's a beautiful idea it's very easy to believe uh there is a big difference between uh what we do with our brains and and compression so compression is actually kind of a tool in the human cognitive toolkits that is is used in many ways but it's just a tool it is not it is a tool for cognition it is not cognition itself and the big fundamental difference is that cognition is about being able to operate in future situations that include fundamental uncertainty and novelty so for instance consider a child at age 10 and so they have 10 years of life experience they've gotten you know pain pleasure rewards and and punishment in a period of time if you were to generate the shortest behavioral program that would have basically run that child over this 10 years in an optimal way right the shortest optimal behavioral program given the experience of that child so far well that program that that compress program this is what you would get if the mind of the child was a compression algorithm essentially um would be utterly enable inappropriate to process the next 70 years in the in the life of the child so in the models with we build of the world we are not trying to make them actually optimally compressed we are we are using compression as a tool to promote simplicity and efficiency in our models but they are not perfectly compressed because they need to include things that are seemingly useless today that have seemingly been useless so far but that may turn out to be useful in the future because you just don't know the future unless that's the fundamental principle uh that cognition that intelligence arises from is that you need to be able to run appropriate behavioral programs except you have absolutely no idea what sort of context environment situation are going to be running in and you have to deal with that with that uncertainty with that future normality so an analogy an analogy that you can make is with investing for instance um if i look at the past uh uh you know 20 years of stock market data and i use a compression algorithm to figure out the best trading strategy it's going to be you know you buy apple stock then maybe the past few years you buy tesla stock or something but is that strategy still going to be true for the next 20 years well actually probably not which is why if you're a smart investor you're not you're not just going to be following the strategy that corresponds to compression of the past uh you're going to be following uh uh you're going to have a balanced portfolio yeah right because you just don't know what's going to run i mean i guess in that same sense the compression is analogous to what you talked about which is like local or robust generalization versus extreme generalization it's much closer to that side of being able to generalize in in a local sense that's why you know as humans as uh when we are when we are children um in our education so a lot of it is driven by place even by curiosity uh we we are not efficiently compressing things we're actually exploring we are um retaining all kinds of uh uh things from our environment that that seem to be completely useless because they might turn out to be eventually useful right and it's it's that's what cognition is really about and that what makes it antagonistic to compression is that it is about hedging for future uncertainty and that's efficient into compression yes especially hedging so uh cognition leverages compression as a tool to promote uh uh to promote efficiency right and so in that sense in our models it's like einstein said make it simpler but not however that quote goes but not too simple so you want to compression simplifies things but you don't want to make it too simple yes so a good model of the world is going to include all kinds of things that are completely useless actually just because just in case yes because you need diversity in the same way that in your portfolio you need all kinds of stocks that that may not have performed well so far but you need diversity and the reason you need diversity because fundamentally you don't know what you're doing and the same is true of the human mind is that it needs to to behave appropriately in the future and it has no idea what the future is going to be like it's a bit it's not going to be like the past so compressing the past is not appropriate because the past is not uh it's not proactive in the future yeah history repeats itself but not perfectly i don't think i asked you last time the most inappropriately absurd question we've talked a lot about intelligence but you know the bigger question from intelligence is of meaning you know intelligence systems are kind of goal-oriented there's throws optimizing for goal if you look at the harder prize actually i mean there's always there's always a clean formulation of a goal but the natural questions for us humans since we don't know our objective function is what is the meaning of it all so the absurd question is what francois chole do you think is the meaning of life what's the meaning of life yeah that's a that's a big question um and i think i can i can you know give you my answer at least one of my answers and so you know the one thing that's uh very important uh in understanding who we are is that everything that makes up uh that makes up ourselves it makes up we are even even your most personal thoughts is not actually your own right like even your most personal thoughts are expressed in words that you did not invent and are built on concepts and images that you did not invent we are very much uh cultural beings right well we are made of culture we are not that what makes us different from animals for instance right so we are everything about ourselves is an echo of the past an echo of people who lived uh before us right that's who we are and in the same way if we manage to contribute something to the collective edifice of culture a new idea maybe a beautiful piece of music a work of art a grand theory a new word maybe um that something is is going to become a part of the minds of future humans essentially forever so everything we do creates ripples right that propagates into the future and i i and that that's in a way this is this is our path to immortality is that as we contribute things to culture culture in turn in turn becomes future humans and we keep influencing people you know thousands of years from now so our actions today create reports and these reports i think basically sum up the meaning of life like in the same way that we are the the sum um of the interactions between many different reports that came from our past we are ourselves creating reports that will propagate into the future and that's why you know we should be this seems like perhaps anything to say but we should be kind to others during our time on earth because every act of kindness creates reports and and in reverse every act of violence also creates reports and you want you want to carefully choose which kind of reports you want to create and you want to propagate into the future and in your case first of all beautifully put but in your case creating ripples into the future human and future agi systems yes it's fascinating all success i don't think there's a better way to end it francois as always for second time and i'm sure many times in the future it's been a huge honor you know one of the most brilliant people in the machine learning computer science science world again that's a huge honor thanks for talking today it's been a pleasure thanks a lot for having me i really appreciate it thanks for listening to this conversation with friend squash chole and thank you to our sponsors babble master class and cash app click the sponsor links in the description to get a discount and to support this podcast if you enjoy this thing subscribe on youtube review five stars on apple podcast follow on spotify support on patreon or connect with me on twitter at lex friedman and now let me leave you with some words from rene descartes in 1668 an except of which francois includes in is on the measure of intelligence paper if there were machines which bore a resemblance to our bodies and imitated our actions as closely as possible for all practical purposes we should still have two very certain means of recognizing that they were not real men the first is that they could never use words or put together signs as we do in order to declare our thoughts to others for we can certainly conceive of a machine so constructed that it utters words and even utters words that correspond to bodily actions causing a change in its organs but it is not conceivable that such a machine should produce different arrangements of words so as to give it an appropriately meaningful answer to whatever is said in his presence as the dullest of men can do here descartes is anticipating the turing test and the argument still continues to this day secondly he continues even though some machines might do some things as well as we do them or perhaps even better they would inevitably fail in others which would reveal that they're acting not from understanding but only from the disposition of their organs this is incredible quote for whereas reason is a universal instrument which can be used in all kinds of situations these organs need some particular action hence it is for all practical purposes impossible for machine to have enough different organs to make it act in all the contingencies of life in the way in which our reason makes us act that's the debate between mimicry memorization versus understanding so thank you for listening and hope to see you next time you