File TXT tidak ditemukan.
File TXT tidak ditemukan.
Transcript
aGBLRlLe7X8 • Oriol Vinyals: Deep Learning and Artificial General Intelligence | Lex Fridman Podcast #306
/home/itcorpmy/itcorp.my.id/harry/yt_channel/out/lexfridman/.shards/text-0001.zst#text/0653_aGBLRlLe7X8.txt
Kind: captions Language: en at which point is the neural network a being versus a tool the following is a conversation with arielle vinialis his second time in the podcast arielle is the research director and deep learning lead at deepmind and one of the most brilliant thinkers and researchers in the history of artificial intelligence this is the lex friedman podcast to support it please check out our sponsors in the description and now dear friends here's arielle vinnie alice you are one of the most brilliant researchers in the history of ai working across all kinds of modalities probably the one common theme is it's always sequences of data uh so that we're talking about languages images even biology and uh games as we talked about last time so you're a good person to ask this in your lifetime will we be able to build an ai system that's able to replace me as the interviewer in this conversation in terms of ability to ask questions that are compelling to somebody listening and then further question is are we close will we be able to build a system that replaces you as the interviewee in order to create a compelling conversation how far away are we do you think it's a good question um i think partly i would say do we want that i i really like when we start now with very powerful models interacting with them and thinking of them more closer to us the question is if you remove the human side of the conversation is that an interesting you know is that an interesting artifact and i would say probably not i've seen for instance um last time we spoke like was we were talking about starcraft um and creating you know agents that play games involves self-play but ultimately what people care about was how does this agent behave when the opposite side is is a human so without a doubt we will probably be more empowered by ai um maybe you can source some questions from an ai system i mean that even today i would say it's quite plausible that with your creativity you might actually find very interesting questions that you can filter we call this cherry picking sometimes in the field of language um and likewise if i had now the tools on my side i could say look you're asking this interesting question from this answer i like the words chosen by this particular system that created a few words completely replacing it feels not exactly exciting to me um although in my lifetime i think way i mean given the trajectory i think it's possible that perhaps there could be interesting um maybe self-play interviews as you you're suggesting that would look look or sound kind of quite interesting and probably would advocate or you could learn a topic through listening to one of these interviews at a basic level at least so you said it doesn't seem exciting to you but what if exciting is part of the objective function the thing is optimized over so you can there's probably a huge amount of data of humans if you look correctly of humans communicating online and there's probably ways to measure the degree of you know as they talk about engagement so you can probably optimize the question that's most created an engaging conversation in the past so actually if you strictly use the word exciting there is probably a way to create a optimally exciting conversations that are involved ai systems at least one side is ai yeah that makes sense i think maybe looping back a bit to to games and the game industry when you design algorithms um you're thinking about winning as the objective right or the reward function but in fact when we discuss this with blizzard the creators of starcraft in this case i think what's exciting fun um if you could measure that and optimize for that that's probably why we play video games or why we interact or listen or look at cat videos or whatever on the internet so it's true that modeling reward beyond the obvious reward functions we've used to in reinforcement learning is definitely very exciting and again there is some progress actually into um a particular aspect of ai which is quite critical which is um for instance is a conversation that or is the information truthful right so you could start trying to evaluate um these from except from the internet right that has lots of information and then if you can learn a function automated ideally so you can also optimize it more easily then you could actually have conversations that optimize for non-obvious things such as excitement so yeah that's quite possible and then i would say in that case it would definitely be fun a fun exercise and quite unique to have at least one site that is fully driven by an excitement reward function um but obviously there would be still quite a lot of humanity in the system both from who who is building the system of course and also ultimately if we think of labeling for excitement that those labels must come from us because it's just hard to have a computational measure of excitement as far as i understand there's no such thing you mentioned truth also i would actually venture to say that excitement is easier to label than truth or is perhaps uh has lower consequences of failure but there is perhaps the humanness that you mentioned that's perhaps part of a thing that could be labeled and that could mean an ai system that's doing dialogue that's doing conversations should be flawed for example like that's the thing you optimize for which is uh have inherent contradictions by design have flaws by design maybe it also needs to have a strong sense of identity so it has a backstory it told itself that it sticks to it has memories not in terms of the how the system is designed but it's able to tell stories about its past it's able to have um mortality and fear of mortality in the following way that it has an identity and like if it says something stupid and gets cancelled on twitter that's the end of that system so it's not like you get to rebrand yourself that system is that's it so maybe that the the high-stakes nature of it because like you can't say anything stupid now or oil because uh you'll be canceled on twitter and that there's there's stakes to that and that i think part of the reason that makes it uh interesting and then you have a perspective like you've built up over time that you stick with and then people can disagree with you so holding that perspective strongly holding sort of a maybe a controversial at least a strong opinion all of those elements it feels like they can be learned because it feels like there's a lot of data on the internet of people having an opinion and then combine that with a metric of excitement you can start to create something that as opposed to trying to optimize for uh sort of grammatical clarity and truthfulness the the factual consistency over many sentences you're optimized for the humanness and there's obviously data for humanness on the internet so i wonder i wonder if there's a future where that's part or i mean i i i sometimes wonder that about myself i'm a huge fan of podcasts and i listen to poc some podcasts and i think like what is interesting about this what is compelling uh the same way you watch other games like you said watch play starcraft or have magnus carlsen play chess so i'm not a chess player so but it's still interesting to me and what is that that's the uh the stakes of it maybe um the end of a domination of a series of wins i don't know there's all those elements somehow connect to a compelling conversation and i wonder how hard is that to replace because ultimately all of that connects the initial proposition of how to test whether an ai is intelligent or not with the turing test which i guess my question comes from a place of the spirit of that test yes um i actually recall i was just listening to our first podcast where we discussed turing tests um so i would say from a neural network you know ai builder perspective um there's you know usually you try to map many of these interesting topics you discuss to to benchmarks and then also to actual architectures on the how these systems are currently built how they learn what data they learn from what are they learning right we're talking about weights of a mathematical function and then looking at the current state of the game maybe what do we need leaps forward to get to the ultimate stage of all these experiences um lifetime experience of fears like words that currently barely we're we're seeing um progress just because what's happening today is you take all these human interactions um it's a large bust of variety of human interactions online and then you're distilling these sequences right going back to my passion like sequences of words letters um images sound there's more modalities here to be to be at play and then you're trying to just learn a function that will be happy that maximizes the the likelihood of seeing all these um through a neural network um now i think there's a few places where the way currently we train these models would clearly like to be able to develop the kinds of capabilities you save i'll tell you maybe a couple one is the lifetime of an agent or a model so you you learn from this data offline right so you're just passively observing and maximizing this you know it's almost like a mountains like a landscape of mountains and then everywhere there's data that humans interacted in this way you're trying to make that higher and then you know lower where there's no data and then these models generally don't then experience themselves these they just are observers right they're passive observers of the data and then we're putting them to then generate data when we interact with them but that's very limiting the experience they actually experience um when they could maybe be optimizing or further optimizing the weights we're not even doing that so to be clear and again mapping to alphago alpha star we train the model and when we deploy it um to play against humans or in this case interact with humans um like language models they don't even keep training right they're not learning in the sense of the weights that you've learned from the data they don't keep changing now there's something a bit more feels magical but it's understandable if you're into neural nets which is well they might not learn in the strict sense of the words the way it's changing maybe that's mapping to how neurons interconnect and how we learn over our lifetime but it's true that the context of the conversation that they they that takes takes place with when you talk to these systems it's held in their working memory right it's almost like um you start a computer it has a hard drive that has a lot of information you have access to the internet which has probably all the information but there's also a working memory where the these agents as we call them or start calling them build upon now this memory is very limited um i mean right now we're talking to be concrete about 2 000 words that we hold and then beyond that we start forgetting what we've seen so you can see that there's some short-term coherence already right with when you said i mean it's a very interesting topic um having sort of a mapping um an agent to like have consistency then you know if if you say oh what's your name um it could remember that but then it might forget beyond 2000 words which is not that long of context if we think even of these podcast um books are much longer so technically speaking there's a limitation there super exciting from people that work on deep learning to be working on but i would say we lack maybe benchmarks and the technology to have this lifetime like experience of memory that keeps building up um however the way it learns offline is clearly very powerful right so i you know you asked me three years ago i would say oh we're very far i think we've seen the power of this imitation again on the internet scale that has enabled this um to feel like at least the knowledge the basic knowledge about the world now is incorporated into the weights but then this experience is lacking and in fact as i said we don't even train them when you know when we're talking to them other than their working memory of course is affected so that's the dynamic part but they don't learn in the same way that you and i have learned right when from basically when we were born and probably before so lots of fascinating interesting questions you asked there i think um the one i mentioned is this idea of memory and experience versus just kind of observe the world and learn its knowledge which i think for that i would argue lots of recent advancements that make me very excited about the field and then the second maybe issue that i see is all these models we train them from scratch that's something i would have complained three years ago or six years ago or 10 years ago and it feels if we take inspiration from how we got here how the universe evolved us and we keep evolving it feels that is a missing piece that we should not be training models from scratch um every few months that there should be some sort of way in which we can grow models um much like as a species and many other elements in the universe is building from the previous sort of iterations and that from uh just purely neural network perspective even though we we would like to make it work it's proven very hard to not you know throw away the previous weights right this landscape we learn from the data and you know refresh it with a brand new set of weights um given maybe a recent snapshot of these data sets we train on etc or even a new game we're learning so that's that feels like something is missing fundamentally we might find it but it's not very clear how it will look like there's many ideas and it's super exciting as well yes just for people who don't know when you approach a new problem in machine learning you're going to come up with an architecture that has a a bunch of weights and then you initialize them somehow which in most cases is some version of random so that's what you mean by starting from scratch and it seems like it's a it's a waste every time you solve uh the game of go in chess starcraft uh protein folding like surely there's some way to reuse the weights as we grow this giant database of of of neural networks that have solved some of the toughest problems in the world and so some of that is um what is that methods how to reuse weights how to learn extract what's generalizable or at least has a chance to be and throw away the other stuff uh and maybe the neural network itself should be able to tell you that like what um yeah how do you what ideas do you have for better initialization of weights maybe stepping back if we look at the field of machine learning but especially deep learning right at the core of deep learning there's this beautiful idea that is a single algorithm can solve any task right so it's been proven over and over with more increasing set of benchmarks and things that were thought impossible that are being cracked by this basic principle that is you take a neural network of uninitialized ways so like a blank computational brain um then you give it in the case of supervised learning a lot ideally of examples of hey here is what the input looks like and the desired output should look like this i mean image classification is very clear example images to maybe one of a thousand categories that's what imagenet is like but many many if not all problems can be mapped this way and then there's a generic recipe right that you can use um and this recipe with very little change and i think that's the core of deep learning research right that what is the recipe that is universal that for any new given task i'll be able to use without thinking without having to work very hard on the problem at stake we have not found this recipe but i think the field is excited to find um less tweaks or tricks that people find when they work on important problems specific to those and more of a general algorithm right so at an algorithmic level i would say we have something general already which is this formula of training a very powerful model a neural network on a lot of data and in many cases you need some specificity to the actual problem you're solving um protein folding being such an important problem has some basic recipe that is learned from beyond before right like transformer models graph neural networks um ideas coming from nlp like uh you know something called birth that is a kind of loss that you can in place to help the model uh knowledge distillation is another technique right so this is the formula we still had to find some particular things that were specific to alpha fault right that's very important because protein folding is such a high value problem that as humans we should solve it no matter if we need to be a bit specific and it's possible that some of these learnings will apply then to the next iteration of this recipe that deep learners are about but it is true that so far the recipe is what's common but the weights you generally throw away which feels very sad um although maybe in the last especially in the last two three years and when we last spoke i mentioned this area of metal learning which is the idea of learning to learn that idea and some progress has been had starting i would say mostly from gpt3 on the language domain only in which you could conceive a model that is trained once and then this model is not narrow in that it only knows how to translate a pair of languages or it only knows how to assign sentiment to a sentence these these actually you could teach it by a prompting is called and this prompting is essentially just showing it a few more examples um almost like you do show examples input output examples algorithmically speaking to the process of creating this model but now you're doing it through language which is very natural way for us to learn from one another i tell you hey you should do this new task i'll tell you a bit more maybe you asked me some questions and now you know the task right you didn't need to retrain it from scratch and we've seen these magical moments almost um in this way to do fuchsia prompting through language on language only domain and then in the last two years we've seen these expanded to beyond language adding vision adding actions and games lots of progress to be had but this is maybe if you ask me like about how are we going to crack this problem this is perhaps one way in which you have a single model the problem of this model is it's hard to grow in weights or capacity but the model is certainly so powerful that you can teach it some tasks right in this way that i teach you i could teach you a new task now if we were oh let's a text a text-based task or a classification a vision style task but it still feels like more breakthroughs should be hot but it's a great beginning right we have a good baseline we have an idea that this maybe is the way we want to benchmark progress towards agi and i think in my view that's critical to always have a way to benchmark the community sort of converging to this overall which is good to see and then this is actually what excites me in terms of also next steps um for deep learning is how to make these models more powerful how do you train them how to grow them if they must grow should they change their weights as you teach it the task or not there's some interesting questions many to be answered yeah you've opened the door about to a bunch of questions i want to ask but let's first return to the uh to your tweet and read it like a shakespeare you wrote gato is not the end it's the beginning and then he wrote meow and then an emoji of a cat uh so first two questions first can you explain the meow and the cat emoji and second can you explain what gatto is and how it works right indeed i mean thanks thanks for reminding me that we're all exposing on twitter and permanently there yes permanently one of the greatest ai researchers of all time meow and cat emoji yes there you go right so can you imagine like touring uh tweeting meow and cat probably he would probably would probably so yeah the tweet is important actually um you know i put thought on the tweets i hope people which part you think okay so there's three sentences gato is not the end gato is the beginning meow cat emoji okay which is the important part the meow no no definitely um that it is the beginning i mean i i probably was just explaining um a bit where the field is going but um let me tell you about gato so first the name gato comes from maybe a sequence of releases that deepmind had that named uh like used animal names to name some of their models that are based on this idea of large sequence models initially their only language but we're expanding to other modalities so we had a you know we had gopher chinchilla these were language only and then more recently we released flamingo which adds vision to the equation and then gato which adds vision and then also actions in the mix right um as we discuss actually actions um especially discrete actions like up down left right i just told you the actions but they're words so you can kind of see how actions naturally map to sequence modeling of words which these models are very powerful so gato was named after i believe i can only from memory right this you know these things always happen with an amazing team of researchers behind so before the release yeah um we had a discussion about which animal would we pick right and i think because of the word general agent right and and this is a property quite unique to gato um we we kind of were playing with the ga words and then you know gato and rice of cat yes um and gato is obviously a spanish version of cat i had nothing to do with it although i'm from spain wait sorry how do you say cat in spanish gato oh god okay yeah no okay okay i see i see i see you now it all makes sense okay how do you say meow in spanish no that's i think you you say it the same way but you write it uh is m-i-a-u okay it's universal yeah all right so then how does the thing work so you said general is so you said uh language vision and action action how does this can you explain what kind of neural networks are involved what does the training look like maybe um what you are some beautiful ideas within the system yeah so maybe the basics of gato are not that dissimilar from many many work that comes so here is where the the sort of the recipe i mean hasn't changed too much there is a transformer model that's just the kind of recurrent neural network that essentially takes a sequence of modalities observations that could be words could be vision or could be actions and then its own objective that you train it to do when you train it is to predict what the next anything is and anything means what's the next action if this sequence that i'm showing you to train is a sequence of actions and observations then you're predicting what's the next action and the next observation right so you you think of of this really as a sequence of bytes right so take any sequence um of words a sequence of interleaved words and images a sequence of um maybe um observations that are images and moves in atari up down left right and these you just think of them as bytes and you're modeling what's the next byte gonna be like and you might interpret that as an action as an action and then play it in a game or you could interpret it as a word and then write it down if you're chatting with the system and so on um so gato basically can be but can be thought as inputs images text video actions it also actually inputs some sort of proprioception sensors from robotics because robotics is one of the tasks that it's been trained to do and then at the output similarly it outputs words actions it does not output images um that's just by design we decided not to go that way for now um that's also in part why it's the beginning because there's more to do clearly but that's kind of what the gato is is this brain that essentially you give it any sequence of these observations and and modalities and it outputs the next step and then you off you go you fit the next the next step into and predict the next one and so on now it is more than a language model because even though you can chat with gato like you can chat with chinchilla or flamingo um it also is an agent right so that's why we call it a of gato like the the word uh the letter a and also it's general um it's not an agent that's been trained to be good at only starcraft or only atari or only go it's been trained on a vast variety of data sets so what makes an agent if i may interrupt the fact that it can generate actions yes so when we call it i mean it's a it's a good question right what why when do we call a model i mean everything is a model but what is an agent in my view is indeed the capacity to take actions in an environment that you then send to it and then the environment might return with a new observation um and then you generate the next action this this actually this reminds me of the question from the side of biology what is life which is actually a very difficult question as well what is living what is living when you think about life here on this planet earth and a question interesting to me about aliens what is life when we visit another planet would we be able to recognize it and this feels like it sounds perhaps silly but i don't think it is at which point is the neural network a being versus a tool and it feels like action ability to modify its environment as that fundamental leap yeah i think it's it certainly feels like action is a necessary condition to to be more alive but probably not sufficient either um yeah so sadly consciousness thing whatever yeah yeah we can get back to that later but anyways going back to the meow and the legato right so um one of the leaps forward and what took the team a lot of effort and time was um as you were asking how has gato been trained so i told you gato is this transformer neural network models actions um sequences of actions words etc and then the way we train it is by essentially pulling data sets of um observations right so it's a massive imitation learning algorithm that it it imitates obviously to what is the next word that comes next from the usual data sets we used before right so these these are these web scale style data sets of people um writing you know on on webs or chatting or whatnot right so that's an obvious source that we use on all language work but then we also took a lot of agents that we have at deepmind i mean as you know deepmind we're quite um you know we're quite interested in learning um reinforcement learning and learning agents that play in different environments so we kind of created a data set of these trajectories as we call them or asian experiences so in a way there are other agents we train for a single mind purpose to let's say um you know control a 3d game environment and navigate a maze so we had all the experience that was created through the one agent interacting with that environment and we added this to the data set right and as i said we just see all the data all these sequences of words or sequences of this agent interacting with that environment or you know agents playing atari and so on we see this as the same kind of data and so we mix these data sets together and we train gato that's the g part right it's general because it really has mixed it it doesn't have different brains for each modality or each narrow task it has a single brain it's not that big of a brain compared to most of the neural networks we see these days it has one billion parameters some models we're seeing get in the trillions these days and certainly 100 billion feels like um a size that is very common from from when you train this this job so the actual agent is relatively small but it's been trained on on a very challenging diverse data set not only containing all of internet but containing all these asian experience playing very different distinct environments so this brings us to the part of the tweet of this is not the end is the beginning it it feels very cool to see gato in principle is able to control any sort of environments um that especially the ones that it's been trained to do these 3d games atari games and all sorts of robotics tasks and so on um but obviously it's not as proficient as the teachers it learned from on these environments not obvious it's not obvious that it wouldn't be more proficient it's just the current beginning part right is that the performance is such that it's not as good as if it's specialized to that task right so it's not as good although i would argue size matters here so the fact that i would argue always size always matter yeah that's a different color but but for neural networks certainly size does matter so um it's the beginning because it's relatively small so obviously scaling this idea up um might make the connections that exist between you know text on the internet and playing atari and so on more synergistic with one another and you might gain and that moment we didn't quite see but obviously that's why it's the beginning that synergy might emerge with scale right my emerge with scale and also i believe there's some new research or ways in which you prepare the data um that might you might need to sort of make it more clear to the model that you're not only playing atari and it's just you start from a screen and here is up and a screen and down maybe you can think of playing atari as there's some sort of context that is needed for the agent before it starts seeing oh i'm this is an entire screen i'm going to start playing um you might require for instance to to be told in words hey this is the in this in this sequence that i'm showing you're going to be playing an entire game so text might actually be a good driver to enhance the data right so then these connections might be made more easily right that's that's an idea that we start seeing in language but you know obviously beyond this is going to be effective right it's not like i don't show you a screen and and you from from scratch you you're supposed to learn a game there is a lot of context we might set so there are there might be some work needed as well to set that context um but anyways there's a lot of work yeah so that context puts all the different modalities on the same level ground exactly provide the context best so maybe on that point uh so there's this task which may not seem trivial of tokenizing the data of converting the data into pieces into basic atomic elements that then could uh cross modalities somehow so what's tokenization how do you tokenize text how do you tokenize images how do you tokenize games and actions and robotics tasks yeah that's a great question so tokenization is the entry point to actually make all the data look like a sequence because tokens then are just kind of these little puzzle pieces we break down anything into these puzzle pieces and then we just model what's the what's this puzzle look like right when you make it you know lay down in a line so to speak in a sequence so in gato um the text there's a lot of work you tokenize text usually by looking at common commonly used sub strings right so there's you know ing in english is a very common substring so that becomes a token um there's quite well studied problem on tokenizing text text and gato just used the standard techniques that have been developed from many years even starting from engram models in the 1950s and so on just for context how many tokens like what order magnitude number of tokens is required for a word yeah actually what are we talking about yeah for a word in in english right i mean every language is very different um the current level or granularity of tokenization generally means is maybe two to five i mean i i don't know the statistics exactly but to give you an idea um we don't tokenize at the level of letters then it would probably be like i don't know what the average length of of a word is in english but that would be you know the the minimum set of tokens you could use was bigger than letters smaller than words yes yes and you could think of very very common words like v i mean that would be a single token but very quickly you you're talking two three four four tokens have you ever tried to tokenize emojis emojis are actually just um sequences of letters so maybe to you but to me they mean so much more yeah you can render the emoji but you you might if you actually just yeah this is a philosophical question is emojis an image or a text the way we do these things these things is they're actually mapped to seek small sequences of characters yeah so you can actually play with these models and input emojis it will output emojis back um which is actually quite a fun exercise you probably can find other tweets about this um out there um but yeah so anyways tex there's like it's very clear how this is done and then in gato what we did for images is we map images to essentially we compressed images so to speak into something that looks more like um less like every pixel with every intensity that would mean we have a very long sequence right like if we were talking about 100 by 100 pixel images that would make the sequences far too long so what was done there is you just use a technique that essentially compresses an image into maybe 16 by 16 patches of pixels and then that is map again tokenize you just essentially quantize this space into a special word that actually maps to these little sequence of pixels and then you put the pixels together in some raster order and then that's how you get out um or in or in the image that your your your process but there's no semantic aspect to that so you're doing some kind of you don't need to understand anything about the image in order to tokenize it currently no you you're only using this notion of compression so you're trying to find common it's like jpg or all these algorithms it's actually very similar at the tokenization level all we're doing is finding common patterns and then making sure in a lossy way we compress these images given the statistics of the images that are contained in all the data we deal with although you could probably argue that jpeg does have some understanding of images like uh because visual information maybe color compressing based crudely based on color does capture some something important about an image that's about its meaning not just about some statistics yeah i mean jp as i said is very the algorithms look actually very similar to they use this the the cosine transform in jpg um the the approach we usually do in machine learning when we deal with images and we do this quantization step is a bit more data driven so rather than have some sort of fourier basis for how you know frequencies appear in natural in the natural world we actually just use the statistics of the images and then quantize them based on the statistics much like you do in words right so common subscript sub strings are allocated a token um and images is very similar but there's no connection the token space if you think of oh like the tokens are an integer and in the end of the day so now like we work on maybe we have about let's say i don't know the exact numbers but let's say 10 000 tokens for text right certainly more than characters because we have groups of characters and so on so from one to ten thousand those are representing all the language and the words we'll see and then images occupy the next set of integers so they're completely independent right so from ten thousand one to twenty thousand those are the tokens that represent these other modality images and that is an interesting aspect that makes it orthogonal so what connects these concepts is the data right once you have a data set for instance that captions images that tells you oh this is someone playing a frisbee on on a green field now the model will need to predict the tokens from the text green field to then the pixels and that will start making the connections between the tokens so these connections happen as the algorithm learns and then the last if we think of these integers the first few are words the next few are images in gato we also allocated the the highest order of integers to actions right which we discretize and actions are very diverse right in atari there's i don't know if 17 discrete actions in robotics um actions might be torques and forces that we apply so we just use kind of similar ideas to compress these actions into tokens and then we just that's how we map now all the space to this sequence of integers but they occupy different space and what connects them is then the learning algorithm that's where the magic happens so the modalities are orthogonal to each other in token space right so in the input everything you add you add extra tokens right and then you're shoving all of that into one place yes the transformer and that transformer that transformer tries to look at this gigantic token space and tries to form some kind of representation some kind of unique um wisdom about all of these different modalities how's that possible are they do if you were to sort of like put your psychoanalysis hat on and try to psychoanalyze this neural network is it schizophrenic does it try to given this very few weights represent multiple disjoint things and somehow have them not interfere with each other or is this about building on the um on the joint strength on whatever is common to all the different modalities like what if you were to ask questions is it schizophrenic or is it uh does it is it of one mind i mean it is it is one mind um and it's actually the very the simplest algorithm which um that's kind of in a way how it feels like the field hasn't changed since back propagation and gradient descent was purpose for learning neural networks so there is obviously details on the architecture this has evolved the current iteration is still the transformer which is a powerful sequence modeling architecture but then the goal of this you know setting these weights to predict the data is essentially the same as basically i could describe i mean we described a few years ago alpha star language modeling and so on right we we take let's say an atari game um we map it to a string of numbers that will all be probably image space and action space interleaved and all we're gonna do is say okay given the numbers you know ten thousand one ten thousand four ten thousand five the next number that comes is twenty thousand six which is in the action space and you're just optimizing these weights be a very simple gradient like you know mathematical is almost the most boring algorithm you could imagine we settle the weights so that given this particular instance these weights are set to maximize the probability of having seen this particular sequence of integers for this particular game and then the algorithm does this for many many many iterations um looking at different modalities different games right that's the mixture of the data set we discuss so in a way it's a very simple algorithm and the weights right they're all shared right so in terms of is it focusing on one modality or not the intermediate weights that are converting from these input of integers to the target integer you're predicting next those weights certainly are common and then the way the tokenization happens there is there is a special place in the neural network which is we map this integer like number 1001 to a vector of real numbers like real numbers um we can optimize them with gradient descent right the the functions we learn are actually um surprisingly differentiable that's why we compute gradients so this this step is the only one that this orthogonality you mentioned applies so mapping a certain token for text or image or actions this each of these tokens gets its own little vector of real numbers that represents this if you look at the field back many years ago people were talking about word vectors or word embeddings these are the same we have word vectors or embeddings we have image vector or embeddings and action vector of embeddings and the beauty here is that as you train this model if you visualize these little vectors um it might be that they start aligning even though they're independent parameters there there could be anything but then it might be that you take the word gato or cat which maybe is common enough that actually has its own token and then you take pixels that have a cat and you might start seeing that these vectors look like they align right so by learning from this vast amount of data the model is realizing the potential connections between these modalities now i will say there would be another way at least in part to not have these different vectors for each different modality for instance when i tell you about actions in certain space i'm defining actions by words right so you could imagine a world in which i'm not learning that the action app in atari is its own number the action app in atari maybe is literally the word or the sentence app in atari right and that would mean we now leverage much more from the language this is not what we did here but certainly it might make these connections much easier to learn and also to teach the model to correct its own actions and so on right so all these to to say that gato is indeed the beginning that it is it is a radical idea to do this this way but there's probably a lot more to be done and the results to be more impressive not only through scale but also through some new research that will come hopefully in the years to come so just to elaborate quickly you mean one possible next step or one of the paths that you might take next is doing the tokenization fundamentally as a kind of uh linguistic communication so like you convert even images into language so doing something like a crude semantic segmentation trying to just assign a bunch of words to an image that like have almost like a dumb entity explaining as much as you can about the the image and so you convert that into words and then you convert games into words and and you provide the context and words and all of it and eventually getting to a point where everybody agrees with noam chomsky that language is actually at the core of everything that's it's the base layer of intelligence and consciousness and all that kind of stuff okay uh you mentioned early on like psy it's hard to grow what did you mean by that because we're talking about scale might change uh there might be and we'll talk about this too like there's a emergent there's certain things about these neural networks that are emerging so certain like performance we can see only with scale and there's some kind of threshold of scale so it why is it hard to grow something like this meow network so the meow network is is not it's not hard to grow if you retrain it yeah what's hard is well we have now one billion parameters um we train them for a while we we spend some amount of work towards building these these weights that are an amazing initial brain for doing this kind of tasks we care about could we reuse the weights and expand to a larger brain and that is extraordinarily hard but also exciting from a research perspective and a practical perspective point of view right so there's this notion of modularity in software engineering and we're starting to see some examples and work that leverages modularity in fact if we go back one step from gato to a work that i would say train much larger much more capable network called flamingo flamingo did not deal with actions but it definitely dealt with images in in a in an interesting way kind of akin to what agato did but slightly different technique for tokenizing but we don't need to go into that detail but what flamingo also did which gato didn't do and that just happens because these projects you know they're they're they're different you know it's a bit of like the exploratory nature of research which is great the research behind these projects is also modular yes exactly um and it has to be right we need we need to have creativity um and sometimes you need to protect pockets of you know people researchers and so on but we believe in humans yes okay and also in particular researchers and maybe even further you know deep mine or or other such labs and then they act the neural networks themselves so it's modularity all the way down okay all the way down so the way that we did modularity very beautifully in flamingo is we took chinchilla which is a language only model not an agent if we think of actions being necessary for agency so we took chinchilla we took the weights of chinchilla and then we froze them we said these don't change we train them to be very good at predicting the next word is a very good language model state of the art at the time you release it etc etc going to add a capability to c right we are going to add the ability to see to this language model so we're going to attach um small pieces of neural networks at the right places in the model it's almost like injecting the network with some weights and some substructures in the ways in a good way right so you need the research to say what is effective how do you add this capability without destroying others etc so we created a small sub network initialized not from random but actually from um self-supervised learning that you know a model that understands vision um in general and then we took data sets that connect the two modalities vision and language and then we froze the main part the largest portion of the network which was chinchilla that is 70 billion parameters and then we added a few more parameters on top train from scratch and then some others that were pre-trained from like from with the capacity to see like it was a it was not tokenization in the way i described forgato but it's a similar idea and then we train the whole system parts of it were frozen parts of it were new and all of a sudden we developed flamingo which is an amazing model that is essentially i mean describing it is a chat bot where you can also upload images and start conversing about images um but it's also kind of a dialogue style um uh chatbot so the input is images and text and the output is text exactly um and how many parameters you said 70 billion 70 billion for chinchilla yeah chinchilla is 70 billion and then the ones we add on top which kind of almost is almost like um a way to overwrite its its little activations so that when it sees vision it does kind of a correct computation of what it's seeing mapping it back towards so to speak um that adds an extra 10 billion parameters right so it's total 80 billion the largest one we released and then you train it on a few data sets that contain vision and language and once you interact with the model you start seeing that you can upload an image and start sort of having a dialogue about the image um which is actually not something it's it's very similar and akin to what we saw in language only this prompting abilities that it has you can teach it a new a new vision task right it does things beyond the capabilities that in theory the data sets um provided in themselves but because it leverages a lot of the language knowledge acquired from chinchilla it actually has this few shot learning ability and these emerging abilities that we didn't even measure once we were developing the model but once developed then as you play with the interface you can start seeing wow okay yeah it's cool we can we can upload i think one of the tweets talking about twitter was this image from obama that is placing a weight and and someone is kind of waiting themselves and and it's kind of a joke style image and it's notable because i think andriy carpati a few years ago said no computer vision system can can understand the subtlety of this joke in this image all the things that go on and so what we try to do and it's very anecdotally i mean this is not a proof that we solved this issue but it just shows that you can upload now this image and start conversing with the model trying to make out if it if it gets that there's a joke um because the person waiting themselves don't see that doesn't see that someone behind is making the weight higher and so on and so forth so it's a fascinating capability um and it comes from this key idea of modularity where we took a frozen brain and we just added a new capability so the question is should we so in a way you can see even from deepmind we have flamingo that this this moderate approach um and thus could leverage the scale a bit more reasonably because we didn't need to retrain a system from scratch and the other on the other hand we had gato which used the same data sets but then it trained it from scratch right and so i guess big question for the community is should we train from scratch or should we embrace modularity and this lies like this goes back to modularity as a way to grow but reuse seems like natural and it was very effective certainly the next question is if you go the way of modularity is there a systematic way of freezing weights and joining different modalities across you know not just two or three or four networks but hundreds of networks from all different kinds of places maybe open source network that looks at weather patterns and you shove that in somehow and then you have networks that uh i don't know do all kinds of to play starcraft and play all the other video games and they you can keep adding them in without significant effort like that maybe the effort scales linearly or something like that as opposed to like the more network you add the more you have to worry about the instabilities created yeah so that that vision is beautiful i think um there's still the question about within single modalities like chinchilla was reused but now if we train a next iteration of language models are we going to use chinchilla or not yeah how do you swap out chinch right so there's there's still big questions but that idea is is actually really akin to software engineering which we're not re-implementing you know libraries from scratch we're reusing and then building ever more amazing things including neural networks with software that we're using so i think this idea of modularity i like it i think it's here to stay and that's also why i mentioned it's just the beginning not the end you mentioned metal learning so given this promise of gatto can we try to redefine this term that's almost akin to consciousness because it means different things to different people throughout the history of artificial intelligence but what do you think meta-learning is and looks like now in the five years 10 years will it look like system i gotta but scaled what's your sense of what is what what does meta learning look like do you think great with all the wisdom we've learned so far yeah great great question maybe it's good to give another data point looking backwards rather than forward so when when we talk um in 2019 uh meta learning meant something that has changed mostly through the revolution of gpt3 and beyond so what meta-learning meant at the time um was driven by what benchmarks people care about in metal learning and the benchmarks were about a capability to learn about object identities so it was very much over fitted to vision and object classification and the part that was met about that was that oh we're not just learning a thousand categories that imagenet tells us to learn we're gonna learn object categories that can be defined when we interact with the model so it's interesting to see the evolution right the way the way this started was we have a special language that was a data set a small data set that we prompted the model with saying hey here is a new classification task i'll give you one image and the name which was an integer at the time of the image and a different image and so on so you have a small prompt in the form of a data set a machine learning data set and then you got then a system that could then predict or classify these objects that you just defined kind of on the fly so fast forward it was revealed that language models are future learners that's the title of the paper so very good title sometimes titles are really good so this one is really really good because that's that's the point of gpt3 that showed that look sure we can we can focus on object classification and how what meta learning means within the space of learning object categories this goes beyond or before rather to also omniglot before imagenet and so on so there's a few benchmarks to now all of a sudden we're a bit unlocked from benchmarks and through language we can define tasks right so we're literally telling the model some logical task or little thing that we wanted to do we prompted much like we did before but now we prompt it through natural language and then not perfectly i mean these models have failure modes and that's fine but no but these models then are now doing a new task right so they met to learn um these new capabilities now now that's where we are now uh flamingo expanded this to visual and language but it basically has the same abilities you can teach it for instance an emergent property was that you can take pictures of numbers and then do do arithmetic with the numbers just by teaching it oh that's i mean when when i show you three plus six you know i want you to output nine and and you show it a few examples and now it does that so it went way beyond the oh this image net sort of category categorization of images that we were a bit stuck maybe before um this revelation moment that happened uh in 2000 i believe it was 19 but it was after we chat and that way it has solved metal learning as was previously defined yes it expanded what it meant so that's what you say what does it mean so it's an evolving term um but here is maybe now looking forward looking at what's happening um you know obviously in the community with more modalities um what we can expect and i would certainly hope to see the following and this is a pretty drastic hope but in five years maybe we chat again and we have a system right a set of weights that we can teach it to play starcraft maybe not at the level of alpha star but play starcraft a complex game we teach it through interactions to prompting you can certainly prompt a system that's what gato shows to play some simple atari games so imagine if you start talking to a system teaching it a new game showing it examples of you know in this in this particular game this user did something good maybe the system can even play and ask you questions say hey i played this game i just played this game did i do well can you teach me more so five maybe to ten years these capabilities or what meta learning means will be much more interactive much more rich and through domains that we were specializing right so you see the difference right we built alpha star specialized to play starcraft the algorithms were general but the weights were specialized and what what we're hoping is that we can teach a network to play games to play any game just using games as an example through interacting with it teaching it uploading the wikipedia page of starcraft like this is in the horizon and obviously their details need to be to be filled and research need to be done but that's how i see metal learning above which is gonna be beyond prompting it's gonna be a bit more interactive it's gonna you know the system might tell us to give it feedback after it maybe makes mistakes or it loses a game um but it's nonetheless very exciting because if you think about this this way the benchmarks are already there we just repurpose them the benchmarks right so in a way i like to map the space of what maybe agi means to say okay like we went 101 performance in go in chess in starcraft the next iteration might be 20 performance across quote unquote all tasks right and even if it's not as good it's fine we we actually we have ways to also measure progress because we have those special agents specialized agents um and so on so this is to me very exciting and these next iteration models are definitely hinting at that direction of progress um which hopefully we can have there are obviously some things that could go wrong in terms of we might not have the tools maybe transformers are not enough then we must there's some breakthroughs to come which makes the field more exciting to people like me as well of course but that's if i if you ask me five to ten years you might see these models that start to look more like weights that are already trained and then it's more about teaching are or make they're meant to learn what you're you're trying um uh you're trying to to induce in terms of tasks and so on well beyond the simple now tasks we're starting to see emerge like you know small arithmetic tasks and so on so a few questions around that this is fascinating uh so that kind of teaching interactive not so it's beyond prompting says interacting with the neural network that's different than the training process so it's different than the optimization over differentiable uh functions this is already trained and now you're teaching i mean um it's almost like akin to the brain then the the neurons already set with their connections on top of that you know using that infrastructure to build up further knowledge okay so that's a really interesting distinction that's actually not obvious from a software engineering perspective that there's a line to be drawn because you always think for a neural network to learn it has to be retrained trained and retrained but maybe and prompting is a way of teaching and you'll now work a little bit of context about whatever the heck you're trying it to do so you can maybe expand this prompting capability by um making it interact that's really really yeah by the way this is not if you look at way back um at different ways to tackle even classification tasks so this this is this comes from from like long-standing literature in machine learning um what i'm suggesting could sound to some like a bit like um nearest neighbor so nida's neighbor is almost the simplest algorithm uh that you can that does not require learning so it has this interesting like you don't need to compute gradients and what nearest neighbor does is you quote unquote have a data set or upload a data set and then all you need to do is a way to measure distance between points and then to classify a new point you're just simply computing what's the closest point in this massive amount of data and that's my answer so you can think of prompting in a way as you're uploading not not just simple points and and you know the metric is not the distance between the images or something simple it's something that you compute that's much more advanced but in a way it's very similar right you you simply are uploading some knowledge to this pre-trained system in nearest neighbor maybe the metric is learned or not but you don't need to further train it and then now you immediately get a classifier out of this right now it's just an evolution of that concept very classical concept in machine learning which is um yeah just learning through what's the closest point closes by some distance and that's it yeah it's an evolution of that and i will say how how i saw metal learning when we worked um on a few ideas in in 2016 was precisely through the lens of nearest neighbor which is very common in computer vision community right there's a very active area of research about how do you compute the distance between two images but if you have a good distance metric you you also have a good classifier right all i'm saying is now these distances and and the points are not just images they're like words or sequences of words and images and actions that teach you something new but it might be that technique wise those come back and i will say that it's not necessarily true that you might not ever train the weights a bit further some aspect of metal learning some techniques in metal learning do actually do a bit of fine tuning as it's called right they train the weights a little bit when they get a new task so as i call the how or or how we're gonna achieve this um as a deep learner i'm very skeptic we're gonna try a few things whether it's a bit of training adding a few parameters thinking of this as nearest neighbor or just simply thinking of there's a sequence of words it's a prefix and that's the new classifier we'll see right there's there's the beauty of research but um but what's what's important is that is a good goal in itself that i see as very worthwhile pursuing for the next stages of not only meta learning i think this is basically what's exciting about machine learning period to me well the and then the interactive aspect of that is also very interesting the interactive version of nearest neighbor yeah to help you uh pull out the classifier from this giant thing okay uh is is this the way we can go in five ten plus years uh from any task so sorry from many tasks to any task so and what does that mean like what does it need to be actually trained on which point is the network had enough so what um what does a network need to learn about this world in order to be able to perform any task is it just as simple as language image and action or do you need some set of representative images uh like if you only see land images will you know anything about underwater is that somehow fundamentally different i don't know those i mean those are upward questions i would say i mean the way you put let me maybe further your example right if if all you see is land images but you're reading all about land and water worlds but in books right imagine like would that be enough i mean good question we don't know but i guess maybe you can you can you can join us if you want in our quest to find this that's that's precisely water world yeah yes that's precisely i mean the beauty of research and and that's the the the the the research business we're in i guess is to figure this out and ask the right questions and then iterate with with the whole community um publishing like findings and so on uh but yeah these are this is a question it's not the only question but it's certainly as you ask is is on my mind constantly right and so we'll we'll need to wait for maybe the let's say five years let's hope it's it's not 10 to to see what what are the answers um some people will largely believe in unsupervised or self-supervised learning of single modalities and then crossing them some people might think end-to-end learning is the answer um modularity is maybe the answer so we don't know but we're just definitely excited to find out but it feels like this is the right time and we're at the beginning of this yeah we're finally ready to do these kind of general big models and agents what do you sort of specific technical thing about gato flamingo chinchilla gopher any of these that is especially beautiful that was surprising maybe is there something that just jumps out at you of course there's the general thing of like you didn't think it was possible and then you realize it's possible in terms of the generalizability across modalities and all that kind of stuff or maybe the how small of a network relatively speaking god was all that kind of stuff but is there some weird little things that were surprising look i i'll give you an answer that's very important because maybe people don't quite realize this but the teams behind these efforts the actual humans yeah that's maybe the surprising um you know obviously positive way so anytime you see these these breakthroughs i mean it's easy to map it to a few people there's people that are great at explaining things and so on that's very nice but maybe the the the learnings or the method learnings that i get as a human about this is um sure we can move forward um and but the surprising bit is how how important are all the pieces of of these projects how do they come together so i'll give you uh maybe some of the ingredients of success that are common across these um but not the obvious ones and machine learning i i can always always also give you those but basically there is engineering is critical so so very good engineering uh because ultimately we're collecting um data sets right so the the engineering of data and then of deploying the models at scale um into some compute cluster that cannot go understated that is a huge factor of success and it's hard to believe that details matter so much we would like to believe that it's true that there is more and more of a standard formula as i was saying like this recipe that works for everything but then when you zoom into this each of these projects then you realize the the the devil is indeed in the details and then the teams have to work kind of together towards these goals um so engineering of data and obviously clusters and large scale is very important and then one that is often not maybe nowadays it is more more clear is benchmark progress right so we're talking here about multiple months of you know tens of researchers um and and and people that are trying to organize the research and so on working together and you don't know that you can get there i mean it is this this is this is the beauty like if you're not risking to trying to do something that feels impossible you're not gonna get there um but you need the way to measure progress so the benchmarks that you build are critical um i've seen this beautifully play out in many projects i mean maybe the one i've seen it more consistently which means we we established the metric actually the community did and then we leverage that massively is alpha fault this is a project where the data the metrics were all there and all it took was and it's easier said than done an amazing team working not to try to find some incremental improvement and publish which which is one way to do research that is valid but aim very high and work literally for years to iterate over that process and working for years with the team i mean it is it is tricky that also happened happened to happen partly during a pandemic and so on um so i think my meta learning from all these is the teams are critical to the success and then if now going to the machine learning the part that's surprising is um so we like architectures like neural networks um and i would say this was a very rapidly evolving field until the transformer came so attention might indeed be all unique which is the title also a good title although in hindsight is good i don't think at the time i thought this is a great title for a paper but that that architecture is proving that the dream of modeling sequences of any bites there is something there that will stick and and i think these these advance in architectures in in kind of how neural networks are architecture to do what they do um it's been hard to find one that has been so stable and relatively has changed very little since it was invented five or so years ago so that is a surprising keeps is a surprise that keeps recurring into other projects try to on a philosophical or technical level introspect what is the magic of attention what is what is the tension that's attention in people that study cognition so human attention i think there's giant wars over what attention means how it works in the human mind so what this very simple looks at what attention is in your network from the days of attention is all you need but broad do you think there's a general principle that's that's really powerful here yeah so a distinction between transformers and lstms which were what came before and and you know there was a transitional period where you could you could use both in fact when we talked about alpha star we used transformers and lstms so it was still the beginning of transformers they were very powerful but lstms were still very also very powerful sequence models so the power of the transformer is that it has built in what we call an inductive bias of attention that makes the model when when you think of a sequence of integers right like we discussed this before right this is the sequence of words um when you when you have to do very hard tasks over these words this could be we're gonna translate a whole paragraph or we're gonna predict the next paragraph given ten paragraphs before there's some loose intuition from how we do it as a human that is very nicely mimicked and re like replicated structurally speaking in the transformer which is this idea of you're looking for something right so you're sort of when you're you you just read a piece of text now you're thinking what comes next you might want to re-look at the text or look it from scratch i mean literally is is because there's no recurrence you're just thinking what comes next and it's almost hypothesis driven right so if if i'm thinking the next word that i'll write is cat or dog okay um the way the transformer works almost philosophically is it has these two hypotheses is it is it gonna be cat or is it gonna be dark and then it says okay if it's cat i'm gonna look for certain words not necessarily cat although cud is an obvious word you would look in the past to see whether it makes more sense to output cut or dog and then it does some very deep computation over the words and beyond right so it combines the words and but but it has the query as we call it that is cat and then similarly for doc right and so it's it's very it's a very computational way to think about look if i'm if i'm thinking deeply about text i need to go back to to look at all the texts attend over it but it's not just attention like what what is guiding the attention and that was the key insight from an earlier paper is not how far away is it i mean how far away is it is important what what what did i just write about that's critical but what you wrote about 10 pages ago might also be critical so you're looking not positionally but content-wise right and you transformers have this beautiful way to query for certain content and pull it out com in a compressed way so then you can make a more informed decision i mean that's one way to explain transformers um but i think it's it's very it's a very powerful inductive bias there might be some details that might change over time but i think that is what makes transformers so much more powerful than the recurrent networks that were more recently biased based which obviously works in some tasks but it has major flaws transformer itself has flaws and i think the main one the main challenge is these prompts that we we just were talking about they can be a thousand words long but if i'm teaching you starcraft i mean i'll have to show you videos i have to i have to point you to whole wikipedia articles about the game um we'll have to interact probably as you play you'll ask me questions the context require for us to achieve me being a good teacher to you on the game as you would want to do it with a model what i think goes well beyond the current capabilities um so the question is how do we benchmark this and then how do we change the structure of the architectures i think there's ideas on both sides but we'll have to see empirically right obviously what ends up working and as as you talked about some of the ideas could be you know keeping the constraint of that length in place but then forming like hierarchical representations to where you can start being much clever in how you use those thousand tokens yeah that's really interesting but it also is possible that this attention mechanism where you basically you don't have a recency bias but you you you look more generally you you make it learnable the mechanism in which way you look back into the past you make that learnable it's also possible where at the very beginning of that because that you might become smarter and smarter in the way you query the past so recent past and distant past and maybe very very distant path so almost like the attention mechanism will have to improve and evolve as good as the uh the tokenization mechanism where so you can represent long-term memory somehow yes and i mean hierarchies are are very i mean it's a very nice word that sounds appealing um there's lots of work adding hierarchy to the memories um in practice it does seem like we keep coming back to the main formula or main architecture that sometimes tells us something there's such a sentence that a friend of mine told me like whether it wants to work or not so transformer was clearly an idea that wanted to work and then i think there's some principles we believe will be needed but finding the exact details details matter so much right that's gonna be tricky i love the idea that there's like you as a human being you want you want some ideas to work and then there's the model that wants some ideas to work and you get to have a conversation to see which more likely the model will win in the end because it's the one you don't have to do any work the model is the one that has to do the work so you should listen to the model and i really love this idea that you talked about the humans in this picture if i could just briefly ask um one is you're saying the benchmarks about the modular humans working on this uh the benchmarks providing a sturdy ground of a wish to do these things that seem impossible they they give you in the darkest of times give you hope because little signs of improvement you get you could yes like you're not you're somehow you're not lost if you have metrics to measure your your improvement and then there's other aspect you said elsewhere and here today like titles matter i wonder how much humans matter in the evolution of all this meaning individual humans you know something about their interaction something about their ideas how much they change the direction of all this like if you change the humans in this picture like is is it that the model is sitting there and it wants you it wants some idea to work or is it the humans or maybe the model is providing you 20 ideas that could work and depending on the humans you pick they're they're going to be able to hear some of those ideas like in in all the because you're now directing all of deep learning at deepmind you get to interact with a lot of projects a lot of brilliant researchers um how much variability is created by the humans in all of this yeah i mean you i do believe humans matter a lot at the very least at the you know time scale of years on when things are happening and what's the sequencing of it right so you get to interact with people that i mean you mentioned this um some people really want some idea to work and they'll persist um and then some other people might be more practical like i don't care what idea works i care about you know cracking protein folding yes um and these at least these two kind of seem opposite sides we need both and we've clearly had both um historically and that made certain things happen earlier or later so definitely humans involved in all of this endeavor have had i would say years of change or of ordering how how things have happened which breakthroughs came before which other breakthroughs and so on so certainly that does happen and so one other maybe one other axis of distinction is what i called and this is most commonly used in reinforcement learning is the exploration exploitation trade-off as well it's not exactly what i meant although quite related so when you start trying to help others right like you you're you're you know you're you become a bit more of a mentor to a large group of people beat a project or the deep learning team or something um or even in the community when you interact with people in conferences and so on um you're identifying quickly right um some some things that are explorative or exploitative and it's tempting to try to guide people obviously i mean that's what makes like our experience we bring it and we try to shape things um sometimes wrongly and there's many times that i've been wrong in the past that's great but it would be wrong to dismiss any sort of of the research styles that i'm observing um and i often get asked well you're in industry right so we do have access to large compute scale and so on so there's certain kinds of research i almost feel like we need to do responsibly and so on but it is kind of we have the particle accelerator here so to speak in physics so we need to use it we need to answer the questions that we should be answering right now for the scientific progress but then at the same time i look at many advances including attention which was discovered in montreal initially because of lack of compute right so we were working on sequence to sequence um with with my friends over at google brain at the time and we were using i think eight gpus which was somehow a lot at the time and then i think montreal was a bit more limited in the scale but then they discovered this content-based attention concept that then has obviously triggered things like transformer not everything obviously starts transformer there's there's always a history that is is important to recognize because then you can make sure that then those who might feel now well we don't have so much compute you need to then help them optimize that the kind of research that might actually produce amazing change perhaps it's not as short term as some of these advancements or perhaps it's a different time scale but um the people and the diversity of the field is quite critical to that we maintain it and at times especially mixed a bit with hype or other things it's it's a bit tricky to be observing um maybe too much of the same thinking across the board um but the humans definitely are critical and i can think of yeah quite a few personal examples where also someone told me something that had a huge you know huge effect on on to some idea and then that's why i'm saying at least at the temp in terms of years probably some things do happen yeah and it's also fascinating how constraints somehow are essential for innovation um and the other thing you mentioned about engineering i have a sneaking suspicion maybe i over you know my love is with engineering so i have a sneaking suspicion that all the genius a large percentage of the genius is in the tiny details of engineering so like i think we like to think our genius our the genius is in the big ideas there's i have a sneaking suspicion that like because i've seen the genius of details of engineering details make uh like the make the night and day difference and i wonder if those kind of have a ripple effect over time so that that too so that's that's sort of the taking the engineering perspective that sometimes that quiet innovation at the level of an individual engineer or maybe at the small scale of a few engineers can make all the difference that scales because we're doing we're working on computers that are scaled across large groups that one engineering decision can lead to ripple effects yes it's interesting to think about yeah i mean engineering there's also kind of a historical it might be a bit random because if you think of the history of how especially deep learning and neural networks took off feels like a bit random because gpus happened to be there at the right time for a different purpose which was to play video games so even the engineering that goes into the hardware and it might have a time like the time frame might be very different i mean these the gpus were evolved throughout many years where we didn't even were looking at that right so even at that level right that revolution so to speak um the ripples are like like we'll see when they stop right but in terms of thinking of why is this happening right there's there's i think that when i try to categorize it in sort of things that might not be so obvious i mean clearly there's a hardware revolution we are surfing thanks to that um data centers as well i mean data centers are where like i mean at google for instance obviously they're serving google but there's also now thanks to that and to have built such amazing data centers we can train these models um software is an important one i think if i look at the state of how i had to implement things to implement my ideas how i discarded ideas because they were too hard to implement yeah clearly the chat the times have changed and thankfully we are in a much better software position as well and then i mean obviously there's research that happens at scale and more people enter the field that's great to see but it's almost enabled by these other things and last but not least is also data right curating data sets labeling data sets these benchmarks we think about maybe we'll we'll want to have all the benchmarks in one system but it's still very valuable that someone put the thought and the time and the vision to build certain benchmarks we've we've seen progress thanks to but we're gonna repurpose the benchmarks that's the beauty of atari is like we solved it in a way but we use it in gato it was critical and i'm sure it's there's there's still a lot more to do thanks to that amazing benchmark that someone took the time to put even though at the time maybe oh you have to think what's the next you know iteration of architectures that's what maybe the field recognizes but we need to that's another thing we need to balance in terms of humans behind we need to recognize all these aspects because they're all critical and we tend to yeah we tend to think of the genius the scientists and so on but i'm i'm glad you're i know you have a strong engineer and background so but also i'm a date i'm a lover of data and because it's a pushback on the engineering comment ultimately could be the the creators of benchmarks who have the most impact andre capati who you mentioned has recently been talking a lot of trash about imagenet which he has the right to do because of how critical he is about him how essential he is to the development and the success of deep learning around uh imagenet and you're saying that that's actually that benchmark is holding back the field because i mean especially in his context on tesla autopilot that's looking at real world behavior of a system it's you you there's something fundamentally missing about imagenet that doesn't capture the real worldness of things that we need to have the datasets benchmarks that have the impressive unpredictability the edge cases the whatever the heck it is that makes the real world so comp so difficult to operate in we need to have benchmarks with that so but just to think about the impact of imagenet as a benchmark and that really puts a lot of emphasis on the importance of a benchmark both sort of internally a deep mind and as a community so one is coming in from within like how do i create a benchmark for me to mark and make progress and how do i make benchmark for the community to mark and uh push um progress you you uh you have this amazing paper you co-authored a survey paper called emergent abilities of large language models has again the philosophy here that i'd love to ask you about what's the intuition about the phenomena of emergence in neural networks transform is language models is there a magic threshold beyond which we start to see certain performance and is that different from task to task is that us humans just being poetic and romantic or is there literally some level of which we start to see breakthrough performance yeah i mean this is a property that we start seeing um in systems that actually tend to be so in machine learning traditionally again going to benchmarks i mean if if you have a some input outputs right like that is just a single input and a single output you generally um when you train these systems you see reasonably smooth curves when you analyze how how much the data set size affects the performance or how the model size affect the performance or how much you long train you how how long you train the system for affects the performance right so you know if we think of imagenet like the train curves look fairly smooth and predictable in a way um and i would say that's probably because of the it's kind of a one a one hop um reasoning task right it's like here is an input and you think for a few milliseconds or 100 milliseconds 300 as a human and then you tell me yeah there's there's an alpaca in this image so in language we are seeing benchmarks that require more pondering and more thought in a way right this is just kind of you you you need to look for some subtleties that it involves inputs that you you might think of or if even if the input is a sentence describing a mathematical problem um there is there is a bit more processing required as a human and more introspection so i think the how these benchmarks work means that there is actually a threshold um just going back to how transformers work in this way of querying for the right questions to get the right answers that might mean that performance becomes random until the right question is asked by the querying system of a transformer or of a language model like a transformer and then only only then you might start seeing performance going from random to non-random and this is more empirical there's there's no formalism or theory behind this yet although it might be quite important but we're seeing these phase transitions of random performance and until some let's say scale of a model and then it goes beyond that and it might be that you need to fit a few low order bits of thought before you can make progress on the whole task and if you could measure actually those breakdown of the task maybe you would see more smooth oh like yeah these you know once once you get these and these and these and this and these then you start making progress in the task but it's somehow um a bit annoying because then it means that certain questions we might ask about architectures possibly cannot only be done at certain scale and one thing that conversely i've seen great progress on in the last couple years is this notion of science of deep learning and science of scale in particular right so on the negative is that there's some benchmarks for which progress might need to be measured at at minimum at a certain scale until you see then what details of the model matter to make that performance better right so that's a bit of a con but what we've also seen is that you can you can sort of empirically analyze behavior of models at scales that are smaller right so let's say to put an example um we had this chinchilla paper that revised the so-called scaling laws of models and that whole study is done at a reasonably small scale right maybe hundreds of millions up to one billion parameters and then the cool thing is that you create some loss right some loss that some trends right you you extract trends from data that you see okay like it looks like the amount of data required to train now a 10x larger model would be this and these laws so far these extrapolations have helped us save compute and just get to a better place in terms of the science of how should we run these models at scale how much data how much depth and all sorts of questions we start asking extrapolating from small scale but then this emergence is sadly that not everything can be extrapolated from scale depending on the benchmark and maybe the harder benchmarks are not so good for extracting these laws but we have a variety of benchmarks at least so i wonder to which degree the threshold the phase shift scale is a function of the benchmark some some of that some of the science the scale might be engineering benchmarks where that threshold is low sort of taking a main benchmark and uh reducing it somehow or the essential difficulties left but the emergent the scale at which the emergence happens is lower just for the science aspect of it versus the actual real world aspect yeah so luckily we have quite a few benchmarks some of which are simpler or maybe they're more like i think people might call this systems one versus systems2 style um so i think what we're not seeing luckily is that extrapolations from maybe slightly more smooth or simpler benchmarks are translating to the harder harder ones but that is not to say that this extrapolation will hit its limits and when it does then how much we scale or how we scale will sadly be a bit suboptimal until we find better loss right um and these laws again are very empirical loss they're not like physical loss of models although i wish there would be better theory about these things as well but so far i would say empirical theory as i call it is way ahead than actual theory of machine learning let me ask you almost for fun so this is not auriel as a as a deep mind person or anything to do with deep mind or google just as a human being and looking at these news of a google engineer who claimed uh that uh i guess the lambda language model was sentient or had the i still need to look into the details of this but sort of making an official report and the claim that he believes there's evidence that this system is has achieved sentience and i think this is a really interesting case on a human level and a psychological level on a technical machine learning level of how language models transform our world and also just philosophical level of the role of ai systems in um in a human world so what did you what do you find interesting what's your take on all of this as a machine learning engineer and a researcher and also as a human being yeah i mean a few reactions um quite a few actually have you ever briefly thought is this thing sanctuary right so never absolutely like even with like alpha star wait a minute what uh sadly though i think yeah sadly i i have not um yeah i think i think the current any of the current models although very useful and very good um yeah i think we're quite far from that and there's kind of a converse side story so one of one of the my passions is about science in general and i think i feel i'm a bit of like a failed scientist that's why i came to machine learning because you always feel and you start seeing this that machine learning is maybe the science that can help other sciences as we've seen right like you you know it's such a powerful tool um so thanks to that angle right that okay i love science i love i mean i love astronomy i love biology but i'm not an expert and i decided well the thing i can do better at these computers but having especially with when i was a bit more involved in alpha fault learning a bit about proteins and about biology and about life um the complexity it feels like it really is like i mean if you start looking at the things that are going on um at you know at that atomic level um and and also i mean there's there's obviously that we are maybe inclined to try to think of neural networks as like the brain but the complexities and the amount of magic that it feels when i mean i don't i'm not an expert so it naturally feels more magic but looking at biological systems as opposed to these computer computational brains just makes me like wow this there's such level of complexity different still right like orders of magnitude complexity that um sure these weights i mean we train them and they do nice things but they're not at the level of biological entities brains cells it just feels like it's just not possible to achieve the same level of complexity behavior and but my belief when i talk to other beings is certainly shaped by this amazement of biology that maybe because i know too much i don't have about machine learning but i certainly feel it's very far fetched and far in the future to be calling um or to be thinking well this this this mathematical function that is differentiable is is um is in fact sentient and so on so there's something on that point it's very interesting so you know enough about machines and enough about biology to know that there's many orders of magnitude of difference in complexity but you know how machine learning works so the interesting question from human beings that are interacting with the system that don't know about the underlying complexity and i've seen people probably including myself that have fallen in love with things that are quite simple yeah so and and so maybe the complexity is one part of the picture but maybe that's not a necessary um that's not a necessary condition for sentience for um perception uh or emulation of sentience right so i mean i guess the other side of this is that's how i feel personally i mean you asked me about the person right um now it's very interesting to see how other humans feel about things right this is this we are like um again like i'm i'm not as amazed about things that i feel like this is not as magical as this other thing because of maybe yeah how i got to learn about it and how i see the curve a bit more smooth because i you know like just seen the progress of language models since shannon in the 50s and actually looking at that time scale we're not that fast progress right i mean it's what what we were thinking at the time like almost 100 years ago is not that dissimilar to what we're doing now but at the same time yeah obviously others my experience right that the personal experience i think no one should um you know i think no one should should should tell others how they should feel i mean the feelings are very personal right so how others might feel about the models and so on that's one part of the story that is important to understand for me personally as a researcher and then when i maybe disagree or i don't understand or see that yeah maybe this this is not something i think right now is reasonable knowing all that i know one of the other things and perhaps partly why it's great to be talking to you and reaching out to the world about machine learning is hey let's make let's demystify a bit the magic and try to see a bit more of the math and the fact that literally to create these models if we had the right software it would be 10 lines of code um and then just a dump of the internet so versus like then the complexity of like the creation of humans um from from their inception right and also the complexity of evolution of the whole universe to where we are um that is feels orders of magnitude more complex and fascinating to me so i think yeah maybe part of the only thing i'm thinking about trying to tell you is yeah i i think explaining a bit of the magic there is a bit of magic it's good to be in love obviously with what you do at work and i'm certainly fascinated and surprised quite quite often as well but i think hopefully as experts in biology hopefully will tell me this is not as magic and i'm happy to learn that um through through interactions with the larger community we can also have a certain level of education that in practice also will matter because i mean one question is how you feel about this but then the other very important is you starting to interact with this in products and so on um it's good to understand a bit what's going on what's not going on and what's safe what's not safe and so on right otherwise um the technology will not be used properly for good which is obviously the goal of all of us i hope so let me then ask the next question do you think in order to solve intelligence or to do to replace the lex bot that does interviews as we started this conversation with do you think the system needs to be sentient do you think he needs to achieve something like consciousness and do you think about what consciousness is in the human mind that could be instructive for creating ai systems yeah honestly i think probably not to to the degree of intelligence that there's this brain that can learn can be extremely useful can challenge you can teach you um converse you can teach it to do things i'm not sure it's necessary personally speaking but if consciousness or any other biological or evolutionary lesson can be repurposed to then influence our next set of algorithms that is a great that is a great way to actually make progress right and the same way i try to explain transformers a bit how it feels we operate when we look at text specifically these insights are very important right so there's a distinction between um details of how the brain might be doing computation um i think my understanding is sure there's neurons and there's some resemblance to neural networks but we don't quite understand enough of the brain in detail right to to be able to replicate it but then more if you if you zoom out a bit how we then our thought process how memory works um maybe even how evolution got us here what's exploration exploitation like all the how these things happen i think this clearly can inform algorithmic level research and i've seen some examples um of these being quite useful to then guide the research even it might be for the wrong reasons right so i think um biology and what we know about ourselves can help a whole lot to build um essentially like what we call agi this this general um the real gato right the the last step of the chain hopefully but but consciousness in particular i don't i don't myself at least think too hard about how to add that to to the system but maybe maybe my understanding is also very personal about what it means right i think this even even that in itself is a long debate that i know people uh people have often and maybe i should learn more about this yeah and i personally i notice the magic often on a personal level especially with physical systems like robots i have a lot of uh legged robots now in austin that i play with and even when you program them when they do things you didn't expect there's an immediate anthropomorphization and you notice the magic and you start to think about things like sanctions that has to do more with effective communication and less with any of these kind of dramatic things it um it seems like a useful part of communication having the perception of consciousness seems like useful for us humans we we treat each other more seriously we are able to uh do a nearest neighbor shoving of that entity into your memory correctly all that kind of stuff seems useful at least to fake it even if you never make it so maybe like yeah mirroring the question and since you talk to a few people do you then you do think that we'll need to figure something out in order to achieve intelligence in a grander sense of the world yeah i i personally believe yes but i don't even think it'll be like a separate island we'll have to travel to i think it will emerge very quite naturally okay that's easier than for us then thank you but the reason i think it's important to think about is you will start i believe like with this google engineer you'll start seeing this a lot more especially when you have ai systems that are actually interacting with human beings that don't have an engineering background and we have to prepare for that because there will be i do believe there will be a civil rights movement for robots as silly as as it is to say there's going to be a large number of people that realize there's these intelligent entities with whom i have a deep relationship and i don't want to lose them they've come to be a part of my life and they mean a lot they have a name they have a story they have a memory and we start to ask questions about ourselves well what uh this thing sure seems like it's capable of suffering because it tells all these stories of suffering it doesn't want to die and all those kinds of things and we have to start to ask ourselves questions what is the difference between a human being in this thing and wait so when you engineer i believe from an engineering perspective like a deep mind or anybody that builds systems there might be laws in the future where you're not allowed to engineer systems with displays of sentience unless they're explicitly designed to be that unless it's a pet so if you if you have a system that's just doing customer support you're legally not allowed to display sentience we'll start to like ask ourselves that question and then so that that's that's going to be part of the software engineering process do we do we which features do we have in one of them as communications essentials but it's important to start thinking about that stuff especially how much it captivates public attention yeah absolutely absolutely it's a it's definitely a topic that is important we think about and i think in a way i i always see not not i mean not not every movie is is is equally on point with certain things but certainly science fiction in this sense at least has prepared society to to start thinking about certain topics that even if it's too early to talk about as long as we are like reasonable um it's certainly gonna prepare us for for both um the research to come and how to i mean there's many important challenges and and topics that um come with with building an intelligent system many of which you just mentioned right so i think being we're never going to be fully ready unless we talk about this and we start also as i said just kind of expanding the the people we talk to to not include only our our own researchers and so on and in fact places like deepmind but elsewhere there's more interdisciplinary groups forming up to start asking and really working with us on these questions um because obviously this is not initially what your passion is when you do your phd but certainly it is coming right so it's it's fascinating kind of it's it's the the thing that brings me to one of my passions that is learning so the in this sense this is kind of a new area that as a learning system myself i want to keep exploring and i think it's it's great that um to see you know parts of the debate and and even i seen a level of maturity in the conferences that deal with ai if you look five years ago um to now just the amount of workshops and so on has changed so much is is impressive to see how much topics of um you know safety ethics and so on come to to the surface which is great and if you were too early clearly it's fine i mean it's a big field and there's lots of people um with lots of um interest that will do progress or make progress um and obviously i don't believe we're too late so in that sense like i think it's great that we're doing this already it's better be too early than yeah too late when it comes to super intelligent systems let me ask speaking of sentient ais you gave props to your friend alias giver for being elected uh the fellow of the world society so just as a shout out to a fellow researcher and a friend what's the secret to the genius of elias discover and also do you believe that his tweets of as youth hypothesized and andre kapathi did as well are generated by a language model uh yeah so i i strongly believe i ilia is going to visit in a few weeks actually so i'll ask him in person um but will he tell you the truth yes of course yeah absolutely i mean we're you know ultimately we we all have share paths and and there's friendships that go beyond obviously institutional institutions and and so on so i hope he tells me the truth well maybe the ai system is holding him hostage somehow maybe he has some videos about he doesn't want to release so maybe he it has taken control over him so he well i if i see him in person then he will he will know yeah but but i think the um i think it's a good i think elia's personality just knowing him for a while um yeah he's he's everyone in twitter i guess gets a different persona and and i think elias one um does not surprise me right so i think knowing ilia from before social media and before ai was so prevalent i recognized a lot of his characters so that's something for me that i feel good about a friend that hasn't changed or like is still true to himself right um obviously there is there is though a fact that your field becomes more popular and he is obviously one of the main figures in the field having done a lot of advancement so i think that the tricky bit here is how to balance your true self with the responsibility that your words carry so in this sense i think yeah like i i i appreciate the style and i understand it but um it created debates on like some some of his tweets right that maybe it's good we have them early anyways right but um but yeah it's it's then the reactions are usually polarizing i think we're just seeing kind of the reality of social media a bit there as well reflected on on that on that particular topic or set of topics he's tweeting about yeah i mean it's funny that he speak to this tension he was one of the early seminal figures in the field of deep learning and so there's a responsibility with that but he's also from having interacted with him quite a bit he's just a brilliant thinker about ideas and um which as as are you and that there's a tension between becoming the manager versus like the actual thinking through very novel ideas the yeah the the scientist versus the manager and he's uh he's one of the great scientists of our time this was quite interesting and also people tell me quite silly which i haven't quite detected yet but um in private we'll have to see about that yeah yeah i mean just just on the point of i mean ilia has been a inspiration um i mean quite a few colleagues i can think shaped you know the person you are like ilia certainly gets probably the top spot if not close to the top and if we go back to the question about people in the fields like how the role would have changed the field or not i think ilia's case is interesting because he really has a deep belief in the scaling up of neural networks there was a talk that that that is still famous to this day um from the sequence to sequence paper um where where he was just claiming just give me supervised data and large neural network and then you know you'll solve basically all the problems right that that that vision right was already already there many years ago so it's it's good to see like someone who's in this case very deeply into this style of research um and clearly has had a tremendous track record of successes and so on um the funny bit about that talk is that we rehearsed the talk in a hotel room before and the original version of that talk would have been even more controversial so maybe i'm i'm the only person that has seen the unfiltered version of the talk um and you know maybe when the time comes maybe we should revisit some of the the skip slides from from the from from the talk from emilia but i really think um the deep belief into some certain style of research pays out right is is is good to be practical sometimes and i actually think ilya and myself are like practical but it's also good there's some sort of long-term belief and trajectory um obviously there's a bit of luck involved but it might be that that's the right path then you clearly are ahead and and hugely influential to the field as he has been do you agree with that intuition that maybe uh was written about by rich sutton in the the bitter lesson that the biggest lesson that can be read from 70 years of ai research is that general methods that leverage computation are ultimately the most effective do you think that intuition is ultimately correct general methods leverage computation allowing the scaling of computation to do a lot of the work and so you the basic task of us humans is to design methods that are more and more general versus more and more specific to the tasks at hand i i certainly think this essentially mimics a bit of the deep learning um research um almost like philosophy that on the one hand we want to be data agnostic we don't want to pre-process data sets we want to see the bytes right like the true data as it is and then learn everything on top so very much agree with that and i think scaling up feels at the very least again necessary for building incredible complex systems um it's possibly not sufficient bearing that we need a couple of breakthroughs um i think rich saturn mentioned search being part of the equation of skill skill and search i think search i've seen it that's been more mixed in my experience so from that lesson in particular search is a bit more tricky because it is very appealing to search in domains like go where you have a clear reward function that you can then discard some search traces but then in some other tasks it's not very clear how you would do that although recently one of our recent works which actually was mostly mimicking or a continuation and even the team and the people involved were pretty much uh very like intersecting with alpha star was alpha code in which we actually saw the bitter lesson how scale of the models and then a massive amount of search yielded this kind of very interesting result of being able to have human level code competition so i've seen examples of it being literally mapped to search and scale um i'm not so convinced about the search bit but certainly i'm convinced skill will be needed so we need general methods we need to test them and maybe we need to make sure that we can scale them given the hardware that we have in practice but then maybe we should also shape how the hardware looks like um based on which methods might be needed to scale and that's an interesting and an interesting contrast of these gpu comments that is we got it for free almost because games were using this but maybe now if sparsity is required we don't have the hardware although in theory i mean many people are building different kinds of hardware these days but there's a bit of this notion of hardware lottery for scale that might actually have an impact at least on the year again scale of years on how fast we'll make progress to to maybe a version of neural nets or or whatever comes next that might enable truly intelligent agents do you think in your lifetime we will build an agi system that would undeniably be a thing that achieves human level intelligence and goes far beyond i definitely think it's possible um that it will go far beyond but i'm definitely convinced that it will be human-level intelligence um and i'm i'm hypothesizing about the beyond because the beyond beat is a bit tricky to define especially when we look at the current formula of starting from this imitation learning standpoint right so we can certainly imitate humans um at language and beyond so getting at human level through imitation feels very possible going beyond will require reinforcement learning and other things and i think in some areas that certainly already has paid out i mean go being an example that's my favorite so far in terms of going beyond human capabilities but in general i'm not sure we can define reward functions that from a seat of imitating human level intelligence that is general and then going beyond um that that bit is not so clear in my lifetime but certainly um human level yes and i mean that in itself is already quite powerful i think so um going beyond i think it's obviously not we're not gonna not try that if if if it then we get to superhuman and discovery and advancing the world but um but at least human level is also in general is also very very powerful well especially if human level or slightly beyond is integrated deeply with human society and there's billions of agents like that uh do you think there's a singularity moment beyond which our world will be just very deeply transformed by these kinds of systems because now you're talking about intelligence systems that are just i mean this is no longer just a going from horse and buggy to to the car it feels like a very different kind of shift and what it means to be a living entity on earth are you afraid are you excited of this world i'm afraid if there's a lot more so i think maybe we'll need to think about if we truly get there just thinking of limited resources like you know humanity clearly hits some limits and then there's some balance hopefully that biologically um the planet is imposing and we we should actually try to get better at this as we know there's there's quite a few you know issues with having too many people um coexisting in a resource limited way so for digital entities it's an interesting question i think such a limit maybe should exist but maybe it's going to be imposed by energy availability because this also consumes energy in fact most systems are more inefficient than we are in terms of energy required but definitely i think as a society we'll need to just work together to find what would be reasonable in terms of growth or how we coexist if that is to happen i am very excited about obviously the aspects of automation that make people that obviously don't have access to certain resources or knowledge um for them to have those that access i think those are the applications in a way that i'm most exciting to see um and to personally work towards yeah there's going to be significant improvements in productivity and the quality of life across the whole population which is very interesting but i'm looking even far beyond us becoming a multi-planetary species and uh just as a quick bet last question do you think as humans become multi-planetary species go outside our solar system all that kind of stuff do you think there will be more humans or more robots in that future world so will humans be the quirky uh intelligent being of the past or is there something deeply fundamental to human intelligence that's truly special where we we will be part of those other planets not just ai systems i think we'll we're all excited to build agi to empower or make us more powerful as human species not to say there might be some hybridization i mean this is obviously speculation but there are companies also trying to um the same way medicine is is making us better maybe there are other other things that are yet to happen on that but if the ratio is not at most one to one i would not be happy so i would hope that we are part of the equation um but maybe there's maybe a one-to-one ratio feels like possible um constructive and so on but it would not be good to have a misbalance at least from my core beliefs and the why i'm doing what i'm doing when i go to work and i research what i research well this is how i know you're human and this is how you've passed the turing test and you are one of the special humans or it's a huge honor that you have talked with me and i hope we get the chance to speak again maybe once before the singularity once after and see how our view of the world changes thank you again for talking today thank you for the amazing work you do you're a shining example of a researcher and a human being in this community thanks a lot lex yeah looking forward to before the singularity certainly and maybe after thanks for listening to this conversation with arielle vignealis to support this podcast please check out our sponsors in the description and now let me leave you with some words from alan turing those who can imagine anything can create the impossible thank you for listening and hope to see you next time