Transcript
YxKMiVqqBgU • George Hotz: Winning - A Reinforcement Learning Approach | AI Podcast Clips
/home/itcorpmy/itcorp.my.id/harry/yt_channel/out/lexfridman/.shards/text-0001.zst#text/0114_YxKMiVqqBgU.txt
Kind: captions Language: en foreign [Music] you've said that the meaning of life is to win if you look five years into the future what does winning look like so I there's a lot of I can go into like technical depth to what I mean by that to win um it may not mean I was criticized for that in the comments like doesn't this guy want to like save the penguins in Antarctica or like okay you know listen to what I'm saying I'm not talking about like I have a yacht or something yeah I am an agent I am put into this world and I don't really know what my purpose is but if you're a reinforcement if you're if you're an intelligent agent and you're put into a world what is the ideal thing to do well the ideal thing mathematically you can go back to like Schmidt hover theories about this is to uh build a compressive model of the world to build a maximally compressive to explore the world such that your exploration function maximizes the derivative of compression of the past mid Hooper has a paper about this and like I took that kind of as like a personal goal function um so what I mean to win I mean like maybe maybe this is religious but like I think that in the future I might be given a real purpose or I may decide this purpose myself and then at that point now I know what the game is and I know how to win I think right now I'm still just trying to figure out what the game is but once I know so you have uh you have uh imperfect information you have a lot of uncertainty about the reward function and you're discovering it exactly what the purpose is that's a better way to put it the purpose is to maximize it while you have it uh a lot of uncertainty around it and you're both reducing the uncertainty and maximizing at the same time yeah and uh so that's at the technical level what is the if you believe in the Universal prior yeah what is the universal reward function that's the better way to put it so that win is interesting I think I speak for everyone in saying that I wonder what that reward function is for you and uh I look forward to seeing that in five years and 10 years I think a lot of people including myself for cheering you on man so I'm I'm happy you exist and I wish you the best of luck