File TXT tidak ditemukan.
Roman Yampolskiy: Dangers of Superintelligent AI | Lex Fridman Podcast #431
NNr6gPelJ3E • 2024-06-02
Transcript preview
Open
Kind: captions Language: en if we create General super intelligences I don't see a good outcome longterm for Humanity so that is X risk existential risk everyone's dead there is srisk suffering risks where everyone wishes they were dead we have also idea for IR risk iyy risks where we lost our meaning the systems can be more creative they can do all the jobs it's not obvious what you have to contribute to a world where super intelligence exists of course you can have all the variants you mentioned where we are safe we are kept alive but we are not in control we are not deciding anything we like animals in a zo there is again possibilities we can come up with as very smart humans and then possibility is something a thousand times smarter can come up with for reasons we cannot comprehend the following is a conversation with Roman yski an AI Safety and Security research and author of a new book titled AI unexplainable unpredictable uncontrollable he argues that there's almost 100% chance that AGI will eventually destroy human civilization as an aside let me say that I will have many often technical conversations on the topic of AI often with Engineers building the state of the art AI systems I would say those folks put the infamous P Doom or the probability of a GI killing all humans at around 1 to 20% but it's also important to talk to folks who put that value at 70 80 90 and is in the case of Roman at 99.99 and many more 9es per. I'm personally excited for the future and believe it will be a good one in part because of the amazing technological innovation we humans create but we must absolutely not do so with blinders on ignoring the possible risks including existential risks of those Technologies that's what this conversation is about this is the Lex Freedman podcast to support it please check out our sponsors in the description and now dear friends here's Roman yski what to you is the probability that super intelligent AI will destroy all human civilization what's the time frame let's say 100 years in the next 100 years so the problem of controlling AI or super intelligence in my opinion is like a problem of creating a Perpetual safety Machine by analogy with perpetual motion machine is impossible yeah we may succeed and do a good job with GPT 5 6 7 but they just keep improving learning eventually self-modifying interacting with the environment interacting with malevolent actors the difference between cyber security narrow AI safety and safety for General AI for super intelligence is that we don't get a second chance with cyber security somebody hacks your account what's the big deal you get a new password new credit card you move on here if we're talking about existential risks you only get one chance so you're really asking me what are the chances that will create the most complex software ever on the first try with zero bugs and it will continue have zero bugs for 100 years or more so there is an incremental Improvement of systems leading up to AGI to you it doesn't matter if we can keep those safe there's going to be one level of system at which you cannot possibly control it I don't think we so far have made any system safe at the level of capability they display they already have made mistakes we had accidents they've been jailbroken I don't think there is a single large language model today which no one was successful at making do something developers didn't intend it to do but there's a difference between getting it to do something unintended getting it to do something that's painful costly destructive and something that's destructive to the level of hurting billions of people or hundreds of millions of people billions of people or the entirety of human civilization that's a big leap exactly but the systems we have today have capability of causing x amount of damage so then they fail that's all we get if we develop systems capable of impacting all of humanity all of universe the damage is proportionate what to you are the possible ways that such kind of mass murder of humans can happen it's always a wonderful question so one of the chapters in my new book is about unpredictability I argue that we cannot predict what a smarter system will do so you're really not asking me how super intelligence will kill everyone you're asking me how I would do it and I think it's not that interesting I can tell you about the standard you know nanotag synthetic bionuclear super intelligence will come up with something completely new completely super we may not even recognize that as a possible path to achieve that goal so there is like a unlimited level of creativity in terms of how humans could be killed but you know we could still investigate possible ways of doing it not how to do it but the at the end what is the methodology that does it you know shutting off the power and then humans start killing each other maybe because the resource is are really constrained that there then there's the actual use of weapons like nuclear weapons or developing artificial pathogens viruses that kind of stuff we could still kind of think through that and defend against it right there's a ceiling to the creativity of mass murder of humans here right the options are limited they are limited by how imaginative we are if you are that much smarter that much more creative you are capable of thinking across multiple domains do no research in physics and biology you may not be limited by those tools if squirrels were planning to kill humans they would have a set of possible ways of doing it but they would never consider things we can come up so are you are you thinking about mass murder and destruction of human civilization or you thinking of with squirrels you put them in a zoo and they don't really know they're in a zoo if we just look at the entire set of undesirable trajectories majority of them are not going to be death most of them are going to be just like uh things like Brave New World where you know the squirrels are fed dopamine and they're all like doing some kind of fun activity and the sort of the fire the soul of humanity is lost because of the drug that's fed to it or like literally in a zoo we're in a zoo we're doing our thing we're like playing a game of Sims and the actual players playing that game are AI systems those are all undesirable because sort of sort of the the free will the fire of human consciousness is dimmed through that process but it's not killing humans so like are you thinking about that or is the biggest concern literally the extinctions of humans I think about a lot of things so there is X risk existential risk everyone's dead there is srisk suffering risks where everyone wishes they were that we have also idea for IR risk iy risks where lost their meaning the systems can be more creative they can do all the jobs it's not obvious what you have to contribute to a world where super intelligence exists of course you can have all the variants you mentioned where we are safe we are kept alive but we are not in control we are not deciding anything we like animals in a zo there is again possibilities we can come up with as very smart humans and then possibility is something a thousand smarter can come up with for reasons we cannot comprehend I would love to sort of dig into each of those X risk srisk and IR risk so can can you like Linger on irisk what is that so Japanese concept of viky guy you find something which allows you to make money you are good at it and the society says we need it so like you have this awesome job you are podcaster gives you a lot of meaning you have a good life I assume you happy mhm that's what we want most people to find to have for many intellectuals it is their occupation which gives them a lot of meaning I am a researcher philosopher scholar that means something to me in a world where an artist is not feeling appreciated because his art is just not competitive with what is produced by machines or writer or scientist will lose a lot of that and at the lower level we're talking about complete technological unemployment we're not losing 10% of jobs we're losing all jobs what do people do with all that free time what happens then everything Society is built on is completely modified in one generation it's not a slow process where we get to kind of figure out how to live that new lifestyle but it's uh pretty quick in that world can't humans just do what humans currently do with chess play each other have tournaments even though AI systems are far superior at this time in chess so we just create artificial games or for us they're real like the Olympics and we do all kinds of different competitions and have fun Focus maximize the fun and and uh let uh the AI focus on the productivity it's an option I have a paper where I try to solve the value alignment problem for multiple agents and the solution to avoid compromise is to give everyone a personal virtual Universe you can do whatever you want in that world you could be king you could be slave you decide what happens so it's basically a glorified video game where you get to enjoy yourself and someone else takes care of your needs and the substrate alignment is the only thing we need to solve we don't have to get 8 billion humans to agree on anything mhm so okay so what why is that not a likely outcome why can't they systems create video games for us to lose ourselves in each each with an individual video game Universe some people say that's what happened we're in a simulation and we're playing that video game and now we're creating uh what maybe we're creating artificial threats for ourselves to be scared about cuz cuz fear is really exciting it allows us to play the video game more uh more vigorously and some people choose to play on a more difficult level with more con strange some say okay I'm just going to enjoy the game high privilege level absolutely so okay what was that paper on multi-agent value alignment personal universes personal universes so so that's one of the possible outcomes but what what what in general is the idea of the paper so it's looking at multiple agents they're human AI like a hybrid system whether it's humans and AI or is it looking at humans or just so this is intelligent agents in order to solve value alignment problem I'm trying to formalize it a little better usually we're talking about getting AIS to do what we want which is not well defined we're talking about creator of a system owner of that AI Humanity as a whole but we don't agree on much there is no universally accepted ethics morals across cultures religions people have individually very different preferences politically and such so even if we somehow managed all the other aspects of it programming those fuzzy Concepts and getting to follow them closely we don't agree on what to program in so my solution was okay we don't have to compromise on room temperature you have your Universe I have mine whatever you want and if you like me you can invite me to visit your Universe we don't have to be independent but the point is you can be and virtual reality is getting pretty good it's going to hit a point where you can't tell the difference and if you can't tell if it's real or not what's the difference so basically give up on value alignment create and entire it's like the the Multiverse Theory this just create an entire universe for you where your values you still have to align with that individual they have to be happy in that simulation but it's a much easier problem to align with one agent versus 8 billion agents plus animals aliens so you convert the multi-agent problem into a single agent problem I'm trying to do that yeah okay is there any way to is so okay that's giving up on the on the value problem well is there any way to solve the value alignment problem where there's a bunch of humans multiple humans tens of humans or 8 billion humans that have very different set of values it seems contradictory I haven't seen anyone explain what it means outside of kind of words which pack a lot make it good make it desirable make it something they don't regret but how do you specifically formalize those Notions how do you program women I haven't seen anyone uh make progress on that so far but isn't that the whole optimization Journey that we're doing as a human civilization we're looking at geopolitics nations are in a state of Anarchy with each other they start wars there's conflict and often times they have a very different views of what is good and and what is evil isn't that what we're trying to figure out just together trying to converge towards that so we're essentially trying to solve the value alignment problem with humans right but the examples you gave uh some of them are for example two different religions saying this is our holy sight and we are not willing to compromise it in any way if you can make two holy s sites in virtual world you solve the problem but if you only have one it's not divisible you kind of stuck there but what if we want to be a tension with each other and that through that tension we understand ourselves and we understand the world so that that's the intellectual Journey we're on we're on as a human civilization is we create intellectual and physical conflict and through that figure stuff out if we go back to that idea of simulation and this is a entertainment kind of giving meaning to us the question is how much suffering is reasonable for a video game so yeah I don't mind you know a video game where I get heptic feedback there is a little bit of shaking maybe I'm a little scared I don't want a game where like kids are tortured literally that seems unethical at least by our human standards are you suggesting it's possible to remove suffering if we're looking at human civilization as an optimization problem so we know there are some humans who because of a mutation don't experience physical pain so at least physical pain can be mutated out re-engineered out suffering in terms of meaning like you burn the only copy of my book is a little harder but even there you can manipulate your honic set point you can change defaults you can reset problem with that is if you start messing with your reward Channel you start wireheading and uh end up bissing out uh a little too much well that's the the question would you really want to live in a world where there's no suffering that's a dark question is there some level of suffering that reminds us of what this is all for I I think we need that but I would change the overall range so right now it's negative Infinity to kind of positive Infinity pain pleasure AIS I would make it like zero to positive infinity and being unhappy is like I'm close to zero okay so what what's the srisk what are the possible things that you're imagining with srisk so Mass suffering of humans what are we talking about there caused by AGI so there are many malevolent actors we can talk about Psychopaths crazies hackers Doomsday cults we know from history they tried killing everyone they tried on purpose to cause maximum amount of damage terrorism what if someone malevolent wants on purpose to torture all humans as long as possible you solve aging so now you have functional immortality and you just try to be as creative as you can do you think there is actually people in human history that try to literally maximize human suffering in just studying people have done evil in the world it seems that they think that they're doing good and it doesn't seem like they're trying to maximize suffering they just cause a lot of suffering as a side effect of doing what they think is good so there are different malevolent agents some may be just gaining personal benefit and sacrificing others to that cause others will know for a fact trying to kill as many people as possible when we look at recent school shootings if they had more capable weapons they would take out not dozens but thousands millions billions well we don't know that but that is a terrifying possibility and we don't want to find out like if terrorists had access to nuclear weapons how far would they go is there a limit to what they're willing to do in your senses there's some malevolent actors where there's no limit there is mental mental diseases where people don't have empathy don't have this human quality of understanding suffering in others and then there's also set of beliefs where you think you're doing good uh by killing a lot of humans again I would like to assume that normal people never think like that it's always some sort of psychopaths but yeah and to you AGI systems can carry that and uh be more competent at executing that they can certainly be more creative they can understand human biology better understand our molecular structure genome uh again uh a lot of times uh torture ends then in individual dies that limit can be removed as well so if we're actually looking at x risk and srisk as the systems get more and more intelligent don't you think it it's possible to anticipate the ways they can do it and defend against it like we do with the cyber security with the do security systems right uh we can definitely keep up for a while I'm saying you cannot do it indefinitely at some point the cognitive Gap is too big the surface you have to defend is infinite but attackers only need to find one exploit so to you eventually this is we're heading off a cliff if we create General super intelligences I don't see a good outcome long term for Humanity the only way to win this game is not to play it okay well we we we'll talk about possible solutions and what not playing it means um but what are the possible timelines here to you what are we talking about we're talking about a set of years decades centuries what do you think I don't know for sure the prediction markets right now are saying 2026 for AGI I heard the same thing from CEO of anthropic dip mine so maybe we are 2 years away which seems very soon uh given we don't have a working safety mechanism in place or even a prototype for one and there are people trying to accelerate those timelines because they feel we're not getting there quick enough but what do you think they mean when they say AGI so the definitions we used to have when people are modifying a little bit lately artificial general intelligence was a system capable of performing in any domain a human could perform so kind of you creating this average artificial person they can do cognitive labor physical labor where you can get another human to do it superintelligence was defined as a system which is superior to All Humans in all domains now people are starting to refer to AGI as if it's super intelligence I made a post recently where I argued for me at least if you average out over all the common human tasks those systems are already smarter than an average human mhm so under that definition we we have it Shane L has this definition of where you're trying to win in all domains that's what intelligence is now are they smarter than Elite individuals in certain domains of course not they're not there yet but uh the progress is exponential see I'm much more concerned about social engineering so to me ai's ability to do something in the physical world like the the lowest hanging fruit this the easiest set of methods is by just getting humans to do it it's going to be much harder to to uh be the kind of viruses that take over the minds of robots that where the robots are executing the commands it just seems like humans social engineering of humans is much more likely that would be enough to bootst the whole process okay just to linger on the term AGI what's what to you is the difference between AGI and human level intelligence uh human level is General in the domain of expertise of humans we know how to do human things I don't speak dog language I should be able to pick it up if I'm a general intelligence it's kind of inferior animal I should be able to learn that skill but I can't at general intelligence truly Universal general intelligence should be able to do things like that humans cannot do to be able to talk to animals for example to solve pattern recognition problems of that type to do of similar things outside of our domain of expertise because it's just not the world will if we just look at the space of cognitive abilities we have I just would love to understand what the limits are Beyond which an AGI system can reach like what does that look like what about about actual mathematical thinking or uh scientific innovation that kind of stuff we know calculators are smarter than humans in that narrow domain of addition but is it humans plus tools versus AGI or just human raw human intelligence cu cu humans create tools and with the tools they become more intelligent so like there there's a gray area there what it means to be human when we're measuring their intelligence so when I think about it I usually think human with like a paper and a pencil not human with internet and anava AI helping but is that a fair way to think about it cuz isn't there another definition of human level intelligence that includes the tools that humans create but we create AI so at any point you'll still just add super intelligence to human capability that seems like cheating no controllable tools there is there is an implied leap that you're making when AGI goes from tool to uh entity that can make its own decisions so if we Define human level intelligence as everything a human can do with fully controllable tools it seems like a hybrid of some kind you're now doing brain computer interfaces you connecting it to maybe narrow AI yeah it definitely increases our capabilities so what's a good test to you that uh measures whether uh an artificial intelligence system has reached human level intelligence and was a good test where it has superseded human level intelligence to reach that land of AGI I am oldfashioned I like tting test I have a paper where I equate passing touring test to solving AI complete problems because you can encode any questions about any domain into the touring test you don't have to talk about how was your day you can ask anything and so the system has to be as smart as a human to pass it in a true sense but then you would extend that to U maybe a very long conversation like I think the Alexa prize was doing that basically can you do a 20 minute 30 minute conversation with an ass system it has to be long enough to where you can make some meaningful decisions about capabilities absolutely you can Brute Force very short conversations so like literally what does that look like can we do uh can we construct formally a kind of test that tests for AGI for AGI it has to be there I cannot give it a task I can give to a human and it cannot do it if a human can for super intelligent it would be superior on all such tasks not just average performance so like go learn to drive car go speak Chinese play guitar okay great I guess the the following question is there a test for the kind of AGI that would be uh susceptible to lead to srisk or X risk susceptible to destroy human civilization like is there a test for that you can develop a test which will give you positives if it lies to you or has those ideas you cannot develop a test which rules them out there is always possibility of what bom calls a treacherous turn where later on a system decides for game theoretic reasons economic reasons to change its behavior and we see the same with humans it's not unique to AI for Millennia we tried developing morals ethics religions uh light detector tests and then employees betray the employer spouses betray family it's a pretty standard thing intelligent agents sometimes do so is it is it possible to detect when a AI system is lying or deceiving you if you know the truth and it tells you something false you can detect that but you cannot know in general every single time and again the system you're testing today may not be lying the system you're testing today may know you are testing it and so behaving and later on after it interacts with the environment interacts with other systems malevolent agents learns more it may start doing those things so do you think it's possible to develop a system where the creators of the system the developers the program rers don't know that it's deceiving them so systems today don't have long-term planning that is not out they can lie today if it optimizes helps them optimize the reward if they realize okay this human will be very happy if I tell them the following they will do it if it brings them more points and they don't have to kind of keep track of it it's just the right answer to this problem every single time at which point is somebody creating that intentionally not unintentionally intentionally creating an AI system that's doing long-term planning with an objective function that's defined by the AI system not by a human well some people think that if they're that smart they always good they really do believe that it's just benevolence from intelligence so they'll always want what's best for us some people think that uh they will be able to detect problem behaviors and correct them at the time when we get there I don't think it's a good idea I am strongly against it but yeah there are quite a few people who in general are so optimistic about this technology it could do no wrong they want it developed as soon as possible as capable as possible so there's going to be people who believe the more intelligent it is the more benevolent and so therefore it should be the one that defines the objective function that it's U optimizing when it's doing long-term planning there are even people who say okay what's so special about humans right we removed the gender bias we're removing race bias why is this pro-human bias we are polluting the planet we are as you said you know fight a lot of Wars kind of violent maybe it's better if this super intelligent perfect uh Society comes and replaces us it's normal stage in the evolution of our species yeah so somebody says uh let's develop an AI system that removes the violent humans from the world and then it turns out that all humans have violence in them or the capacity for violence and therefore all humans are removed yeah yeah yeah let me ask about uh Yan laon he's somebody who uh you've had a few exchanges with and he's somebody who actively pushes back against this view that AI is going to lead to destruction of uh human civilization also known as uh Ai dorismar and open source are the best ways to understand and mitigate the risks and two AI is not something that just happens we build it we have agency in what it becomes hence we control the risks we meaning humans it's not some sort of natural phenomena that uh we have no control over so can you can you make the case that he's right and can you try to make the case that he's wrong I cannot make a case that he's right he's wrong in so many ways it's difficult for me to remember all of them uh he is a Facebook buddy so I have a lot of fun uh having those little debates with him so I'm trying to remember the arguments so one he he says we are not gifted to this intelligence from Aliens we are designing it we are making decisions about it that's not true it was true then we had expert systems symbolic AI decision threes today you set up parameters for a model and you water this plant you give it data you give it compute and it grows and after it's finished growing into this alien plant you start testing it to find out what capabilities it has and it takes years to figure out even for existing models if it's Str for 6 months it will take you 2 3 years to figure out basic capabilities of that system we still discover new capabilities in systems which are already out there so that's that's not the case so just to linger on that to you the difference there that there is some level of emergent intelligence that happens in our current approaches so stuff that we don't hardcode in absolutely that's what makes it so successful then we had to painstakingly hardcode in everything we didn't have much progress now just spend more money and more compute and it's a lot more capable and then the question is when there is emergent intelligent phenomena what is the ceiling of that for you there's no ceiling for uh for Yan laon I think there's a kind of ceiling that happens that we have full control over even if we don't understand the internals of the emergence how the emergence happens there's a sense that we have control and understanding of the approximate ceiling of capability the limits of the capability let's say there is a ceiling it's not guaranteed to be at a level which is competitive with us it may be greatly Superior to ours so what about his statement about open research and open source are the best ways to understand and mitigate the risks historically he's completely right open source software is wonderful it's tested by the community it's de but we're switching from tools to agents now you're giving open source weapons to Psychopaths do we want to open source nuclear weapons biological weapons it's not safe to give technology so powerful to those who may misalign it even if you are successful at somehow getting it to work in the first place in a friendly manner but the difference with nuclear weapons current AI systems are not akin to nuclear weapons so the idea there is you're open sourcing it at this stage that you can understand it better large large number of people can explore the limitation the capabilities explore the possible ways to keep it safe to keep uh it secure all that kind of stuff while it's not at the stage of nuclear weapons so nuclear weapons there's a no nuclear weapon and then there's a nuclear weapon with AI systems there's a gradual Improvement of capability and you get to uh perform that Improvement incrementally and so open source allows you to study uh how things go wrong I study the the very process of emergence study AI safety on those systems when there's not a high level of danger all that kind of stuff it also sets a very wrong precedence so we open sourced model one model two model three nothing ever bad happened so obviously we're going to do it with model four it's just gradual Improvement I I don't think it always works with the precedent like you're not stuck doing it the way you always did it just uh it's that's a precedent of open research and open development such that we get to learn together and then the first time there's a sign of danger some dramatic thing happen not a thing that destroys human civilization but some dramatic demonstration of capability that can legitimately lead to a lot of damage then everybody wakes up and says okay we need to regulate this we need to come up with safety mechanism that stops this right but at this time maybe can educate me but I haven't seen any illustration of significant damage done by intelligent AI systems so I have a paper which collects accidents through history of AI and they always are proportionate to capabilities of that system so if you have Tic Tac to playing AI it will fail to properly play and lose the game which it should draw trivial your spell checker will be spellward so on uh I stopped collecting those because there are just too many examples of AI failing at what they are capable of we haven't had terrible accidents in a sense of billion people got killed absolutely true but in another paper I argue that those accidents do not actually prevent people from continuing with research and actually they kind of serve like vaccines a vaccine makes your body a little bit sick so you can handle the big disease later much better it's the same here people will point out you know that accident AI accident we had where 12 people died everyone's still here 12 people is less than smoking kills it's not a big deal so we continue so in a way it will actually be kind of confirming that it's not that bad it matters how the deaths happen whether it's literally Murder By thei system then one is a problem but if it's accidents because of increased Reliance on automation for example so when uh airplanes are flying in an automated way maybe the number of plane crashes increased by 177% or something and then you're like okay do we really want to rely on automation I think in a case of automation airplanes it decrease significantly okay same thing with autonomous vehicles like okay uh what are the pros and cons what are the W with the trade-offs here you can have that discussion in an honest way but I think the kind of things we're talking about here is mass scale pain and suffering caused by AI systems and I think we need to see illustrations of that on a very small scale to start to understand that this is really damaging versus clippy versus a tool that's really useful to a lot of people to do learning to do um summarization of text to do question answer all that kind of stuff to generate videos a tool fundamentally a tool versus an agent that can do a lot a huge amount of damage so you bring up example of cars yes cars were slowly developed and integrated if we had no cars and somebody came around and said I invented this thing it's called cars it's awesome it kills like a 100,000 Americans every year let's deploy it m would we deploy that there's been fear mongering about cars for a long time from the the the transition from horses cars there's a there's a really nice channnel that I recommend people check out pessimist archive that documents all the fear mongering about technology that's happened throughout history there's definitely been a lot of fear-mongering about cars there's a transition period there about cars about how deadly they are we can try it took a very long time for cars to proliferate to the degree they have now and then you could ask serious questions uh in terms of the miles traveled the benefit to the economy the benefit to the quality of life that cars do versus the number of deaths 30 40,000 in the United States are we willing to pay that price I think most people when they're rationally thinking policy makers will say yes it's we want to decrease it from 40,000 to zero and do everything we can to decrease it there's all kinds of policies incentives you can create to decrease the risks uh with the uh deployment of Technology but then you have to weigh the benefits and the risk the technology and the same thing would be done with with with AI you need data you need to know but if I'm right and it's unpredictable unexplainable uncontrollable you cannot make this decision we're gaining $10 trillion of wealth but we're losing we don't know how many people uh you basically have to perform an experiment on 8 billion humans without their consent and even if they want to give you consent they can't because they cannot give informed consent they don't understand those things right that happens when you do when you go from the predictable to the unpredictable very quickly you just uh but it's not obvious to me that AI systems would gain capability so quickly that you won't be able to collect enough data to study the sa the benefits and risks we literally doing it the previous model we learned about after we finish training it what it was capable of let's say we stopped GPT for training run around human cap capability hypothetically we start training GPT 5 and I have no knowledge of Insider training runs or anything and we started that point of about human and we train it for the next 9 months maybe 2 months in it becomes super intelligent we continue training it at the time when we start uh testing it it is already a dangerous system how dangerous I have no idea but neither people training it at the training stage but then there's a testing stage mhm inside the company they can start getting intuition about what the system is capable to do you're saying that somehow from leap from GPT 4 to GPT 5 can happen the kind of leap where GPT 4 was controllable in GPT 5 is no longer controllable and we get no insights from using GPT 4 about the fact that GPT 5 will be uncontrollable like that's the that's the situation you're concerned about where there leap from n to n plus one would be such that uncontrollable system is created without any ability for us to anticipate that if we had capability of ahead of the run before the training run to register exactly what capabilities that next model will have at the end of a training run and we accurately guessed all of them I would say you're right we can definitely go ahead with this run we don't have that capability from gp4 you can build up intuition about what GPT 5 will be capable of it's just incremental progress MH even if that's a big leap in capability it just doesn't seem like you can take a leap from a system that's uh helping you write emails to a system that's going to destroy human civilization it seems like it's always going to be sufficiently incremental such that we can anticipate the possible dangers and we're not even talking about existential risks but just the the kind of damage can do to civilization it seems like we'll be able to anticipate the kinds not the exact but the kinds of uh risks it might lead to and then rapidly develop defenses ahead of time and as the risks emerge we're not talking just about capabilities specific tasks we're talking about General capability to learn maybe like a child at the time of testing and deployment it is still not extremely capable but as it is exposed to more data real world it can be trained to become much more dangerous and capable so let's let's focus then on the control problem at which point does the system become uncontrollable why is it the more likely trajectory for you that the system becomes uncontrollable so I think at some point it becomes capable of getting out of control for game theoretic reasons it may decide not to do anything right away and for a long time just collect more resources accumulate strategic Advantage right away it may be kind of still young weak super intelligence give it a decade it's in charge of a lot more resources it had time to make backups so it's not obvious to me that it will strike as soon as it can can we just try to imagine this future with there's an AI system that's capable of uh escaping in control of humans and then doesn't and waits what's that look like so one we have to rely on that system for a lot of the infrastructure so we have to give it access not just to the internet but to the task of managing uh Power government economy this kind of stuff so and that just feels like a gradual process given the bureaucracies of all those systems involved we've been doing it for years software controls all those systems nuclear power plants airline industry it's all software based every time there is electrical outage I can't fly anywhere for days but there's a difference between software and AI there's different kinds of software so to give a single AI system access to the control of Airlines and the control of the economy that's not a that's not a trivial transition for Humanity no but if it shows it is safer in fact fact then it's in control we get better results people will demand that it put in place and if not it can hack the system it can use social engineering to get access to it that's why I said it might take some time for it to accumulate those resources it just feels like that would take a long time for either humans to trust it or for the social engineering to come into play like it's not a thing that happens overnight it feels like something that happens across one or two decades I really hope you're right but it's not what I'm seeing people are very quick to jump on a latest Trend early adopters will be there before it's even deployed buying prototypes maybe the social engineering I can see because so for social engineering AI systems don't need any hardware access they just it's all software so they can start manipulating you through social media so on like you have ai assistants they're going to help you do a lot of manage a lot of your day-to-day and then they start doing social engineering but like for a system that's so capable that is can escape the control of humans that created it such a system being deployed at a mass scale and trusted by people to be deployed it feels like that would take a lot of convincing so we've been deploying systems which had hidden capabilities can you give an example gp4 I don't know what else is capable of but there are still things we haven't discovered can do there may be trial proportional to his capability I don't know it writes Chinese poetry hypothetical I know it does but we haven't tested for all possible capabilities and we are not explicitly designing them MH we can only rule out bugs we find we cannot rule out bugs and capabilities because we haven't found them is it possible for a system to have hidden capabilities that are orders a magnitude greater than its non-hidden capabilities this is the thing I'm really struggling with where on the surface the thing we understand it can do doesn't seem that harmful so if even if it has bugs even if it has hidden capabilities like Chinese poetry or generating effective viruses uh software viruses the damage that can do seems like on the same order of magnitude as it's uh the the capabilities that we know about so like this this idea that the hidden capabilities will include being uncontrollable this is something I'm struggling with cuz GPT 4 on the surface seems to be very controllable again we can only ask and test for things we know about if there are unknown unknowns we cannot do it I'm thinking of human statistics of an right if you talk to a person like that you may not even realize they can multiply 20 digit number numbers in their head you have to know to ask so as I mentioned just to sort of Linger on the the fear of the unknown so the pessimist archive has just documented let's look at data of the past at history there's been a lot of fearmongering about technology pessimist archive does a really good job of documenting how crazily afraid we are of every piece of technology we've been afraid there's a blog post where anlo who created pessimus archive writes about the fact that we've been uh fear-mongering about robots and automation for for over 100 years so why is Agi different than the kinds of Technologies we've been afraid of in the past so two things one we switching from tools to agents tools don't have negative or positive impact people using tools do so guns don't kill people with guns do agents can make their own decisions they can be positive or negative a pitbull can decide to harm you it's an agent the fears are the same the only difference is now we have this technology then they were afraid of humano robots 100 years ago they had none today every major company in the world is investing billions to create them not every but you understand what I'm saying yes it's very different well agents uh it depends on what you mean by the word agents the all those companies are not investing in a system that has the kind of agency that's implied by in the fears where it can really make decisions on their own that have no human in the loop they are saying they're building super intelligence and have a super alignment team you don't think they're trying to create a system smart enough to be an independent agent under that definition I have not seen evidence of it I I think a lot of it is marketing uh is is a is a marketing kind of discussion about the future and it's a it's a mission about the kind of systems we can create in the long-term future but in the short term the kind of systems they're creating Falls fully within the definition of narrow AI these are tools that have increasing capabilities but they're just don't have a sense of agency or Consciousness or self-awareness or ability to deceive at Scales that would require would be required to do like Mass scale suffering and murder of humans those systems are well beyond Naro AI if you had to list all the capabilities of GPT 4 you would spend a lot of time writing that list but agency is not one of them not yet but do you think any of those companies are holding back because they think it may be not safe or are they developing the most capable system they can give the resources and hoping they can control and monetize control and monetize hoping they can control and monetize so you're saying if they could press a button and create an agent that they no longer control that they can have to ask nicely a thing that's lives on a server across huge number of uh computers you're saying that they would uh push for the creation of that kind of system I mean I can't speak for other people for all of them I think some of them are very ambitious they fundraising trillions they talk about controlling the light corn of the Universe I would guess that they might well that's a human question whether humans are capable of that probably some humans are capable of that my more direct question if it's possible to create such a system have a system that has that level of agency I I don't think that's an easy technical challenge we're not it doesn't I feel like we're close to that A system that has the kind of agency where it can make its own decisions and deceive everybody about them the current architecture we have in machine learning and how we train the systems how deploy the systems and all that it just doesn't seem to support that kind of agency I really hope you are right uh I think the scaling hypothesis is correct we haven't seen diminishing returns it used to be we asked how long before AGI now we should ask how much until AGI it's trillion dollars today it's a billion dollars next year it's a million dollar in a few years don't you think it's possible basically run out of trillions so is this constrained by compute compute gets cheaper every day exponentially but then then that becomes a question of decades versus years if the only disagreement is that it will take decades not years for everything I'm saying to materialize then I can go with that but if it takes decades then uh the development of tools for AI safety uh becomes more and more realistic so I guess the question is I have a fundamental belief that humans when faced with danger can come up with ways to defend defend against that danger and one of the big problems facing AI safety currently for me is that there's not clear illustrations of what that danger looks like there's no illustrations of AI systems doing a lot of damage and so it's unclear what you're defending against because currently it's a philosophical Notions that yes it's possible to imagine AI systems that take control of everything and Destroy All Humans it's also a more formal mathematical notion that you talk about that it's impossible to have a perfectly secure system you can't you can't prove that a program of sufficient complexity is uh completely safe and and perfect and you know everything about it yes but like when you actually just pragmatically look how much damage have the AI systems done and what kind of damage there's not been illustrations of that even in autonomous weapon systems there's not been mass deployments of autonomous weapon systems luckily um the Automation in war currently is very limited the that the automation is at the scale of individuals versus like at the scale of strategy and planning so I think one of the challenges here is like where is the dangers uh and the intuition that yam Lun and others have is let's keep in the open building AI systems until the dangers start rearing their heads and they become more explicit there there start being uh case studies illustrative uh case studies that show exactly how the damage by as systems is done then regulation can step in then brilliant Engineers can step up and we can have Manhattan style projects that defend against such systems that's kind of the no the notion and I guess attention with that is the idea that for you we need to be thinking about that now so that we're we're ready because we we'll have not much time once the systems are deployed is that true so there is a lot to unpack here uh there is a partnership on AI a conglomerate of many large corporations they have a database of AI accidents they collect I contributed a lot to that database if we so far made almost no progress in actually solving this problem not patching it not again lipstick and a p kind of solutions why would we think we'll do better than we closer to the problem uh all the things you mentioned are serious concerns measuring the amount of harm so benefit versus risk there is is difficult but to you the sense is already the risk has superseded the benefit again I I want to be perfectly clear I love AI I love technology I'm a computer scientist I have PhD in engineering I work at an engineering school there is a huge difference between we need to develop narrow AI systems super intelligent in solving specific human problems like protein folding and let's create super intelligent machine G and will decide what to do with us yeah those not the same I am against the super intelligence in general sense with No undo button do you think the teams that are doing they're able to do the AI safety on the the kind of narrow AI risks that you've mentioned are those approaches going to be at all productive towards leading to approaches of doing AI safety on AGI or is it just a fundamentally different partially but they don't scale for narrow AI for deterministic systems you can test them you have edge cases you know what the answer should look like you know the right answers for General systems you have infinite test surface you have no edge cases you cannot even know what to test for again the unknown unknowns a
Resume
Categories