Transcript preview
Open
Kind: captions Language: en So let's talk about deep fakes which is this sort of sliver of all of this. >> Yeah. >> So deep fakes is an umbrella term for using machine learning AI to whole cloth create images, audio and video of things um that have never existed or happened. So, for example, I can go to my favorite deep fake generator and say, "Give me an image of Hakee in a studio doing a podcast with Professor Hani Fared >> and actually would do a pretty good job because you have a presence online. I have somewhat of a presence online. It knows what we look like and it would generate an image that's not exactly this, but something like that." Or I can say, "Please, by the way, I still say please when I ask AI for for things." One of my students told me that this is a good idea because when the AI overlords come, they're going to remember you were polite to them. Ah, >> I actually really like this advice. >> Wait a minute. So, I read an article. >> Yes. It cost tens of millions of dollars. >> The energy ultimate. Yes. Just saying please and thank you. I still do it by the way. And even in my head right there when I was asked when I was I I still in my head say please. >> Well, listen. I have AI connected to my AI, right? And so my AI corrects my AI prompts >> to proper grammar and it's like >> please. It puts please in there. >> I know. And it does cost tens of millions of dollars for that extra token. Okay. So, I will ask it for an image of a um of a unicorn wearing a red clown hat um walking down the street of Times Square and it will generate that image. Um I can ask uh generate an audio uh of Professor Hani Fared saying the following, right? >> Um I can generate a video of me saying and doing things I never did. And you can clearly see the power of that technology from a creative perspective. If you and I are having a conversation and in post we said something we didn't mean to, we can just fill it in with AI now. >> Well, here here's the thing that makes me you just mentioned how we're only two three years into this. So, however good it is now, you know, >> this is the worst it will ever be, >> right? >> So, if you look at the so I can tell you, by the way, how good it is. >> So, in addition to being trained as a computer scientist and applied mathematician, I've been somewhat trained as a as a cognitive neuroscientist. And we do perceptual studies. So what we do is we recruit participants. We show them images, audio clips and video. And we tell them half of the things you're going to look at are real. Half of the things are AI generated. We explain to them what AI generated is. We give them examples of that. >> And for images as of last year, people are roughly at chance at distinguishing a real photo from an AI generated photo. >> So what you mean by that is if they were just if you had a a monkey behind a keyboard, >> flip flipping a coin. >> Flipping a coin. >> Yeah. Yeah. The monkeyy's probably better than you, by the way. I'm I'm going to go off and guess. Um, so with audio, so we play a clip of somebody speaking like you and we play an AI generated version. They're slightly above chance, not like 65%. >> On image at chance at audio slightly better than chance and video, they're a little bit better, but all of those trends are going towards chance. So here's what we know. everything in the next 12 months, 18 months, 24 months, I don't know what the number is, >> it will be indistinguishable to the average person online, right? And that is >> that is a weird world we're living in because think about how much in first of all, the vast majority of Americans now get the the the majority of their information from online sources and unfortunately from social media too. >> And that and because it is so easy to create this content, understand all this is is a text prompt away. I type, "Please give me an image of this, generate this audio, generate this video." There are dozens of services that will do this extremely inexpensive or for free. And you can carpet bomb the internet with fake images of the conflict in uh Gaza. >> Fake images. >> I have seen them too. Fake images of the flood in Texas. Fake images and video of the fires in name it across the boards, right? Fake images of people stuffing ballot boxes. Now we have a threat to our democracy. >> Wow. So suddenly our sense of reality coming back to your first very good question is up in the air because I can create whatever reality I want and understand that there's sort of three things happening here when we talk about deep fakes. There's the creation of it. That's what we've been talking about. >> There's the distribution which we democratized 20 years ago. So anybody can >> publish to the world and that's very powerful and very terrifying because there's no editorial standards on social media. And then there's the amplification that we have become so polarized as a society that when you see things that conform to your world view, you are more than happy to click like, reshare, and now you have creation, distribution, amplification. >> Wow. >> That's the ball game, >> right? That's the ballgame for spreading massive lies, conspiracies, and disinformation campaigns that affect our global health, our planet's health, our democracy, our economy, everything. Everything. So let's get into how these fakes are generated. So start with images. >> Good. So let's start with images because in some ways it's the easiest one, but all of these have a similar theme. And one of my favorite techniques for generating images called a generative adversarial network or a GAN. And here's how it works. >> Wait a minute. Wait a minute. Adversarial. >> Adversarial. >> So that means that you're fighting your computer. >> Two computer two computer systems are fighting each other. And this is sort of the genius of this technique. So here's how it works. >> You have two systems. One system's job is to make an image of a person or a landscape or whatever you want. Yeah. And so what it does, it starts by, this is literally true, it just splats down a bunch of random pixels. So I say, generate an image of a of a person and it says, "Okay, here's a bunch of so so think uh the monkeys at the keyboard typing randomly. Let's see if this is Shakespeare, >> right? And then it takes that image and it hands it to a second system and it says, "Is this a face?" And that system has access to millions and millions of images that it scraped from the internet that are faces. >> I see. >> And that system says, "That thing that you generated doesn't look like these things over here." >> And it gives the feedback to the generator and it says, "Nope, try again. >> Modify some pixels. Send it back to what's called the discriminator. Is it a face? No. Try again." >> And they work in this adversarial loop. So, it's like somebody's checking your homework. >> But it it seems like it could get stuck never getting to a face. >> You would think, and that's what's amazing about the GANs, the is that they converge. >> They converge. >> And part of that is the way they they've been trained. But that's what's the genius of this is that the generator is not very smart because all it's doing is modifying pixels. And the discriminator is actually quite simple. It's simply saying, does this thing look like these things? And because you pit them against each other in this adversarial game, this sort of amazing thing happens out the other side. >> So here's the question. In on average, how many iterations does it take? And then how much time does that translate to? >> That's a great question. So typically the time is in seconds. >> So there's two phases. There's you train the GANs. That's a really long process. But then what we call inference, which is that run this thing, it happens in seconds. And the reason it happens in seconds is by the way that is hundreds of thousands of iterations but it's on a GPU which is very powerful and very fast. And then there's these tricks to make it even faster. You start with small images and then you make them bigger over time. So there's these tricks to make but it is literally seconds to make that image. >> Wow. >> And what the brilliance of that is the two systems are competing with each other. >> Um and then this thing that seems like intelligence come out even though it's not. If you think about those two individual components, >> they're pretty basic. pretty dumb. >> But then you have this like emergent behavior almost. It's like you know how to generate images of people. That's amazing. >> So let's have a little fun. >> I understand good >> that you brought me some fakes and some real images. >> Good >> to put to the test. >> Good. >> To see if I can >> discern the difference. >> So before I I'm going to play for you a couple of audios. Before I do this, let me say I've been doing this for a long time and I've been I'm pretty good at it. I'm pretty good at what I do. And I had created three audio samples. I'm going to play them for you. >> Wait, are you allowed to say that that you're you're good at what you do? I'll say that. Connie is really good. That's right. >> I said pretty good, by the way. >> She's amazing. >> But this is amaz This is this is this is a true story, by the way. So, I made three audio clips for you of me talking. And you and I have been talking for a little while, so you now know what my voice sounds like. >> And uh I got off the plane and I was in the car coming over here and I wanted to make sure they worked. And I played all three of them. And I couldn't tell which one of me was real or fake. I wasn't 100% sure. Wow. >> And I do this for a living and it's my voice, >> right? >> So, okay. So, that is Okay. >> So, wait a minute. Which AI did you use? This was something that you created or something generally available. >> So, so here's the thing you have to understand about AI. This is so readily available. So, here's what I did. I went to a service. It's a commercial service. Um, I uploaded I think it was about 3 minutes of my voice. >> I said please um uh please clone my voice. Um and it clones my voice. And by what I mean by that is that it learns the patterns of my voice. what I sound like, the intonation, my cadence, how fast I speak, where I put the pauses, >> and then I can simply type >> and have it say anything I want to say. >> And so I'm going to I'm going to read I'm going to have you play I'm going to listen have you listen to three sentences. >> Okay. >> Um and one of them is f I'm going to give you a hint. One of them is fake and two are real. Okay. >> Okay. And let's see what you we can do. Okay. Here we go. >> And in fairness, this is not the best uh speaker, but Okay. >> Are there guard rails in our law? >> Ah, good. Uh, so first of all, when I went to do this this service, um, I uploaded my voice and there's a button that says, "Do you have permission to use this person's voice?" And and I did because it was my voice, but I can upload anybody's voice and click a button. >> The laws are very complicated and they actually vary state-tostate and of course internationally. Wow. >> So there are almost no guardrails on grabbing people's likeness and even if there were, >> there's >> you can still do it anyway. >> There's there's no stopping this. There's no stopping it. Okay. All right. Number one. Oh, and by the way, the the three U this is part of a talk I gave recently on deep fakes. So, you'll hear a consecutive thing. Okay. Ready? >> And if you invite me back next year, almost certainly everything will have changed. Uh the nature of creation of deep fakes, the risk of deep fakes, >> that's the deep fake right there, man. >> Is changing. >> Hold on. Hold on. That was good. >> It is a fastmoving field and we have to start thinking seriously and carefully about the threat of misinformation. >> Okay, >> good. And one more. We are living through an unprecedented time where we are relying more and more on the internet for information. For information that affects our health, our societies, our democracies, and our economies. >> Can I hear number one again? >> Yep. You're a little less sure than you were a minute ago. >> Yeah. >> And if you invite me back next year, almost certainly everything will have changed. Uh the nature of creation of deep fakes, the risk of deep fakes, and the detection of deep fakes is changing. >> I think it's the first one still. I got it right. >> Yeah. >> Yeah. I struggled with it, by the way. Honestly, I couldn't remember. I'm from the future. >> You're the time traveler. It turns out. >> Wow. Well, you know what? I So, I I started my media work in audio, right? Being a voice actor and and very quickly I was able to pick up on music and commercials and movies where they were dropping in >> uh you know, pickups. The the reason I figured out is there's a difference in the background noise. Like one had more reverb than the other. Um which is how I I I then remembered it. But you got to admit all three of them sound like me. >> Oh, they all do. They all sound like you. >> Oh, by the way, so not only can >> Let let me tell you what has gotten me recently is I'll get these uh social media announcements. Oh, there's a new song by Tupac and Eminem. And I start listen to it and halfway in I'm like, no, this is Yeah. But in the beginning they it's coming from music. Yeah, it's coming from the way. So, this is one of my favorite videos by the way. Let me just show this to you. >> And if you invite me back next year, almost certainly everything will have changed. Uh the nature of the creation of deep fakes, the risk of deep fakes, that's real. Wait, wait for it. I don't speak and your mouth is doing it. I don't speak Japanese. Doesn't it sound like Indian? >> Yes, it does. >> I know. So, now I can do full-blown video. >> Any language. Any language. By the way, here's what's really cool about this. Here's a really cool application. I like foreign films a lot, but I can't stand bad lip syncing. It makes me crazy. But you don't need it anymore. >> You don't need it. >> We're now going to make videos in any language you want and it's going to be perfect. >> What? How did you do that? How? What? >> This is also a commercial software. Um, you upload a video, say that you have permission to do it, and you say, "Please translate this into Japanese, Korean, Spanish, French, German, anything you want." >> It's amazing. >> That is nuts. But the fact that the mouth change to to voice the word, >> by the way, the way this works, this is really amazing, is you upload a video of you talking and what it does is it takes the audio and transcribes it. So, it goes from audio to words >> and then it translates from English to Spanish and then it synthesizes a new audio in Spanish and then it puts that audio back into the video. Every one of those is an AI system, by the way. And it does that in about 3 minutes. >> Wow. >> And it's amazing. So, if you wanted to take this podcast, >> right, >> and distribute it in Spanish, French, German. >> Yeah. Yeah. >> Upload it. >> And I'm just hitting India, China, Southeast Asia, >> two and a half billion people. Done. Done. 10 cents each. We're good to go.
Resume
Categories