How AI Deepfakes Are Really Made | Hany Farid
syNN38cu3Vw • 2025-10-03
Transcript preview
Open
Kind: captions
Language: en
So let's talk about deep fakes which is
this sort of sliver of all of this.
>> Yeah.
>> So deep fakes is an umbrella term for
using machine learning AI to whole cloth
create images, audio and video of things
um that have never existed or happened.
So, for example, I can go to my favorite
deep fake generator and say, "Give me an
image of Hakee in a studio doing a
podcast with Professor Hani Fared
>> and actually would do a pretty good job
because you have a presence online. I
have somewhat of a presence online. It
knows what we look like and it would
generate an image that's not exactly
this, but something like that." Or I can
say, "Please, by the way, I still say
please when I ask AI for for things."
One of my students told me that this is
a good idea because when the AI
overlords come, they're going to
remember you were polite to them. Ah,
>> I actually really like this advice.
>> Wait a minute. So, I read an article.
>> Yes. It cost tens of millions of
dollars.
>> The energy ultimate. Yes. Just saying
please and thank you. I still do it by
the way. And even in my head right there
when I was asked when I was I I still in
my head say please.
>> Well, listen. I have AI connected to my
AI, right? And so my AI corrects my AI
prompts
>> to proper grammar and it's like
>> please. It puts please in there.
>> I know. And it does cost tens of
millions of dollars for that extra
token. Okay. So, I will ask it for an
image of a um of a unicorn wearing a red
clown hat um walking down the street of
Times Square and it will generate that
image. Um I can ask uh generate an audio
uh of Professor Hani Fared saying the
following, right?
>> Um I can generate a video of me saying
and doing things I never did. And you
can clearly see the power of that
technology from a creative perspective.
If you and I are having a conversation
and in post we said something we didn't
mean to, we can just fill it in with AI
now.
>> Well, here here's the thing that makes
me you just mentioned how we're only two
three years into this. So, however good
it is now, you know,
>> this is the worst it will ever be,
>> right?
>> So, if you look at the so I can tell
you, by the way, how good it is.
>> So, in addition to being trained as a
computer scientist and applied
mathematician, I've been somewhat
trained as a as a cognitive
neuroscientist. And we do perceptual
studies. So what we do is we recruit
participants. We show them images, audio
clips and video. And we tell them half
of the things you're going to look at
are real. Half of the things are AI
generated. We explain to them what AI
generated is. We give them examples of
that.
>> And for images as of last year, people
are roughly at chance at distinguishing
a real photo from an AI generated photo.
>> So what you mean by that is if they were
just if you had a a monkey behind a
keyboard,
>> flip flipping a coin.
>> Flipping a coin.
>> Yeah. Yeah. The monkeyy's probably
better than you, by the way. I'm I'm
going to go off and guess. Um, so with
audio, so we play a clip of somebody
speaking like you and we play an AI
generated version. They're slightly
above chance, not like 65%.
>> On image at chance at audio slightly
better than chance and video, they're a
little bit better, but all of those
trends are going towards chance. So
here's what we know. everything in the
next 12 months, 18 months, 24 months, I
don't know what the number is,
>> it will be indistinguishable to the
average person online, right? And that
is
>> that is a weird world we're living in
because think about how much in first of
all, the vast majority of Americans now
get the the the majority of their
information from online sources and
unfortunately from social media too.
>> And that and because it is so easy to
create this content, understand all this
is is a text prompt away. I type,
"Please give me an image of this,
generate this audio, generate this
video." There are dozens of services
that will do this extremely inexpensive
or for free. And you can carpet bomb the
internet with fake images of the
conflict in uh Gaza.
>> Fake images.
>> I have seen them too. Fake images of the
flood in Texas. Fake images and video of
the fires in name it across the boards,
right? Fake images of people stuffing
ballot boxes. Now we have a threat to
our democracy.
>> Wow. So suddenly our sense of reality
coming back to your first very good
question is up in the air because I can
create whatever reality I want and
understand that there's sort of three
things happening here when we talk about
deep fakes. There's the creation of it.
That's what we've been talking about.
>> There's the distribution which we
democratized 20 years ago. So anybody
can
>> publish to the world and that's very
powerful and very terrifying because
there's no editorial standards on social
media. And then there's the
amplification that we have become so
polarized as a society that when you see
things that conform to your world view,
you are more than happy to click like,
reshare, and now you have creation,
distribution, amplification.
>> Wow.
>> That's the ball game,
>> right? That's the ballgame for spreading
massive lies, conspiracies, and
disinformation campaigns that affect our
global health, our planet's health, our
democracy, our economy, everything.
Everything. So let's get into how these
fakes are generated. So start with
images.
>> Good. So let's start with images because
in some ways it's the easiest one, but
all of these have a similar theme. And
one of my favorite techniques for
generating images called a generative
adversarial network or a GAN. And here's
how it works.
>> Wait a minute. Wait a minute.
Adversarial.
>> Adversarial.
>> So that means that you're fighting your
computer.
>> Two computer two computer systems are
fighting each other. And this is sort of
the genius of this technique. So here's
how it works.
>> You have two systems.
One system's job is to make an image of
a person or a landscape or whatever you
want. Yeah. And so what it does, it
starts by, this is literally true, it
just splats down a bunch of random
pixels. So I say, generate an image of a
of a person and it says, "Okay, here's a
bunch of so so think uh the monkeys at
the keyboard typing randomly. Let's see
if this is Shakespeare,
>> right? And then it takes that image and
it hands it to a second system and it
says, "Is this a face?" And that system
has access to millions and millions of
images that it scraped from the internet
that are faces.
>> I see.
>> And that system says, "That thing that
you generated doesn't look like these
things over here."
>> And it gives the feedback to the
generator and it says, "Nope, try again.
>> Modify some pixels. Send it back to
what's called the discriminator. Is it a
face? No. Try again."
>> And they work in this adversarial loop.
So, it's like somebody's checking your
homework.
>> But it it seems like it could get stuck
never getting to a face.
>> You would think, and that's what's
amazing about the GANs, the is that they
converge.
>> They converge.
>> And part of that is the way they they've
been trained. But that's what's the
genius of this is that the generator is
not very smart because all it's doing is
modifying pixels. And the discriminator
is actually quite simple. It's simply
saying, does this thing look like these
things? And because you pit them against
each other in this adversarial game,
this sort of amazing thing happens out
the other side.
>> So here's the question. In on average,
how many iterations does it take? And
then how much time does that translate
to?
>> That's a great question. So typically
the time is in seconds.
>> So there's two phases. There's you train
the GANs. That's a really long process.
But then what we call inference, which
is that run this thing, it happens in
seconds. And the reason it happens in
seconds is by the way that is hundreds
of thousands of iterations but it's on a
GPU which is very powerful and very
fast. And then there's these tricks to
make it even faster. You start with
small images and then you make them
bigger over time. So there's these
tricks to make but it is literally
seconds to make that image.
>> Wow.
>> And what the brilliance of that is the
two systems are competing with each
other.
>> Um and then this thing that seems like
intelligence come out even though it's
not. If you think about those two
individual components,
>> they're pretty basic. pretty dumb.
>> But then you have this like emergent
behavior almost. It's like you know how
to generate images of people. That's
amazing.
>> So let's have a little fun.
>> I understand good
>> that you brought me some fakes and some
real images.
>> Good
>> to put to the test.
>> Good.
>> To see if I can
>> discern the difference.
>> So before I I'm going to play for you a
couple of audios. Before I do this, let
me say I've been doing this for a long
time and I've been I'm pretty good at
it. I'm pretty good at what I do. And I
had created three audio samples. I'm
going to play them for you.
>> Wait, are you allowed to say that that
you're you're good at what you do? I'll
say that. Connie is really good. That's
right.
>> I said pretty good, by the way.
>> She's amazing.
>> But this is amaz This is this is this is
a true story, by the way. So, I made
three audio clips for you of me talking.
And you and I have been talking for a
little while, so you now know what my
voice sounds like.
>> And uh I got off the plane and I was in
the car coming over here and I wanted to
make sure they worked. And I played all
three of them. And I couldn't tell which
one of me was real or fake. I wasn't
100% sure. Wow.
>> And I do this for a living and it's my
voice,
>> right?
>> So, okay. So, that is Okay.
>> So, wait a minute. Which AI did you use?
This was something that you created or
something generally available.
>> So, so here's the thing you have to
understand about AI. This is so readily
available. So, here's what I did. I went
to a service. It's a commercial service.
Um, I uploaded I think it was about 3
minutes of my voice.
>> I said please um uh please clone my
voice. Um and it clones my voice. And by
what I mean by that is that it learns
the patterns of my voice. what I sound
like, the intonation, my cadence, how
fast I speak, where I put the pauses,
>> and then I can simply type
>> and have it say anything I want to say.
>> And so I'm going to I'm going to read
I'm going to have you play I'm going to
listen have you listen to three
sentences.
>> Okay.
>> Um and one of them is f I'm going to
give you a hint. One of them is fake and
two are real. Okay.
>> Okay. And let's see what you we can do.
Okay. Here we go.
>> And in fairness, this is not the best uh
speaker, but Okay.
>> Are there guard rails in our law?
>> Ah, good. Uh, so first of all, when I
went to do this this service, um, I
uploaded my voice and there's a button
that says, "Do you have permission to
use this person's voice?" And and I did
because it was my voice, but I can
upload anybody's voice and click a
button.
>> The laws are very complicated and they
actually vary state-tostate and of
course internationally. Wow.
>> So there are almost no guardrails on
grabbing people's likeness and even if
there were,
>> there's
>> you can still do it anyway.
>> There's there's no stopping this.
There's no stopping it. Okay. All right.
Number one. Oh, and by the way, the the
three U this is part of a talk I gave
recently on deep fakes. So, you'll hear
a consecutive thing. Okay. Ready?
>> And if you invite me back next year,
almost certainly everything will have
changed. Uh the nature of creation of
deep fakes, the risk of deep fakes,
>> that's the deep fake right there, man.
>> Is changing.
>> Hold on. Hold on. That was good.
>> It is a fastmoving field and we have to
start thinking seriously and carefully
about the threat of misinformation.
>> Okay,
>> good. And one more. We are living
through an unprecedented time where we
are relying more and more on the
internet for information. For
information that affects our health, our
societies, our democracies, and our
economies.
>> Can I hear number one again?
>> Yep. You're a little less sure than you
were a minute ago.
>> Yeah.
>> And if you invite me back next year,
almost certainly everything will have
changed. Uh the nature of creation of
deep fakes, the risk of deep fakes, and
the detection of deep fakes is changing.
>> I think it's the first one still. I got
it right.
>> Yeah.
>> Yeah. I struggled with it, by the way.
Honestly, I couldn't remember. I'm from
the future.
>> You're the time traveler. It turns out.
>> Wow. Well, you know what? I So, I I
started my media work in audio, right?
Being a voice actor and and very quickly
I was able to pick up on music and
commercials and movies where they were
dropping in
>> uh you know, pickups. The the reason I
figured out is there's a difference in
the background noise. Like one had more
reverb than the other. Um which is how I
I I then remembered it. But you got to
admit all three of them sound like me.
>> Oh, they all do. They all sound like
you.
>> Oh, by the way, so not only can
>> Let let me tell you what has gotten me
recently is I'll get these uh social
media announcements. Oh, there's a new
song by Tupac and Eminem. And I start
listen to it and halfway in I'm like,
no, this is Yeah. But in the beginning
they it's coming from music. Yeah, it's
coming from the way. So, this is one of
my favorite videos by the way. Let me
just show this to you.
>> And if you invite me back next year,
almost certainly everything will have
changed. Uh the nature of the creation
of deep fakes, the risk of deep fakes,
that's real. Wait, wait for it.
I don't speak
and your mouth is doing it. I don't
speak Japanese.
Doesn't it sound like Indian?
>> Yes, it does.
>> I know. So, now I can do full-blown
video.
>> Any language. Any language. By the way,
here's what's really cool about this.
Here's a really cool application. I like
foreign films a lot, but I can't stand
bad lip syncing. It makes me crazy. But
you don't need it anymore.
>> You don't need it.
>> We're now going to make videos in any
language you want and it's going to be
perfect.
>> What? How did you do that? How? What?
>> This is also a commercial software. Um,
you upload a video, say that you have
permission to do it, and you say,
"Please translate this into Japanese,
Korean, Spanish, French, German,
anything you want."
>> It's amazing.
>> That is nuts. But the fact that the
mouth change to to voice the word,
>> by the way, the way this works, this is
really amazing, is you upload a video of
you talking and what it does is it takes
the audio and transcribes it. So, it
goes from audio to words
>> and then it translates from English to
Spanish and then it synthesizes a new
audio in Spanish and then it puts that
audio back into the video. Every one of
those is an AI system, by the way. And
it does that in about 3 minutes.
>> Wow.
>> And it's amazing. So, if you wanted to
take this podcast,
>> right,
>> and distribute it in Spanish, French,
German.
>> Yeah. Yeah.
>> Upload it.
>> And I'm just hitting India, China,
Southeast Asia,
>> two and a half billion people. Done.
Done. 10 cents each. We're good to go.
Resume
Read
file updated 2026-02-13 12:55:27 UTC
Categories
Manage