Jeremy Howard: fast.ai Deep Learning Courses and Research | Lex Fridman Podcast #35
J6XcP4JOHmk • 2019-08-27
Transcript preview
Open
Kind: captions
Language: en
the following is a conversation with
Jeremy Howard he's the founder of fast
AI a Research Institute dedicated to
making deep learning more accessible
he's also a distinguished research
scientist at the University of San
Francisco a former president of Kegel as
well as the top ranking competitor there
and in general he's a successful
entrepreneur educator researcher and an
inspiring personality in the AI
community when someone asked me how do I
get started with deep learning fast AI
is one of the top places that point them
to it's free it's easy to get started
it's insightful and accessible and if I
may say so it has very little BS they
can sometimes dilute the value of
educational content on popular topics
like deep learning fast AI has a focus
on practical application of deep
learning and hands-on exploration of the
cutting edge that is incredibly both
accessible to beginners and useful to
experts this is the artificial
intelligence podcast if you enjoy it
subscribe on YouTube give it five stars
and iTunes supported on patreon or
simply connect with me on Twitter Alex
Friedman spelled Fri D ma N and now
here's my conversation with Jeremy
Howard what's the first program you've
ever ridden this program I wrote that I
remember would be at high school I did
an assignment where I decided to try to
find out if there were sand like better
musical scales and the normal twelve
tone twelve interval scale so I wrote a
program on my Commodore 64 in basic
let's search through other scale sizes
to see if you could find one where they
were more accurate you know harmonies
like mid tone like sliding like he won
an actual exactly 3 to 2 ratio where
else with a 12 interval scale it's not
exactly 3 to 2 for example so that's in
the car well tempered as I say you know
and basic on a Commodore 64 yeah where
was the interest in music from or is it
just I took music all my life so I
played
the phone and clarinet and piano and
guitar and drums and whatever so how
does that threat go through your life
where's music today yeah it's not where
I wish it was I for various reasons
couldn't really keep it going
particularly because I had a lot of
problems with RSI with my fingers and so
I had to kind of like cut back anything
that used hands and fingers I hope one
day I'll be able to get back to it
health-wise so there's a love for music
underlying it all yeah what's your
favorite instrument sex the phone sex
baritone saxophone well probably bass
saxophone but they're awkward well I'm I
always love it when music is coupled
with programming there's something about
a brain that utilizes those that emerges
with creative ideas so you've used and
studied quite a few programming
languages can you given an overview of
what you've used one of the pros and
cons of each well my favorite
programming environment almost certainly
was Microsoft Access back in like the
earliest days so that was Visual Basic
for applications which is not a good
programming language for the programming
environment fantastic it's like the
ability to create you know user
interfaces and tie data and actions to
them and create reports and all that as
I've never seen anything as good there's
things nowadays like air table which
you're like small subsets of that which
people love for good reason but
unfortunately nobody's ever achieved
anything like that what is that if you
could pause in there for a second no
access this is it a database database
program that Microsoft produced part of
office and the kind of wizard you know
but basically it lets you in a totally
graphical way create tables and
relationships and queries and tie them
to forms and set up you know event
handlers and calculations and it was
very
plate powerful system designed for not
massive scalable things but fair like
useful little applications that I loved
so what's the connection between excel
and access so very close
so access kind of was the relational
database equivalent if you like so
people still do a lot of that stuff it
should be an access in Excel excels they
don't know what Excel is great as well
so but it's just not as rich a
programming model as VBA combined with a
relational database and so I've always
loved relational databases but today
programming on top of a relational
database is just a lot more of a
headache you know you generally either
need to kind of you know you need
something that connects that that runs
some kind of database server unless you
use circle light which has its own
issues
then you can often if you want to get a
nice programming model you'll need to
like create and add an ORM on top and
then I don't know there's all these
pieces tie together and it's just a lot
more awkward than it should be there are
people that are trying to make it easier
so in particular I think of if sharp you
know Don Syme who him and his team have
done a great job of making something
like a database appear in the type
system so you actually get like tab
completion for fields and tables and
stuff like that anyway so that was kind
of anyway so like that whole VBA office
thing I guess was a starting point which
I still miss I got into standard Visual
Basic that's interesting just to pause
on them for a second it's interesting
that you're connecting programming
languages to the ease of management of
data yeah so in your use of programming
languages you always had a love and a
connection with data I've always been
interested in doing useful things for
myself and for others which generally
means getting some data and doing
something with it and putting it out
there again so that's been my interest
throughout so I also did a lot of stuff
with Apple script back in the early days
so it's kind of nice being able to get
the computer and computers to talk to
each other and to do things for you and
then I could think that one night the
programming language I most loved then
would have been Delphi which was object
pascal created by under sales berg who
previously did to it by pascal and then
went on to create dotnet and then went
on create typescript delphi was amazing
because it was like a compiled fast
language that was as easy to use as
Visual Basic Delphi what is it similar
to in in more modern languages Visual
Basic Visual Basic yeah that a compiled
fast version so I'm not sure there's
anything quite like it anymore
if you took like C
shop or Java and got rid of the virtual
machine and replaced it with something
you could compile a small type binary I
feel like it's where um Swift could get
to with the new Swift UI and the
cross-platform development going on like
that's one of my dreams is that will
hopefully get back to where Delphi was
there is actually a free Pascal project
nowadays called Lazarus which is also
attempting to kind of recreate Delphi
though they're making good progress so
ok Delphi that's one of your favorite
programming languages programming
environments again I hate Pascal's not a
nice language if you wanted to know
specifically about what languages I like
they would definitely pick J there's
being an amazingly wonderful language
well woods j.j are you aware of APL I am
NOT okay so from doing a little research
on work you've done okay so not at all
surprising you're not familiar with it
cuz it's not well known but it's
actually one of the main families of
programming languages going back to the
late 50s early 60s so there was a couple
of major directions one was the kind of
lambda calculus Alonzo Church direction
which I guess kind of listens
game and whatever which has a history
going back to the early days of
computing the second was the kind of
imperative /o o you know algo Simula
going under C C++ so forth there was a
third which Accord array oriented
languages which started with a paper by
a guy called Ken Iverson which was
actually a math theory paper not a
programming paper it was called notation
as a tool for thought and it was the
development of a new way a new type of
math notation and the idea is that this
math notation would be was was much more
flexible expressive
and also well-defined then traditional
math notation which is none of those
things math notation is awful and so he
actually turned that into a programming
language and because this was the early
50s although that's very late 50s
although names were available so he
called his language a programming
language or APL ABL APL is a
implementation of notation as a tool for
thought by which he means math notation
and Ken and his son went on to do many
things but eventually they actually
produced you know a new language that
was built on top of all the learnings of
APL that was called J and J is the most
expressive composable language of you
know beautifully designed language I've
ever seen this didn't have
object-oriented components deserve that
kind of thing there's not really it's an
array oriented language it's a new it's
a it's an it's it's the third half using
array array oriented yes so I need to be
a ray warrior so arranged it means that
you generally don't use any loops but
the whole thing is done with kind of a
extreme version of broadcasting if
you're familiar with that none got an
umpire slash Python concept so you do a
lot with one line of code it looks a lot
like math notation basically I'll
compact
mm-hm and the idea is that you can kind
of because you can do so much with one
line of code a single screen of code is
very unlikely to you very rarely need
more than that to in the rest your
program and so you can kind of keep it
all in your head and you can kind of
clearly communicate it it's interesting
that the APL created two main branches k
and j j is this kind of like open source
niche community of crazy enthusiasts
like me and then the other path k was
fascinating it's an astonishingly
expensive programming language which
many of the world's most ludicrous a
rich hedge funds use so the entire
machine is so small it sits inside level
3 cache on your CPU and and it easily
wins every benchmark I've ever seen in
terms of data processing speed
hey you don't come across it very much
because it's like $100,000 per CPU to to
run it yeah but it's like this this this
this path of programming languages it's
just so much that are not so much more
powerful in every way than the ones that
almost anybody uses every day
so though it's all about computation
it's really focused pretty heavily
focused on computation I mean so much of
programming is data processing by
definition and so there's a lot of
things you can do with it but yeah
there's not much work being done on
making like use user interface talking
us or whatever I mean this some but it's
they're not great at the same time
you've done a lot of stuff with Perl and
Python yeah so where does that fit into
the picture of J and K and APO and well
you know it's much more pragmatic like
in the end you kind of have to end up
where the where the libraries are you
know like because to me my my focus is
on productivity I just want to get stuff
done and solve problems so Perl was
great for I created an email company
called fast mail and Perl was great cuz
back in the late 90s early 2000s it just
had a lot of stuff it could do I still
had to write my own monitoring system
and my own web framework my own whatever
because like none of that stuff existed
but it was the super flexible language
to do that in and you used Perl
fast ball used as a back-end think so
everything was written in Perl yeah yeah
everything everything was fell why do
you think Perl hasn't succeeded or
hasn't dominated the market where Python
really takes over a lot yeah well I mean
it felt did dominate it was for time
everything everywhere but then the guy
that
Pal Larry will kind of just didn't put
the time in anymore
and no project can be successful if
there isn't you know it's particularly
one that's data with a strong leader
that that loses that strong leadership
so then python is kind of replaced - you
know python is a lot less elegant
language in nearly every way but it has
the data science libraries and a lot of
them are pretty great so I kind of use
it because it's the best we have but
it's definitely not good enough what do
you think the future programming looks
like what do you hope the future
programming looks like if we zoom in on
the computational fields on data science
on machine learning I hope Swift is
successful because the goal is Swift the
way Chris Lattner describes it is to be
infinitely hackable and that's what I
want I want something where me and the
people I do research with and my
students can look at and change
everything from top to bottom there's
nothing mysterious and magical and
inaccessible unfortunately with Python
it's the opposite of that because
pythons so slow it's extremely
unhackable you get to a point where it's
like okay from here on down at sea so
your debugger doesn't works in the same
way your profiler doesn't work in the
same way your build system doesn't work
in the same way it's really not very
happy ball at all what's the part you
would like to be hackable is it for the
objective of optimizing training of
neural networks inference in your
networks is it performance of the system
or is there some non performance related
just it's it's a greater thing I'm in
the end I want to be productive as a
practitioner so that means that so like
at the moment our understanding of deep
learning is incredibly primitive there's
very little we understand most things
don't work very well even though it
works better than anything else out
there there's so many opportunities to
make it
so you look at any domain area like I
don't know speech recognition with deep
learning or natural language processing
classification with deep learning or
whatever every time I look at an area
with deep learning I always see like oh
it's terrible there's lots and lots of
obviously stupid ways to do things that
need to be fixed so then I want to be
able to jump in there and quickly
experiment and make them better using
the programming language is has a role
in a huge role yes so currently Python
has a big gap in terms of our ability to
innovate particularly around recurrent
neural networks and natural language
processing because it because it's so
slow the the actual loop where we
actually loop through words we have to
do that whole thing in CUDA C so we
actually can't innovate with the kernel
the heart of that most important
algorithm and it's just a huge problem
and this happens all over the place so
we hit you know research limitations
another example convolutional neural
networks which actually the most popular
architecture for lots of things maybe
most things in declining we almost
certainly should be using space
convolutional neural networks but only
like two people are because to do it you
have to rewrite all of that CUDA sea
level stuff and yeah this researchers
and practitioners don't so like there's
just big gaps in like what people
actually research on what people
actually implement because of the
programming language problem so you
think you think it's it's just too
difficult to write in CUDA see that a
programming like a higher level
programming language like Swift should
enable the the easier input fooling
around creative stuff with RN ends or
was parse convolution your noise kind of
who's a who's at fault who's who's a
charge of making it easy for a research
- player I mean no one's at fault just
know what he's got around to it yet or
it's just it's hard right and I mean
part of the fault is that we ignored
that whole APL kind of direction most
prominently everybody did for 60 years
50 years but recently people have been
starting to reinvent pieces of that and
kind of create some interesting new
directions in the compiler technology so
the place where that's particularly
happening right now is something called
ml ir which is something that ok I'm
Kris lat know this rift guy is leading
and because it's actually not gonna be
swift on its own that solves his problem
because the problem is they're currently
writing a acceptable fast you know GPU
program is too complicated regardless of
what language you use no and that's just
because if you have to deal with the
fact that I've got you know 10,000
threads and I have to synchronize
between them all and I have to put my
thing in to grid blocks and think about
warps and all this stuff it's just it's
just so much boilerplate to do that well
you have to be a specialist at that and
it's going to be a year's work to you
know optimize that algorithm in that way
but with things like tensor
comprehensions and tile and ml ir and t
vm there's all these various projects
which are all about saying let's let
people create like domain-specific
languages for tensor computations these
are the kinds of things we do are
generally in on the GPU for deep
learning and then have a compiler which
can optimize that tensor computation a
lot of this work is actually sitting on
top of a project called halide which was
is a mind-blowing project where they
came up with such a domain-specific
language in fact true one
domain-specific language for expressing
this is what my tensor computation is
and another domain-specific language for
expressing this is the kind of the way I
want you to structure the compilation of
that like do it block by block and do
these bits in parallel
they were able to show how you can
compress the amount of code by 10x
compared to optimized GPU code and get
the same performance so that's like so
these other things are kind of sitting
on top of that kind of research and ml
ir is pulling a lot of those best
practices together and now we're
starting to see work done on making all
of that directly accessible through
Swift so that I could use Swift to kind
of write those domain-specific languages
and hopefully we'll get them Swift CUDA
kernels written in a very expressive and
concise way that looks a bit like J in
APL and then Swift layers on top of that
and then a swift UI on top of that and
you know it'll be so nice if we can get
to that point that does it all
eventually boil down to CUDA and NVIDIA
GPUs unfortunately at the moment it does
but one of the nice things about ml ir
if AMD ever gets their act together
which they probably won't is that they
or others could write MLA our backends
for other GPUs or other or other tensor
computation devices of which today there
are increasing number are like graph
core or vertex AI or whatever so yeah
being able to target lots of backends
would be another benefit of this and the
market really needs competitions at the
moment NVIDIA is massively overcharging
for their kind of enterprise class cards
because there is no serious competition
because nobody else is doing the
software properly in the cloud there is
some competition right but not really
other than TP used for heavy use are
almost unprogrammed well at the moment
you can't the GPUs has the same problem
the case is even worse so TP use the
Google actually made an explicit
decision to make them almost entirely
unprogrammed ball because they felt that
there was too much IP in there and if
they gave people direct access to
program them people would learn their
secrets yeah so you can't actually
directly
program the memory in a teepee you you
can't even directly like create code
that runs on and that you look at on the
machine that has the GPU it all goes
through a virtual machine so all you can
really do is this kind of cookie cutter
thing of like plug into high-level stuff
together which is just super tedious and
annoying and totally unnecessary so what
was the tell me if you could the origin
story of fast AI what is the motivation
its mission its dream so I guess the
founding story is heavily tied in my
previous startup which is a company
called in lytic which was the first
company to focus on deep learning for
medicine and I created that because I
saw that was a huge opportunity to
there's a there's a about a 10x shortage
of the number of doctors in the world
and the developing world that we need
expected it would take about three
hundred years to train enough doctors to
meet that gap but I guess that maybe if
we used deep learning for some of the
analytics we could maybe make it so you
don't need as highly trained doctors
diagnosis diagnosis and treatment
planning where's the biggest benefit
just before get the first day I was
where's the biggest benefit of AI in
medicine DC today and not much not much
happening today in terms of like stuff
that's actually out there it's very
early but in terms of the opportunity
it's to take markets like India and
China and Indonesia which have big
populations Africa small numbers of
doctors and provide diagnostic
particularly treatment planning and
triage kind of on device so that if you
do a you know test for malaria or
tuberculosis or whatever you immediately
get something that even a health care
worker that's had a month of training
can get a very high quality assessment
of whether the
patient might be at risk until you know
okay we'll send them off to a hospital
so for example in Africa outside of
South Africa there's only five pediatric
radiologists for the entire continent so
most countries don't have any so if your
kid is sick and they need something
diagnose your medical imaging the person
even if you're able to get medical
imaging done the person that looks at it
will be you know a nurse at best yeah
but actually in India for example and in
China almost no x-rays are read by
anybody by any trained professional
because they don't have enough so if
instead we had a algorithm that could
take the most likely high-risk 5% and
say triage basically say okay somebody
needs to look at this it would massively
change the kind of way that what's
possible with medicine in the developing
world and remember they have
increasingly they have money there the
developing world they're not imported
Apella people so they have the money so
that they're building the hospitals
they're getting the diagnostic equipment
but they just there's no way for a very
long time will they be able to have the
expertise shortage of their sweeties
okay and that's where the deep learning
systems could step in and magnify the
expertise they do exactly yeah so you do
see just a longer it a little bit longer
yeah the interaction you still see the
human expert still at the core of these
systems yeah absolutely there's
something in medicine that can be
automated almost completely I don't see
the point of even thinking about that
because we have such a shortage of
people why would we not why would we
want to find a way not to use them like
we have people so the idea of like even
from an economic point of view if you
can make them 10x more productive
getting rid of the person doesn't impact
your unit economics at all and it
totally ignores effect that there are
things people do better than machines so
it's just to me that's not a useful way
of framing the problem I guess
just to clarify I guess I meant there
may be some problems where you can avoid
even going to the expert ever sort of
maybe preventive care or some basic
stuff flowing and food allowing the
expert to focus on the things that are
that are really that well that's what
the triage would do right so the triage
would say okay it's ninety ninety nine
percent sure there's nothing here right
so you know that can be done on device
and they can just say okay go home so
the experts are being used to look at
the stuff which has some chance it's
worth looking at which most things is
it's not you know it's fine why do you
think we haven't quite made progress on
that yet in terms of the the scale of
how much AI is applied in the middle
there's a lot of reasons I mean one is
it's pretty new I only started and let
it can like 2014 and before that like
it's hard to express to what degree the
medical world was not aware of the
opportunities here so I went to iris na
which is the world's largest radiology
conference and I told everybody I could
you know like I'm doing this thing this
deep learning please come and check it
out and no one had any idea what I was
talking about and no one had any
interest in it so like we've come from
absolute zero which is hard and then the
whole regulatory framework education
system everything is just set up to
think of doctoring in a very different
way so today there is a small number of
people who are deep learning
practitioners and doctors at the same
time and that we're starting to see the
first ones come out of their PhD
programs so that Kinane over in
fostering Cambridge has a number of
students now who are data data science
experts deep learning experts and and
actual medical doctors quite a few
doctors have completed
first day of course now and are
publishing papers and creating journal
reading groups in the American Council
of radiology and like it's just starting
out but it's going to be a long process
they regulators have to learn how to
regulate this they have to build you
know guidelines and then the lawyers at
hospitals have to develop a new way of
understanding that sometimes it makes
sense for data to be you know looked at
in raw form in large quantities in order
to create world-changing results he has
a regulation around data all that it
sounds it was probably the hardest
problem but sounds reminiscent of
autonomous vehicles as well many of the
same regulatory challenges meaning the
same data challenges yeah I mean funnily
enough that problem is less their
regulation and more the interpretation
of that regulation by by lawyers in
hospital so hipper is actually was
designed to its it to P and hipper is
not standing does not stand for privacy
it stands for portability it's actually
meant to be a way that data can be used
and it was created with lots of gray
areas because the idea is that would be
more practical and would help people to
use this this legislation to actually
share data in a more thoughtful way
unfortunately it's done the opposite
because when a lawyer sees a gray area
they see oh if we don't know we won't
get sued then we can't do it today
so hipper is not exactly the problem the
problem is more than there's hospital
lawyers are not incentive to make bold
decisions about data portability or even
to embrace technology that saves lives
no they more want to not get in trouble
for embracing the right but also it is
also so slaves in a very abstract way
which is like oh we've been able to
release these hundred thousand and on
most records I can't point at the
specific person whose life that's saved
I can say like oh we've ended up with
this paper which found this result which
you know diagnosed a thousand more
people
otherwise but it's like which ones were
helped it's it's very abstract and on
the counter side of that you may be able
to point to a life that was taken
because of something though yeah or or
or a person whose privacy was violated
it was like oh this specific person you
know there was de-identified so we've
identified just a fascinating topic
we're jumping around I'll get back to
fast AI but on the question of privacy
data is the fuel for so much innovation
in deep learning what's your sense and
privacy whether we're talking about
Twitter Facebook YouTube just the
technologies like in the medical field
that rely on people's data in order to
create impact how do we get that right
respecting people's privacy and yet
creating technology that just learns
from data one of my areas of focus is on
doing more with less data which so most
vendors unfortunately are strongly
incented to find ways to require more
data and more computation so Google and
IBM being the most obvious IBM yeah so
Watson you know so Google and IBM both
strongly push the idea that you have to
be you know that they have more data and
more computation and more intelligent
people than anybody else and so you have
to trust them to do things because
nobody else can do it and Google's very
upfront about this like Geoff Dana's
going out there and given talks and said
our goal is to require a thousand times
more computation but less people our
goal is to use the people that you have
better and the data you have better in
the computation you have better so one
of the things that we've discovered is
or or at least highlighted is that you
very very very often don't need much
data at all and so the data you already
have in your organization
we'll be enough to get state-of-the-art
results so like my starting point would
be this going to say around privacy is a
lot of people are looking for ways to
share data and aggregate data but I
think often that's unnecessary they
assume that they need more data than
they do because they're not familiar
with the basics of transfer learning
which is this critical technique for
needing orders of magnitude less data is
your sense one reason you might want to
collect data from everyone is like in
the recommender system context where
your individual Jeremy Howard's
individual data is the most useful for
freeing for providing a product that's
impactful for you so for giving you
advertisements for recommending to your
movies for doing medical diagnosis is
your sense we can build with a small
amount of data general models they will
have a huge impact for most people that
we don't need to have data from punching
on the whole I'd say yes I mean they're
things like you know recommender systems
have this cold-start problem where you
know Jeremy is a new customer we haven't
seen him before so we can't recommend
him things based on what else he's
bought and liked with us and there's
various workarounds to that like in a
lot of music programs we'll start out by
saying which of these artists you like
which of these albums do you like which
of these songs do you like Netflix used
to do that nowadays they they tend not
to people kind of don't like that
because they think oh we don't want to
bother the user so you could work around
that by having some kind of data sharing
where you get my marketing record from
axiom or whatever and try to guess from
that to me the the benefit to me and to
society of saving me five minutes on
answering some questions versus the
negative externalities of if the privacy
issue doesn't add up so I think like a
lot of the time the places where people
are
invading our privacy in order to provide
convenience is really about just trying
to make them more money and and they
move these negative externalities and to
places that they don't have to pay for
them so when you actually see
regulations appear that actually cause
the companies that create these negative
externalities to have to pay for it
themselves
they say well we can't do it anymore so
the cost is actually too high right but
for something like medicine yeah I mean
the hospital has my you know medical
imaging my pathology studies my medical
records and also I own my medical data
so you can so I I helped a startup
called doc AI one of the things doc AI
does is that this has an app you can
connect to you know Sutter Health's and
webcore and Walgreens and download your
medical data to your phone and then
upload it again at your discretion to
share it as you wish so with that kind
of approach we can share our medical
information with the people we want to
yes of control I mean it really being
able to control who you share with us on
yeah so that that has a beautiful
interesting tangent but to return back
to uh the origin story of fast they act
right so so before I started fast AI I
spent a year researching where the
biggest opportunities for deep learning
because I knew from my time at Cal in
particular that deep learning had kind
of hit this threshold point where it was
rapidly becoming the state of the art
approach in every areas that looked at
it and I've been working with neural
nets for over 20 years I knew that from
a theoretical point of view once it hit
that point it would do that in kind of
just about every domain and so I kind of
spent a year researching what are the
domains it's going to have the biggest
low-hanging fruit in the shortest time
period
medicine but there were so many I could
have picked and so there was a kind of
level of frustration for me of like okay
I'm really glad we've opened up the
medical deep learning world and today is
huge as you know but we can't do you
know I can't do everything I don't even
know like it took like in medicine it
took me a really long time to even get a
sense of like what kind of problems to
medical practitioners solve what kind of
data do they have who has that data so I
kind of felt like I need to approach
this differently if I want to maximize
the positive impact of deep mourning
rather than me picking an area and
trying to become good at it and building
something I should let people who are
already domain experts in those areas
and who already have the data do it
themselves mm-hmm so that was the reason
for fast AI is to basically try and
figure out how to get deep learning into
the hands of people who could benefit
from it and help them to do so in as
quick and easy and effective way as
possible god it's all sort of empowered
the the domain expert yeah and like
partly it's because like unlike most
people in this field
my background is very applied and
industrial that my first job at MIT was
at McKinsey and company I spent 10 years
in management consulting I I spend a lot
of time with domain experts you know so
I kind of respect them and appreciate
them and know I know that's where the
value generation in society is and so I
also know how most of them can't code
and most of them don't have the time to
invest you know three years and a
graduate degree or whatever so it's like
how do i skill those two main experts I
think it would be a super powerful thing
you know biggest societal impact I could
have so that yeah that was the thinking
so so much a fast AI students and
researchers and the things you teach are
pragmatically minded right practically
minded freaking figuring out ways how to
solve
real problems and fast right so from
your experience what's the difference
between theory and practice of deep
learning well most of the research in
the deep mining world is a total waste
of time all right that's what I was
getting at yeah it's it's a problem in
science in general scientists need to be
published which means they need to work
on things that their peers are extremely
familiar with and can recognize in
advance in that area so that means that
they all need to work on the same thing
and so it really Inc and and the thing
they work on there's nothing to
encourage them to work on things that
are practically useful so you get just a
whole lot of research which is minor
advances and stuff that's been very
highly studied and has no significant
practical impact where else the things
that really make a difference like I
mentioned transfer learning like if we
can do better at transfer learning then
it's this like world-changing thing
we're suddenly like lots more people can
do world-class work with less resources
and less data and but almost nobody
works on that or another example active
learning which is the study of like how
do we get more out of the human beings
in the loop where's my favorite topic
yeah so active learning is great but
it's almost nobody working on it because
it's just not a trendy thing right now
you know what somebody's suicide
interrupt
you're saying that nobody is publishing
an active learning but there's people
inside companies anybody who actually
has to solve a problem they're going to
innovate an active learning yeah
everybody kind of reinvents active
learning when they actually have to work
in practice because they start labeling
things and they think gosh this is
taking a long time and it's very
expensive and then they start thinking
well why am i labeling everything I'm
only the machines only making mistakes
on those two classes they're the hard
ones maybe I ought to start labeling
those two classes and then you start
thinking well why did I do that manually
why kind of just get the system to tell
me which things are going to be hardest
it's an obvious thing to do but
yeah it's it's just like like transplant
learning it's it's under studied and the
academic world just has no reason to
care about practical results the funny
thing is like I've only really ever
written one paper I hate writing papers
and I didn't even write it it was my
colleague sebastian ruder who actually
wrote it I just knew did the research
for it but it was basically introducing
transfer learning successful transfer
learning to NLP for the first time the
algorithm is called GLM fit and it
actually I actually wrote it for the
course for the first day of course I
wanted to teach people in LP and I
thought I only want to teach people
practical stuff and I think the only
practical stuff is transfer learning and
I couldn't find any examples of transfer
learning and NLP so I just did it and I
was shocked to find that as soon as I
did it was you know the basic prototype
took a couple of days smashed the
state-of-the-art on one of the most
important data sets in a field that I
knew nothing about and I just thought
well this is ridiculous
and so I spoke to the best unit and he
kindly offered to write it up the
results and so it ended up being
published in a CL which is the top link
with a computational linguistics
conference so like people do actually
care once you do it but I guess it's
difficult for maybe like junior
researchers or like like I don't care
whether I get citations or papers
whatever I was right there's nothing in
my life that makes that important which
is why I've never actually bothered to
write a pic of myself now for people who
do I guess they have to pick the kind of
safe option which is like yeah make a
slight improvement on something that
everybody is already working on yeah
nobody does anything interesting or
succeeds in life or the safe option
speed I mean the nice thing is nowadays
everybody is now working on you know a
transfer learning because since that
time we've had GPT and GPT too and Burt
and you know it's like it's so yeah once
you show that something is possible if
nobody jumps you and I guess I
hope to be a part of and I hope to see
more innovation and active learning in
the same way I think yeah try learning
an active learning are fascinating
public open were I actually helped start
a startup called platform AI which is
really all about active learning and
yeah it's very interesting trying to
kind of see what research is out there
and make the most of it and there's
basically none so we've had to do all
our own research once again and just as
easy described can you tell the story of
the stanford competition dawn bench and
fast day eyes achievement on it sure so
something which I really enjoy is that I
basically teach two courses a year
the practical deep money for coders
which is kind of the introductory course
and then cutting-edge tech mining for
coders which is the kind of research
level course and while I teach those
courses I have a I basically have a big
office at the University of San
Francisco big enough for like 30 people
and I invite anybody any student who
wants to come and hang out with me well
I built the course and so generally it's
full and so we have twenty or thirty
people in a big office with nothing to
do but study deep learning so it was
during one of these times that somebody
in the group said oh there's a thing
called Don benched it looks interesting
and I was like what the hell is that is
it about some competition to see how
quickly you can train a model seems kind
of not exactly relevant to what we're
doing but it sounds like the kind of
thing which you might be interested in I
checked it out and I said oh crap
there's only ten days till it's over
it's pretty too late and we're kind of
busy trying to teach this course yeah
maybe like oh it would make an
interesting case study for the course
like it's all the stuff where you're
already doing why don't you just put
together our current best practices and
ideas so me and I guess about four
students just decided to give it a go
and we focused on this more one called
Sipho ten which is that all 32 by 32
pixels can you say word on benches yeah
so it's a competition to train a model
as fast as possible I was run by
Stanford
as cheap as possible - that's also
another one first cheap as possible and
there was a couple of categories
imagenet and so far 10 so image nets is
big 1.3 million image thing that took a
couple of days to train remember a
friend of mine Pete worden who's now at
Google I remember he told me how he
trained imagenet a few years ago and he
basically like had this little granny
flat out the back that he turned into
his image net training center and he
figured you know after like a year of
work he figured out how to train it and
like ten days or something it's like
that was a big job well so far ten at
that time you could train in a few hours
you know it's much smaller and easier so
we thought would try so far 10 and yeah
I've really never done that before like
I've never really liked things like
using more than one gpgpu at a time was
something I tried to avoid cuz to me
it's like very against the whole idea of
accessibility is she better to do things
with 1gb here I mean have you asked in
the past before after having
accomplished something how do I do this
faster much faster Oh always but it's
always for me it's always how do I make
it much faster on a single genus you
that a normal person could afford in
their day-to-day life it's not how could
I do it faster I you know having a huge
data center because up to me it's all
about like as many people should be to
use something as possible without
fussing around with infrastructure so
anyway so in this case it's like well we
can use eight GPUs just by renting a AWS
machine so we thought we'd try that and
yeah basically using the stuff we were
already doing we were able to get you
know the speed you know within a few
days we had to speed down to I don't
know that's a very small number of
minutes I can't remember exactly how
many minutes it was but I might have in
like 10 minutes or something and so yeah
we found ourselves at the top of the
leaderboard easily for both time and
money which really shocked me because
the other people competing this were
like Google and Intel and stuff we're
like know a lot more about this stuff
I think we do so that we were emboldened
we thought let's try the imagenet one
two way out of our league but our goal
was to get under 12 hours yeah and we
did which was really exciting and but we
didn't put anything up on the
leaderboard but we were down to like 10
hours but then Google put in some like 5
hours or something about us like oh
they're so screwed but we kind of
thought we'll keep trying you know if
Google can do it info I mean Google did
on five hours on someone like a TPU pod
or something like a lot of hardware but
we kind of like had a bunch of ideas to
try like a really simple thing was why
are we using these big images they're
like 224 256 by 256 pixels you know why
don't we try smaller ones and just
elaborate there's a constraint on the
accuracy that your training model is
supposed to achieve yeah you got to
achieve 93% I think it was for imagenet
exactly which is very tough so you have
to yeah 93% like they think that they
picked a good threshold it was a little
bit higher than what the most commonly
used ResNet 50 model could achieve at
that time so yeah so it's quite a
difficult problem to solve but yeah we
realized if we actually just use 64 by
64 images it trained a pretty good model
and then we could take that same model
and just give it a couple of epochs to
learn 224 by 224 images and it was
basically already trained it makes a lot
of sense like if you teach somebody like
here's what a dog looks like and you
show them low res versions and then you
say here's a really clear picture of a
dog they already know what a dog looks
like so that like just we jumped to the
front and we ended up winning
parts of that competition we actually
ended up doing a distributed version
over multiple machines a couple of
months later and ended up at the top of
the leaderboard we had 18 minutes in it
yeah and it was and people have just
kept on blasting through again and again
since then so so what's your view on
multi-gpu or multiple machine training
in general as as a way to speed code up
I think it's largely a waste of time
both multi-gpu on a single machine and
yeah particularly multi machines because
it's just clunky motogp use is less
clunky than it used to be but to me
anything that slows down your iteration
speed is a waste of time so you could
maybe do your very last you know
perfecting of the model on Motty GPUs if
you need to that so for example I think
doing stuff on imagenet is generally a
waste of time why test things on 1.3
million images most of us don't use 1.3
million images and we've also done
research that shows that doing things on
a smaller subset of images gives you the
same relative answers anyway so from a
research point of view why waste that
time so actually I released a couple of
new data sets
recently one is called imaginet the
French image net which is a small subset
of image net which is designed to be
easy to classify I would highly spell
imaginer it's got an extra T and e at
the end because it's very French am i
okay yeah I'm okay and then another one
called image Wharf which is a subset of
the image net that only contains dog
breeds
that's a hard one right that's a hard
one yeah and I've discovered that if you
just look at these two subsets you can
train things on a single GPU in ten
minutes and the results you get directly
transferable to imagenet nearly all the
time and so now I'm starting to see some
researchers start to use these holidays
that's so deeply love the way you think
because I think you might have written a
blog post saying that sort of going
these big data sets is encouraging
people to not think creatively
absolutely so you're - it's sort of
constrained you to Train on large
resources and because you have these
resources you think more research will
be bit better and then you start like
for some somehow you kill the creativity
yeah and even worse than that Lex I keep
hearing from people who say I decided
not to get into deep learning because I
don't believe it's accessible to people
outside of Google to do useful work so
like I see a lot of people make an
explicit decision to not learn this
incredibly valuable tool because they've
they've drunk the Google kool-aid which
is that only Google's big enough and
smart enough to do it and I just find
that so disappointing and it's so wrong
and I think all the major breakthroughs
in AI in the next twenty years will be
doable on a single GPU
like I would say my sense is all the big
sort of well let's put it this way none
of the big breakthroughs of the last 20
years or acquired multiple GPUs
so like fetch norm well you drop out
did you demonstrate to everyone of them
yeah this is five multiple GPUs against
the original Gans didn't require
multiple ups well and and we've actually
recently shown that you don't even need
gains so we've developed gained level
outcomes without knitting Gans and we
can now do it with again by using
transfer learning we can do it in a
couple of hours on a single generator
might like without the other serial port
yeah
so we've found loss functions that work
super well without the adversarial part
and then one of our students guy called
Jason antic has created
Cordiale defi which uses this technique
to colorize old black-and-white movies
you can do it on a single GPU color as a
whole movie in a couple of hours and one
of the things that Jason and I did
together was we figured out how to add a
little bit of n at the very end which it
turns out for colorization makes it just
a bit brighter and nicer and then Jason
did masses of experiments to figure out
exactly how much to do but it's still
all done on his home machine on a single
GPU in his lounge room and like if you
think about like colorizing Hollywood
movies that sounds like something a huge
studio it would have to do but he has
the world's best results on this there's
this problem of microphones we're just
talking two microphones now yeah it's
such a pain in the ass to have these
microphones to get good quality audio
and I tried to see if it's possible to
plop down a bunch of cheap sensors and
reconstruct higher quality audio from
multiple sources because right now I
haven't seen work from okay we can say
inexpensive mics automatically combining
audio from multiple sources to improve
the combined audio right people haven't
done that and that feels like a learning
problem alright so hopefully somebody
can well I mean it's it's eminently
doable and it should have been done by
now
I feel I felt the same way about
computational photography four years ago
that's right
why are we investing in big lenses when
three cheap lenses plus actually a
little bit of intentional movement so
like Holden you don't like take a few
frames gives you enough information to
get excellent sub pixel resolution which
particularly with deep learning
you would know exactly what you meant to
be looking at we can totally do the same
thing with audio I think there's a
madness that it hasn't been done yet I
live in progress on the photographer tog
Rafik um yeah the dog photography is
basically standard now so the the Google
picks all night light I don't know if
you've ever tried it but it's it's
astonishing you take a picture in almost
pitch black and you get back a very high
quality image and it's not because of
the lens same stuff is like adding the
bouquet to the you know the
background wearing have done
computationally this depicts over here
yeah basically the everybody now is
doing most of the fanciest stuff on
their pho
Resume
Read
file updated 2026-02-13 13:23:35 UTC
Categories
Manage