Transcript

J6XcP4JOHmk • Jeremy Howard: fast.ai Deep Learning Courses and Research | Lex Fridman Podcast #35
/home/itcorpmy/itcorp.my.id/harry/yt_channel/out/lexfridman/.shards/text-0001.zst#text/0099_J6XcP4JOHmk.txt
Back Raw
Kind: captions
Language: en
the following is a conversation with
Jeremy Howard he's the founder of fast
AI a Research Institute dedicated to
making deep learning more accessible
he's also a distinguished research
scientist at the University of San
Francisco a former president of Kegel as
well as the top ranking competitor there
and in general he's a successful
entrepreneur educator researcher and an
inspiring personality in the AI
community when someone asked me how do I
get started with deep learning fast AI
is one of the top places that point them
to it's free it's easy to get started
it's insightful and accessible and if I
may say so it has very little BS they
can sometimes dilute the value of
educational content on popular topics
like deep learning fast AI has a focus
on practical application of deep
learning and hands-on exploration of the
cutting edge that is incredibly both
accessible to beginners and useful to
experts this is the artificial
intelligence podcast if you enjoy it
subscribe on YouTube give it five stars
and iTunes supported on patreon or
simply connect with me on Twitter Alex
Friedman spelled Fri D ma N and now
here's my conversation with Jeremy
Howard what's the first program you've
ever ridden this program I wrote that I
remember would be at high school I did
an assignment where I decided to try to
find out if there were sand like better
musical scales and the normal twelve
tone twelve interval scale so I wrote a
program on my Commodore 64 in basic
let's search through other scale sizes
to see if you could find one where they
were more accurate you know harmonies
like mid tone like sliding like he won
an actual exactly 3 to 2 ratio where
else with a 12 interval scale it's not
exactly 3 to 2 for example so that's in
the car well tempered as I say you know
and basic on a Commodore 64 yeah where
was the interest in music from or is it
just I took music all my life so I
played
the phone and clarinet and piano and
guitar and drums and whatever so how
does that threat go through your life
where's music today yeah it's not where
I wish it was I for various reasons
couldn't really keep it going
particularly because I had a lot of
problems with RSI with my fingers and so
I had to kind of like cut back anything
that used hands and fingers I hope one
day I'll be able to get back to it
health-wise so there's a love for music
underlying it all yeah what's your
favorite instrument sex the phone sex
baritone saxophone well probably bass
saxophone but they're awkward well I'm I
always love it when music is coupled
with programming there's something about
a brain that utilizes those that emerges
with creative ideas so you've used and
studied quite a few programming
languages can you given an overview of
what you've used one of the pros and
cons of each well my favorite
programming environment almost certainly
was Microsoft Access back in like the
earliest days so that was Visual Basic
for applications which is not a good
programming language for the programming
environment fantastic it's like the
ability to create you know user
interfaces and tie data and actions to
them and create reports and all that as
I've never seen anything as good there's
things nowadays like air table which
you're like small subsets of that which
people love for good reason but
unfortunately nobody's ever achieved
anything like that what is that if you
could pause in there for a second no
access this is it a database database
program that Microsoft produced part of
office and the kind of wizard you know
but basically it lets you in a totally
graphical way create tables and
relationships and queries and tie them
to forms and set up you know event
handlers and calculations and it was
very
plate powerful system designed for not
massive scalable things but fair like
useful little applications that I loved
so what's the connection between excel
and access so very close
so access kind of was the relational
database equivalent if you like so
people still do a lot of that stuff it
should be an access in Excel excels they
don't know what Excel is great as well
so but it's just not as rich a
programming model as VBA combined with a
relational database and so I've always
loved relational databases but today
programming on top of a relational
database is just a lot more of a
headache you know you generally either
need to kind of you know you need
something that connects that that runs
some kind of database server unless you
use circle light which has its own
issues
then you can often if you want to get a
nice programming model you'll need to
like create and add an ORM on top and
then I don't know there's all these
pieces tie together and it's just a lot
more awkward than it should be there are
people that are trying to make it easier
so in particular I think of if sharp you
know Don Syme who him and his team have
done a great job of making something
like a database appear in the type
system so you actually get like tab
completion for fields and tables and
stuff like that anyway so that was kind
of anyway so like that whole VBA office
thing I guess was a starting point which
I still miss I got into standard Visual
Basic that's interesting just to pause
on them for a second it's interesting
that you're connecting programming
languages to the ease of management of
data yeah so in your use of programming
languages you always had a love and a
connection with data I've always been
interested in doing useful things for
myself and for others which generally
means getting some data and doing
something with it and putting it out
there again so that's been my interest
throughout so I also did a lot of stuff
with Apple script back in the early days
so it's kind of nice being able to get
the computer and computers to talk to
each other and to do things for you and
then I could think that one night the
programming language I most loved then
would have been Delphi which was object
pascal created by under sales berg who
previously did to it by pascal and then
went on to create dotnet and then went
on create typescript delphi was amazing
because it was like a compiled fast
language that was as easy to use as
Visual Basic Delphi what is it similar
to in in more modern languages Visual
Basic Visual Basic yeah that a compiled
fast version so I'm not sure there's
anything quite like it anymore
if you took like C
shop or Java and got rid of the virtual
machine and replaced it with something
you could compile a small type binary I
feel like it's where um Swift could get
to with the new Swift UI and the
cross-platform development going on like
that's one of my dreams is that will
hopefully get back to where Delphi was
there is actually a free Pascal project
nowadays called Lazarus which is also
attempting to kind of recreate Delphi
though they're making good progress so
ok Delphi that's one of your favorite
programming languages programming
environments again I hate Pascal's not a
nice language if you wanted to know
specifically about what languages I like
they would definitely pick J there's
being an amazingly wonderful language
well woods j.j are you aware of APL I am
NOT okay so from doing a little research
on work you've done okay so not at all
surprising you're not familiar with it
cuz it's not well known but it's
actually one of the main families of
programming languages going back to the
late 50s early 60s so there was a couple
of major directions one was the kind of
lambda calculus Alonzo Church direction
which I guess kind of listens
game and whatever which has a history
going back to the early days of
computing the second was the kind of
imperative /o o you know algo Simula
going under C C++ so forth there was a
third which Accord array oriented
languages which started with a paper by
a guy called Ken Iverson which was
actually a math theory paper not a
programming paper it was called notation
as a tool for thought and it was the
development of a new way a new type of
math notation and the idea is that this
math notation would be was was much more
flexible expressive
and also well-defined then traditional
math notation which is none of those
things math notation is awful and so he
actually turned that into a programming
language and because this was the early
50s although that's very late 50s
although names were available so he
called his language a programming
language or APL ABL APL is a
implementation of notation as a tool for
thought by which he means math notation
and Ken and his son went on to do many
things but eventually they actually
produced you know a new language that
was built on top of all the learnings of
APL that was called J and J is the most
expressive composable language of you
know beautifully designed language I've
ever seen this didn't have
object-oriented components deserve that
kind of thing there's not really it's an
array oriented language it's a new it's
a it's an it's it's the third half using
array array oriented yes so I need to be
a ray warrior so arranged it means that
you generally don't use any loops but
the whole thing is done with kind of a
extreme version of broadcasting if
you're familiar with that none got an
umpire slash Python concept so you do a
lot with one line of code it looks a lot
like math notation basically I'll
compact
mm-hm and the idea is that you can kind
of because you can do so much with one
line of code a single screen of code is
very unlikely to you very rarely need
more than that to in the rest your
program and so you can kind of keep it
all in your head and you can kind of
clearly communicate it it's interesting
that the APL created two main branches k
and j j is this kind of like open source
niche community of crazy enthusiasts
like me and then the other path k was
fascinating it's an astonishingly
expensive programming language which
many of the world's most ludicrous a
rich hedge funds use so the entire
machine is so small it sits inside level
3 cache on your CPU and and it easily
wins every benchmark I've ever seen in
terms of data processing speed
hey you don't come across it very much
because it's like $100,000 per CPU to to
run it yeah but it's like this this this
this path of programming languages it's
just so much that are not so much more
powerful in every way than the ones that
almost anybody uses every day
so though it's all about computation
it's really focused pretty heavily
focused on computation I mean so much of
programming is data processing by
definition and so there's a lot of
things you can do with it but yeah
there's not much work being done on
making like use user interface talking
us or whatever I mean this some but it's
they're not great at the same time
you've done a lot of stuff with Perl and
Python yeah so where does that fit into
the picture of J and K and APO and well
you know it's much more pragmatic like
in the end you kind of have to end up
where the where the libraries are you
know like because to me my my focus is
on productivity I just want to get stuff
done and solve problems so Perl was
great for I created an email company
called fast mail and Perl was great cuz
back in the late 90s early 2000s it just
had a lot of stuff it could do I still
had to write my own monitoring system
and my own web framework my own whatever
because like none of that stuff existed
but it was the super flexible language
to do that in and you used Perl
fast ball used as a back-end think so
everything was written in Perl yeah yeah
everything everything was fell why do
you think Perl hasn't succeeded or
hasn't dominated the market where Python
really takes over a lot yeah well I mean
it felt did dominate it was for time
everything everywhere but then the guy
that
Pal Larry will kind of just didn't put
the time in anymore
and no project can be successful if
there isn't you know it's particularly
one that's data with a strong leader
that that loses that strong leadership
so then python is kind of replaced - you
know python is a lot less elegant
language in nearly every way but it has
the data science libraries and a lot of
them are pretty great so I kind of use
it because it's the best we have but
it's definitely not good enough what do
you think the future programming looks
like what do you hope the future
programming looks like if we zoom in on
the computational fields on data science
on machine learning I hope Swift is
successful because the goal is Swift the
way Chris Lattner describes it is to be
infinitely hackable and that's what I
want I want something where me and the
people I do research with and my
students can look at and change
everything from top to bottom there's
nothing mysterious and magical and
inaccessible unfortunately with Python
it's the opposite of that because
pythons so slow it's extremely
unhackable you get to a point where it's
like okay from here on down at sea so
your debugger doesn't works in the same
way your profiler doesn't work in the
same way your build system doesn't work
in the same way it's really not very
happy ball at all what's the part you
would like to be hackable is it for the
objective of optimizing training of
neural networks inference in your
networks is it performance of the system
or is there some non performance related
just it's it's a greater thing I'm in
the end I want to be productive as a
practitioner so that means that so like
at the moment our understanding of deep
learning is incredibly primitive there's
very little we understand most things
don't work very well even though it
works better than anything else out
there there's so many opportunities to
make it
so you look at any domain area like I
don't know speech recognition with deep
learning or natural language processing
classification with deep learning or
whatever every time I look at an area
with deep learning I always see like oh
it's terrible there's lots and lots of
obviously stupid ways to do things that
need to be fixed so then I want to be
able to jump in there and quickly
experiment and make them better using
the programming language is has a role
in a huge role yes so currently Python
has a big gap in terms of our ability to
innovate particularly around recurrent
neural networks and natural language
processing because it because it's so
slow the the actual loop where we
actually loop through words we have to
do that whole thing in CUDA C so we
actually can't innovate with the kernel
the heart of that most important
algorithm and it's just a huge problem
and this happens all over the place so
we hit you know research limitations
another example convolutional neural
networks which actually the most popular
architecture for lots of things maybe
most things in declining we almost
certainly should be using space
convolutional neural networks but only
like two people are because to do it you
have to rewrite all of that CUDA sea
level stuff and yeah this researchers
and practitioners don't so like there's
just big gaps in like what people
actually research on what people
actually implement because of the
programming language problem so you
think you think it's it's just too
difficult to write in CUDA see that a
programming like a higher level
programming language like Swift should
enable the the easier input fooling
around creative stuff with RN ends or
was parse convolution your noise kind of
who's a who's at fault who's who's a
charge of making it easy for a research
- player I mean no one's at fault just
know what he's got around to it yet or
it's just it's hard right and I mean
part of the fault is that we ignored
that whole APL kind of direction most
prominently everybody did for 60 years
50 years but recently people have been
starting to reinvent pieces of that and
kind of create some interesting new
directions in the compiler technology so
the place where that's particularly
happening right now is something called
ml ir which is something that ok I'm
Kris lat know this rift guy is leading
and because it's actually not gonna be
swift on its own that solves his problem
because the problem is they're currently
writing a acceptable fast you know GPU
program is too complicated regardless of
what language you use no and that's just
because if you have to deal with the
fact that I've got you know 10,000
threads and I have to synchronize
between them all and I have to put my
thing in to grid blocks and think about
warps and all this stuff it's just it's
just so much boilerplate to do that well
you have to be a specialist at that and
it's going to be a year's work to you
know optimize that algorithm in that way
but with things like tensor
comprehensions and tile and ml ir and t
vm there's all these various projects
which are all about saying let's let
people create like domain-specific
languages for tensor computations these
are the kinds of things we do are
generally in on the GPU for deep
learning and then have a compiler which
can optimize that tensor computation a
lot of this work is actually sitting on
top of a project called halide which was
is a mind-blowing project where they
came up with such a domain-specific
language in fact true one
domain-specific language for expressing
this is what my tensor computation is
and another domain-specific language for
expressing this is the kind of the way I
want you to structure the compilation of
that like do it block by block and do
these bits in parallel
they were able to show how you can
compress the amount of code by 10x
compared to optimized GPU code and get
the same performance so that's like so
these other things are kind of sitting
on top of that kind of research and ml
ir is pulling a lot of those best
practices together and now we're
starting to see work done on making all
of that directly accessible through
Swift so that I could use Swift to kind
of write those domain-specific languages
and hopefully we'll get them Swift CUDA
kernels written in a very expressive and
concise way that looks a bit like J in
APL and then Swift layers on top of that
and then a swift UI on top of that and
you know it'll be so nice if we can get
to that point that does it all
eventually boil down to CUDA and NVIDIA
GPUs unfortunately at the moment it does
but one of the nice things about ml ir
if AMD ever gets their act together
which they probably won't is that they
or others could write MLA our backends
for other GPUs or other or other tensor
computation devices of which today there
are increasing number are like graph
core or vertex AI or whatever so yeah
being able to target lots of backends
would be another benefit of this and the
market really needs competitions at the
moment NVIDIA is massively overcharging
for their kind of enterprise class cards
because there is no serious competition
because nobody else is doing the
software properly in the cloud there is
some competition right but not really
other than TP used for heavy use are
almost unprogrammed well at the moment
you can't the GPUs has the same problem
the case is even worse so TP use the
Google actually made an explicit
decision to make them almost entirely
unprogrammed ball because they felt that
there was too much IP in there and if
they gave people direct access to
program them people would learn their
secrets yeah so you can't actually
directly
program the memory in a teepee you you
can't even directly like create code
that runs on and that you look at on the
machine that has the GPU it all goes
through a virtual machine so all you can
really do is this kind of cookie cutter
thing of like plug into high-level stuff
together which is just super tedious and
annoying and totally unnecessary so what
was the tell me if you could the origin
story of fast AI what is the motivation
its mission its dream so I guess the
founding story is heavily tied in my
previous startup which is a company
called in lytic which was the first
company to focus on deep learning for
medicine and I created that because I
saw that was a huge opportunity to
there's a there's a about a 10x shortage
of the number of doctors in the world
and the developing world that we need
expected it would take about three
hundred years to train enough doctors to
meet that gap but I guess that maybe if
we used deep learning for some of the
analytics we could maybe make it so you
don't need as highly trained doctors
diagnosis diagnosis and treatment
planning where's the biggest benefit
just before get the first day I was
where's the biggest benefit of AI in
medicine DC today and not much not much
happening today in terms of like stuff
that's actually out there it's very
early but in terms of the opportunity
it's to take markets like India and
China and Indonesia which have big
populations Africa small numbers of
doctors and provide diagnostic
particularly treatment planning and
triage kind of on device so that if you
do a you know test for malaria or
tuberculosis or whatever you immediately
get something that even a health care
worker that's had a month of training
can get a very high quality assessment
of whether the
patient might be at risk until you know
okay we'll send them off to a hospital
so for example in Africa outside of
South Africa there's only five pediatric
radiologists for the entire continent so
most countries don't have any so if your
kid is sick and they need something
diagnose your medical imaging the person
even if you're able to get medical
imaging done the person that looks at it
will be you know a nurse at best yeah
but actually in India for example and in
China almost no x-rays are read by
anybody by any trained professional
because they don't have enough so if
instead we had a algorithm that could
take the most likely high-risk 5% and
say triage basically say okay somebody
needs to look at this it would massively
change the kind of way that what's
possible with medicine in the developing
world and remember they have
increasingly they have money there the
developing world they're not imported
Apella people so they have the money so
that they're building the hospitals
they're getting the diagnostic equipment
but they just there's no way for a very
long time will they be able to have the
expertise shortage of their sweeties
okay and that's where the deep learning
systems could step in and magnify the
expertise they do exactly yeah so you do
see just a longer it a little bit longer
yeah the interaction you still see the
human expert still at the core of these
systems yeah absolutely there's
something in medicine that can be
automated almost completely I don't see
the point of even thinking about that
because we have such a shortage of
people why would we not why would we
want to find a way not to use them like
we have people so the idea of like even
from an economic point of view if you
can make them 10x more productive
getting rid of the person doesn't impact
your unit economics at all and it
totally ignores effect that there are
things people do better than machines so
it's just to me that's not a useful way
of framing the problem I guess
just to clarify I guess I meant there
may be some problems where you can avoid
even going to the expert ever sort of
maybe preventive care or some basic
stuff flowing and food allowing the
expert to focus on the things that are
that are really that well that's what
the triage would do right so the triage
would say okay it's ninety ninety nine
percent sure there's nothing here right
so you know that can be done on device
and they can just say okay go home so
the experts are being used to look at
the stuff which has some chance it's
worth looking at which most things is
it's not you know it's fine why do you
think we haven't quite made progress on
that yet in terms of the the scale of
how much AI is applied in the middle
there's a lot of reasons I mean one is
it's pretty new I only started and let
it can like 2014 and before that like
it's hard to express to what degree the
medical world was not aware of the
opportunities here so I went to iris na
which is the world's largest radiology
conference and I told everybody I could
you know like I'm doing this thing this
deep learning please come and check it
out and no one had any idea what I was
talking about and no one had any
interest in it so like we've come from
absolute zero which is hard and then the
whole regulatory framework education
system everything is just set up to
think of doctoring in a very different
way so today there is a small number of
people who are deep learning
practitioners and doctors at the same
time and that we're starting to see the
first ones come out of their PhD
programs so that Kinane over in
fostering Cambridge has a number of
students now who are data data science
experts deep learning experts and and
actual medical doctors quite a few
doctors have completed
first day of course now and are
publishing papers and creating journal
reading groups in the American Council
of radiology and like it's just starting
out but it's going to be a long process
they regulators have to learn how to
regulate this they have to build you
know guidelines and then the lawyers at
hospitals have to develop a new way of
understanding that sometimes it makes
sense for data to be you know looked at
in raw form in large quantities in order
to create world-changing results he has
a regulation around data all that it
sounds it was probably the hardest
problem but sounds reminiscent of
autonomous vehicles as well many of the
same regulatory challenges meaning the
same data challenges yeah I mean funnily
enough that problem is less their
regulation and more the interpretation
of that regulation by by lawyers in
hospital so hipper is actually was
designed to its it to P and hipper is
not standing does not stand for privacy
it stands for portability it's actually
meant to be a way that data can be used
and it was created with lots of gray
areas because the idea is that would be
more practical and would help people to
use this this legislation to actually
share data in a more thoughtful way
unfortunately it's done the opposite
because when a lawyer sees a gray area
they see oh if we don't know we won't
get sued then we can't do it today
so hipper is not exactly the problem the
problem is more than there's hospital
lawyers are not incentive to make bold
decisions about data portability or even
to embrace technology that saves lives
no they more want to not get in trouble
for embracing the right but also it is
also so slaves in a very abstract way
which is like oh we've been able to
release these hundred thousand and on
most records I can't point at the
specific person whose life that's saved
I can say like oh we've ended up with
this paper which found this result which
you know diagnosed a thousand more
people
otherwise but it's like which ones were
helped it's it's very abstract and on
the counter side of that you may be able
to point to a life that was taken
because of something though yeah or or
or a person whose privacy was violated
it was like oh this specific person you
know there was de-identified so we've
identified just a fascinating topic
we're jumping around I'll get back to
fast AI but on the question of privacy
data is the fuel for so much innovation
in deep learning what's your sense and
privacy whether we're talking about
Twitter Facebook YouTube just the
technologies like in the medical field
that rely on people's data in order to
create impact how do we get that right
respecting people's privacy and yet
creating technology that just learns
from data one of my areas of focus is on
doing more with less data which so most
vendors unfortunately are strongly
incented to find ways to require more
data and more computation so Google and
IBM being the most obvious IBM yeah so
Watson you know so Google and IBM both
strongly push the idea that you have to
be you know that they have more data and
more computation and more intelligent
people than anybody else and so you have
to trust them to do things because
nobody else can do it and Google's very
upfront about this like Geoff Dana's
going out there and given talks and said
our goal is to require a thousand times
more computation but less people our
goal is to use the people that you have
better and the data you have better in
the computation you have better so one
of the things that we've discovered is
or or at least highlighted is that you
very very very often don't need much
data at all and so the data you already
have in your organization
we'll be enough to get state-of-the-art
results so like my starting point would
be this going to say around privacy is a
lot of people are looking for ways to
share data and aggregate data but I
think often that's unnecessary they
assume that they need more data than
they do because they're not familiar
with the basics of transfer learning
which is this critical technique for
needing orders of magnitude less data is
your sense one reason you might want to
collect data from everyone is like in
the recommender system context where
your individual Jeremy Howard's
individual data is the most useful for
freeing for providing a product that's
impactful for you so for giving you
advertisements for recommending to your
movies for doing medical diagnosis is
your sense we can build with a small
amount of data general models they will
have a huge impact for most people that
we don't need to have data from punching
on the whole I'd say yes I mean they're
things like you know recommender systems
have this cold-start problem where you
know Jeremy is a new customer we haven't
seen him before so we can't recommend
him things based on what else he's
bought and liked with us and there's
various workarounds to that like in a
lot of music programs we'll start out by
saying which of these artists you like
which of these albums do you like which
of these songs do you like Netflix used
to do that nowadays they they tend not
to people kind of don't like that
because they think oh we don't want to
bother the user so you could work around
that by having some kind of data sharing
where you get my marketing record from
axiom or whatever and try to guess from
that to me the the benefit to me and to
society of saving me five minutes on
answering some questions versus the
negative externalities of if the privacy
issue doesn't add up so I think like a
lot of the time the places where people
are
invading our privacy in order to provide
convenience is really about just trying
to make them more money and and they
move these negative externalities and to
places that they don't have to pay for
them so when you actually see
regulations appear that actually cause
the companies that create these negative
externalities to have to pay for it
themselves
they say well we can't do it anymore so
the cost is actually too high right but
for something like medicine yeah I mean
the hospital has my you know medical
imaging my pathology studies my medical
records and also I own my medical data
so you can so I I helped a startup
called doc AI one of the things doc AI
does is that this has an app you can
connect to you know Sutter Health's and
webcore and Walgreens and download your
medical data to your phone and then
upload it again at your discretion to
share it as you wish so with that kind
of approach we can share our medical
information with the people we want to
yes of control I mean it really being
able to control who you share with us on
yeah so that that has a beautiful
interesting tangent but to return back
to uh the origin story of fast they act
right so so before I started fast AI I
spent a year researching where the
biggest opportunities for deep learning
because I knew from my time at Cal in
particular that deep learning had kind
of hit this threshold point where it was
rapidly becoming the state of the art
approach in every areas that looked at
it and I've been working with neural
nets for over 20 years I knew that from
a theoretical point of view once it hit
that point it would do that in kind of
just about every domain and so I kind of
spent a year researching what are the
domains it's going to have the biggest
low-hanging fruit in the shortest time
period
medicine but there were so many I could
have picked and so there was a kind of
level of frustration for me of like okay
I'm really glad we've opened up the
medical deep learning world and today is
huge as you know but we can't do you
know I can't do everything I don't even
know like it took like in medicine it
took me a really long time to even get a
sense of like what kind of problems to
medical practitioners solve what kind of
data do they have who has that data so I
kind of felt like I need to approach
this differently if I want to maximize
the positive impact of deep mourning
rather than me picking an area and
trying to become good at it and building
something I should let people who are
already domain experts in those areas
and who already have the data do it
themselves mm-hmm so that was the reason
for fast AI is to basically try and
figure out how to get deep learning into
the hands of people who could benefit
from it and help them to do so in as
quick and easy and effective way as
possible god it's all sort of empowered
the the domain expert yeah and like
partly it's because like unlike most
people in this field
my background is very applied and
industrial that my first job at MIT was
at McKinsey and company I spent 10 years
in management consulting I I spend a lot
of time with domain experts you know so
I kind of respect them and appreciate
them and know I know that's where the
value generation in society is and so I
also know how most of them can't code
and most of them don't have the time to
invest you know three years and a
graduate degree or whatever so it's like
how do i skill those two main experts I
think it would be a super powerful thing
you know biggest societal impact I could
have so that yeah that was the thinking
so so much a fast AI students and
researchers and the things you teach are
pragmatically minded right practically
minded freaking figuring out ways how to
solve
real problems and fast right so from
your experience what's the difference
between theory and practice of deep
learning well most of the research in
the deep mining world is a total waste
of time all right that's what I was
getting at yeah it's it's a problem in
science in general scientists need to be
published which means they need to work
on things that their peers are extremely
familiar with and can recognize in
advance in that area so that means that
they all need to work on the same thing
and so it really Inc and and the thing
they work on there's nothing to
encourage them to work on things that
are practically useful so you get just a
whole lot of research which is minor
advances and stuff that's been very
highly studied and has no significant
practical impact where else the things
that really make a difference like I
mentioned transfer learning like if we
can do better at transfer learning then
it's this like world-changing thing
we're suddenly like lots more people can
do world-class work with less resources
and less data and but almost nobody
works on that or another example active
learning which is the study of like how
do we get more out of the human beings
in the loop where's my favorite topic
yeah so active learning is great but
it's almost nobody working on it because
it's just not a trendy thing right now
you know what somebody's suicide
interrupt
you're saying that nobody is publishing
an active learning but there's people
inside companies anybody who actually
has to solve a problem they're going to
innovate an active learning yeah
everybody kind of reinvents active
learning when they actually have to work
in practice because they start labeling
things and they think gosh this is
taking a long time and it's very
expensive and then they start thinking
well why am i labeling everything I'm
only the machines only making mistakes
on those two classes they're the hard
ones maybe I ought to start labeling
those two classes and then you start
thinking well why did I do that manually
why kind of just get the system to tell
me which things are going to be hardest
it's an obvious thing to do but
yeah it's it's just like like transplant
learning it's it's under studied and the
academic world just has no reason to
care about practical results the funny
thing is like I've only really ever
written one paper I hate writing papers
and I didn't even write it it was my
colleague sebastian ruder who actually
wrote it I just knew did the research
for it but it was basically introducing
transfer learning successful transfer
learning to NLP for the first time the
algorithm is called GLM fit and it
actually I actually wrote it for the
course for the first day of course I
wanted to teach people in LP and I
thought I only want to teach people
practical stuff and I think the only
practical stuff is transfer learning and
I couldn't find any examples of transfer
learning and NLP so I just did it and I
was shocked to find that as soon as I
did it was you know the basic prototype
took a couple of days smashed the
state-of-the-art on one of the most
important data sets in a field that I
knew nothing about and I just thought
well this is ridiculous
and so I spoke to the best unit and he
kindly offered to write it up the
results and so it ended up being
published in a CL which is the top link
with a computational linguistics
conference so like people do actually
care once you do it but I guess it's
difficult for maybe like junior
researchers or like like I don't care
whether I get citations or papers
whatever I was right there's nothing in
my life that makes that important which
is why I've never actually bothered to
write a pic of myself now for people who
do I guess they have to pick the kind of
safe option which is like yeah make a
slight improvement on something that
everybody is already working on yeah
nobody does anything interesting or
succeeds in life or the safe option
speed I mean the nice thing is nowadays
everybody is now working on you know a
transfer learning because since that
time we've had GPT and GPT too and Burt
and you know it's like it's so yeah once
you show that something is possible if
nobody jumps you and I guess I
hope to be a part of and I hope to see
more innovation and active learning in
the same way I think yeah try learning
an active learning are fascinating
public open were I actually helped start
a startup called platform AI which is
really all about active learning and
yeah it's very interesting trying to
kind of see what research is out there
and make the most of it and there's
basically none so we've had to do all
our own research once again and just as
easy described can you tell the story of
the stanford competition dawn bench and
fast day eyes achievement on it sure so
something which I really enjoy is that I
basically teach two courses a year
the practical deep money for coders
which is kind of the introductory course
and then cutting-edge tech mining for
coders which is the kind of research
level course and while I teach those
courses I have a I basically have a big
office at the University of San
Francisco big enough for like 30 people
and I invite anybody any student who
wants to come and hang out with me well
I built the course and so generally it's
full and so we have twenty or thirty
people in a big office with nothing to
do but study deep learning so it was
during one of these times that somebody
in the group said oh there's a thing
called Don benched it looks interesting
and I was like what the hell is that is
it about some competition to see how
quickly you can train a model seems kind
of not exactly relevant to what we're
doing but it sounds like the kind of
thing which you might be interested in I
checked it out and I said oh crap
there's only ten days till it's over
it's pretty too late and we're kind of
busy trying to teach this course yeah
maybe like oh it would make an
interesting case study for the course
like it's all the stuff where you're
already doing why don't you just put
together our current best practices and
ideas so me and I guess about four
students just decided to give it a go
and we focused on this more one called
Sipho ten which is that all 32 by 32
pixels can you say word on benches yeah
so it's a competition to train a model
as fast as possible I was run by
Stanford
as cheap as possible - that's also
another one first cheap as possible and
there was a couple of categories
imagenet and so far 10 so image nets is
big 1.3 million image thing that took a
couple of days to train remember a
friend of mine Pete worden who's now at
Google I remember he told me how he
trained imagenet a few years ago and he
basically like had this little granny
flat out the back that he turned into
his image net training center and he
figured you know after like a year of
work he figured out how to train it and
like ten days or something it's like
that was a big job well so far ten at
that time you could train in a few hours
you know it's much smaller and easier so
we thought would try so far 10 and yeah
I've really never done that before like
I've never really liked things like
using more than one gpgpu at a time was
something I tried to avoid cuz to me
it's like very against the whole idea of
accessibility is she better to do things
with 1gb here I mean have you asked in
the past before after having
accomplished something how do I do this
faster much faster Oh always but it's
always for me it's always how do I make
it much faster on a single genus you
that a normal person could afford in
their day-to-day life it's not how could
I do it faster I you know having a huge
data center because up to me it's all
about like as many people should be to
use something as possible without
fussing around with infrastructure so
anyway so in this case it's like well we
can use eight GPUs just by renting a AWS
machine so we thought we'd try that and
yeah basically using the stuff we were
already doing we were able to get you
know the speed you know within a few
days we had to speed down to I don't
know that's a very small number of
minutes I can't remember exactly how
many minutes it was but I might have in
like 10 minutes or something and so yeah
we found ourselves at the top of the
leaderboard easily for both time and
money which really shocked me because
the other people competing this were
like Google and Intel and stuff we're
like know a lot more about this stuff
I think we do so that we were emboldened
we thought let's try the imagenet one
two way out of our league but our goal
was to get under 12 hours yeah and we
did which was really exciting and but we
didn't put anything up on the
leaderboard but we were down to like 10
hours but then Google put in some like 5
hours or something about us like oh
they're so screwed but we kind of
thought we'll keep trying you know if
Google can do it info I mean Google did
on five hours on someone like a TPU pod
or something like a lot of hardware but
we kind of like had a bunch of ideas to
try like a really simple thing was why
are we using these big images they're
like 224 256 by 256 pixels you know why
don't we try smaller ones and just
elaborate there's a constraint on the
accuracy that your training model is
supposed to achieve yeah you got to
achieve 93% I think it was for imagenet
exactly which is very tough so you have
to yeah 93% like they think that they
picked a good threshold it was a little
bit higher than what the most commonly
used ResNet 50 model could achieve at
that time so yeah so it's quite a
difficult problem to solve but yeah we
realized if we actually just use 64 by
64 images it trained a pretty good model
and then we could take that same model
and just give it a couple of epochs to
learn 224 by 224 images and it was
basically already trained it makes a lot
of sense like if you teach somebody like
here's what a dog looks like and you
show them low res versions and then you
say here's a really clear picture of a
dog they already know what a dog looks
like so that like just we jumped to the
front and we ended up winning
parts of that competition we actually
ended up doing a distributed version
over multiple machines a couple of
months later and ended up at the top of
the leaderboard we had 18 minutes in it
yeah and it was and people have just
kept on blasting through again and again
since then so so what's your view on
multi-gpu or multiple machine training
in general as as a way to speed code up
I think it's largely a waste of time
both multi-gpu on a single machine and
yeah particularly multi machines because
it's just clunky motogp use is less
clunky than it used to be but to me
anything that slows down your iteration
speed is a waste of time so you could
maybe do your very last you know
perfecting of the model on Motty GPUs if
you need to that so for example I think
doing stuff on imagenet is generally a
waste of time why test things on 1.3
million images most of us don't use 1.3
million images and we've also done
research that shows that doing things on
a smaller subset of images gives you the
same relative answers anyway so from a
research point of view why waste that
time so actually I released a couple of
new data sets
recently one is called imaginet the
French image net which is a small subset
of image net which is designed to be
easy to classify I would highly spell
imaginer it's got an extra T and e at
the end because it's very French am i
okay yeah I'm okay and then another one
called image Wharf which is a subset of
the image net that only contains dog
breeds
that's a hard one right that's a hard
one yeah and I've discovered that if you
just look at these two subsets you can
train things on a single GPU in ten
minutes and the results you get directly
transferable to imagenet nearly all the
time and so now I'm starting to see some
researchers start to use these holidays
that's so deeply love the way you think
because I think you might have written a
blog post saying that sort of going
these big data sets is encouraging
people to not think creatively
absolutely so you're - it's sort of
constrained you to Train on large
resources and because you have these
resources you think more research will
be bit better and then you start like
for some somehow you kill the creativity
yeah and even worse than that Lex I keep
hearing from people who say I decided
not to get into deep learning because I
don't believe it's accessible to people
outside of Google to do useful work so
like I see a lot of people make an
explicit decision to not learn this
incredibly valuable tool because they've
they've drunk the Google kool-aid which
is that only Google's big enough and
smart enough to do it and I just find
that so disappointing and it's so wrong
and I think all the major breakthroughs
in AI in the next twenty years will be
doable on a single GPU
like I would say my sense is all the big
sort of well let's put it this way none
of the big breakthroughs of the last 20
years or acquired multiple GPUs
so like fetch norm well you drop out
did you demonstrate to everyone of them
yeah this is five multiple GPUs against
the original Gans didn't require
multiple ups well and and we've actually
recently shown that you don't even need
gains so we've developed gained level
outcomes without knitting Gans and we
can now do it with again by using
transfer learning we can do it in a
couple of hours on a single generator
might like without the other serial port
yeah
so we've found loss functions that work
super well without the adversarial part
and then one of our students guy called
Jason antic has created
Cordiale defi which uses this technique
to colorize old black-and-white movies
you can do it on a single GPU color as a
whole movie in a couple of hours and one
of the things that Jason and I did
together was we figured out how to add a
little bit of n at the very end which it
turns out for colorization makes it just
a bit brighter and nicer and then Jason
did masses of experiments to figure out
exactly how much to do but it's still
all done on his home machine on a single
GPU in his lounge room and like if you
think about like colorizing Hollywood
movies that sounds like something a huge
studio it would have to do but he has
the world's best results on this there's
this problem of microphones we're just
talking two microphones now yeah it's
such a pain in the ass to have these
microphones to get good quality audio
and I tried to see if it's possible to
plop down a bunch of cheap sensors and
reconstruct higher quality audio from
multiple sources because right now I
haven't seen work from okay we can say
inexpensive mics automatically combining
audio from multiple sources to improve
the combined audio right people haven't
done that and that feels like a learning
problem alright so hopefully somebody
can well I mean it's it's eminently
doable and it should have been done by
now
I feel I felt the same way about
computational photography four years ago
that's right
why are we investing in big lenses when
three cheap lenses plus actually a
little bit of intentional movement so
like Holden you don't like take a few
frames gives you enough information to
get excellent sub pixel resolution which
particularly with deep learning
you would know exactly what you meant to
be looking at we can totally do the same
thing with audio I think there's a
madness that it hasn't been done yet I
live in progress on the photographer tog
Rafik um yeah the dog photography is
basically standard now so the the Google
picks all night light I don't know if
you've ever tried it but it's it's
astonishing you take a picture in almost
pitch black and you get back a very high
quality image and it's not because of
the lens same stuff is like adding the
bouquet to the you know the
background wearing have done
computationally this depicts over here
yeah basically the everybody now is
doing most of the fanciest stuff on
their phones with computational
photography and also increasingly people
are putting more than one lens on the
back of the camera so the same will
happen for audio for sure and there's
applications in the audio side if you
look at an Alexa type device most people
have seen especially I worked at Google
before when you look at noise background
removal you don't think of multiple
sources of audio you don't play with
that as much as I would hope people I
mean you can still do it even with one
like again it's not not much works being
done in this area so we're actually
going to be releasing an audio library
soon which hopefully will encourage
development of this because it's so
underused the basic approach we used for
our super resolution in which Jason uses
video defy of generating high quality
images the exact same approach would
work for audio no-one's done it yet but
it would be a couple of months work okay
are also learning rate in terms of Don
bench there's some magic on learning
rate that you played around with yeah
interesting yeah so this is all work
that came from a guy called Leslie Smith
Leslie's a researcher who like us cares
a lot about just the practicalities of
training neural networks quickly and
accurately which i think is what
everybody should care about but almost
nobody does and he discovered something
very interesting which he calls super
convergence which is there are certain
networks that with certain settings of
high parameters could suddenly be
trained ten times faster by using a ten
times higher learning rate now no one
published that paper because it's not an
area of kind of active research in the
academic world no academics recognized
this is important and also deep learning
in academia is not considered a
experimental science so unlike in
physics where you could say like I just
saw as a subatomic particle do something
which the theory doesn't explain you
could publish that
without an explanation and then in the
next 60 years people can try to work out
how to explain it
we don't allow this in the deep learning
world so it's it's literally impossible
for Leslie to publish a paper that says
I've just seen something amazing happen
this thing trained ten times faster than
it should have I don't know why
and so the reviewers were like we can't
publish that because you don't know why
so anyway that's important to pause on
because there's so many discoveries that
would need to start like that every
every other scientific field I know of
works is that way I don't know why ours
is uniquely disinterested in publishing
unexplained experimental results but
there it is so it wasn't published
having said that I read a lot more
unpublished papers and published papers
because that's where you find the
interesting insights so I absolutely
read this paper and I was just like this
is astonishingly mind-blowing and weird
and awesome and like why isn't everybody
only talking about this because like if
you can train these things ten times
faster they also generalized better
because you're you're doing less epochs
which means you look at the data less
you get better accuracy so I've been
kind of studying that ever since and
eventually Leslie kind of figured out a
lot of how to get it's done and we added
minor tweaks and a big part of the trick
is starting at a very low learning rate
very gradually increasing it so as
you're training your model you would
take very small steps at the start and
it gradually makes them bigger and
bigger and tall eventually you're taking
much bigger steps than anybody thought
as possible
a few other little tricks to make it
work but ever ever basically we can
reliably get super convergence and so
for the dawn bench thing we were using
just much higher learning rates than
people expected to work what do you
think the future of I mean makes so much
sense for that to be a critical hyper
parameter learning rate that you very
what do you think the future of learning
rate magic looks like well there's been
a lot of great work in the last 12
months in this area it's and people are
increasingly realizing that up to might
like we just have no idea really how
optimizers work and the combination of
weight decay which is how we regularize
optimizers and the learning rate and
then other things like the epsilon we
use in in the atom optimizer they all
work together in weird ways and
different parts of the model this is
another thing we've done a lot of work
on is research into how different parts
of the model should be trained at
different rates in different ways so we
do something we call discriminative
learning rates which is really important
particularly for transfer learning so
really I think in the last 12 months a
lot of people have realized that this
all this stuff is important there's been
a lot of great work coming out and we're
starting to see algorithms here which
have very very few dials if any that you
have to touch selector I think what's
going to happen is the idea of a
learning rate well it almost already has
disappeared in the latest research and
instead it's just like you know we we
know enough about how to interpret the
gradients and the change of gradients we
see to know how to set every parameter
you can await it so you see the future
of of deep learning where really where's
the input of a human expert needed well
hopefully the input of the human expert
will be almost entirely unneeded from
the deep learning point of view so again
like Google's approach to this is to try
and use thousands of times more compute
to run lots and lots of models at the
same time and hopefully one of them is
good at or male CONUS yeah I don't know
kind of stuff which i think is insane
when you better understand the mechanics
of how models learn you don't have to
try
thousand different models to find which
one happens to work the best you can
just jump straight to the best one which
means that it's more accessible in terms
of compute cheaper and also with less
hyper parameters to set it means you
don't need deep learning experts to
train your deep learning model for you
which means that domain experts can do
more of the work which means that now
you can focus the human time on the kind
of interpretation data gathering
identifying what all errors and stuff
like that yeah the data side how often
do you work with data these days in
terms of the cleaning looking at like
Darwin looked at different species while
traveling about do you look at data I
have you in your roots and cargo always
yeah good data I mean it's a key part of
our course it's like before we train a
model in the course we see how to look
at the data and then after the first
thing we do after we train our first
model which we fine-tune an image net
model for five minutes and then the
thing we immediately do after that is we
learn how to analyze the results of the
model by looking at examples of
misclassified images and looking at a
classification matrix and then doing
like research on Google to learn about
the kinds of things that it's
misclassifying so to me one of the three
cool things about machine learning
models in general is that you can
interpret when you interpret them they
tell you about things like what are the
most important features which groups you
misclassifying and they help you become
a domain expert more quickly because you
can focus your time on the bits that the
model is telling you it is important so
it lets you deal with things like data
leakage for example if it says all the
main feature I'm looking at is customer
ID you know and you're like oh customer
ID should be predictive and then you can
talk to the people that manage customer
IDs and they'll tell you like oh yes as
soon as a customer's application is
accepted we add a one on the end of
their customer arm or something you know
yeah so yeah model looking at data
particularly from the lens of which
parts of the date of the model says is
important is super important yeah and
using kind of using the model to almost
debug the data yeah you have learn more
about exactly
what are the different cloud options for
training y'all networks it's the last
question related to dawn bench well it's
part of a lot of the work we do but from
a perspective of performance I think
you've written this in a blog post
there's AWS there's TPU from Google
what's your sense what the future holds
what would you recommend now right there
was a so from a halfway point of view
Google's TP use and the best nvidia gpus
are similar I mean maybe the TP is like
30% faster but they're also much harder
to program with there isn't a clear
leader in terms of hardware right now
although much more importantly the GPU
nvidia gpus a much more programmable
they've got much more written for all
them so like that's the clear leader for
me and where I would spend my time as a
researcher and practitioner millington
to the platform
I mean we're super lucky now with stuff
like Google TCP Google Cloud and AWS
that you can access a GPU pretty quickly
and easily but I mean for AWS it's still
too hard like you have to find an ami
and get the instance running and then
install the software you want blah blah
blah GCP is still is currently the the
best way to get started on if the server
environment because they have a
fantastic fast AI in pi torch ready to
go instance which has all the courses
pre-installed it has Jupiter notebook
pre running Jupiter notebook is this
wonderful interactive computing system
which everybody basically should be
using for any kind of data-driven
research but then even better than there
are there are platforms like salamander
which we own and paper space where
literally you click a single button and
it pops up a Jupiter notebook straight
away without any kind of installation or
anything and all the course notebooks
are all pre-installed so like for me we
this is one of the things we spent
a lot of time kind of curating and
working on because when we first started
our courses the biggest problem was
people dropped out of lesson one because
they couldn't get an AWS instance
running so things are so much better now
and like we actually have if you got a
cost up faster day I the first thing it
says is here's how to get started with
your GPU and there's like you just click
on a link and you click start and and
it's going it will you a go GCP I have
to confess I've never used the Google
DCP yeah JCP gives you three hundred
dollars of compute for free which is
really nice that as I say a salamander
and paper spacer even even easier still
okay so the from the perspective of deep
learning frameworks you work with fast
AI to go to this framework and PI torch
intensive flow what are the strengths of
each platform your perspective so in
terms of what we've done our research on
and taught in our course we started with
Theano and care us and then we switch to
tensor flow and care us and then we
switch to PI torch and then we switched
to PI torch and fast AI and that that
kind of reflects a growth and
development of the ecosystem of dig
learning libraries siano intensive flow
were great but we're much harder to
teach and do research and development on
because they define what's called a
computational graph upfront less data
graph well you basically have to say
here are all the things that I'm going
to eventually do in my model and then
later on you say okay do those things
with this data and you can't like debug
them you can't do them step-by-step you
can't program them interactively in a
Jupiter notebook and so forth
pi torch was not the first four pi torch
was certainly the the strongest entrant
to come along and say let's not do it
that way let's just use normal Python
and everything you know about in Python
is just going to work and we'll figure
out how to make that run on the GPU as
in when and necessary that turned out to
be a huge a huge leap in terms of what
we could do with our research and what
we could weigh with our teaching and
because it was a limiting yeah I mean it
was critical for us for something like
dawn Bench to be able to rapidly try
things it's just so much harder to be a
researcher and practitioner when you
have to do everything up front and you
can inspect it problem with pay torch is
it's not at all accessible to newcomers
because you have to like write your own
training loop and manage the gradients
and all their stuff and it's also like
not great for researchers because you're
spending your time dealing with all this
boilerplate and overhead rather than
thinking about your algorithm so we
ended up writing this very multi-layered
API that at the top level you can train
a state-of-the-art neural network in
three lines of code and which kind of
talks to an API which talks to an API
which talks from API which like you can
deep dive into at any level and get
progressively closer to the Machine kind
of levels of control and this is the
first AI library that's been critical
for us and for our students and for lots
of people that have one big learning
competitions with it and written
academic papers with it it's made a big
difference we're still limited though by
Python and particularly this problem
with things like recurrent neural nets a
where you just can't change things
unless you accept it going so slowly
that it's impractical so in the latest
incarnation of the course and with some
of the research risked out now starting
to do we're starting to do stuff some
stuff in Swift
I think we're three years away from that
being super practical but I'm in no
hurry I'm very happy to invest the time
to get there but you know with with that
we actually already have a nascent
version of the first AI library for
vision running on special knowledge and
so flow
because a Python for tensorflow is not
going to cut it it's just a disaster
what they did was they tried to
replicate the bits that people were
saying they like about a torch the is
kind of interactive computation but they
didn't actually change their
foundational runtime components so they
kind of added this like syntax sugar
they call TF eager tend to flow again
which makes it look a lot like pay torch
but it's 10 times slower than pi torch
to actually hmm do a step so because
they didn't invest the time and like
retooling the foundations cuz their code
base is so horribly copy yeah I think
it's probably very difficult to do that
kind of rejoin yeah well particularly
the way tensorflow was written it was
written by a lot of people very quickly
in a very disorganized way so like when
you actually look in the code as I do it
often I'm always just like oh god what
were they thinking it's just it's pretty
awful so I'm really extremely negative
about the potential future if it by the
flaws of the fet swift for tensorflow
can be a different beast altogether it
can be like it can basically be a layer
on top of M lar that takes advantage of
you know all the great compiler stuff
that Swift builds on with LLVM and yeah
it could be a thing kit will be
absolutely fantastic well you're
inspiring me to try evan Roo truly felt
the pain of tensorflow 2.0 python it's
fine by me but yeah but it does the job
if you're using like predefined things
that somebody's already written but if
you actually compare you know like I've
had to do because I've been having to do
a lot of stuff with tensorflow recently
you actually compare like okay I want to
write something from scratch yeah like I
just kick fighting is like oh it's
running ten times slower than pi torch
so is the biggest cost let's throw
running time out the window how long it
takes you to program that's not too
different now thanks to transfer flow
eager that's not too different but
because because so many things take so
long
to run yeah you wouldn't run it at ten
times slower like you just go like oh
this is taking so long
yeah and also there's a lot of things
which are just less programmable like TF
data which is the way they do processing
works intensive flow is just this big
mess it's incredibly inefficient and
they kind of had to write it that way
because of the TPU problems I described
earlier so I just you know I just feel
like they've got this huge technical
debt which they're not gonna solve
without starting from scratch so here's
an interesting question then if there's
a new student starting today what would
you recommend they use well I mean we
obviously recommend fast AI and pi torch
because we teach new students and that's
what we teach with so we would very
strongly recommend that because it will
let you get on top of the concepts much
more quickly so then you'll become an
extra and you'll also learn the actual
state-of-the-art techniques you know so
you actually get world-class results
honestly it doesn't much matter what
library you learn because switching from
China to MX net to tensorflow to PI
torch is going to be a couple of days
work as few long as you understand the
foundation as well but you think we'll
Swift creep in there as a thing that
people start using not for a few years
particularly because like Swift has no
data science community libraries Oh
basil wing and the Swift community has a
a total lack of appreciation and
understanding of numeric computing so
like they keep on making stupid
decisions you know for years they've
just done dumb things around performance
and prioritization
that's clearly changing now because the
developer of Chris Christie at developer
of Swift Chris Latner is working at
Google on the Swift Potenza flows so
like that's that's a priority it'll be
interesting to see what happens with
Apple because like Apple hasn't shown
any sign of caring about
numeric programming in Swift so I mean
hopefully they'll get off their ass and
start appreciating this because
currently all of their low-level
libraries are not written in Swift
they're not particularly swifty at all
stuff like whore ml they're really
pretty rubbish so yeah so there's a long
way to go but at least one nice thing is
that Swift for tensorflow
can actually directly use Python code
and Python libraries you know literally
the entire lesson one notebook a fast AI
runs in Swift right now in Python mode
so that's that's a nice intermediate
thing how long does it take the look at
the two two facile courses how long does
it take to get from point zero to
completing both courses it varies a lot
somewhere between two months and two
years generally
so for two months how many hours a day
so I sound like a somebody who is a very
competent coder can can do 70 hours per
course and seventy seven zero yeah
that's it okay but a lot of people I
know take a year off to study first day
I full-time and say at the end of the
year they feel pretty competent because
generally there's a lot of other things
you do like they're generally they'll be
entering cowgirl competitions they you
might be reading in Goodfellows books
they might you know they'll be doing a
bunch of stuff and often you know
particularly if they are domain expert
they're coding skills might be a little
on the pedestrian side so part of it's
just like doing a lot more writing what
do you find is the bottleneck for people
usually except getting started and
setting stuff up I would say coding just
yeah I would say the best the people who
are strong coders pick it up the best
although another bottleneck is people
who have a lot of experience of classic
statistics can really struggle because
it the intuition is so the opposite of
what they used to they're very used to
like trying to reduce the number of
parameters in their model and looking at
individual coefficients and stuff like
that so I find people who have a lot of
coding background and know nothing about
statistics are generally going to be the
best off so you taught several course on
deep learning and as Fineman says the
best way to understand something is to
teach it what have you learned about
deep learning from teaching it a lot
it's a key reason for me to to teach the
courses I mean obviously it's going to
be necessary to achieve our goal of
getting two main experts to be familiar
with deep learning but it was also
necessary for me to achieve my goal of
being really familiar with deep learning
I I mean to see so many domain experts
from so many different backgrounds it's
definitely I wouldn't say taught me but
convinced me something that I like to
believe was true which was anyone can do
it so there's a lot of kind of
snobbishness out there about only
certain people can learn to code only
certain people are going to be smart
enough to like do AI that's definitely
bullshit you know I've seen so many
people from so many different
backgrounds get state-of-the-art results
in their domain areas now the it's
definitely taught me that the key
differentiator between people that
succeed and people that fail is tenacity
that seems to be basically the only
thing that matters the people a lot of
people give up and but if the ones who
don't give up pretty much everybody
succeeds you know even if at first I'm
just kind of like thinking like wow
they're really not quite getting it yet
are they but eventually people get it
and they succeed so I think that's been
any they're both things I'd like to
believe was true but I don't feel like I
really had strong evidence with them to
be true but now I can say I've seen it
again and again so what advice do you
have for someone who wants to get
started in deep learning train lots of
models that's that's how you that's how
you learn it so like so I would you know
I think it's not just me I think I think
our course is very good but also lots of
people independently I said it's very
good it recently won the cog X award for
AI courses as being the best in the
world let's say come to our course cost
up faster day I and the thing I keep on
hopping on in my lessons is train models
print out the inputs to the models print
out to the outputs to the models like
study you know change change the inputs
of it look at how the outputs very just
run lots of experiments to get a you
know an intuitive understanding of
what's going on to get hooked do
think you mentioned training do you
think just running the models inference
like if we talk about getting started no
you've got to find cheering the models
so that's that's that's the critical
thing because at that point you now have
model that's in your domain area so
there's there's there's no point running
somebody else's model because it's not
your model like so it only takes five
minutes to fine-tune a model for the
data you care about and in lesson two of
the course we teach you how to create
your own data set from scratch by
scripting Google Image Search yeah so
and we show you how to actually create a
web application running online so I
create one in the course that
differentiates between a teddy bear or
grizzly bear and a brown bear and it
does it with basically a hundred percent
accuracy took me about four minutes to
scrape the images from Google search in
the script there's a little graphical
widgets we have in the notebook that
help you clean up the data set there's
other widgets that help you study the
results to see where the errors are
happening and so now we've had got over
a thousand replies in our share your
work here thread of students saying
here's the thing I built and so those
people who like and a lot of them are
state of the art like somebody said oh I
tried looking at Devon Gehry characters
and I couldn't believe it the thing that
came out was more accurate than the best
academic paper after lesson one and then
there's others which are just more kind
of fun like somebody who's doing
Trinidad and Tobago hummingbirds she
said that's kind of their national bird
and she's got something that can now
classify Trinidad and Tobago
hummingbirds so yeah train models
fine-tune models with your data set and
then study their inputs and outputs
how much is fast there of course is free
everything we do is free we have no
revenue sources of any kind it's just a
service to the community you're a saint
okay once the person understands the
basics trains a bunch of models if we
look at the scale of years what advice
do you have for someone wanting to
eventually become an expert train lots
of models train lots of models in your
domain area so an expert what right we
don't need more expert like
create slightly evolutionary research an
area that everybody's studying we need
experts at using deep learning to
diagnose malaria well we need experts at
using deep learning to analyze language
to study media bias so we need experts
in analyzing fisheries to identify
problem areas and you know the ocean you
know that that's that's what we need so
like become the expert in your passion
area and this is a tool which you can
use just about anything and you'll be
able to do that thing better than other
people particularly by combining it with
your passion and domain expertise so
that's really interesting even if you do
want to innovate on transfer learning or
active learning your thought is that
means one I certainly share is you also
need to find a domain or dataset that
you actually really care for right if
you're not working on a real problem
that you understand how do you know if
you're doing it any good you know how do
you know if your results so good how do
you know if you're getting bad results
why you're getting bad results is it a
problem with the data or is like how do
you know you're doing anything useful
yeah the only to me the only really
interesting research is not the only but
the vast majority of interesting
research is like try and solve an actual
problem and solve it really well so both
understanding sufficient tools and the
deep learning side and becoming a domain
expert in a particular domain I really
thinks will then reach for anybody yeah
I mean to me I would compare it to like
studying self-driving cars having never
looked at a car or being in a car or
turn the car on right you know which is
like the way it is for a lot of people
they'll study some academic data set
where they literally have no idea about
the other way I'm not sure how familiar
with the thomas vehicles but that is
literally you describe a large
percentage of robotics folks working in
a self-driving cars as they actually
haven't considered driving they haven't
actually looked at what driving looks
right they haven't driven it goes and
enterprise because you know when you've
actually driven
you know like these are the things that
happened to me and I was driving it so
there's nothing that beats the
real-world examples are just
experiencing them you've created many
successful startups what does it take to
create a successful startup same thing
is becoming successful deep learning
practitioner which is not getting up so
you can
right out of money or
time or run out of something you know
but if you keep costs super low and try
and save up some money beforehand so you
can afford to have some time then just
sticking with it it's one important
thing doing something you understand and
care about is important that by
something I don't mean the biggest
problem I see with deep whining people
is they do a PhD in deep learning and
then they try and commercialize their
PhD it is a waste of time because that
doesn't solve an actual problem you
picked your PhD topic because it was an
interesting kind of engineering or math
or research exercise but yeah if you've
actually spent time as a recruiter and
you know that most of your time was
spent sifting through resumes and you
know that most of the time you're just
looking for certain kinds of things and
you can try doing that with a model for
a few minutes and see whether that
something which your models be able to
do as well as you could then you're on
the right track to creating a startup
and then I think just yeah being just be
pragmatic and
train state
to capital money as long as possible
preferably forever so yeah on that point
do you venture capital so did you were
able to successfully run startups with
was self-funded yeah my first two was
self-funded and that was the right way
to do it
that's scary no species startups are
much more scary because you have these
people on your back who do this all the
time and who have done it for years
telling you grow grow grow grow and I
don't they don't care if you fail they
only care if you don't grow fast enough
so that's scary
where else doing the ones myself well
with with partners who were friends is
nice because like we just went along at
a pace that made sense and we were able
to build it to something which was big
enough that we never had to work again
but was not big enough that any VC would
think it was impressive and that was
enough for us to be excited you know so
I I thought that's a much better way to
do things and most people in generally
speaking that for yourself but how do
you make money during that process do
you cut into savings if I guess so yeah
so fir
so I started fast mail and optimal
decisions at the same time in 1999 with
two different friends and for fast mail
I guess I spent $70 a month on the
server and when the server ran out of
space I put a payments button on the
front page and said if you want more
than 10 makerspace you have to pay $10 a
yeah and so run low like keep your cost
down yes I came across town and once you
know once once I needed to spend more
money I asked people to spend the money
for me and that that was that basically
from then on oh we were making money and
I was profitable from then for optimal
decisions it was a bit harder because we
were trying to sell something that was
more like a 1 million dollar sale but
what we did was we would sell scoping
projects so kind of like prototype he
projects but rather than to be free we
would sell them 50 to $100,000 so again
we were covering our costs and also
making the client feel like we were
doing something valuable so in both
cases we were profitable from six months
in yeah nevertheless is scary I mean
yeah sure it's it's Gary before you jump
in and I just I guess I was comparing it
to this scariness of VC I felt like with
VC stuff it was more scary you kind of
much more in somebody else's hands you
know will they fund you or not and what
do they think of what you're doing I
also found it very difficult with VC's
bet startups to actually do the thing
which I thought was important for the
company rather than doing the thing
which I thought would make the VC happy
now VCS always tell you not to do the
thing that makes them happy but then if
you don't do the thing that makes them
happy they get set so and do you think
optimizing for the whatever they call it
they exit is uh
as a good thing to optimize for I think
I can be but not at the VC level because
the VC exit needs to be you know a
thousand x so where else the lifestyle
exit if you can sell something for ten
million dollars I think you've made it
right so I don't it depends if you want
to build something that's gonna you kind
of happy to do forever
then fine if you want to build something
you want to sell then three is time
that's fine too I mean they're both
perfectly good outcomes so you're
learning Swift now in a way I mean you
were a writer and I read that you use at
least in some cases spaced repetition as
a mechanism for learning new things yeah
I used Anki quite a lot yourself sure I
actually don't never talk to anybody
about it don't don't know how many
people do it but it works incredibly
well for me can you talk to your
experience like how did you what what do
you like first of all okay let's back it
up what is space repetition so spaced
repetition is an idea created by a
psychologist named Epping house must be
a couple hundred years ago or something
hundred and fifty years ago he did
something which sounds pretty damn
tedious he wrote down random sequences
of letters on cards and tested how well
he would remember those random sequences
a day later or a week later whatever he
discovered that there was this kind of a
curve where his probability of
remembering one of them would be
dramatically smaller the next day and
then a little bit smaller the next day a
little bit smaller next day what he
discovered is that if he revised those
cards after a day the probabilities
would decrease at a smaller rate and
then if he revised them again a week
later they would decrease it a smaller
rate again and so he basically figured
out a roughly optimal equation for when
you should revise something you want to
remember so spaced repetition learning
is using this simple algorithm just
something like revise something after a
day and then three days and then a week
and then three weeks and so forth
and so if you use a program like Anki as
you know it will just do that for you
and if you and it will say did you
remember this and if you say no it will
reschedule it back to be up here again
like ten times faster than it otherwise
would have it's a kind of a way of being
guaranteed to learn something
because by definition if you're not
learning it it will be rescheduled to be
revised more quickly unfortunately
though it's also like it doesn't let you
for yourself if you not learning
something you you know like it your
revisions will just get more and more so
you have to find ways to learn things
productively and effectively like treat
your brain well so using like mnemonics
and stories and context and stuff like
that so yeah it's it's a super great
technique is like learning how to loan
is something which everybody should
learn before they actually learn
anything but almost nobody does what
have you so certainly works well for
learning new languages for I mean for
learned like small projects almost but
do you you know I started using it for
if you had who wrote a blog post about
this inspired me
I went Ben you I'm not sure is I started
when I read papers all all concepts and
ideas I'll put them was it Michael
Nelson in my Illinois muscle strains
that Michael started doing this recently
and he's been writing about it I so the
kind of today's evening house is a guy
called Peter was niak who developed a
system called super memo and he's been
basically trying to become like the
world's greatest Renaissance man over
the last few decades he's basically
lived his life with spaced repeated
repetition learning for everything I and
sort of like Michaels only very recently
got into this but he started really
getting excited about doing it for a lot
of different things for me personally I
actually don't use it for anything
except Chinese and the reason for that
is that Chinese is specifically a thing
I made a conscious decision that I want
to continue to remember even if I don't
get much of a chance to exercise it
because like I'm not often in China so I
I don't or else something like
programming languages or papers I have a
very different approach which is I try
not to
learn anything from them but instead I
try to identify the important concepts
and like actually ingest them so like
really understand that concept deeply
and study it carefully I will decide if
it really is important if it is like
incorporated into our library
you know incorporated into how I do
things
or decide it's not worth it say so I
find I find I didn't remember the things
that I care about because I'm using it
all the time so I've fell at last 25
years I've committed to spending at
least half of everyday learning or
practicing something new which is all my
colleagues have always hated because it
always looks like I'm not working I mean
if what I meant to be working on but it
always means I do everything faster
because I've been practicing a lot of
stuff so I kind of give myself a lot of
opportunity to practice new things and
so I find now I don't
yeah I don't often kind of find myself
wishing I could remember something
because if it's something that's useful
then I've been using it a lot that's
easy enough to look it up on google fit
speaking Chinese you can't look it up on
Google so do you have advice for people
learning new things so if you what have
you learned is a process does it I mean
it all starts is just making the hours
in the day available yeah you gotta
stick with it which is again the number
one thing that 99% of people don't do so
the people I started learning Chinese
with none of them were still doing it
twelve months later I'm still doing a
ten years later I tried to stay in touch
with them but they just no one did it
yeah for something like Chinese like
study how human learning works so my
every one of my Chinese flashcards is
associated with a story and that story
is specifically designed to be memorable
and we find things memorable which are
like funny or disgusting or sexy or
related to people that we know will care
about so I try to make sure all those
stories that are in my head have those
characteristics
yeah so you have to you know you won't
remember things well if they don't have
some context and yeah you won't remember
them well if you don't regularly
practice them whether it be just part of
your day to day life or the Chinese for
me flashcards I mean the other thing is
I'll let yourself fail sometimes
so like I've had various medical
problems over the last few years and
basically my flashcards just stopped for
about three years and then they've been
other times I've stopped for a few
months and it's so hard because you get
back to it and it's like you have 18,000
cards June and so you just have to go
alright well I can either stop and give
up everything or just decide to do this
every day for the next two years until I
get back to it
the amazing thing has been that even
after three years I you know the Chinese
were still in there like yeah it was so
much faster to relearn than it was to
learn the first time yeah absolutely
it's it's in there the same with with
guitar with music and so on it's sad
because the work sometimes takes away
and then you won't play for a year but
really if you then just get back to it
every day you're right through right
there again what do you think is the
next big breakthrough in artificial
intelligence what are your hopes in deep
learning or beyond that people should be
working on or you hope there'll be
breakthroughs I don't think it's
possible to predict I think yeah I think
what we already have is an incredibly
powerful platform to solve lots of
societally important problems that are
currently unsolved so I just hope that
people will lots of people will learn
this toolkit and try to use it I don't
think we need a lot of new technological
breakthroughs to do a lot of great work
right now and when do you think we're
going to create a human level
intelligence system do you think know
how hard is it how far away are we don't
know don't have no way to know I don't
know like I don't know why people make
predictions about this because there's
no data and nothing to go on and the
Senate that's right it's just like
there's so many
societally important problems to solve
right now I just don't find it a really
interesting question to even answer so
in terms of societally important
problems what's the problem well is
within reached for it
well I mean for example there are
problems that AI creates right so most
specifically
labor force displacement is going to be
huge and people keep making this
frivolous econometrics argument of being
like oh there's been other things that
aren't AI that have come along before
and haven't created massive labor force
displacement therefore AI want it slow
so there's a serious concern for you oh
yeah Andrew yang is running on it yeah
it's it's it's I'm desperately concerned
and you see already that the changing
workplace has lived to a hollowing out
of the middle class you're seeing that
students coming out of school today have
a less rosy financial future ahead of
them and the parents did which has never
happened in recent in the last few
hundred years you know we've always had
progress before and you see this turning
into anxiety and despair and and even
violence so I very much worry about that
quite a bit about ethics too I do think
that every data scientist working with
deep learning needs to recognize they
have an incredibly high leverage tool
that they're using that can influence
society in lots of ways and if they're
doing research that that research is
going to be used by people doing this
kind of work and they have a
responsibility to consider the
consequences and to think about things
like how will humans be in the loop here
how do we avoid runaway feedback loops
how do we ensure an appeals process for
humans that are impacted by my algorithm
how do I ensure that the constraints of
my algorithm are ethically explained to
the people that end up using them
there's all kinds of human issues which
only data scientists are actually in the
right place to educate people about but
data scientists tend to think of
themselves as just engineers and that
they don't need to be part of that
process just know yeah which is wrong
well you're in the perfect position to
educate them better to read literature
to read history to learn from history
well Jeremy thank you so much for
everything you do for inspiring huge
amount of people getting them into deep
learning and having the ripple effects
the the flap of a butterfly's wings that
will probably change the world so thank
you very much yes
you