Jeremy Howard: fast.ai Deep Learning Courses and Research | Lex Fridman Podcast #35
J6XcP4JOHmk • 2019-08-27
Transcript preview
Open
Kind: captions Language: en the following is a conversation with Jeremy Howard he's the founder of fast AI a Research Institute dedicated to making deep learning more accessible he's also a distinguished research scientist at the University of San Francisco a former president of Kegel as well as the top ranking competitor there and in general he's a successful entrepreneur educator researcher and an inspiring personality in the AI community when someone asked me how do I get started with deep learning fast AI is one of the top places that point them to it's free it's easy to get started it's insightful and accessible and if I may say so it has very little BS they can sometimes dilute the value of educational content on popular topics like deep learning fast AI has a focus on practical application of deep learning and hands-on exploration of the cutting edge that is incredibly both accessible to beginners and useful to experts this is the artificial intelligence podcast if you enjoy it subscribe on YouTube give it five stars and iTunes supported on patreon or simply connect with me on Twitter Alex Friedman spelled Fri D ma N and now here's my conversation with Jeremy Howard what's the first program you've ever ridden this program I wrote that I remember would be at high school I did an assignment where I decided to try to find out if there were sand like better musical scales and the normal twelve tone twelve interval scale so I wrote a program on my Commodore 64 in basic let's search through other scale sizes to see if you could find one where they were more accurate you know harmonies like mid tone like sliding like he won an actual exactly 3 to 2 ratio where else with a 12 interval scale it's not exactly 3 to 2 for example so that's in the car well tempered as I say you know and basic on a Commodore 64 yeah where was the interest in music from or is it just I took music all my life so I played the phone and clarinet and piano and guitar and drums and whatever so how does that threat go through your life where's music today yeah it's not where I wish it was I for various reasons couldn't really keep it going particularly because I had a lot of problems with RSI with my fingers and so I had to kind of like cut back anything that used hands and fingers I hope one day I'll be able to get back to it health-wise so there's a love for music underlying it all yeah what's your favorite instrument sex the phone sex baritone saxophone well probably bass saxophone but they're awkward well I'm I always love it when music is coupled with programming there's something about a brain that utilizes those that emerges with creative ideas so you've used and studied quite a few programming languages can you given an overview of what you've used one of the pros and cons of each well my favorite programming environment almost certainly was Microsoft Access back in like the earliest days so that was Visual Basic for applications which is not a good programming language for the programming environment fantastic it's like the ability to create you know user interfaces and tie data and actions to them and create reports and all that as I've never seen anything as good there's things nowadays like air table which you're like small subsets of that which people love for good reason but unfortunately nobody's ever achieved anything like that what is that if you could pause in there for a second no access this is it a database database program that Microsoft produced part of office and the kind of wizard you know but basically it lets you in a totally graphical way create tables and relationships and queries and tie them to forms and set up you know event handlers and calculations and it was very plate powerful system designed for not massive scalable things but fair like useful little applications that I loved so what's the connection between excel and access so very close so access kind of was the relational database equivalent if you like so people still do a lot of that stuff it should be an access in Excel excels they don't know what Excel is great as well so but it's just not as rich a programming model as VBA combined with a relational database and so I've always loved relational databases but today programming on top of a relational database is just a lot more of a headache you know you generally either need to kind of you know you need something that connects that that runs some kind of database server unless you use circle light which has its own issues then you can often if you want to get a nice programming model you'll need to like create and add an ORM on top and then I don't know there's all these pieces tie together and it's just a lot more awkward than it should be there are people that are trying to make it easier so in particular I think of if sharp you know Don Syme who him and his team have done a great job of making something like a database appear in the type system so you actually get like tab completion for fields and tables and stuff like that anyway so that was kind of anyway so like that whole VBA office thing I guess was a starting point which I still miss I got into standard Visual Basic that's interesting just to pause on them for a second it's interesting that you're connecting programming languages to the ease of management of data yeah so in your use of programming languages you always had a love and a connection with data I've always been interested in doing useful things for myself and for others which generally means getting some data and doing something with it and putting it out there again so that's been my interest throughout so I also did a lot of stuff with Apple script back in the early days so it's kind of nice being able to get the computer and computers to talk to each other and to do things for you and then I could think that one night the programming language I most loved then would have been Delphi which was object pascal created by under sales berg who previously did to it by pascal and then went on to create dotnet and then went on create typescript delphi was amazing because it was like a compiled fast language that was as easy to use as Visual Basic Delphi what is it similar to in in more modern languages Visual Basic Visual Basic yeah that a compiled fast version so I'm not sure there's anything quite like it anymore if you took like C shop or Java and got rid of the virtual machine and replaced it with something you could compile a small type binary I feel like it's where um Swift could get to with the new Swift UI and the cross-platform development going on like that's one of my dreams is that will hopefully get back to where Delphi was there is actually a free Pascal project nowadays called Lazarus which is also attempting to kind of recreate Delphi though they're making good progress so ok Delphi that's one of your favorite programming languages programming environments again I hate Pascal's not a nice language if you wanted to know specifically about what languages I like they would definitely pick J there's being an amazingly wonderful language well woods j.j are you aware of APL I am NOT okay so from doing a little research on work you've done okay so not at all surprising you're not familiar with it cuz it's not well known but it's actually one of the main families of programming languages going back to the late 50s early 60s so there was a couple of major directions one was the kind of lambda calculus Alonzo Church direction which I guess kind of listens game and whatever which has a history going back to the early days of computing the second was the kind of imperative /o o you know algo Simula going under C C++ so forth there was a third which Accord array oriented languages which started with a paper by a guy called Ken Iverson which was actually a math theory paper not a programming paper it was called notation as a tool for thought and it was the development of a new way a new type of math notation and the idea is that this math notation would be was was much more flexible expressive and also well-defined then traditional math notation which is none of those things math notation is awful and so he actually turned that into a programming language and because this was the early 50s although that's very late 50s although names were available so he called his language a programming language or APL ABL APL is a implementation of notation as a tool for thought by which he means math notation and Ken and his son went on to do many things but eventually they actually produced you know a new language that was built on top of all the learnings of APL that was called J and J is the most expressive composable language of you know beautifully designed language I've ever seen this didn't have object-oriented components deserve that kind of thing there's not really it's an array oriented language it's a new it's a it's an it's it's the third half using array array oriented yes so I need to be a ray warrior so arranged it means that you generally don't use any loops but the whole thing is done with kind of a extreme version of broadcasting if you're familiar with that none got an umpire slash Python concept so you do a lot with one line of code it looks a lot like math notation basically I'll compact mm-hm and the idea is that you can kind of because you can do so much with one line of code a single screen of code is very unlikely to you very rarely need more than that to in the rest your program and so you can kind of keep it all in your head and you can kind of clearly communicate it it's interesting that the APL created two main branches k and j j is this kind of like open source niche community of crazy enthusiasts like me and then the other path k was fascinating it's an astonishingly expensive programming language which many of the world's most ludicrous a rich hedge funds use so the entire machine is so small it sits inside level 3 cache on your CPU and and it easily wins every benchmark I've ever seen in terms of data processing speed hey you don't come across it very much because it's like $100,000 per CPU to to run it yeah but it's like this this this this path of programming languages it's just so much that are not so much more powerful in every way than the ones that almost anybody uses every day so though it's all about computation it's really focused pretty heavily focused on computation I mean so much of programming is data processing by definition and so there's a lot of things you can do with it but yeah there's not much work being done on making like use user interface talking us or whatever I mean this some but it's they're not great at the same time you've done a lot of stuff with Perl and Python yeah so where does that fit into the picture of J and K and APO and well you know it's much more pragmatic like in the end you kind of have to end up where the where the libraries are you know like because to me my my focus is on productivity I just want to get stuff done and solve problems so Perl was great for I created an email company called fast mail and Perl was great cuz back in the late 90s early 2000s it just had a lot of stuff it could do I still had to write my own monitoring system and my own web framework my own whatever because like none of that stuff existed but it was the super flexible language to do that in and you used Perl fast ball used as a back-end think so everything was written in Perl yeah yeah everything everything was fell why do you think Perl hasn't succeeded or hasn't dominated the market where Python really takes over a lot yeah well I mean it felt did dominate it was for time everything everywhere but then the guy that Pal Larry will kind of just didn't put the time in anymore and no project can be successful if there isn't you know it's particularly one that's data with a strong leader that that loses that strong leadership so then python is kind of replaced - you know python is a lot less elegant language in nearly every way but it has the data science libraries and a lot of them are pretty great so I kind of use it because it's the best we have but it's definitely not good enough what do you think the future programming looks like what do you hope the future programming looks like if we zoom in on the computational fields on data science on machine learning I hope Swift is successful because the goal is Swift the way Chris Lattner describes it is to be infinitely hackable and that's what I want I want something where me and the people I do research with and my students can look at and change everything from top to bottom there's nothing mysterious and magical and inaccessible unfortunately with Python it's the opposite of that because pythons so slow it's extremely unhackable you get to a point where it's like okay from here on down at sea so your debugger doesn't works in the same way your profiler doesn't work in the same way your build system doesn't work in the same way it's really not very happy ball at all what's the part you would like to be hackable is it for the objective of optimizing training of neural networks inference in your networks is it performance of the system or is there some non performance related just it's it's a greater thing I'm in the end I want to be productive as a practitioner so that means that so like at the moment our understanding of deep learning is incredibly primitive there's very little we understand most things don't work very well even though it works better than anything else out there there's so many opportunities to make it so you look at any domain area like I don't know speech recognition with deep learning or natural language processing classification with deep learning or whatever every time I look at an area with deep learning I always see like oh it's terrible there's lots and lots of obviously stupid ways to do things that need to be fixed so then I want to be able to jump in there and quickly experiment and make them better using the programming language is has a role in a huge role yes so currently Python has a big gap in terms of our ability to innovate particularly around recurrent neural networks and natural language processing because it because it's so slow the the actual loop where we actually loop through words we have to do that whole thing in CUDA C so we actually can't innovate with the kernel the heart of that most important algorithm and it's just a huge problem and this happens all over the place so we hit you know research limitations another example convolutional neural networks which actually the most popular architecture for lots of things maybe most things in declining we almost certainly should be using space convolutional neural networks but only like two people are because to do it you have to rewrite all of that CUDA sea level stuff and yeah this researchers and practitioners don't so like there's just big gaps in like what people actually research on what people actually implement because of the programming language problem so you think you think it's it's just too difficult to write in CUDA see that a programming like a higher level programming language like Swift should enable the the easier input fooling around creative stuff with RN ends or was parse convolution your noise kind of who's a who's at fault who's who's a charge of making it easy for a research - player I mean no one's at fault just know what he's got around to it yet or it's just it's hard right and I mean part of the fault is that we ignored that whole APL kind of direction most prominently everybody did for 60 years 50 years but recently people have been starting to reinvent pieces of that and kind of create some interesting new directions in the compiler technology so the place where that's particularly happening right now is something called ml ir which is something that ok I'm Kris lat know this rift guy is leading and because it's actually not gonna be swift on its own that solves his problem because the problem is they're currently writing a acceptable fast you know GPU program is too complicated regardless of what language you use no and that's just because if you have to deal with the fact that I've got you know 10,000 threads and I have to synchronize between them all and I have to put my thing in to grid blocks and think about warps and all this stuff it's just it's just so much boilerplate to do that well you have to be a specialist at that and it's going to be a year's work to you know optimize that algorithm in that way but with things like tensor comprehensions and tile and ml ir and t vm there's all these various projects which are all about saying let's let people create like domain-specific languages for tensor computations these are the kinds of things we do are generally in on the GPU for deep learning and then have a compiler which can optimize that tensor computation a lot of this work is actually sitting on top of a project called halide which was is a mind-blowing project where they came up with such a domain-specific language in fact true one domain-specific language for expressing this is what my tensor computation is and another domain-specific language for expressing this is the kind of the way I want you to structure the compilation of that like do it block by block and do these bits in parallel they were able to show how you can compress the amount of code by 10x compared to optimized GPU code and get the same performance so that's like so these other things are kind of sitting on top of that kind of research and ml ir is pulling a lot of those best practices together and now we're starting to see work done on making all of that directly accessible through Swift so that I could use Swift to kind of write those domain-specific languages and hopefully we'll get them Swift CUDA kernels written in a very expressive and concise way that looks a bit like J in APL and then Swift layers on top of that and then a swift UI on top of that and you know it'll be so nice if we can get to that point that does it all eventually boil down to CUDA and NVIDIA GPUs unfortunately at the moment it does but one of the nice things about ml ir if AMD ever gets their act together which they probably won't is that they or others could write MLA our backends for other GPUs or other or other tensor computation devices of which today there are increasing number are like graph core or vertex AI or whatever so yeah being able to target lots of backends would be another benefit of this and the market really needs competitions at the moment NVIDIA is massively overcharging for their kind of enterprise class cards because there is no serious competition because nobody else is doing the software properly in the cloud there is some competition right but not really other than TP used for heavy use are almost unprogrammed well at the moment you can't the GPUs has the same problem the case is even worse so TP use the Google actually made an explicit decision to make them almost entirely unprogrammed ball because they felt that there was too much IP in there and if they gave people direct access to program them people would learn their secrets yeah so you can't actually directly program the memory in a teepee you you can't even directly like create code that runs on and that you look at on the machine that has the GPU it all goes through a virtual machine so all you can really do is this kind of cookie cutter thing of like plug into high-level stuff together which is just super tedious and annoying and totally unnecessary so what was the tell me if you could the origin story of fast AI what is the motivation its mission its dream so I guess the founding story is heavily tied in my previous startup which is a company called in lytic which was the first company to focus on deep learning for medicine and I created that because I saw that was a huge opportunity to there's a there's a about a 10x shortage of the number of doctors in the world and the developing world that we need expected it would take about three hundred years to train enough doctors to meet that gap but I guess that maybe if we used deep learning for some of the analytics we could maybe make it so you don't need as highly trained doctors diagnosis diagnosis and treatment planning where's the biggest benefit just before get the first day I was where's the biggest benefit of AI in medicine DC today and not much not much happening today in terms of like stuff that's actually out there it's very early but in terms of the opportunity it's to take markets like India and China and Indonesia which have big populations Africa small numbers of doctors and provide diagnostic particularly treatment planning and triage kind of on device so that if you do a you know test for malaria or tuberculosis or whatever you immediately get something that even a health care worker that's had a month of training can get a very high quality assessment of whether the patient might be at risk until you know okay we'll send them off to a hospital so for example in Africa outside of South Africa there's only five pediatric radiologists for the entire continent so most countries don't have any so if your kid is sick and they need something diagnose your medical imaging the person even if you're able to get medical imaging done the person that looks at it will be you know a nurse at best yeah but actually in India for example and in China almost no x-rays are read by anybody by any trained professional because they don't have enough so if instead we had a algorithm that could take the most likely high-risk 5% and say triage basically say okay somebody needs to look at this it would massively change the kind of way that what's possible with medicine in the developing world and remember they have increasingly they have money there the developing world they're not imported Apella people so they have the money so that they're building the hospitals they're getting the diagnostic equipment but they just there's no way for a very long time will they be able to have the expertise shortage of their sweeties okay and that's where the deep learning systems could step in and magnify the expertise they do exactly yeah so you do see just a longer it a little bit longer yeah the interaction you still see the human expert still at the core of these systems yeah absolutely there's something in medicine that can be automated almost completely I don't see the point of even thinking about that because we have such a shortage of people why would we not why would we want to find a way not to use them like we have people so the idea of like even from an economic point of view if you can make them 10x more productive getting rid of the person doesn't impact your unit economics at all and it totally ignores effect that there are things people do better than machines so it's just to me that's not a useful way of framing the problem I guess just to clarify I guess I meant there may be some problems where you can avoid even going to the expert ever sort of maybe preventive care or some basic stuff flowing and food allowing the expert to focus on the things that are that are really that well that's what the triage would do right so the triage would say okay it's ninety ninety nine percent sure there's nothing here right so you know that can be done on device and they can just say okay go home so the experts are being used to look at the stuff which has some chance it's worth looking at which most things is it's not you know it's fine why do you think we haven't quite made progress on that yet in terms of the the scale of how much AI is applied in the middle there's a lot of reasons I mean one is it's pretty new I only started and let it can like 2014 and before that like it's hard to express to what degree the medical world was not aware of the opportunities here so I went to iris na which is the world's largest radiology conference and I told everybody I could you know like I'm doing this thing this deep learning please come and check it out and no one had any idea what I was talking about and no one had any interest in it so like we've come from absolute zero which is hard and then the whole regulatory framework education system everything is just set up to think of doctoring in a very different way so today there is a small number of people who are deep learning practitioners and doctors at the same time and that we're starting to see the first ones come out of their PhD programs so that Kinane over in fostering Cambridge has a number of students now who are data data science experts deep learning experts and and actual medical doctors quite a few doctors have completed first day of course now and are publishing papers and creating journal reading groups in the American Council of radiology and like it's just starting out but it's going to be a long process they regulators have to learn how to regulate this they have to build you know guidelines and then the lawyers at hospitals have to develop a new way of understanding that sometimes it makes sense for data to be you know looked at in raw form in large quantities in order to create world-changing results he has a regulation around data all that it sounds it was probably the hardest problem but sounds reminiscent of autonomous vehicles as well many of the same regulatory challenges meaning the same data challenges yeah I mean funnily enough that problem is less their regulation and more the interpretation of that regulation by by lawyers in hospital so hipper is actually was designed to its it to P and hipper is not standing does not stand for privacy it stands for portability it's actually meant to be a way that data can be used and it was created with lots of gray areas because the idea is that would be more practical and would help people to use this this legislation to actually share data in a more thoughtful way unfortunately it's done the opposite because when a lawyer sees a gray area they see oh if we don't know we won't get sued then we can't do it today so hipper is not exactly the problem the problem is more than there's hospital lawyers are not incentive to make bold decisions about data portability or even to embrace technology that saves lives no they more want to not get in trouble for embracing the right but also it is also so slaves in a very abstract way which is like oh we've been able to release these hundred thousand and on most records I can't point at the specific person whose life that's saved I can say like oh we've ended up with this paper which found this result which you know diagnosed a thousand more people otherwise but it's like which ones were helped it's it's very abstract and on the counter side of that you may be able to point to a life that was taken because of something though yeah or or or a person whose privacy was violated it was like oh this specific person you know there was de-identified so we've identified just a fascinating topic we're jumping around I'll get back to fast AI but on the question of privacy data is the fuel for so much innovation in deep learning what's your sense and privacy whether we're talking about Twitter Facebook YouTube just the technologies like in the medical field that rely on people's data in order to create impact how do we get that right respecting people's privacy and yet creating technology that just learns from data one of my areas of focus is on doing more with less data which so most vendors unfortunately are strongly incented to find ways to require more data and more computation so Google and IBM being the most obvious IBM yeah so Watson you know so Google and IBM both strongly push the idea that you have to be you know that they have more data and more computation and more intelligent people than anybody else and so you have to trust them to do things because nobody else can do it and Google's very upfront about this like Geoff Dana's going out there and given talks and said our goal is to require a thousand times more computation but less people our goal is to use the people that you have better and the data you have better in the computation you have better so one of the things that we've discovered is or or at least highlighted is that you very very very often don't need much data at all and so the data you already have in your organization we'll be enough to get state-of-the-art results so like my starting point would be this going to say around privacy is a lot of people are looking for ways to share data and aggregate data but I think often that's unnecessary they assume that they need more data than they do because they're not familiar with the basics of transfer learning which is this critical technique for needing orders of magnitude less data is your sense one reason you might want to collect data from everyone is like in the recommender system context where your individual Jeremy Howard's individual data is the most useful for freeing for providing a product that's impactful for you so for giving you advertisements for recommending to your movies for doing medical diagnosis is your sense we can build with a small amount of data general models they will have a huge impact for most people that we don't need to have data from punching on the whole I'd say yes I mean they're things like you know recommender systems have this cold-start problem where you know Jeremy is a new customer we haven't seen him before so we can't recommend him things based on what else he's bought and liked with us and there's various workarounds to that like in a lot of music programs we'll start out by saying which of these artists you like which of these albums do you like which of these songs do you like Netflix used to do that nowadays they they tend not to people kind of don't like that because they think oh we don't want to bother the user so you could work around that by having some kind of data sharing where you get my marketing record from axiom or whatever and try to guess from that to me the the benefit to me and to society of saving me five minutes on answering some questions versus the negative externalities of if the privacy issue doesn't add up so I think like a lot of the time the places where people are invading our privacy in order to provide convenience is really about just trying to make them more money and and they move these negative externalities and to places that they don't have to pay for them so when you actually see regulations appear that actually cause the companies that create these negative externalities to have to pay for it themselves they say well we can't do it anymore so the cost is actually too high right but for something like medicine yeah I mean the hospital has my you know medical imaging my pathology studies my medical records and also I own my medical data so you can so I I helped a startup called doc AI one of the things doc AI does is that this has an app you can connect to you know Sutter Health's and webcore and Walgreens and download your medical data to your phone and then upload it again at your discretion to share it as you wish so with that kind of approach we can share our medical information with the people we want to yes of control I mean it really being able to control who you share with us on yeah so that that has a beautiful interesting tangent but to return back to uh the origin story of fast they act right so so before I started fast AI I spent a year researching where the biggest opportunities for deep learning because I knew from my time at Cal in particular that deep learning had kind of hit this threshold point where it was rapidly becoming the state of the art approach in every areas that looked at it and I've been working with neural nets for over 20 years I knew that from a theoretical point of view once it hit that point it would do that in kind of just about every domain and so I kind of spent a year researching what are the domains it's going to have the biggest low-hanging fruit in the shortest time period medicine but there were so many I could have picked and so there was a kind of level of frustration for me of like okay I'm really glad we've opened up the medical deep learning world and today is huge as you know but we can't do you know I can't do everything I don't even know like it took like in medicine it took me a really long time to even get a sense of like what kind of problems to medical practitioners solve what kind of data do they have who has that data so I kind of felt like I need to approach this differently if I want to maximize the positive impact of deep mourning rather than me picking an area and trying to become good at it and building something I should let people who are already domain experts in those areas and who already have the data do it themselves mm-hmm so that was the reason for fast AI is to basically try and figure out how to get deep learning into the hands of people who could benefit from it and help them to do so in as quick and easy and effective way as possible god it's all sort of empowered the the domain expert yeah and like partly it's because like unlike most people in this field my background is very applied and industrial that my first job at MIT was at McKinsey and company I spent 10 years in management consulting I I spend a lot of time with domain experts you know so I kind of respect them and appreciate them and know I know that's where the value generation in society is and so I also know how most of them can't code and most of them don't have the time to invest you know three years and a graduate degree or whatever so it's like how do i skill those two main experts I think it would be a super powerful thing you know biggest societal impact I could have so that yeah that was the thinking so so much a fast AI students and researchers and the things you teach are pragmatically minded right practically minded freaking figuring out ways how to solve real problems and fast right so from your experience what's the difference between theory and practice of deep learning well most of the research in the deep mining world is a total waste of time all right that's what I was getting at yeah it's it's a problem in science in general scientists need to be published which means they need to work on things that their peers are extremely familiar with and can recognize in advance in that area so that means that they all need to work on the same thing and so it really Inc and and the thing they work on there's nothing to encourage them to work on things that are practically useful so you get just a whole lot of research which is minor advances and stuff that's been very highly studied and has no significant practical impact where else the things that really make a difference like I mentioned transfer learning like if we can do better at transfer learning then it's this like world-changing thing we're suddenly like lots more people can do world-class work with less resources and less data and but almost nobody works on that or another example active learning which is the study of like how do we get more out of the human beings in the loop where's my favorite topic yeah so active learning is great but it's almost nobody working on it because it's just not a trendy thing right now you know what somebody's suicide interrupt you're saying that nobody is publishing an active learning but there's people inside companies anybody who actually has to solve a problem they're going to innovate an active learning yeah everybody kind of reinvents active learning when they actually have to work in practice because they start labeling things and they think gosh this is taking a long time and it's very expensive and then they start thinking well why am i labeling everything I'm only the machines only making mistakes on those two classes they're the hard ones maybe I ought to start labeling those two classes and then you start thinking well why did I do that manually why kind of just get the system to tell me which things are going to be hardest it's an obvious thing to do but yeah it's it's just like like transplant learning it's it's under studied and the academic world just has no reason to care about practical results the funny thing is like I've only really ever written one paper I hate writing papers and I didn't even write it it was my colleague sebastian ruder who actually wrote it I just knew did the research for it but it was basically introducing transfer learning successful transfer learning to NLP for the first time the algorithm is called GLM fit and it actually I actually wrote it for the course for the first day of course I wanted to teach people in LP and I thought I only want to teach people practical stuff and I think the only practical stuff is transfer learning and I couldn't find any examples of transfer learning and NLP so I just did it and I was shocked to find that as soon as I did it was you know the basic prototype took a couple of days smashed the state-of-the-art on one of the most important data sets in a field that I knew nothing about and I just thought well this is ridiculous and so I spoke to the best unit and he kindly offered to write it up the results and so it ended up being published in a CL which is the top link with a computational linguistics conference so like people do actually care once you do it but I guess it's difficult for maybe like junior researchers or like like I don't care whether I get citations or papers whatever I was right there's nothing in my life that makes that important which is why I've never actually bothered to write a pic of myself now for people who do I guess they have to pick the kind of safe option which is like yeah make a slight improvement on something that everybody is already working on yeah nobody does anything interesting or succeeds in life or the safe option speed I mean the nice thing is nowadays everybody is now working on you know a transfer learning because since that time we've had GPT and GPT too and Burt and you know it's like it's so yeah once you show that something is possible if nobody jumps you and I guess I hope to be a part of and I hope to see more innovation and active learning in the same way I think yeah try learning an active learning are fascinating public open were I actually helped start a startup called platform AI which is really all about active learning and yeah it's very interesting trying to kind of see what research is out there and make the most of it and there's basically none so we've had to do all our own research once again and just as easy described can you tell the story of the stanford competition dawn bench and fast day eyes achievement on it sure so something which I really enjoy is that I basically teach two courses a year the practical deep money for coders which is kind of the introductory course and then cutting-edge tech mining for coders which is the kind of research level course and while I teach those courses I have a I basically have a big office at the University of San Francisco big enough for like 30 people and I invite anybody any student who wants to come and hang out with me well I built the course and so generally it's full and so we have twenty or thirty people in a big office with nothing to do but study deep learning so it was during one of these times that somebody in the group said oh there's a thing called Don benched it looks interesting and I was like what the hell is that is it about some competition to see how quickly you can train a model seems kind of not exactly relevant to what we're doing but it sounds like the kind of thing which you might be interested in I checked it out and I said oh crap there's only ten days till it's over it's pretty too late and we're kind of busy trying to teach this course yeah maybe like oh it would make an interesting case study for the course like it's all the stuff where you're already doing why don't you just put together our current best practices and ideas so me and I guess about four students just decided to give it a go and we focused on this more one called Sipho ten which is that all 32 by 32 pixels can you say word on benches yeah so it's a competition to train a model as fast as possible I was run by Stanford as cheap as possible - that's also another one first cheap as possible and there was a couple of categories imagenet and so far 10 so image nets is big 1.3 million image thing that took a couple of days to train remember a friend of mine Pete worden who's now at Google I remember he told me how he trained imagenet a few years ago and he basically like had this little granny flat out the back that he turned into his image net training center and he figured you know after like a year of work he figured out how to train it and like ten days or something it's like that was a big job well so far ten at that time you could train in a few hours you know it's much smaller and easier so we thought would try so far 10 and yeah I've really never done that before like I've never really liked things like using more than one gpgpu at a time was something I tried to avoid cuz to me it's like very against the whole idea of accessibility is she better to do things with 1gb here I mean have you asked in the past before after having accomplished something how do I do this faster much faster Oh always but it's always for me it's always how do I make it much faster on a single genus you that a normal person could afford in their day-to-day life it's not how could I do it faster I you know having a huge data center because up to me it's all about like as many people should be to use something as possible without fussing around with infrastructure so anyway so in this case it's like well we can use eight GPUs just by renting a AWS machine so we thought we'd try that and yeah basically using the stuff we were already doing we were able to get you know the speed you know within a few days we had to speed down to I don't know that's a very small number of minutes I can't remember exactly how many minutes it was but I might have in like 10 minutes or something and so yeah we found ourselves at the top of the leaderboard easily for both time and money which really shocked me because the other people competing this were like Google and Intel and stuff we're like know a lot more about this stuff I think we do so that we were emboldened we thought let's try the imagenet one two way out of our league but our goal was to get under 12 hours yeah and we did which was really exciting and but we didn't put anything up on the leaderboard but we were down to like 10 hours but then Google put in some like 5 hours or something about us like oh they're so screwed but we kind of thought we'll keep trying you know if Google can do it info I mean Google did on five hours on someone like a TPU pod or something like a lot of hardware but we kind of like had a bunch of ideas to try like a really simple thing was why are we using these big images they're like 224 256 by 256 pixels you know why don't we try smaller ones and just elaborate there's a constraint on the accuracy that your training model is supposed to achieve yeah you got to achieve 93% I think it was for imagenet exactly which is very tough so you have to yeah 93% like they think that they picked a good threshold it was a little bit higher than what the most commonly used ResNet 50 model could achieve at that time so yeah so it's quite a difficult problem to solve but yeah we realized if we actually just use 64 by 64 images it trained a pretty good model and then we could take that same model and just give it a couple of epochs to learn 224 by 224 images and it was basically already trained it makes a lot of sense like if you teach somebody like here's what a dog looks like and you show them low res versions and then you say here's a really clear picture of a dog they already know what a dog looks like so that like just we jumped to the front and we ended up winning parts of that competition we actually ended up doing a distributed version over multiple machines a couple of months later and ended up at the top of the leaderboard we had 18 minutes in it yeah and it was and people have just kept on blasting through again and again since then so so what's your view on multi-gpu or multiple machine training in general as as a way to speed code up I think it's largely a waste of time both multi-gpu on a single machine and yeah particularly multi machines because it's just clunky motogp use is less clunky than it used to be but to me anything that slows down your iteration speed is a waste of time so you could maybe do your very last you know perfecting of the model on Motty GPUs if you need to that so for example I think doing stuff on imagenet is generally a waste of time why test things on 1.3 million images most of us don't use 1.3 million images and we've also done research that shows that doing things on a smaller subset of images gives you the same relative answers anyway so from a research point of view why waste that time so actually I released a couple of new data sets recently one is called imaginet the French image net which is a small subset of image net which is designed to be easy to classify I would highly spell imaginer it's got an extra T and e at the end because it's very French am i okay yeah I'm okay and then another one called image Wharf which is a subset of the image net that only contains dog breeds that's a hard one right that's a hard one yeah and I've discovered that if you just look at these two subsets you can train things on a single GPU in ten minutes and the results you get directly transferable to imagenet nearly all the time and so now I'm starting to see some researchers start to use these holidays that's so deeply love the way you think because I think you might have written a blog post saying that sort of going these big data sets is encouraging people to not think creatively absolutely so you're - it's sort of constrained you to Train on large resources and because you have these resources you think more research will be bit better and then you start like for some somehow you kill the creativity yeah and even worse than that Lex I keep hearing from people who say I decided not to get into deep learning because I don't believe it's accessible to people outside of Google to do useful work so like I see a lot of people make an explicit decision to not learn this incredibly valuable tool because they've they've drunk the Google kool-aid which is that only Google's big enough and smart enough to do it and I just find that so disappointing and it's so wrong and I think all the major breakthroughs in AI in the next twenty years will be doable on a single GPU like I would say my sense is all the big sort of well let's put it this way none of the big breakthroughs of the last 20 years or acquired multiple GPUs so like fetch norm well you drop out did you demonstrate to everyone of them yeah this is five multiple GPUs against the original Gans didn't require multiple ups well and and we've actually recently shown that you don't even need gains so we've developed gained level outcomes without knitting Gans and we can now do it with again by using transfer learning we can do it in a couple of hours on a single generator might like without the other serial port yeah so we've found loss functions that work super well without the adversarial part and then one of our students guy called Jason antic has created Cordiale defi which uses this technique to colorize old black-and-white movies you can do it on a single GPU color as a whole movie in a couple of hours and one of the things that Jason and I did together was we figured out how to add a little bit of n at the very end which it turns out for colorization makes it just a bit brighter and nicer and then Jason did masses of experiments to figure out exactly how much to do but it's still all done on his home machine on a single GPU in his lounge room and like if you think about like colorizing Hollywood movies that sounds like something a huge studio it would have to do but he has the world's best results on this there's this problem of microphones we're just talking two microphones now yeah it's such a pain in the ass to have these microphones to get good quality audio and I tried to see if it's possible to plop down a bunch of cheap sensors and reconstruct higher quality audio from multiple sources because right now I haven't seen work from okay we can say inexpensive mics automatically combining audio from multiple sources to improve the combined audio right people haven't done that and that feels like a learning problem alright so hopefully somebody can well I mean it's it's eminently doable and it should have been done by now I feel I felt the same way about computational photography four years ago that's right why are we investing in big lenses when three cheap lenses plus actually a little bit of intentional movement so like Holden you don't like take a few frames gives you enough information to get excellent sub pixel resolution which particularly with deep learning you would know exactly what you meant to be looking at we can totally do the same thing with audio I think there's a madness that it hasn't been done yet I live in progress on the photographer tog Rafik um yeah the dog photography is basically standard now so the the Google picks all night light I don't know if you've ever tried it but it's it's astonishing you take a picture in almost pitch black and you get back a very high quality image and it's not because of the lens same stuff is like adding the bouquet to the you know the background wearing have done computationally this depicts over here yeah basically the everybody now is doing most of the fanciest stuff on their pho
Resume
Categories