Machine Learning at Spotify - Gustav Soderstrom | AI Podcast Clips
_TWgsvF4hBQ • 2019-10-09
Transcript preview
Open
Kind: captions
Language: en
there is an interesting statistic I saw
that
so Spotify has maybe you can correct me
but over 50 million songs tracks and
over 3 billion playlists so yes a
million songs and three billion playlist
60 times more playlists what do you make
of that yeah so the way I think about it
is that from a from is that the station
or machine learning point of view you
have all these if you only thing about
reinforcement learning where you have
this state space of all the tracks and
you can take different journeys through
this through this world and these I
think of these is like people helping
themselves and each other
creating interesting vectors through
this space of tracks and then it's not
so surprising that across you know many
tens of millions of kind of atomic units
there will be billions of paths that
make sense and we're probably pretty
quite far away from having found all of
them so kind of our job now is users
when Spotify started it was really a
search box that was for that time pretty
powerful and then I'd like to refer to
that this programming language called
play listing where if you as you
probably were pretty good at music
you knew your new releases you knew your
backyard low you knew your stairway to
heaven
you could create a soundtrack for
yourself using this playlist thing to
all that's like meta programming
language for music this sounds like your
life and people who were good at music
it's back to how do you scale the
product for people who are good at music
that wasn't actually enough if you had
the catalog in a good search tool you
can create your own sessions you could
create really good a soundtrack for your
entire life probably perfectly
personalized because you did it yourself
but the problem was most people many
people aren't that good at music they
just can't spend the time even if you're
very good at news it's gonna be hard to
keep up so what we did to try to scale
this was to essentially try to build you
can think of them as agents that there's
this friend that some people had that
helped them navigate this music catalog
that's what we're trying to do for you
but also there is something like 200
million active users on Spotify so there
it's okay so from the machine learning
perspective you have these two hundred
million people plus they're creating
it's really interesting to think of
playlist as I mean I don't know if you
meant it that way but it's almost like a
programming language it's a released a
trace of exploration of those individual
agents the the listeners and you have
all this new tracks coming in so it's a
fascinating space that is ripe for
machine learning so that is there is
there is it possible how can playlist be
used as data in terms of machine
learning and just to help Spotify
organize the music so we found in our
data not surprising that people who play
listed lots they retain much better they
had a great experience and so our first
attempt was to playlist for users and so
we acquired this company called tune ego
of editors and professional playlist
errs and kind of leverage the maximum of
human intelligence to help to help build
kind of these vectors through the track
space for for people and that that
broaden the product then the obvious
next and we you know use statistical
means where they could see what when
they created a playlist how did that
play this perform you know they could
see skips of the songs they could see
how the songs perform and they manually
iterated the playlist to maximize
performance for a large group of people
but there were never enough editors to
playlists for you personally so the
promise of machine learning was to go
from kind of group personalization using
editors and tools into statistics to
individualization and then what's so
interesting about the 3 billion playlist
we have is we ended the truth is we
locked up this was not a priori strategy
as is often the case yeah it looks
really smart in hindsight was as dumb
luck we looked at these playlists and we
had some people in the company a person
named Eric Reynolds on
it was really good at machine learning
already back in in back then in like
2007-2008 back then it was mostly
collaborative filtering you so forth but
we realized that what what this is is
people are grouping tracks for
themselves that have some semantic
meaning to them and then they actually
label it with a playlist name as well so
in a sense people were grouping tracks
along semantic dimensions and labeling
them and so could you could you use that
information to find that that latent
embedding and so we started playing
around with collaborative filtering and
we saw tremendous success with it
basically trying to extract some of
these some of these dimensions and and
if you think about it's not surprising
at all it'd be quite surprising if
playlists were actually random if they
had no semantic meaning for most people
they group these tracks for some reason
so we just happen to cross this
incredible data set where people are
taking taken these tens of millions of
tracks and grouped them along different
semantic vectors and the semantics being
outside the individual user so some kind
of universal there's a universal
embedding that holds across people on
this earth yes I do think that the
embeddings do finally gonna be
reflective of the people who play listed
so if if you have a lot of indie lovers
who playlist your embeds can perform
better there but what we found was that
yes there were these these latent
similarities they were very powerful and
we we had them it was interesting
because I think that the people who play
listed the most initially were this
so-called music aficionados who who
really into music and they often had a
certain they're tasteful stuff is often
certain geared towards a certain type of
music and so what surprised us if you
look at the problem from the outside you
might expect that the algorithms would
start performing best with mainstreamers
first because it somehow feels like an
easier problem to solve mainstream taste
then really particular taste it was the
complete opposite for us the
recommendations performed fantastically
for people who saw them
us having very unique taste that's
probably because all of them playlist
and they didn't perform so well for
mainstream is they actually thought they
were a bit too particular and unorthodox
so we had a complete opposite of what we
expected success within the hardest
problem first and then had to try to
scale to more mainstream recommendations
you
Resume
Read
file updated 2026-02-13 13:23:52 UTC
Categories
Manage