Machine Learning at Spotify - Gustav Soderstrom

Machine Learning at Spotify - Gustav Soderstrom | AI Podcast Clips

_TWgsvF4hBQ • 2019-10-09

Transcript preview

Open

Kind: captions
Language: en
there is an interesting statistic I saw
that
so Spotify has maybe you can correct me
but over 50 million songs tracks and
over 3 billion playlists so yes a
million songs and three billion playlist
60 times more playlists what do you make
of that yeah so the way I think about it
is that from a from is that the station
or machine learning point of view you
have all these if you only thing about
reinforcement learning where you have
this state space of all the tracks and
you can take different journeys through
this through this world and these I
think of these is like people helping
themselves and each other
creating interesting vectors through
this space of tracks and then it's not
so surprising that across you know many
tens of millions of kind of atomic units
there will be billions of paths that
make sense and we're probably pretty
quite far away from having found all of
them so kind of our job now is users
when Spotify started it was really a
search box that was for that time pretty
powerful and then I'd like to refer to
that this programming language called
play listing where if you as you
probably were pretty good at music
you knew your new releases you knew your
backyard low you knew your stairway to
heaven
you could create a soundtrack for
yourself using this playlist thing to
all that's like meta programming
language for music this sounds like your
life and people who were good at music
it's back to how do you scale the
product for people who are good at music
that wasn't actually enough if you had
the catalog in a good search tool you
can create your own sessions you could
create really good a soundtrack for your
entire life probably perfectly
personalized because you did it yourself
but the problem was most people many
people aren't that good at music they
just can't spend the time even if you're
very good at news it's gonna be hard to
keep up so what we did to try to scale
this was to essentially try to build you
can think of them as agents that there's
this friend that some people had that
helped them navigate this music catalog
that's what we're trying to do for you
but also there is something like 200
million active users on Spotify so there
it's okay so from the machine learning
perspective you have these two hundred
million people plus they're creating
it's really interesting to think of
playlist as I mean I don't know if you
meant it that way but it's almost like a
programming language it's a released a
trace of exploration of those individual
agents the the listeners and you have
all this new tracks coming in so it's a
fascinating space that is ripe for
machine learning so that is there is
there is it possible how can playlist be
used as data in terms of machine
learning and just to help Spotify
organize the music so we found in our
data not surprising that people who play
listed lots they retain much better they
had a great experience and so our first
attempt was to playlist for users and so
we acquired this company called tune ego
of editors and professional playlist
errs and kind of leverage the maximum of
human intelligence to help to help build
kind of these vectors through the track
space for for people and that that
broaden the product then the obvious
next and we you know use statistical
means where they could see what when
they created a playlist how did that
play this perform you know they could
see skips of the songs they could see
how the songs perform and they manually
iterated the playlist to maximize
performance for a large group of people
but there were never enough editors to
playlists for you personally so the
promise of machine learning was to go
from kind of group personalization using
editors and tools into statistics to
individualization and then what's so
interesting about the 3 billion playlist
we have is we ended the truth is we
locked up this was not a priori strategy
as is often the case yeah it looks
really smart in hindsight was as dumb
luck we looked at these playlists and we
had some people in the company a person
named Eric Reynolds on
it was really good at machine learning
already back in in back then in like
2007-2008 back then it was mostly
collaborative filtering you so forth but
we realized that what what this is is
people are grouping tracks for
themselves that have some semantic
meaning to them and then they actually
label it with a playlist name as well so
in a sense people were grouping tracks
along semantic dimensions and labeling
them and so could you could you use that
information to find that that latent
embedding and so we started playing
around with collaborative filtering and
we saw tremendous success with it
basically trying to extract some of
these some of these dimensions and and
if you think about it's not surprising
at all it'd be quite surprising if
playlists were actually random if they
had no semantic meaning for most people
they group these tracks for some reason
so we just happen to cross this
incredible data set where people are
taking taken these tens of millions of
tracks and grouped them along different
semantic vectors and the semantics being
outside the individual user so some kind
of universal there's a universal
embedding that holds across people on
this earth yes I do think that the
embeddings do finally gonna be
reflective of the people who play listed
so if if you have a lot of indie lovers
who playlist your embeds can perform
better there but what we found was that
yes there were these these latent
similarities they were very powerful and
we we had them it was interesting
because I think that the people who play
listed the most initially were this
so-called music aficionados who who
really into music and they often had a
certain they're tasteful stuff is often
certain geared towards a certain type of
music and so what surprised us if you
look at the problem from the outside you
might expect that the algorithms would
start performing best with mainstreamers
first because it somehow feels like an
easier problem to solve mainstream taste
then really particular taste it was the
complete opposite for us the
recommendations performed fantastically
for people who saw them
us having very unique taste that's
probably because all of them playlist
and they didn't perform so well for
mainstream is they actually thought they
were a bit too particular and unorthodox
so we had a complete opposite of what we
expected success within the hardest
problem first and then had to try to
scale to more mainstream recommendations
you

Resume

Berikut adalah rangkuman komprehensif berdasarkan transkrip yang Anda berikan:

# Evolusi Spotify: Dari Kotak Pencarian hingga Kekuatan Machine Learning di Balik Miliaran Playlist

### Inti Sari (Executive Summary)
Video ini membahas perjalanan evolusi Spotify dari sekadar platform "kotak pencarian" lagu menjadi layanan streaming yang kompleks dengan lebih dari 3 miliar playlist. Pembahasan berfokus pada perubahan pendekatan Spotify dalam kurasi musik, mulai dari mengandalkan kemampuan pengguna dan editor profesional, hingga penerapan *machine learning* dan *collaborative filtering* untuk mempersonalisasi pengalaman pengguna melalui agen cerdas.

### Poin-Poin Kunci (Key Takeaways)
*   **Skala Data:** Spotify mengelola lebih dari 50 juta lagu dan lebih dari 3 miliar playlist, dengan rasio playlist terhadap lagu mencapai 60 banding 1.
*   **Konsep Playlist:** Playlist dianggap sebagai jejak eksplorasi pengguna atau "vektor" yang membentuk perjalanan musik, mirip dengan bahasa pemrograman untuk menciptakan *soundtrack* kehidupan.
*   **Retensi Pengguna:** Pengguna yang aktif membuat playlist memiliki tingkat retensi (pertahanan) yang jauh lebih baik dibandingkan yang hanya mendengarkan.
*   **Evolusi Kurasi:** Strategi beralih dari penggunaan editor manusia (akuisisi Tuneego) menuju personalisasi individu menggunakan *machine learning*.
*   **Peran ML:** *Machine learning* digunakan untuk memecahkan masalah kurasi karena sebagian besar pengguna tidak ahli dalam menyusun musik, bertindak sebagai "teman yang paham musik" bagi pengguna.

### Rincian Materi (Detailed Breakdown)

#### 1. Statistik dan Ekosistem Spotify
Spotify telah berkembang menjadi platform dengan skala masif, menghosting lebih dari 50 juta trek lagu dan lebih dari 3 miliar playlist. Jumlah playlist ini 60 kali lebih banyak daripada jumlah lagu yang tersedia. Selain itu, platform ini melayani sekitar 200 juta pengguna aktif. Dari perspektif *machine learning*, koleksi lagu ini dilihat sebagai *state space* (ruang keadaan) yang luas, di mana playlist berfungsi sebagai jalur atau vektor perjalanan yang dibuat oleh pengguna untuk menavigasi ruang tersebut.

#### 2. Filosofi "Playlisting" dan Tantangan Pengguna
Pada awalnya, Spotify berfungsi sebagai kotak pencarian sederhana. Proses membuat playlist diibaratkan sebagai bentuk bahasa pemrograman di mana pengguna menciptakan *soundtrack* untuk diri mereka sendiri. Namun, muncul tantangan utama: sebagian besar pengguna tidak memiliki keahlian atau kemampuan kurasi musik yang baik. Untuk mengatasi ini, Spotify bertujuan membangun "agen"—semacam teman akrab yang paham musik—yang dapat membantu pengguna menemukan musik yang tepat.

#### 3. Strategi Awal: Kurasi Manusia dan Akuisisi Tuneego
Upaya awal Spotify untuk mengatasi masalah kurasi melibatkan campur tangan manusia. Mereka mengakuisisi sebuah perusahaan yang disebut "Tuneego" untuk mendapatkan editor dan *playlisters* profesional. Tim manusia ini menggunakan kecerdasan mereka untuk menyusun playlist, yang kemudian disempurnakan secara manual berdasarkan data statistik seperti jumlah *skip* (pemutaran ulang) dan kinerja lagu. Namun, pendekatan ini terbatas karena jumlah editor tidak akan pernah cukup untuk melayani semua pengguna.

#### 4. Transisi ke Machine Learning dan Personalisasi
Keterbatasan kurasi manual mendorong Spotify untuk beralih ke janji *machine learning*. Fokus bergeser dari *group personalization* (playlist umum oleh editor) menuju *individualization* (personalisasi untuk tiap individu). Menariknya, keberadaan 3 miliar playlist saat ini disebut sebagai "kebetulan bodoh" (*dumb luck*)—bukan strategi awal perusahaan—namun data ini menjadi sangat kaya untuk dimanfaatkan oleh algoritma.

#### 5. Sejarah Teknis: Collaborative Filtering
Secara teknis, Spotify telah merintis pemanfaatan data sejak 2007-2008. Seorang bernama Eric Reynolds mulai mengembangkan konsep *collaborative filtering*. Ide dasarnya adalah mengamati bagaimana orang-orang mengelompokkan trek lagu ke dalam playlist. Pengelompokan ini memiliki makna semantik; ketika pengguna melabeli kumpulan lagu dengan nama playlist tertentu, mesin dapat mempelajari hubungan antar trek berdasarkan konteks penggunaan tersebut.

### Kesimpulan & Pesan Penutup
Kesimpulan utama dari bagian ini adalah bahwa playlist bukan sekadar kumpulan lagu, melainkan data berharga yang mencerminkan perilaku eksplorasi manusia. Spotify berhasil mengidentifikasi bahwa pengguna yang berinteraksi dengan playlist memiliki loyalitas lebih tinggi. Dengan memadukan kurasi awal oleh manusia dan *machine learning* canggih (seperti *collaborative filtering*), Spotify bertransformasi dari alat pencari sederhana menjadi asisten pribadi musik yang mampu memahami dan memprediksi keinginan pengguna secara individual.

Read

file updated 2026-02-13 13:23:52 UTC