Tuomas Sandholm: Poker and Game Theory | Lex Fridman Podcast #12

b7bStIQovcY • 2018-12-28

Transcript preview

Open

Kind: captions
Language: en
the following is a conversation with
Thomas sent home he's a professor same
you and co-creator of lebra's which is
the first AI system to be top human
players in the game of heads-up No Limit
Texas Hold'em he has published over 450
papers on game theory and machine
learning including a best paper in 2017
at nips now renamed to new reps which is
where I caught up with him for this
conversation his research and companies
have had wide reaching impact in the
real world especially because he and his
group not only proposed new ideas but
also build systems to prove that these
ideas work in the real world this
conversation is part of the MIT course
on artificial general intelligence and
the artificial intelligence podcast if
you enjoy subscribe on youtube itunes or
simply connect with me on Twitter at Lex
Friedman spelled Fri D and now here's my
conversation with Thomas sent home can
you describe at the high level the game
of poker Texas Hold'em heads-up Texas
Hold'em for people who might not be
familiar at this card game yeah happy to
so heads up No Limit Texas Hold'em has
really emerged in the AI community as a
main benchmark for testing these
application independent algorithms for
imperfect information game solving and
this is a game that's actually played by
humans you don't see that much on TV or
casinos because well for obvious reasons
but you do see it in some expert level
casinos and you see it in the best poker
movies of all time it's actually an
event in the World Series of Poker but
mostly it's played online and typically
for pretty big sums of money and this is
a game that usually only experts play so
if you recall to your home game on a
Friday night it probably is not gonna be
hits up no Limit Texas Hold'em it might
be no let me it takes us Hold'em in some
cases but typically for a big group and
it's not as competitive well heads up
means it's two-player so it's really
like me against you
Am I
you better much like chess or or or go
in that sense but an imperfect
information game which makes it much
harder because I have to deal with
issues of you knowing things that I
don't know and I know things that you
don't know
instead of pieces being nicely laid on
the board for both of us to see
so in Texas Hold'em there's a two cards
that you only see the game on to you
yeah there is they gradually lay out
some cards that add up overall to five
cards that everybody can see yeah the
imperfect nature of the information is
the two cards that you're holding on
front yeah so as you said you know you
first get two cards in private each and
then you this a betting round then you
get three clubs in public on the table
then there's a betting round then you
get the fourth card in public on the
table they're spitting around then you
get the five fifth card on the table
there's a bending drop so there's a
total of four betting rounds and four
torrontés of information revelation if
you will the only the first tranche is
private and they omits public from there
and this is probably probably by far the
most popular game in AI and just the
general public in terms of imperfect
information so it's probably the most
popular spectator game to watch right so
which is why it's a super exciting game
tackle so it sits on the order of chess
I would say in terms of popularity in
terms of AI setting it as the bar of
what is intelligence
so in 2017 labret does how do you
pronounce it Liberato lebra das lebra
does beats little laughing they're a
little bit Latin LeBron is beat a few
for expert human players can you
describe that event what you learned
from it what was it like what was the
process in general for people who have
not read the papers and study yeah so
the event was that we invited four of
the top 10 players with these are
specialist players in heads-up no Limit
Texas Hold'em which is very important
because this game is actually quite
different than the the multiplayer
version we brought me in to Pittsburgh
to play at the reverse casino
for twenty days we wanted to get a
hundred and twenty thousand hands in
because we wanted to get statistical
significance so it's a lot of hands for
humans to play even for this top pros
who play fairly quickly normally so we
couldn't just have one of them play so
many hands twenty days they were playing
basically morning to evening and he
raised two hundred thousand as a little
incentive for them to play and the
setting was so that they didn't all get
fifty thousand we actually paid them out
based on how they did against the AI
each so they had an incentive to play as
hard as they could whether they're way
ahead the way behind or right at the
mark of beating the AI and you don't
make any money unfortunately right no we
can't make any money so so originally a
couple of years earlier I actually
explored whether we could actually play
for money because that would be of
course interesting as well to play
against the top people for money but the
Pennsylvania Gaming Board said no so so
if we couldn't so this is much like an
exhibit like for a musician or a boxer
or something like that
nevertheless you're keeping track of the
money and brought us one close to two
million dollars I think so so if there
if it was for real money if you were
able to earn money that was a quite
impressive and inspiring achievement
just a few details what what were the
players looking at I mean were they
behind the computer what was the
interface like yes there they were
playing much like they normally do these
top players when they play this game
they play mostly online so they used to
playing through what UI yes and they did
the same thing here so there was this
layout you could imagine there's a table
on the screen this the the human sitting
there and then there's the AI sitting
there and the the screen source
everything is happening the cards coming
out and so the bets being made and we
also had the betting history for the
human so if the human for what what had
happened in the ham so far they could
actually reference back
and and and so forth is there a reason
they were given access to the betting
history for well we just uh it's a it
didn't really matter that they wouldn't
have forgotten anyway these are top
quality people but we just want to put
out there so it's not a question for
human for getting and the AI somehow
trying to get advantage of better memory
so what was that like I mean that was an
incredible accomplishment so what did it
feel like before the event did you have
doubt hope
where was your confidence at yeah that's
great so a great question so eighteen
months earlier I had organized the
similar brains versus AI competition
with our previous a I call clerical and
we couldn't beat the humans so this time
around it was only eighteen months later
and I knew that this new AI Lovato's was
way stronger but it's hard to say how
you'll do against the top humans before
you try so I thought we had about a
50/50 shot and the international betting
sites put us a us as a four to one or
five to one underdog so it's kind of
interesting that people really believe
in people and I get over AI not just
people people don't just believe over
believing themselves but they have
overconfidence in other people as well
compared to the performance of AI and
yeah so we were afford to 105 to 108
beating the humans in a row we were
still 50/50 on the international betting
sites do you think there's something
special and magical about poker and in
the way people think about it in a sense
you have I mean even in chess there's no
Hollywood movies poker is this the star
of many movies and there's this feeling
that certain human facial expressions
and body language eye movement all these
tells are critical to poker you can look
into somebody's soul understand their
betting strategy and so on there so
that's probably why the possibly do you
think that is why people have a
confidence that humans will outperform
because AI systems cannot in
construct perceive these kinds of tells
they're only looking at betting patterns
and and nothing else the betting
patterns and and statistics so what's
more important to you if you step back
and human players human versus human
what's the role these tells of these
ideas that we romanticize yeah so I
split it into two parts so one is why do
humans trust he much more than AI and
all have overconfidence in humans yes I
think that's that's not really related
to tell a question it's just that
they've seen these top players how good
they are and they're really fantastic so
it's just hard to believe therefore that
the Navy I could beat them yeah so I
think that's where that comes from and
and that's actually maybe a more general
lesson about the AI that until you've
seen it over perform a human it's hard
to believe it it could but then the
tails a lot of these top players they're
so good at hiding tails that among the
top players it's actually not really
worth it for them to invest a lot of
effort trying to find tails in each
other because there's a so good at
hiding them so yes at the kind of Friday
evening game tells are gonna be a huge
thing you can read other people and if
you're a good reader you you'll read
them like an open book but at the top
levels of poker no details become a list
of the much much smaller and smaller
aspect of the game as you go to the top
levels the the amount of strategies the
amount of possible actions is is very
large ten to the power of one hundred
plus so there has to be some I've read a
few the papers related it has it has to
form some abstractions of various hands
and actions so what kind of abstractions
are effective for the game of poker yeah
so you're exactly right so when you go
from a game tree that's ten to the 161
especially in an imperfect information
game it's way too large to solve
directly even with our fastest ik
finding algorithms so you wanna abstract
it first and abstraction in games is
much trickier than abstraction in mdps
or other single agent settings because
you have these abstraction pathologies
that if I have a finer grained
abstraction the strategy that I can get
from that for the real game might
actually be worse than the strategy I
can get from the coarse-grained
abstraction if you have to be very
careful now the the kinds of
abstractions just to zoom out we're
talking about there's the hands
abstractions and then there's betting
strategies yeah what I think actions
yeah baiting access or so there's
information obstruction to talk about
general games information abstraction
which is the abstraction of what chance
does and this would be the cards in the
case of poker and then there's action
abstraction which is abstracting the
actions of the actual players which
would be bits in the case of poker
yourself and the other players yes
yourself and other players and for
information abstraction we were
completely automated so these were these
are algorithms but they do what we call
potential aware abstraction where we
don't just look at the value of the hand
but also how it might materialize in the
good or bad hands over time and it's a
certain kind of bottom-up process with
integer programming there and clustering
and various aspects how do you build
build this abstraction and then in the
action abstraction there it's largely
based on how humans other and other AIS
have played this game in the past but in
the beginning we actually use an
automated action abstraction technology
which is provably convergent that it
finds the optimal combination of eight
sizes but it's not very scalable so we
couldn't use it for the whole game but
we used it for the first couple of
betting actions so what's more important
the strength of the hand so the
information retraction or the how you
play them the actions does
you know the romanticized notion again
is that it doesn't matter what hands you
have that the actions the betting may be
the way you win no matter what hands you
have yeah so that's why you have to play
a lot of hands so that the role of luck
gets smaller so you could otherwise get
lucky and get some good hands and then
you're gonna win the match even with
thousands of hands you can get lucky
because there's so much variance in No
Limit Texas Hold'em because if we both
go all-in it's a huge stack or variant
so there are these massive swings in No
Limit Texas Hold'em so that's why you
have to play not just thousands but over
a hundred thousand hands don't get
statistical significance let me ask
another way this question if you didn't
even look at your hands but they didn't
know that the your opponents didn't know
that how well would you be able to do oh
that's a good question there's actually
I heard this story that this is
Norwegian female poker player goal and
at uber stud who's actually won a
tournament by doing exactly that but
that would be extremely rare so so I
cannot really play well the hands do
have some role to play oh yes so LeBron
is does not use as far as I understand a
used learning methods deep learning is
there room for learning in you know
there's no reason why lab artist doesn't
you know combined with an alphago type
approach for estimating the quality for
function estimator what are your
thoughts on this maybe as compared to
another algorithm which I'm not that
familiar with deep stack the the engine
that does use deep learning that it's
unclear how well it does but
nevertheless uses deep learning so what
are your thoughts about learning methods
to aid in the way that teller Broadus
plays the game of poker yeah so as you
said Lee barratto's did not use learning
methods and played very well without
them since then we have actually
actually here we have a couple of papers
on things that do use learning technique
Saxon so and deep learning in particular
and the sort of the way you're talking
about where it's learning an evaluation
function but in imperfect information
games unlike let's say in Co or now now
also in chess and shogi it's not some
sufficient to learn an evaluation for a
state because the value of an
information set depends not only on the
exact state but it also depends on both
players beliefs like if I have a bad
hand I'm much better off if the opponent
thinks I'm have a good hand and vice
versa if I have a good hand I'm much
better off if the opponent believes I
have a bad hand so the value of a state
is not just a function of the cards it
depends on if you will the path of play
but only to the extent that is captured
in the belief distributions so so that's
why it's not as simple as as it is
imperfect information games another one
I'd say it's simple there either it's of
course very complicated computationally
there too but at least conceptually it's
very straightforward there's a state
there's an evaluation function you can
try to learn it here you have to do
something more and what we do is in one
of these papers we're looking at
allowing where we allow with the
opponent to actually take different
strategies at the leaf of the search
tree as F if you will and and that is a
different way of doing it and it doesn't
assume therefore a particular way that
the opponent plays but it allows
opponent to choose from a set of
different continuation strategies and
that forces us to not be too optimistic
in our local head search and that's
that's one way you can do sound look
ahead search in imperfect information
games which is very different difficult
and in us you were asking about deep
stack what they did it was very
different than what we do either in Lee
brothers or in this new work
they were gender and Umrah generating
various situations in the game then they
were doing Luca head from there to the
end of the game as if that was a start
of a different game and then they were
using deep learning to learn those
values of those states but the states
were not just the physical states they
include the belief distributions when
you talk about look ahead for deep stack
or with libertas does it mean
considering every possibility that the
game can involve is that we're talking
about extremely sort of like this
exponentially growth of a tree yes so
we're talking about exactly that
much like you do in Alpha Beta search or
want to crawl to research but with
different techniques so there's a
different search algorithm and then we
have to deal with the leaves differently
so if you think about what Lee brothers
did we didn't have to worry about this
because we only did it at the end of the
game so we would always terminate into a
real situation and we would know what to
payout this it didn't do this
depth limited loka heads but now in this
new paper which is called depth limited
I think it's called depth limited
research for imperfect information games
we can actually do sound depth limited
look at it so we can actually started
with a look ahead
from the beginning of the game on
because that's too complicated to do for
this whole long game
so in Lee brothers we were just doing it
for the end so and then the other side
this belief distribution so is it
explicitly modeled what kind of beliefs
that the opponent might have yeah yeah
it is explicitly modeled but it's not
assumed the beliefs are actually output
not input of course the starting beliefs
are input but they just fall from the
rules of the game because we know that
the dealer deals uniformly from the dick
so I know that every pair of cards that
you might have is equally likely I know
that for a fact
that's as follows from the rules of the
game of course except the two cards that
I have I know you don't have those yes
you have to take that into account
that's called card removal and that's
very important is the dealing always
coming from a single deck in the heads
up so you can assume single deck know
that if some if if I have the ace of
spades I know you don't have an ace of
spades
okay so in the beginning your belief is
basically the fact that it's a fair
dealing of hands but how do you adjust
start to adjust that belief well that's
a where this beauty of games here it
comes so nash equilibrium which john
nash introduced in 1950 introduces what
rational play is when you have more than
one player and these are pairs of
strategies where strategies are
contingency plans one for each player so
neither player wants to
deviate to a different strategy given
that the other doesn't deviate but as a
side effect you get the beliefs from
Bayes rule so Nash equilibrium really
isn't just deriving in these imperfect
information games Nash equilibrium
doesn't just define strategies it also
defines beliefs for both us and it
defines beliefs for each state so at the
each state it's if they take all
information sets at each information set
in the game there's a set of different
states that we might be in but I don't
know which one we're in Nash equilibrium
tells me exactly what is a probability
distribution over those real world
states in my mind how does naturally
give you that distribution so why I'll
do a simple example so you know the game
rock-paper-scissors so we can draw it as
player 1 moves first and then player 2
moves but of course it's important that
player 2 doesn't know what player 1
moved otherwise player 2 would win every
time so we can draw that as an
information set where player 1 makes one
of three moves first and then there's an
information set for player 2 so player 2
doesn't know which of those nodes the
world is it but once we know the
strategy for player 1 Nash equilibrium
will say that you play 1/3 Rock 1/3
paper 1/3 caesars from that I can derive
my beliefs of the information set that
they wanted 1/3 wants it though so Bayes
gives you that basis you but is that
specific to a particular player or is it
is there something you quickly update
with the game theory isn't really player
specific so that's what also why we
don't need any data we don't need any
history how these particular humans
played in the past or how any AI or even
had played before it's all about
rationality so we just think the AI just
thinks about what would a rational
opponent do and what would I do if I
were right I am rational and that that's
that's the idea of game theory so it's
really a data free opponent free
approach sir comes from the design of
the game as opposed to the design of the
player exactly if there's no opponent
modeling per se I mean we've done some
work on combining opponent modeling with
game theory so you couldn't exploit weak
players even more but that's another
strand and in the Lee brothers we didn't
turn that on because I decided that
these players are too good and when you
start to exploit an opponent you'll
typically open yourself up self up to
exploitation and these guys have so few
holes to exploit and they're world's
leading experts in counter exploitation
so I decided that we're not gonna turn
that stuff on actually I saw a few
papers exploiting opponents it sound
very interesting to explore do you think
there's room for exploitation generally
outside of LeBron us is is there subject
or people differences that could be
exploited maybe not just in poker but in
general interactions negotiations all
these other domains that yours
considering yeah I definitely we've done
some work on that and I really like
their work at hybridize is the two so
you figure out what would a rational
opponent do and by the way that's safe
in these zero-sum games two player
zero-sum games because if the opponent
does something irrational yes it might
show throw off my beliefs but the amount
that the player can gain by throwing off
my belief is always less than they lose
by playing poorly so so it's safe but
still if somebody's weak as a player you
might want to play differently to
exploit them more so that you can think
about it this way a game theoretic
strategies are unbeatable but it doesn't
maximally beat the other opponent so the
winnings per hand might be better with a
different strategy and the hybrid is
that you start from a game theoretic
approach and then as you gain data from
about the opponent in certain parts of
the game tree that in those parts of the
game tree you start to tweak your
strategy more and more towards
exploitation while still staying fairly
close to the game theoretic strategy so
as to not open yourself up to
exploitation too much
how do you do that do you try to vary up
strategies make it unpredictable it's
like what is it
tit-for-tat strategies in prisoner's
dilemma or well it doesn't that that's a
repeated game kind of prisoner's dilemma
repeats it games but but even there
there's no proof that says that that's
the best thing but experimentally it
actually does does does well so what
kind of games are there first of all I
don't know if this is something that you
could just summarize there's perfect
information games or all the
informations on the table there is
imperfect information games there's
repeated games you play over and over
there's zero-sum games there's
nonzero-sum games yeah and then there's
a really important distinction you're
making two-player versus more players so
what are what other games out there and
what's the difference for example with
this two-player game versus more players
yeah what are the key differences right
here so let me start from the the basic
so a repeated game is a game where the
same exact game is played over and over
in these extensive form games where
think about three form maybe with these
information says to represent incomplete
information you can have kind of
repetitive interactions even repeated
games are a special case of that by the
way but if the game doesn't have to be
exactly the same selectively sourcing
all trips yes we kind of see it the same
supply base year to year but what I'm
buying is a little different every time
and the supply base is a little
different every time and so on so it's
not really repeated so to find a purely
repeated game is actually very rare in
the world so they're really a very
coarse model of what's going on then if
you move up from repeat just repeated
simple repeated matrix games not all the
way to extensive form games but in
between
they're stochastic games where you know
this these think about it like these
little matrix games
and when you take an action and your
home takes an action they determine not
which next state I'm going to next game
I'm going to but the distribution over
next games where I might be going to so
so that's the stochastic game but it's
like matrix games repeated stochastic
games extensive form games that is from
less to more general and and poker is an
example of the last one so it's really
the most general setting extensive form
games and that's kind of what the AI
community has been working on and being
benched marked on with this heads-up No
Limit Texas Hold'em
can you describe extensive form games
what was the motto here yeah so if you
imagine with the tree form so it's
really the tree form like in chess
there's a search tree versus a matrix is
a matrix yeah and that's the new matrix
is called the matrix form or by matrix
form or normal form game and here you
have the tree form so you can actually
do certain types of reasoning there that
you'll lose the information when you go
to normal form there's a certain form of
equivalence like if you go from three
form and you say it every possible
contingency plan is the strategy then I
can actually go back to the normal form
but I lose some information from the
lack of sequentiality then the
multiplayer versus two-player
distinction is an important one
so two-player games in zero-sum are
conceptually easier and computationally
easier there's still huge like this one
this one but they're conceptually easier
and computationally easier in that
conceptually you don't have to worry
about which equilibrium is the other guy
going to play when there are multiple
because any equilibrium strategy is a
best response to any other equilibrium
strategy so I can play a different
equilibrium from you and we'll still get
the right values of the game that falls
apart even with two players when you
have general some games even without
cooperation just even without
cooperation so there's a big gap from
two player zero-sum to two-player
general sum or even to three player
zero-sum that's that's a big gap
at least in theory can you maybe not
mathematically provide the intuition why
it all falls apart with three or more
players it seems like you should still
be able to have a Nash equilibrium that
yeah that's instructive that holds okay
so it is true that all finite games have
a Nash equilibrium so this is what your
Nash actually proved so they do have a
Nash equilibrium that's not a problem
the problem is that there can be many
and then there's a question of which
equilibrium to select so and if you
select your strategy from a different
equilibrium and I select mind then did
what does that mean I and in this non
zero sum games we may lose some joint
benefits we hope by being just simply
stupid we could actually both be better
off if we did something else yes and in
three player you get other problems also
like collusion that maybe you and I can
get up on a third player and we can do
radically better by colluding so that
there are lots of issues that come up
there
so no Brown student you workers on this
has mentioned I looked through the AMA
and read it he mentioned that the
ability of poker players to collaborate
will make the game he was asked the
question of how would you make the game
of poker or both of you were asked the
question how would you make the game of
poker
beyond being solvable by current AI
methods and he said that there's not
many ways of making poker more difficult
but collaboration or cooperation between
players would make it extremely
difficult so can you provide the
intuition behind why that is if you
agree with that idea yeah so we've done
a lot of work coalitional games and we
actually have a paper here with my other
student cappella Farina and some other
collaborators on after net nips on that
actually just came back from the poster
session where we present life so when
you have a collusion it's a it's a
different problem yes and it typically
gets even harder then
even the game representations some of
the game representations don't really
allow go to computation so we actually
introduced a new game representation for
for that is that kind of cooperation
part of the model is are you do you have
do you have information about the fact
that other players are cooperating or is
it just this chaos that where nothing is
known so there's some something's
unknown can you give an example of a
collusion type game or Z you select
breach that so think about bridge it's
like when you and I are on a team
our payoffs are the same the problem is
that we can't talk so so when I get my
cards I can't whisper to you what my
cards are that would not be allowed so
we have to somehow coordinate our
strategies ahead of time and only ahead
of time and then there are certain
signals we can talk about but they have
to be such that the other team also
understands them so so that that's
that's an example where the coordination
is already built into the rules of the
game but in many other situations like
auctions or negotiations or diplomatic
relationships poker it's not really
built-in but it still can be very
helpful for the coders I've read you
right somewhere the negotiations you
come to the table with prior like a
strategy that like that you're willing
to do and not willing to do those kinds
of things so how do you start to now
moving away from poker movie beyond
poker into other applications like
negotiations how do you start applying
this to other and to other domains
yeah even real world domains that you've
worked on yeah I actually have two
start-up companies doing exactly that
one is called strategic machine and
that's for kind of build applications
gaming sports all sorts of things like
that any applications of this to
business and to sports and to gaming to
various types of things for in finance
electricity markets and so on and the
other is called strategy robot where we
are taking this to military secure
the cyber security and intelligence
applications I think you worked a little
bit in how he put it advertisement sort
of suggesting ad kind of thing yeah
auction that's another component
optimized markets optimized but that's
much more about a combinatorial market
and optimization based technology that's
not using these game theoretic reasoning
technologies I think okay so what sort
of high level do you think about our
ability to use game theoretic concepts
to model human behavior do you think do
you think human behavior is amenable to
this kind of modeling so outside of the
poker games and where have you seen it
done successfully in your work I'm not
sure the goal really is modeling humans
like for example if I'm playing a
zero-sum game yes I don't really care
that the opponent is actually following
my model of rational behavior because if
they're not that's even better for me
right so so they see with the opponents
and games there's a the prerequisite is
that you've formalized the interaction
in some way that can be amenable to
analysis and you've done this amazing
work with mechanism design designing
games that have certain outcomes but so
I'll tell you an example for my for my
world of autonomous vehicles right we're
studying pedestrians and pedestrians and
cars negotiating this nonverbal
communication there's this weird and
game dance of tension where pedestrians
are basically saying I trusted you won't
kill me and so as a jaywalker I will
step onto the road even though I'm
breaking the law and there's this
tension and the question is we really
don't know how to model that well in
trying to model intent and so people
sometimes bring up ideas of game theory
and so on do you think that aspect of
human behavior can use these kinds of
imperfect information approaches
modeling
how do we how do you start to attack a
problem like that when you don't even
know how the game design the game to
describe the situation in order to solve
it okay so I haven't really thought
about jaywalking but one thing that I
think could be a good application in an
autonomous vehicles is the following so
let's say that you have fleets of
autonomous cars operated by different
companies so maybe here's the way more
fleet and here's the uber fleet if you
think about the rules of the road they
define certain little rules but that
still leaves a huge strategy space open
like as a simple example when cars merge
you know how he must merge you know they
slow down and look at each other and try
to I try to merge wouldn't it be better
if these situations would all repeat
pre-negotiated so we can actually merge
at full speed and we know that this is
the situation this is how we do it and
it's all gonna be faster but there are
way too many situations to negotiate
manually so you could do use automated
negotiation this is the idea at least
you could use automated negotiation to
negotiate all of these situations or
many of them in advance and of course it
might be that hey maybe you're not gonna
always let me go first
maybe you said okay well in these
situations all let you go first but in
exchange you're gonna give me - how much
you're gonna let me go first in this
situation yes
so it's this huge combinatorial
negotiation and do you think there's
room in that example of merging to model
this whole situation is an imperfect
information game or do you really want
to consider it to be a perfect no that's
a good question yeah that's a good
question I'm paid the price of assuming
that you don't know everything yeah I
don't know it's certainly much easier
games with perfect information are much
easier so if you can get away with it
you should but if the real situation is
of imperfect information then you're
going to have to deal with in for
imperfect information great so what
lessons have you learned the annual
computer poker competition an incredible
accomplishment of AI you know you look
at the history of deep blue
go these kind of moments when I stepped
up in an engineering effort and a
scientific effort combined to beat the
best human players so what do you take
away from this whole experience what
have you learned about designing it has
systems that play these kinds of games
and what does that mean for sort of AI
in general for the future of IAI
development yeah so that's a good
question so there's so much to say about
it I do like this type of performance
oriented research although in my group
we go all the way from like idea to
theory to experiments to big system
fielding the commercialization so we
spend that spectrum but I think that in
a lot of situations in AI you really
have to build the big systems and
evaluate them at a scale before you know
what works and doesn't and we've seen
that in the computational game theory
community that there are a lot of
techniques that look good in the small
but then they cease to look good in the
large and we've also seen that there are
a lot of techniques that look superior
in theory and I really mean in terms of
convergence rates better like
first-order methods better convergence
rates like the CFR based based
algorithms yet the CFR pay based
algorithms are the fastest in practice
so it really tells me that you have to
test this in reality the theory isn't
tight enough if you will to tell you
which algorithms are better than the
others and you have to look at these
things that in the large because any
sort of projections you do from the
small and at least in this domain be
very misleading so that that's kind of
from from a kind of science and
engineering perspective from personal
perspective it's been just a wild
experience in that with the first poker
competition the first or first brains
versus AI man-machine poker competition
that we organized there had been by the
way for other poker games there had been
previous competitions but this was for
heads up No Limit this was the first and
I probably became the most hated person
in the world of Poker and I didn't mean
to III size that they cracked in the
game for yeah it was a lot of people
felt that it was a real
threat to the whole game the whole
existence of the game if AI becomes
better than humans people would be
scared to play poker because there are
the superhuman AI is running around
taking their money and you know all of
that so so I just it's just really
aggressive just in the comments were
super aggressive I got everything
it's just short of death threats do you
think the same was true for chess
because right now they just completed
the World Championships and chess and
humans just started ignoring the fact
that there's AI systems now that I'll
perform humans and they still enjoy the
game is still a beautiful game that's
what I think yeah and I think the same
thing happens in poker and so I didn't
think of myself as somebody was gonna
kill the game and I don't think I did
yeah I've really learned to love this
game I wasn't a poker player before but
learn so many new ones is about it from
these AIS and they've really changed how
the game is played by the way so they
have these very Martian ways of playing
poker and the top humans are now
incorporating those types of strategies
into their own play so if anything to me
our work has made poker a richer more
interesting game for humans to play not
something that is gonna steer him as
away from it entirely just a quick
comment and something you said which is
if I may say so in academia is a little
bit rare sometimes it's pretty brave to
put your ideas to the test in the way
you described
saying that sometimes good ideas don't
work when you actually try to apply them
at scale and so where does that come
from I mean what if you could do a
advice for people what what drives you
in that sense were you always this way I
mean it takes a brave person I guess is
what I'm saying to test their ideas and
to see if this thing actually works
against human top human players and so
on yeah I don't know about brave but it
takes a lot of work it takes a lot of
work and a lot of time to organize do
make something big and to organize an
event and stuff like that and what
drives you in that effort because you
could still I would argue get a best
paper award and nips as you did in 17
without doing this that's right yes
and so so in general I believe it's very
important to do things in in the real
world and at scale and that's really
where the the the pudding if you will
proof is in the pudding that's what
that's where it is in this particular
case it was kind of a competition
between different groups and for many
years as to who can be the first one to
beat the top humans that heads up No
Limit Texas Hold'em so it became it
became kind of a like a competition who
can get there yeah so a little friendly
competition could be I can do wonders
for progress yes so the topic of
mechanism design which is really
interesting also kind of new to me
except as an observer if I don't know
politics and any I'm an observer of
mechanisms but you write in your paper
an automated mechanism design that I
quickly read so mechanism design is
designing the rules of the game so you
get a certain desirable outcome and you
have this work on doing so in an
automatic fashion as opposed to
fine-tuning it so what have you learned
from those efforts if you look say I
don't know at complex it's like our
political system can we design our
political system to have in an automated
fashion to have outcomes that we want
can we design something like traffic
lights to be smart where it gets
outcomes that we want so what are the
lessons you draw from that work yeah so
I still very much believe in the
automated mechanism design direction yes
but it's not a panacea
there are impossibility results in
mechanism design saying that there is no
mechanism that accomplishes objective X
in Class C so so they it's not gonna
there's no way using any mechanism
design tools manual or automated
to do certain things in mechanism design
he can't describe that again so meaning
there it's impossible to achieve that
yeah yes it was likely impossible so so
so these are these are not statements
about human ingenuity who might come up
with something smart these are proofs
that if you wanna accomplish properties
X in Class C that is not to oppose with
any mechanism the good thing about
automated mechanism design is that we're
not really designing for a class we're
designing for specific settings at the
time so even if there's an impossibility
result for the whole class it just
doesn't mean that all of the cases in
the class are impossible it just means
that some of the cases are impossible so
we can actually carve these islands of
possibility within these known
impossible classes and we've actually
done that so what one of the famous
results in mechanism design is a Meyer
sham set its weight theorem for pi Roger
Myerson and Mark Satterthwaite from 1983
so it's an impossibility of efficient
trade under imperfect information we
show that you can in many settings avoid
that and get the efficient trade anyway
depending on how they design the game
okay so depending how you design the
game and of course it's not it doesn't
in any way any way contradict to
impossibility result or impossibility
results is still there but it just finds
spots within this impossible class where
in those spots you don't have time
possibility sorry if I'm going a bit
philosophical but what lessons you draw
towards like I mentioned politics or
human interaction and designing
mechanisms for outside of just these
kinds of trading or auctioning or purely
formal games our human interaction like
a political system what how do you think
it's applicable to yeah politics or to
business to negotiations these kinds of
things
designing rules that have certain
outcomes yeah yeah I do think so have
you seen success that successfully done
yes and really oh you mean mechanism
design or automated make automated
mechanism design but so so mechanism
design itself has had fairly limited
success so far there are certain cases
but most of the real-world situations
are actually not sound from a mechanism
design perspective even in those cases
where they've been designed by very
knowledgeable mechanism design people
the people are typically just taking
some insights from the theory and
applying those insights into the real
world rather than applying the
mechanisms directly so one famous
example of is the FCC spectrum auctions
so I've also had a small role in that
and very good economists have been where
excellent economists have been working
on that with no game theory yet the
rules that are designed in practice
they're they're such that bidding
truthfully is not the best strategy
usually mechanism design we try to make
things easy for the participants so
telling the truth is the best strategy
but but even in those very high stakes
auctions where you have tens of billions
of dollars worth of expect from being
auctioned truth-telling is not the best
strategy and by the way nobody knows
even a single optimal bidding strategy
for those auctions what's the challenge
of coming up with an optimum because
there's a lot of players and there's a
lot of players but many items for sale
and the these mechanisms are such that
even with just two items or one item
bidding truthfully wouldn't be the best
strategy if you look at the history of
AI it's marked by seminal events and
alphago being a world champion human go
player I would put librettist winning
the heads of no-limit hold'em as one of
such event thank you and what do you
think
is the next such event whether it's in
your life or in the broadly AI community
that you think might be out there that
would surprise the world so that's a
great question and I don't really know
the answer in terms of game solving hits
up No Limit Texas Hold'em really was the
one remaining widely agreed-upon
benchmark so that was the big milestone
now are there other things yes certainly
there are but there there is not one
that the community has kind of focused
on so what could be other things there
are groups working on StarCraft there
are groups working on dota2 these are
video games yes or you could have like
diplomacy or Hanavi you know things like
that these are like recreational games
but none of them are really acknowledged
that's kind of the main next challenge
problem like chess or go or heads-up No
Limit Texas Hold'em was so I don't
really know in the game solving space
what is or what will will be the next
benchmark
I hope kind of hope that there will be a
next benchmark because really the
different groups working on the same
problem really drove these application
independent techniques for put very
quickly over ten years do you think
there's an open problem that excites you
that you start moving away from games
into real world games like say the stock
market trading yeah that's that's kind
of how I am so I am probably not going
to work as hard on these recreational
benchmarks I'm doing to startups on game
solving technology strategic machine and
strategy robot and we're really
interested in pushing this stuff into
practice what do you think would be
really you know a powerful result that
would be surprising that would be if you
can say I mean you know five years ten
years from now something that
statistically you would say is not very
likely but if there's a breakthrough
what achieve yeah so I think that
overall we're in a very different
situation in game theory than we are in
let's say machine learning yes
so in machine learning it's a fairly
mature technology and it's very broadly
applied and proven success in the real
world in game solving there are almost
no applications yet
we have just become superhuman which
machine learning you could argue
happened in the 90s if not earlier and
at least some supervised learning at
certain complex supervised learning
applications
now I think a next challenge problem I
know you're not asking about this way
you're you're asking about the
technology breakthrough but I think the
big big breakthrough is to be able to
show it hey maybe most of let's say
military planning or most of business
strategy will actually be done
strategically using computational game
theory that that's what I would like to
see as a next five or ten year goal
maybe you can explain to me again
forgive me if this is an obvious
question but you know machine learning
methods neural networks are suffer from
not being transparent not being
explainable a game theoretic methods you
know Nash equilibria do they generally
when you see the different solutions are
they when you talk about military
operations are they once you see the
strategies do they make sense that they
explainable or do they suffer from the
same problems as neural networks do so
that's that's a good question I would
say a little bit yes and no and what I
mean by that is that these games are
ethic strategies let's say Nash
equilibrium it has provable properties
so it's unlike let's say deep learning
where you kind of cross your fingers
hopefully it'll work and then after the
fact when you have the weights you're
still crossing your fingers and I hope
it will work here you know that the
solution quality is there this provable
or Souls from quality guarantees now
that doesn't necessarily mean that the
strategies are human understandable
that's a whole other problem so that's
also I think it deep learning and
computational game theory are in the
same boat in that sense that both are
difficult to understand but at least the
game theoretic techniques they have this
guarantees of guarantee quality so did
you see business operations to achieve
your corporations or even military in
the future being at least the strong
candidates being proposed by automated
systems do you see that yeah I do I do
but that's more of a really belief than
a substantiated fact depending on where
you land and optimism or pessimism
that's a relief to me that's an exciting
future especially if they're provable
things
in terms of optimality so looking into
the future there's a a few folks worried
about the especially you look at the
game of poker which is probably one of
the last benchmarks in terms of games
being solved they they worry about the
future and the existential threats of
artificial intelligence so the negative
impact in whatever form on society is
that something that concerns you as much
are you more optimistic about the
positive impacts of AI oh I am much more
optimistic about the positive impacts so
just in my own work what we've done so
far we run the nationwide kidney
exchange hundreds of people are walking
around alive today who would it be and
it's increased employment you had you
have a lot of people now running kidney
changes and at the transplant centers
interacting with the kidney exchange you
have extra surgeons nurses
anesthesiologists hospitals all of that
as so so employment is increasing from
that and the world is becoming a better
place
another example is combinatorial
sourcing auctions we did 800 large-scale
combinatorial sourcing auctions from
2001 to 2010 in a previous startup of
mine called combine it and we increased
the supply chain efficiency on that
sixty billion dollars of spend by twelve
point six percent so that's over six
billion dollars of efficiency
improvement in the world and this is
also like shifting value from somebody
to somebody else just efficiency
improvement like in trucking less empty
driving so there's less waste less
carbon footprint and so on it's a huge
positive impact in the near term but
sort of to stay in it for a little
longer because I think game theory is a
role to play here well let me actually
come back and tell you this is one thing
I think Asia is also going to make the
world much safer so so so that's another
aspect that often gets overlooked well

Resume

Berikut adalah rangkuman komprehensif dan terstruktur dari transkrip video yang Anda berikan.

***

# Revolusi AI & Teori Permainan: Dari Kemenangan di Poker hingga Aplikasi Penyelamat Nyawa

### Inti Sari (Executive Summary)
Video ini membahas wawancara mendalam dengan Profesor Tuomas Sandholm mengenai penciptaan **Libratus**, kecerdasan buatan (AI) pertama yang berhasil mengalahkan pemain profesional elit dalam permainan *Heads-up No Limit Texas Hold'em*. Diskusi tidak hanya terbatas pada teknis di balik kemenangan AI tersebut, tetapi juga mengupas tuntas konsep teori permainan (*Game Theory*), *Nash Equilibrium*, dan bagaimana teknologi pemecahan permainan (*game solving*) ini diterapkan pada skenario dunia nyata yang kompleks, mulai dari lelang strategis hingga pertukaran ginjal untuk menyelamatkan nyawa.

### Poin-Poin Kunci (Key Takeaways)
*   **Kemenangan Bersejarah:** Libratus menjadi AI pertama yang mengalahkan pemain profesional terbaik dalam *Heads-up No Limit Texas Hold'em*, sebuah benchmark penting untuk permainan informasi yang tidak sempurna (*imperfect information*).
*   **Pendekatan Tanpa Data:** Berbeda dengan machine learning pada umumnya, pendekatan teori permainan yang digunakan bersifat "bebas data" (*data-free*), mengandalkan rasionalitas dan perhitungan strategi daripada data historis lawan.
*   **Pentingnya Abstraksi:** Untuk mengatasi kompleksitas permainan yang memiliki $10^{161}$ keputusan, AI menggunakan teknik abstraksi informasi dan aksi yang canggih.
*   **Aplikasi Nyata:** Teknologi ini telah melahirkan startup seperti *Strategic Machine* (bisnis/olahraga) dan *Strategy Robot* (militer/keamanan), serta digunakan untuk efisiensi rantai pasok dan pertukaran organ ginjal.
*   **Etika & Keamanan:** Narasumber menyuarakan optimisme mengenai dampak positif AI, dengan menekankan bahwa teori permainan memberikan jaminan kualitas solusi yang lebih transparan dibandingkan *deep learning*.

---

### Rincian Materi (Detailed Breakdown)

#### 1. Pengenalan Libratus dan Tantangan Poker
Video dibuka dengan perkenalan terhadap Profesor Tuomas Sandholm dari Carnegie Mellon University, pencipta Libratus. Pada tahun 2017, Libratus mencatat sejarah dengan mengalahkan empat pemain poker profesional terbaik dunia dalam pertandingan maraton 120.000 tangan.
*   **Kompleksitas Permainan:** *Heads-up No Limit Texas Hold'em* dianggap sebagai benchmark utama untuk AI karena tingkat kompleksitasnya yang jauh melampaui catur. Permainan ini melibatkan informasi yang tersembunyi (kartu lawan) dan strategi yang sangat luas.
*   **Faktor "Tells":** Di level permainan kasual, gerakan tubuh (*tells*) sangat penting. Namun, di level profesional, *tells* menjadi sangat minim karena pemain sangat pandai menyembunyikannya, sehingga fokus beralih pada strategi matematis dan psikologis murni.

#### 2. Teknologi di Balik Kecerdasan Buatan: Abstraksi dan Deep Learning
Menghadapi pohon permainan yang besarnya mencapai $10^{161}$, Libratus tidak mungkin menghitung setiap langkah secara langsung.
*   **Teknik Abstraksi:** AI menggunakan dua jenis abstraksi:
    1.  **Abstraksi Informasi:** Mengelompokkan kartu-kartu yang mirip (misalnya kartu yang memiliki peluang menang serupa) untuk disederhanakan.
    2.  **Abstraksi Aksi:** Menyederhanakan pilihan taruhan (misalnya hanya mempertimbangkan taruhan dalam kelipatan tertentu).
*   **Perbandingan dengan Deep Learning:** Meskipun *Deep Learning* (seperti pada AlphaGo) populer, Libratus menggunakan pendekatan yang berbeda. Abstraksi dalam permainan poker jauh lebih sulit ("abstraction pathologies") daripada dalam pengambilan keputusan tunggal (MDPs), karena abstraksi yang lebih halus tidak selalu menjamin strategi yang lebih baik.
*   **Peran Keberuntungan:** Meskipun poker melibatkan keberuntungan, keterampilan (*skill*) mendominasi dalam jangka panjang. Dengan memainkan 100.000 tangan, elemen varians (keberuntungan) dapat diminimalkan sehingga keunggulan skill AI terlihat jelas.

#### 3. Fondasi Teori Permainan dan Nash Equilibrium
Sandholm menjelaskan landasan matematis dari AI ini, yaitu *Nash Equilibrium* yang diperkenalkan John Nash pada tahun 1950.
*   **Keseimbangan Nash:** Situasi di mana setiap pemain memilih strategi terbaik mereka dengan mempertimbangkan strategi lawan, sehingga tidak ada insentif untuk mengubah strategi (tidak ada yang bisa mengeksploitasi yang lain).
*   **Pendekatan Rasional:** AI tidak mempelajari gaya bermain lawan secara spesifik, melainkan berpikir, "Apa yang akan dilakukan lawan rasional?" dan "Bagaimana saya meresponsnya?". Ini membuat strategi AI sangat kuat dan tidak bisa dieksploitasi.
*   **Strategi Hibrida:** Dalam situasi non-turnamen (seperti negosiasi bisnis), AI dapat menggunakan pendekatan hibrida: mulai dengan strategi teori permainan yang aman, lalu sedikit menyimpang untuk mengeksploitasi kelemahan lawan jika data yang cukup telah terkumpul.

#### 4. Tantangan Multi-Pemain dan Kolusi
Diskusi beralih ke tingkat kesulitan yang lebih tinggi: permainan dengan lebih dari dua pemain.
*   **Masalah Kolusi:** Dalam permainan tiga pemain atau lebih, pemain dapat bekerja sama (kolusi) melawan pemain lain. Ini membuat pemecahan permainan menjadi jauh lebih sulit.
*   **Representasi Baru:** Penelitian yang dipresentasikan di NeurIPS memperkenalkan representasi permainan baru untuk menangani koalisi ini, yang relevan untuk skenario seperti Bridge, lelang, dan diplomasi.

#### 5. Dampak pada Dunia Poker dan Masa Depan Benchmark
Kemenangan Libratus awalnya membuat Sandholm menjadi "orang yang paling dibenci" di dunia poker karena takut AI akan membunuh permainan.
*   **Evolusi Poker:** Mirip dengan catur, di mana manusia tetap memainkannya meskipun AI sudah lebih unggul, poker justru menjadi lebih kaya. Pemain profesional sekarang belajar dari strategi "alien" yang dibawa AI, membuat permainan menjadi lebih menarik.
*   **Desain Mekanisme Otomatis:** Teknologi ini juga digunakan untuk merancang aturan (*mechanism design*) guna mencapai hasil yang diinginkan, meskipun ada batasan-batasan matematis (teorema ketidakmungkinan) yang harus diakali.
*   **Benchmark Berikutnya:** Setelah poker, tantangan besar berikutnya untuk AI adalah game strategi real-time seperti *StarCraft*, *Dota 2*, atau *Diplomacy*, namun belum ada kesepakatan tunggal mengenai benchmark utama selanjutnya.

#### 6. Aplikasi Nyata: Bisnis, Medis, dan Keamanan
Teknologi pemecahan permainan (*game solving*) telah melampaui tahap akademis dan diterapkan di dunia nyata melalui dua startup: *Strategic Machine* dan *Strategy Robot*.
*   **Pertukaran Ginjal:** Salah satu aplikasi paling mulia adalah dalam *kidney exchange*. Algoritma ini telah membantu menyelamatkan ratusan nyawa dengan mencocokkan donor pasien secara efisien, menciptakan lapangan kerja bagi tenaga medis, dan mengurangi biaya.
*   **Efisiensi Rantai Pasok:** Teknologi lelang kombinatorial telah digunakan dalam 800 lelang skala besar, meningkatkan efisiensi pengeluaran sebesar $60 miliar dan menghemat lebih dari $6 miliar, sekaligus mengurangi jejak karbon.
*   **Kejelasan vs Deep Learning:** Berbeda dengan *neural networks* yang seringkali menjadi "kotak hitak", solusi berbasis teori permainan memiliki sifat yang dapat dijelaskan dan memiliki jaminan kualitas (*provable properties*).

#### 7. Masa Depan AI, Strategi Nuklir, dan Tantangan Industri
Bagian penutup membahas visi masa depan dan tantangan penerapan AI.
*   **Optimisme Keamanan AI:** Sandholm percaya AI akan membuat dunia lebih aman. Khawatir tentang "misalignment" nilai (di mana AI mengejar tujuan yang salah) dianggap terlalu teoretis dibandingkan dengan manfaat nyata yang sudah terlihat.
*   **Strategi Nuk

Read

file updated 2026-02-13 13:24:50 UTC