Edward Gibson: Human Language, Psycholinguistics, Syntax, Grammar & LLMs | Lex Fridman Podcast #426
F3Jd9GI6XqE • 2024-04-17
Transcript preview
Open
Kind: captions
Language: en
naively I certainly thought that all
humans would have words for exact
counting uh and the Paha don't okay so
they don't have any words for even one
there's not a word for one in their
language and so there's certainly not a
word for two three or four so that kind
of blows people's minds often yeah that
blowing my mind that's pretty weird how
are you how are you going to ask I want
two of those you just don't and so
that's just not a thing you can possibly
ask in the P it's not possible that is
there is no words for that
the following is a conversation with
Edward Gibson or Ted as everybody calls
him he is a psycho Linguistics professor
in MIT he heads the MIT language lab
that investigates why human languages
look the way they do the relationship
between cultureal language and how
people represent process and learn
language also he should have a book
titled syntax a cognitive approach
published by MIT press coming out this
fall so look out for that this is Alex
rman podcast to support it please check
out our sponsors in the description and
now dear friends here's Edward
Gibson when did you first become
fascinated with human language as a kid
in school when we had to structure
sentences in English grammar I I I found
that process interesting I found it
confusing as to what it was I was told
to do I didn't didn't didn't understand
what the theory was behind it but I
found it very interesting so when you
look at grammar you're almost thinking
about like a puzzle like almost like a
mathematical puzzle yeah I think that's
right I didn't know I was going to work
on this at all at that point I was
really just I was kind of a math geek
person computer scientist I really liked
computer science and then I found
language as a a neat puzzle to work on
from an engineering perspective actually
that's what I as a I I sort of
accidentally well I decided after I
finished my undergraduate degree which
was computer science and math and Canada
and Queens University I decided to go to
grad school it's like that's what I
always thought I would do and I went to
Cambridge where they had a master's in a
master's program in computational
linguistics and I hadn't taken a single
language class before all I had taken
was CS computer science math classes
pretty much mostly as an undergrad and I
just oh this was an interesting thing to
do for a year
because it was a single year program and
um then I ended up spending my whole
life doing it so fundamentally your
journey through life was one of a
mathematician and a computer scientist
and then you kind of discovered the
puzzle the problem of language and
approached it from that angle uh to try
to understand it from that angle almost
like a mathematician or maybe even an
engineer as an engineer I'd say I mean
to be frank I had taken an AI class I
guess it was 83 or 84 5 somewhere 84 in
there a long time ago and there was a
natural language section in there and it
didn't impress me I thought there must
be more interesting things we can do
didn't it didn't seem very it seemed
just a bunch of uh hacks to me it didn't
seem like a real theory of things in any
way and so I just thought this was this
seemed like an interesting area where
there wasn't enough good work did you
ever come across like the the philosophy
angle of logic so if you think about the
80s with AI the expert systems where you
try to kind
of uh maybe sidestep the The Poetry of
language and some of the syntax and the
grammar and all that kind of stuff and
go to the underlying meaning that
language is trying to communicate and
try to somehow compress that in a
computer representable way did you ever
come across that in your studies I mean
I probably did but I wasn't as
interested in it I was I was trying to
do the easier problems first the ones I
could thought maybe were handleable
which is seems like the syntax is easier
like which is just the forms as opposed
to the meaning like you're talking when
you're starting talking about the
meaning that's very hard problem and
it's still is a really really hard
problem but the forms is is easier and
so I thought at least figuring out the
forms of human language which sounds
really hard but is actually maybe more
attractable so it's interesting you
think there is a big divide there's a
gap there's a distance between form and
meaning because that's a question you
have discussed a lot with llms mhm
because they're damn good at form yeah I
think that's what they're good at is
form exactly and that's that's why
they're good because they can do form
meanings hard do you think there's oh
wow and I mean it's an open question
right yeah how close form and meaning
are we'll discuss it but I to me
studying form maybe it's a romantic
notion gives you form is like the
shadow of the the bigger meaning thing
underlying language CU I it form is is
language is how we communicate ideas we
communicate with each other using
language so in understanding the
structure of that communication I think
you start to understand the structure of
thought and the structure of meaning
behind those thoughts and communication
to me but to you big gap yeah what do
you find most beautiful about human
language maybe the form of human
language the expression of human
language what I find beautiful about
human language is the uh some of the
generalizations that um happen across
the human languages within and across a
language so let me give you an example
of something which I find kind of
remarkable that is if like a language if
it has um a word order such that the
verbs tend to come before they're
objects and so that's like English does
that so we have the the first the
subject comes first in a in a simple
sentence so I say uh you know the the
dog chased the cat or or Mary kicked the
ball so the subject's first the and then
after the subject there's the verb and
then we have objects all these things
come after in English so it's a it's
generally a verb and most of the stuff
that we want to say comes after the
subject it's comes it's the it's the
objects there's a lot of things we want
to say that come after and and and
there's a lot of languages like that
about 40% of the languages of the world
are look like that they're um sub
subject verb object languages and then
um these languages tend to have um
prepositions these little markers on the
nouns that that connect nouns to other
nouns or nouns to verbs so I when I so
verb like sorry preposition like in or
on or of or about I say I talk about
something the something is the object of
that preposition that we have these
little markers come also just like verbs
they come before their their nouns okay
and then so now we look at other
languages that like Japanese or or Hindi
or some these are these are so-called
verb final languages
those as about maybe a little more than
40% maybe 45% of the world's languages
or more I mean 50% of the world's
languages are verb final those tend to
be um post positions those markers the
same we have the states have the same
kinds of markers as we do in English but
they put them after so uh uh sorry they
put them uh first the markers come first
so you say instead of um you know talk
about a book you say book about the
opposite order there in Japanese or in
Hindi you do the opposite and and the
talk comes at the end so the verb will
come at the end as well so instead of um
Mary kicked the ball it's Mary uh Ball
kicked and then uh says Mary kicked the
ball to John it's John two the two
little the marker there uh the
preposition it's a postposition in these
languages and so the interesting thing
fascinating thing to me is that within a
language this order
aligns it's
harmonic and so if it's one or the other
it's either verb initial or verb final
but then you then you'll have
prepositions prepositions or
postpositions and so that and that's
across the languages that we we can look
at we' got around a thousand languages
for for there's around 7,000 languages
around on on the Earth right now uh but
we have information about say word order
on around a thousand of those pretty
decent amount of information and for
those thousand which we know about um
about 95% fit that pattern so they will
have either verb so about it's about
half and half half a verb initial like
English and half a verb final like um
like Japanese so just to clarify verb
initial is subject verb object that's
correct verb final is still subject
object verb that's correct yeah the
subject is generally first that's so
fascinating I ate an apple or I Apple at
yes okay and this fascinating that
there's a pretty even division in the
world amongst those 40 45% yeah it's
pretty it's pretty even and and those
two are the most common by far those two
word ARS the subject tends to be first
there's so many interesting things but
these things are the thing I find so
fascinating is there are these
generalizations within and across a
language and and not only those are the
and there's actually a simple
explanation I think for a lot of that
and that is um you're trying to like
minimize dependencies between words
that's basically the story I think
behind a lot of why word order looks the
way it is is you we're always connecting
what is it what is the thing I'm telling
you I'm I'm talking to you in sentences
you're talking to me in sentences these
are sequences of Words which are
connected and the connections are
dependencies between the words and and
it turns out that what we what we're
trying to do in a language is actually
minimize those dependency links it's
easier for me to say things if the words
that are connecting for their meaning
are close together it's easier for you
in understanding if that's also true if
they're far away it's it's hard as to
produce produce that and it's hard for
you to understand and the languages of
the world within a language and across
languages you know fit that
generalization which is you know so I
you know it turns out that having verbs
initial and then having prepositions
ends up making dependencies shorter and
and having verbs final and having
postpositions ends up making dependency
shorter then if you cross them if if you
cross themit ends up you just end up
it's possible you can do it it mean
within a language within a language you
can do it it just ends up with longer
dependencies than if you didn't so
languages tend to go that way they tend
to minim they say they call it harmonic
so it was observed a long time ago by uh
without the explanation by a guy called
Joseph Greenberg who's a um famous
typologist from Stanford he observes a
lot of generalizations about how word
order works and these are some of the
harmonic generalizations that he
observed
harmonic generalizations about word word
order there's so many things I want to
ask you okay let me uh just sometimes
Basics you you mentioned dependencies a
few times yeah what do you mean by
dependencies well what I mean is in um
in language there's kind of three
structures to three components to the
structure of language one is the sounds
so cat is C and T in English I'm not
talking about that part I'm talking then
there's two meaning parts and those are
the words and and you were talking about
meaning earlier so words have a form and
they have a meaning associated with them
and so cat is a full form in English and
it has a meaning associated with
whatever a cat is and then the
combinations of words uh that's what
I'll call grammar or syntax and uh
that's like when I have a combination
like the cat or two cats okay so uh
where I take two different words there
and put them together and I get a
compositional meaning from putting those
two different words together and and so
that's the syntax and
in any sentence or utterance whatever
I'm talking to you you're talking to me
we have a bunch of words and we're
putting together in a sequence they it
turns out they
are connected so that every word is
connected to just one other word in that
in that sentence and so you end up with
what's what's called technically a tree
it's a tree structure so there where
there's a root of that of that utterance
of that sentence and then there's a
bunch of dependence like branches from
that root that go down to the words the
words are the leaves in this metaphor
for a tree so a tree is also sort of a
mathematical construct a graph
theoretical thing graph Theory thing uh
so in the it's fascinating that you can
break down a sentence into a tree and
then one every word is hanging on to
another this depending on right and and
everyone agrees on that so all linguists
will agree with that no one not
controversial that is not controversial
there's nobody sitting here mad at you I
don't think so okay there's no linguist
sitting there mad at this I think every
language I think everyone agrees that
all sentences are trees at some level
can I pause on that cuz it it's to me
just as a Layman it uh it's surprising
yeah that you can break down sentences
in many most all languages all languages
I into a tree I think so that's weird I
I've never heard of anyone disagreeing
with that that's weird the details of
the trees are what people disagree about
well okay so what's uh what's at the
root of a how do you conru construct how
hard is it what is the process of
constructing a tree from a sentence uh
well this is where you know depending on
what you're there's different
theoretical Notions I'm going to say the
simplest thing the pendency grammar it's
like a bunch of people invented this
tenier was the first French guy back in
I mean the paper was published in 1959
but he was working on the 30s and stuff
so and and it goes back to uh you know
philologist Pini was doing this in
ancient uh India okay and so you know
doing something like this the simplest
thing we can think of is that there's
just connections between the words to
make the the utterance and so just say I
have like two dogs entered a room okay
here's a sentence and so uh we're
connecting two and dogs together that's
like there's some dependency between
those words to make some bigger meaning
and then we're connecting dogs now to uh
entered right and we connect a room
somehow to entered and so I'm going to
connect uh to room and then room back to
enter is that's the tree is I that the
root is entered that's the the thing is
like an entering event that's what we're
saying here and the the subject which is
whatever that dog is is two dogs it was
and and the connection goes back to dogs
which goes back to then that that goes
back to two I'm just that that's my tree
it it starts at entered goes to dogs
down to two and on the other side after
the verb the object it goes to room and
then that goes back to the the
determiner or article whatever you want
to call that word uh so there's a bunch
of categories of words here we're
noticing so there are verbs those are
these things that typically Mark uh they
refer to events and states in the world
and they're nouns which typically refer
to people places and things is what
people say but they can refer to other
more they can refer to events themselves
as well they're they're they're marked
by you know how they how they get you
what the category the part of speech of
a word is how it gets used in language
it's like that's how you decide what the
what the category of a word is not not
by the meaning but how it's how it gets
used how it's used what's usually the
root is it going to be the verb that
defines the event usually usually yes
yes okay yeah I mean if I don't say a
verb then there won't be a verb and so
it'll be something else what if you're
messing are we talking about language
that's like correct language what if
you're doing poetry and messing with
stuff is it then then rules go out the
window right then it's no you're still
no no no you're constrained by whatever
language you're dealing with probably
you have other constraints in poetry
such that you're like usually in poetry
there's multiple constraints that you
want to like you want to usually convey
multiple meanings is the idea and maybe
you have like a rhythm or a rhyming
structure as well and depending on so
but you usually are constrained by your
the rules of your language for the most
part and so you don't violate those too
much you can violate them somewhat but
not too much so it has to be
recognizable as your language like in
English I can't say dogs to entered room
ah I mean I meant the you know two dogs
entered a room and I I I can't mess with
the order of the the Articles the
Articles and the nouns you just can't do
that in some languages you can you can
mess around with the order of words much
more I mean you speak Russian Russian
has a much Freer word order than English
and so in fact you can move around words
in you know I told you that English has
the subject verb object word order so
does Russian but Russian is much Freer
than English and so you can actually
mess around with the word order so
probably Russian
poetry is going to be quite different
from English poetry because the word
order is much less constrained yeah
there's a much more extensive uh culture
of poetry throughout the history of the
last 100 years in Russia and I I always
wondered why that is but it seems that
there's more
flexibility in the way the language is
used there's more you're morphing the
language Easier by altering the words
altering the order of the words messing
with it well you can just mess with
different things in each language and so
Russian you have case markers right on
the end which is there these endings on
the nouns which tell you how it connects
each noun connects to the verb right we
don't have that in English and so when I
say um Mary kissed John I don't know who
the agent or the patient is except by
the order of the words right in in
Russian you actually have a marker on
the end if you're using a Russian name
and each of those names you'll also say
is it you know agent it'll be the uh you
know nominative which is marking the
subject or an accusative will Mark the
object and you could put them in the
reverse order you could put accusative
first as you could put subject you could
put um the patient first and then the
verb and then the the the subject and
that would be a perfectly good Russian
sentence and it would still mean Mary I
could say John kissed Mary meaning Mary
kissed John with as long as I use the
case markers in the right way you can't
do that in English and so uh I love the
terminology of agent and patient and uh
and the other ones you used those are
sort of linguistic terms correct those
are those are for like kind of meaning
those are meaning and and subject and
object are generally used for position
so subject is just like the thing that
comes before the the verb and the object
is one that comes after the verb the
agent is kind of like the thing doing it
that's kind of what that means right the
subject is often the person doing the
action right the thing so yeah okay this
is fascinating so how hard is it to form
a tree in general is there um is there a
procedure to it like if you look at
different languages is it supposed to be
a very natural like is it aable or is
there some human genius involved in I
think it's pretty automatable at this
point people can figure out the words
are they can figure out the morphemes
which are the technically morphemes are
the the minimal meaning units within a
language okay and so when you say eats
or drinks it actually has two morphemes
and in English there's there's the
there's the root which is the verb and
then there's some ending on it which
tells you you know that's this third
person uh third person singular say what
mores are morphemes are just the minimal
meaning units within a language and a
word is just kind of the things we put
spaces between English and they have a
little bit more they have the morphology
as well they have the endings this
inflexal morphology on the endings on
the roots they modify something about
the word that adds additional meaning
they tell you yeah yeah yeah and so we
have a little bit of that in English
very little much more in Russian for
instance and and uh but we have a little
bit in English and so we have a little
on the on the nouns you can say it's
either singular or plural and and you
can say uh same thing for um for for
verbs like simple past tense for example
like you know notice in English we say
drink drinks uh you know he drinks but
everyone else is I drink you drink we
drink it's unmarked in a way and then
but in the past tense it's just drank
there for everyone there's no morphology
at all for past tense it's there is
morphology it's marking past tense but
it's kind of it's an irregular now so we
don't even you know drink to drank you
know it's not even a regular word so in
most verbs many verbs there's an ed we
kind of add so walk to walked we add
that to say it's the past tense that I
just happen to choose an irregular
because it's a high frequency word and
the high frequency words tend to have
Irregulars in English for what's an
irregular irregular it's just there's
there isn't a rule so drink to drank is
an is an irregular drink drank okay Asos
to walk walked talk talked and there's a
lot ofre Irregulars in English there's a
lot of Irregulars in English the the the
frequent ones the common words tend to
be irregular the Le there's many many
more um low frequency words and those
tend to be those IR regular ones the
evolution of the Irregulars are
fascinating it's essentially slang
that's sticky mhm cuz you're breaking
the rules and then everybody use it and
doesn't follow the rules yeah and they
they say screw it to the rules it's
fascinating so you said it mores lots of
questions so morphology is what the
study of morphemes morphology is the is
the connections between the morphemes
onto the Roots the Roots so in English
we mostly have suffixes we have endings
on the words not very much but a little
bit and uh as opposed to prefixes some
words depending on your language can
have you know mostly prefixes mostly
suffixes or mostly or or both and then
even languages several languages have
things called infixes where you have
some kind of a uh
General uh form for the for the root and
you put stuff in the middle you change
the vowels that's fascinating that is
fascinating so wait so in general
there's what two morphemes per word
usually one or two or three well in
English it's it's it's one or two in
English it tends to be one or two there
can be more you know in in other
languages you know a lang language like
uh like finish which has a very uh
elaborate morphology there may be 10
morphemes on the end of a route okay and
so there may be Mill there be millions
of forms of a given word okay okay I I
will ask the same question over and over
but
uh how does a just sometimes to
understand things like morphemes it's
nice to just ask the question how does
these kinds of things evolve so you uh
have a great book studying sort of
the how how the cognitive processing how
language used for communication so the
the mathematical notion of how effective
language is for communication what role
that plays in the evolution of language
but just high level like how do we how
does a language evolve with where
English is two morphemes or one or two
mores per word and then Finnish has
Infinity forward so what how does that
how does that happen is it just
that's a really good question yeah
that's a very good question is like why
do languages have more morphology versus
less morphology and and I don't think we
know the answer to this I don't I think
there's just like a lot of good
solutions to the problem of
communication so I like I believe as you
hinted that language is an invented
system by humans for communicating their
ideas and I think we it comes down to we
label things we want to talk about those
are the the the morphemes and words
those are the things we want to talk
about in the world and we invent those
things and then uh we put them together
in ways that are um easy for us to
convey to process but that that that's
like a naive View and I don't I mean I I
think it's probably right right it's
naive and probably right well I don't
know if it's naive I think it's simple
simple yeah I think naive is naive is an
indication that it's an incorrect
somehow it's a trivial to too simple I
think it could very well be correct but
it's interesting how sticky it feels
like two people got
together it just it just feels like once
you figure out certain aspects of a
language that just becomes sticky and
the tribe forms around that language
maybe the language maybe the tribe forms
first and then the language evolves and
then you just kind of agree and that you
stick to whatever that is I mean these
are very interesting questions we don't
know really about how words even words
get invented very much about you know we
don't really I mean assuming they get
invented they we don't really know how
that process works and how these things
evolve what we have is kind of a a
current picture a current picture of few
thousand languages a few thousand
instances we don't have any pictures of
really how these things are evolving
really and and then the evolution is
massively con you know uh confused by
contact right so as soon as one language
group one group runs into
another we are smart hum are smart and
they take on whatever is useful in the
other group and so any kind of contrast
which you're talking about which I find
useful I'm going to I'm going to start
using as well so I I worked a little bit
in um in in specific areas of words in
in number words and in in color words
and in color words that so we have in
English we have around 11 words that
everyone knows for
colors and uh and many more if you
happen to uh be interested in color for
some reason or other if you're a fashion
designer or an artist or something you
may have many many more words but we can
see Millions like if you have normal
color vision normal tri chometric color
vision you can see millions of
distinctions in colors so we don't have
millions of words you know the most
efficient no the most you know detailed
color vocabulary would have over a
million terms to distinguish all the
different colors that we can see but of
course we don't have that so it's
somehow it's been it's kind of useful
for English to have evolved in some way
to there's 11 terms that people find
useful to talk about you know black
white red uh blue green yellow purple uh
gray pink and I probably missed
something there anyway uh there there's
11 that everyone knows yeah and um and
depending on your and but you go to
different cultures um especially the
non-industrialized cultures and there'll
be many fewer so some cultures will have
only two believe it or not that the Dan
I and in Papa New Guinea have only two
labels that the that the group uses for
color those are roughly black and white
they are okay very very dark and very
very light which are roughly black and
white and you might think oh they're
dividing the whole color space into you
know light and dark or something and
that's not really true they mostly just
only label the light the black and the
white things they just don't talk about
the colors for the other ones and so and
and then there's other groups I've
worked with a group called The chimani
down in um in Bolivia in South America
and they have three words that everyone
knows but there's a few others that are
that that several people like that many
people know and so they have me kind of
depending at how you count between three
and seven words that the group knows
okay and uh and again they're they're
black and white everyone knows those and
red red is you like that tends to be the
third word that everyone that that
cultures bring in if there's a word it's
always read the third one and then after
that it's kind of all bets are off about
what they bring in and so after that
they they bring in a sort of a big blue
green Spa gr gr they have one for that
and then they have uh and then you know
different people have different words
that they'll use for other parts of the
space and so anyway it's probably
related to what they want to talk what
they not what they not what they see
because they see the same colors as we
see so it's not like they have they
don't they have a a weak a low color
palette and the things they're looking
at they're looking at a lot of beautiful
scenery okay a lot of different colored
uh flowers and berries and things and
you know and so there's lots of things
of very bright colors but they just
don't label the color in those cases and
the reason probably we we don't know
this but we think probably what's going
on here is that what you do why you
label something is you need to talk to
someone else about it and and why do I
need to talk about a color well if I
have two things which are identical and
I want you to give me the one that's
different and and the only way it varies
is color
then I invent a word which tells you uh
you know this is the one I want so I
want the red sweater off the rack not
the not the green sweater right there's
two and and so those those things will
be identical ex because these are things
we made and they're died and there
there's nothing different about them and
so in in industrialized Society we have
you know everything everything we've got
is pretty much arbitrarily colored uh
but you go to non-industrialized group
that's not true and so they don't re Sly
they're not interested in color you you
bring bright colored things to them they
like them just like we like them bright
colors are great they're beautiful they
are but they just don't need to don't
need to talk about them they don't have
so probably color words is a good
example of how language evolves from
sort of function when you need to
communicate the use of something I think
so then then you kind of invent
different variations and uh and
basically you can imagine that the
evolution of a language has to do with
what the early tribe is doing like what
what they want what what kind of
problems they're facing them and they're
quickly figuring out how to efficiently
communicate uh the solution to those
problems whether it's aesthetic or
functional all that kind of stuff
running away from a mammoth or whatever
um but you know it's so so I think what
you're pointing to is that we don't have
data on the evolution of language
because many languages have formed a
long time ago so you don't get the
chatter we have a little bit of like Old
English to Modern English because there
was a writing system and we can see how
how old English looked so the word order
changed for instance in Old English to
Middle English to Modern English and so
it you know we can see things like that
but most languages don't even have a
writing system so of the 7,000 only you
know a small subset of those have a
writing system and even if they have a
writing system they it's not a very
modern writing system and so they don't
have it so we just basically have for
Mandarin for Chinese we have a lot of a
lot of evidence from from for a long
time and for English and not for much
else not for in German a little bit but
not for a whole lot of like long-term um
language Evolution we don't have a lot
we just have snapshots is what we've got
of current languages yeah I you get an
inkling of that from the rapid
communication on certain platforms like
on Reddit there's different communities
and they'll come up with different slang
usually from my perspective during by a
little bit of humor um or maybe mockery
or whatever it's you know just talking
and different kinds of ways and uh
you could see the
evolution of language there
because um I think a lot of things on
the internet you don't want to be the
boring mainstream so you like want to
deviate from the proper way of talking
MH and so you get a lot of deviation
like rapid deviation then when
communities Collide you get like uh just
like you said humans adapt to it and you
can see it through L of humor I mean
it's very difficult to study but you can
imagine like 100 years from now well if
there's a new language born for example
will get really high resolution data on
I mean English is changing English
changes all the time all languages
change all the time so you know there
the famous um result about the queen's
English so the que if you look at the
Queen's vowels the queen's English is
supposed to be you know originally the
proper way for the talk was sort of
defined by whoever the queen talked or
the king whoever was in charge and uh
and and so if you look at the how her
vowels changed uh from when she be first
became Queen in 1952 or 53 when she was
car the first I mean that's Queen
Elizabeth who's got who died recently of
course uh until you know 50 years later
her vowels changed her vowels shifted a
lot and so that you know even in the
sounds of British English in her the way
she was talking was changing the vowels
were changing slightly so that's just in
the sounds there's change I don't know
what's you know we're we're I'm
interested we're all interested in
what's driving any of these changes the
the word order of English changed a lot
over Thousand Years right so it used to
look like German you know it looks it
used to be a verb final language with
case marking and it shifted to a verb
medial language a lot of contact so a
lot of contact with French and it became
a verb medial language with no case
marking and so it became this you know
verb verb initially thing so and so
that's evolving we it totally evolved
and so it may very well I mean you know
it doesn't evolve maybe very much in 20
years is maybe what you're talking about
but over 50 and 100 years things change
a lot I I think will now have good data
on it which is great that's for sure um
can you talk to what is syntax and what
is grammar so you wrote a book on syntax
I did you were asking me before about
what you know how do I figure out what a
dependency structure is I'd say the
dependency structures aren't that hard
to generally I think there's a lot of
agreement of what they of what they are
for almost any sentence in in most
languages I think people will agree on a
lot of
that there are other parameters in the
mix such that some people think there's
a more complicated grammar than just a
dependency structure and so you know
like n chsky he's the most famous
linguist ever uh and he he is famous for
proposing a a a slightly more
complicated syntax and so he he invented
phrase structure grammar so he's um well
known for many many things but in the
50s in early 60s like but late 50s he
was basically figuring out what's called
formal language Theory so and he uh
figured out sort of a framework for
figuring out how complicated langu you
know a certain type of language might be
so-called phrase structured grammars of
language might be and so he his his idea
was that maybe we can we can think about
the complexity of a language by how
complicated the rules are okay and the
rules will look like this they will have
a left hand side and will have a right
right hand side something on the left
hand side will expand to the thing on
the right hand side so we'll say we'll
start with an a an S which is like the
root which is an a sentence okay and
then we're going to expand to things uh
like a noun phrase and a verb phrase is
what he would say for instance okay an S
goes to an NP and a VP is a kind of a
phrase structure Rule and then and we
figure out what an NP is an NP is a a a
determiner and a noun for instance and a
verb verb phrase is something else is a
verb and another noun phrase and another
npce for instance those are the rules of
a very simple phrase structure okay and
and so he he proposed phrase structure
grammar as a way to sort of cover human
languages and then he actually figured
out that well depending on the
formalization of those grammars you
might get more complicated or less
complicated languages so you could he
could he said well you these are these
are things called you know um context
free languages that rule that he thought
you know human languages tend to be what
he calls context free languages um and
but there are simpler languages which
are so-called regular languages and they
have a more a more constrained form to
the rules of the of the phrase structure
of of these particular rules so he he
basically discovered and kind of
invented ways to describe the language
and and those are phrase those are
phrase structure a human language and he
was mostly interested in English
initially in his his work in the 50s so
a quick questions around all this so
former language theory is The Big Field
of just studying language formally yes
and it doesn't have to be human language
there we have computer languages any
kind of system which is generating a uh
a um
some set of um expressions in a language
and those could be like the the um you
know the statements in a in a computer
language for example so formal it could
be that or it could be human language so
technically you can study programming
languages ab and have been been heavily
studied using this formalism there
there's a big field of programming
languages within the formal language
okay and then phrase structure grammar
is this idea that you can break down
language into this s npvp
it's a particular formalism for
describing language okay so and chsky
was the first one he's the one who
figured that stuff out back in the 50s
and and and but he and and that's
equivalent actually the this the context
free grammar is actually is kind of
equivalent in the sense that it
generates the same sentences as a
dependency grammar would you know as the
dependency grammar is a little simpler
in some way you just have a root and it
goes like we don't have any of these the
the rules are implicit I guess in and we
just have connections between words the
phrase structure grammar is a kind of a
different way to think about the the
dependency grammar it's slightly more
complicated but it's kind of the same in
some ways so to clarify dependency
grammar is the framework under which you
see language and you make the case that
this is a good way to describe language
that's correct and uh no Nome jsky is
watching this is very upset right now so
let's uh I'm just kidding but uh what's
the difference between uh where's the
the place of disagreement um between
phrase structure grammar and dependency
grammar they're they're very close so
phrase structure grammar and dependency
grammar aren't that aren't that far
apart I I I like dependency grammar
because it's more perspicuous it's more
transparent about representing the
connections between the words it's just
a little harder to see in phrase
structure grammar you know the the place
where Chomsky sort of devolved or went
off from from from this is he also
thought there was um something called M
okay and so so and that's where we
disagree okay that's the place where I
would say we disagree and and and I mean
we maybe we'll get into that later but
the idea is if you want to do you want
me to explain that now I would love can
you to explain movement movement okay so
you're saying so many interesting things
yeah yeah yeah okay so here's the
movement is Chomsky basically sees
English and he says okay I said um you
know we had that sentence earlier like
it was like two dogs enter the room it's
changed a little bit say two dogs will
enter the room and he notices that hey
English if I want to make a question a
yes no question from that same sentence
I I say instead of two dogs will enter
the room I say will two dogs enter the
room okay there's a different way to to
say the same idea and it's like well the
auxiliary verb that will thing it's at
the front as opposed to in the middle
okay and so and he looked you know if
you look at English you see that that's
true for all those modal verbs and for
other kinds of auxiliary verbs in
English you always do that you always
put an auxiliary verb at the front and
and what he when he saw that so you know
if I say um I can win this bet can I win
this bet right so I move a can to the
front so actually that's a theory I just
gave you a theory there I he he talks
about it as movement that word in the
thinks the declarative is the root is is
the sort of default way to think about
the sentence and you move the auxiliary
verb to the front that's a movement
Theory okay and he he just thought that
was just so obvious that it must be true
that that that there's nothing more to
say about that that this is how
auxiliary verbs work in English there's
a movement rule such that you're move
like to get from the declarative to the
interrogative you're moving the
auxiliary to the front and it's a little
more complicated as soon as you go to
simple simple present and simple past
because you know if I say you know John
slept you have to say did JN sleep not
slept John right and so it's you have to
somehow get an auxiliary verb and I
guess underlyingly it's like slept is
it's a little more complicated than that
but his that's his idea there's a
movement okay and and and so a different
way to think about that that isn't I
mean the then then he ended up showing
later so he proposed this theory of
grammar which has movement there's other
places where he thought there's movement
not just auxiliary verbs but things like
the passive in English and things like
um questions wh questions a bunch of
places where he thought there's also
movement going on and and in each each
one of those these things there's words
well phrases and words are moving around
from one structure to another what you
call Deep structure to surface structure
I mean there's like two different
structures in his in his theory okay um
there's a different way to think about
this um which is there's no movement at
all there's a lexical copying rule such
that the word will or the word can these
these auxiliary verbs they just have two
forms and and and one of them is the
declarative and one of them is
interrogative and you basically have the
declarative one and oh I form the
interrogative or I can form one from the
other it doesn't matter which direction
you go and and I just have a new entry
which has the same meaning which has a
slightly different argument structure
argument structure just a fancy word for
The Ordering of the words and so if I
say you it was um the the dogs two dogs
can or will enter the room the the
there's two forms of will one is Will
declarative and and then okay I've got
my subject to the left it comes before
me and the verb comes after me in that
one and then the will interrogative it's
like oh I go first interrogative will is
first and then have the subject
immediately after and then the verb
after that and so you just you can just
generate from one of those words another
word with a slightly different argument
structure with different ordering and
these are just lexical copies they
they're not necessarily moving from one
to another there's no movement there's a
romantic notion that you have like one
main way to use a word and then you
could move it around right right which
is essentially what movement is implying
yeah but that's that's the lexical
copying is similar so then so then then
we we do Lex copying for that same idea
that maybe the declarative is the source
and then we can copy it and so an
advantage uh for there's multiple
advantages of the lexical copying story
it's not my story this is like um Ivan
SG linguists a bunch of linguists have
been proposing these stories as well you
know in tandem with the movement story
okay you know he's he Ivan soag died a
while ago but he was a one of the
proponents of the non-movement of the
lexical copying story and so that is
that um a great Advantage is well
Chomsky really famously in 1971 showed
that the movement story leads to
learnability problems it leads it leads
to problems for for how language is
learned it's really really hard to
figure out what the underlying structure
of a language is if you have both phrase
structure and movement it's like really
hard to figure out what came from what
there's like a lot of possibilities
there if you don't have that problem
learning that learning problem gets a
lot easier say there's lexical copies
and when we say the learning problem do
you mean like humans learning a new
language yeah just learning English so
baby is lying around listening to the
crib listening to me talk and is you
know how are they learning English or or
you know maybe it's a 2-year-old who's
learning you know interrogatives and
stuff or one you know there you how are
they doing that are they doing it from
like are they figuring out or like know
so Chomsky said it's impossible to
figure it out actually he said it's
actually impossible not not hard but
impossible MH and therefore that's that
that's where Universal grammar comes
from is that it has to be built in and
so what they're learning is uh that
there there's some built-in movement is
built in in his story is absolutely part
of your language module and uh and then
you are you're just setting parameters
you're you're said depending on English
is just sort of a variant of the
universal grammar and you're figuring
out oh which orders do does English do
these things that's the the non-movement
story doesn't have this it's like much
more
bottom up uh you're you're learning
rules you're learning rules one by one
and oh there's this this word is
connected to that word a great advant
another Advantage it's learnable another
advantage of it is that it predicts that
not all auxiliaries might move like it
it might depend on the word depending on
whether you and and and that turns out
to be true so there's words that um that
don't really work as auxiliary you they
work in declarative and not in in
interrogative so I can say um I'll give
you the opposite first if so I can say
aren't I invited to the party okay and
that's an that's an interrogative form
but it's not from I aren't invited to
the party there is no I aren't right so
that's that's interrogative only and and
then we also have forms like um ought uh
I I ought to do this and and I guess
some British old British people can say
exactly it doesn't sound right does it
for me it sounds ridiculous I don't even
think a is great but I mean I totally
recognize I ought to I is not too bad
actually I can say I ought to do this
that sounds if I'm trying to sound
sophisticated maybe I don't know it just
sounds completely out to me I yeah
anyway it's so there are variance here
uh and a lot of these words just work in
one versus is the other and and that's
like fine under the lexical copying
story it's like well you just learn the
usage whatever the usage is is what you
is what you do with this with with this
word but um it doesn't it's a little bit
harder in the movement story The
Movement story like that's an advantage
I think of lexical copying in all these
different places there's there's all
these usage variants which make the
movement story um a little bit harder to
work so one of the main divisions here
is the movement Story versus the C story
that has to do about the auxiliary warts
and so on but you if rewind to the
phrase structured grammar yeah versus
dependency grammar those are equivalent
in some sense in that for any dependency
grammar I can generate a dependence a
phrase structure grammar which generates
exactly the same sentences I just I just
like the dependency grammar uh formalism
because it makes something really
Salient which is the depend the the
lengths of dependencies between Words
which isn't so obvious in in the phrase
in the phrase structure it's just kind
of hard to see it's in there it's just
very very it's opaque uh technically I
think phrase structure grammar is
mappable to dependency grammar and vice
versa and vice versa yeah there's like
these like little labels SN PVP yeah for
a particular dependency grammar you can
make a phrase structure grammar which
generates exactly those same sentences
and vice versa but there are many phrase
structure grammars which you can't
really make a dependency grammar I mean
there you can do a lot more in a phrase
structure grammar you get many more of
these extra nodes basically you you can
have more structure in there uh and and
some people like that and and maybe
there's value to that I I I don't like
it well for you so we should clarify so
so dependency grammar it's just uh well
one word depends on only one other word
and you form these trees and that makes
it really puts priority on those
dependencies just like as a as a tree
that you can then measure the distance
of the dependency from one word to the
other they can then map to uh the
cognitive processing of the of these
sentences how well how easy it is to
understand and all that kind of stuff so
it just puts the focus on just like the
mathematical
um uh distance of dependence between
words so like it's just a different
Focus absolutely Ju Just continue on a
thread of chsky because it's really
interesting because it as you're
discussing
disagreement to the degree there's
disagreement you're also telling the
history of the study of language which
is really awesome so you mention context
free versus regular does that
distinction come into play for the peny
grammar no okay not at all I mean the
regular regular languages are too simple
for human languages they they are uh
they it's a part of the hierarchy but
human languages are in in the phrase
structure world are definite they
they're at least context free maybe a
little bit more a little bit harder than
that but uh so there's something called
context sensitive as well where you can
have like this is the just the formal
language description in in a context
free grammar you have one this is like a
bunch of like formal language Theory
we're doing here but I love it okay so
you have you have a left- hand side
category and you're expanding
Resume
Read
file updated 2026-02-14 19:38:43 UTC
Categories
Manage