Transcript

Aq9UPIXbtKI • Manolis Kellis: Biology of Disease | Lex Fridman Podcast #133
/home/itcorpmy/itcorp.my.id/harry/yt_channel/out/lexfridman/.shards/text-0001.zst#text/0457_Aq9UPIXbtKI.txt
Back Raw
Kind: captions
Language: en
the following is a conversation with
manolas kellis
his third time on the podcast he is a
professor at mit
and head of the mit computational
biology group
this time we went deep on the science
biology and genetics so this is a bit
of an experiment manolas went back and
forth
between the basics of biology to the
latest state of the art
and the research he's a master at this
so i just said back
and enjoyed the ride this conversation
happened at 7 00 am
so it's yet another podcast episode
after an all-nighter for me
and once again since the universe has a
sense of humor
this one was a tough one for my brain to
keep up
but i did my best and i never shy away
from good challenge quick mention of
your sponsor followed by some thoughts
related to
the episode first is sem rush
the most advanced seo optimization tool
i've ever come across
i don't like looking at numbers but
someone probably should
it helps you make good decisions second
is pessimist archive they're back one of
my favorite history podcasts on why
people
resist new things from recorded music to
umbrellas to cars
chess coffee and the elevator
third is eight sleep a mattress that
cools itself
measures heart rate variability has an
app and has given me yet another reason
to look forward to sleep including
the all-important power nap and finally
better help online therapy when you want
to face your demons with a licensed
professional
not just by doing the david goggins like
physical challenges
like i seem to do on occasion please
check out the sponsors in the
description to get a discount and
to support this podcast as a side note
let me say that biology
in the brain and in the various systems
of the body
fill me with awe every time i think
about how such
a chaotic mess coming from its humble
origins in the ocean
was able to achieve such incredibly
complex
and robust mechanisms of life that
survived
despite all the forces of nature that
want to destroy it
it is so unlike the computing systems we
humans have engineered
that it makes me feel that in order to
create artificial general intelligence
and artificial consciousness we may have
to completely rethink
how we engineer computational systems
if you enjoy this thing subscribe on
youtube review it with five stars in
apple podcast
follow on spotify support on patreon or
connect with me on twitter
at lex friedman and now here's my
conversation
with manolis kalis so your group at mit
is trying to understand the molecular
basis of human disease
what are some of the biggest challenges
in your view don't
get me started i mean irregularities
standing human disease
is the most complex challenge
in modern science so because human
disease
is as complex as the human genome
it is as complex as the human brain
and it is in many ways even more complex
because the more we understand disease
complexity
the more we start understanding genome
complexity and
epigenome complexity and brain circuitry
complexity
and immune system complexity and cancer
complexity and so on and so forth
so traditionally
human disease was following basic
biology
you would basically understand basic
biology and model organisms
like you know mouse and fly and yeast
you would understand sort of
mammalian biology and animal biology and
eukaryotic biology
in sort of progressive layers of
complexity
getting closer to human phylogenetically
and you would do perturbation
experiments in those species
to see if i knock out a gene what
happens
and based on the knocking out of these
genes you would basically then have a
way to drive
human biology because you would you
would sort of understand the functions
of these genes and then if you find that
a human gene
locus something that you've mapped from
human genetics
to that gene is related to a particular
human disease you say
now i know the function of the gene from
the model organisms
i can now go and understand the function
of that gene
in human but this is all changing this
is dramatically changed so that that was
the old way of doing basic biology you
would start with the animal models the
eukaryotic models
the mammalian models and then you would
go to human
human genetics has been so transformed
in the last
decade or two that human genetics is now
actually driving the basic biology
there is more genetic mutation
information in the human
genome than there will ever be in any
other species
what do you mean by mutation information
so perturbations is how you understand
systems
so an engineer builds systems and then
they know how they work from the inside
out
a scientist studies systems through
perturbations
you basically say if i poke that balloon
what's going to happen and i'm going to
film it in super high resolution
understand i don't know
aerodynamics or fluid dynamics if it's
filled with water etc so you can then
make experimentation by perturbation and
then the scientific process is sort of
building models that
best fit the data designing new
experiments that best test your models
and challenge your models and so forth
that's the same thing with science
basically if you're trying to understand
biological science
you basically want to do perturbations
that then drive
the models so how do these perturbations
allow you to understand
disease so if if you know that
a gene is related to disease you don't
want to just know that it's related to
the disease you want to know what is the
disease mechanism
because you want to go and intervene so
the way that i like to describe it is
that traditionally
epidemiology which is basically the
study of disease
you know sort of the observational study
of disease has been
about correlating one thing with another
thing so if you if you have a lot of
people with liver disease who are also
alcoholics
you might say well maybe the alcoholism
is driving the liver disease
or maybe those who have liver disease
self-medicate with alcohol
so that the connection could be either
way
with genetic epidemiology it's about
correlating changes in the genome
with phenotypic differences and then you
know the direction of causality
so if you know that a particular gene is
related
to the disease you can basically say
okay perturbing that gene in mouse
causes the mice to have x phenotype
so perturbing that gene in human causes
the humans to have the disease
so i can now figure out what are the
detailed molecular phenotypes
in the human that are related to that
organismal
phenotype in the disease so it's all
about understanding
disease mechanism understanding what are
the pathways what are the tissues
what are the processes that are
associated with the disease so that we
know how to intervene
you can then prescribe particular
medications that also alter these
processes
you can prescribe lifestyle changes that
also affect these processes
and so forth that's such a beautiful
puzzle to try to solve
like what kind of perturbations
eventually have this ripple effect that
leads to disease
across the population and then you study
that for animals
a mice first and then see how that might
possibly connect to
humans how hard is that puzzle of
trying to figure out how little
perturbations might lead
to in a stable way to a disease
in animals we make the puzzle
simpler because we perturb one gene at a
time
that's the beauty of it's the power of
animal models you can basically decouple
the perturbations you only do one
perturbation and you only do strong
perturbations at a time
in human the puzzle is incredibly
complex
because i mean obviously you don't do
human experimentation
you wait for natural selection and
natural genetic variation
to basically do its own experiments
which it has been doing for hundreds
and thousands of years in the human
population
and for hundreds of thousands of years
across
you know the the history leading to the
human population
so you basically take this natural
genetic variation
that we all carry within us every one of
us carries
six million perturbations so i've done
six million experiments on you
six million experiments for me six
million experiments on every one
of seven billion people on the planet
what's the six million correspond to
six million unique genetic variants
that are segregating the human
population every one of us
carries millions of polymorphic
sites poly many morph forms polymorphic
means many forms
variants that basically means that every
one of us has
single nucleotide alterations that we
have inherited from mom and from that
that basically can be thought of as tiny
little perturbations
most of them don't do anything but some
of them
lead to all of the phenotypic
differences that we see
between us the reason why two twins are
identical is because
these variants completely determine the
way that i'm going to look at exactly 93
years of age
how happy are you with this kind of data
set is it uh
large enough of the human population of
earth is that too big
too small yeah so so the the is it is it
large enough
is a power analysis question and in
every one of our grants we do a power
analysis
based on what is the effect size that i
would like to detect
and what is the natural variation
in the two forms so every time you do a
perturbation you're asking i'm changing
form a into for
b form a has some natural genetic vary
some natural phenotypic variation around
it
and form b has some natural phenotypic
variation around it
if those variances are large and the
differences between the mean of a and
the mean of b
are small then you have very little
power the further the means go apart
that's the effect size the more power
you have
and the smaller the standard deviation
the more power you have
so basically when you're asking is that
sufficiently large
certainly not for everything but we
already have enough power
for many of the stronger effects
in the more tight distributions so
that's a hopeful message that
there exists parts of the genome
that that have a strong effect that has
a small
variance that's exactly right
unfortunately
those perturbations are the basis of
disease in many cases
so it's not a you know hopeful message
sometimes it's a terrible message
it's basically well some people are sick
but if when
if we can figure out what are these
contributors to sickness
we can then help make them better and
help many other people better
who don't carry that exact mutation but
who carry
mutations on the same pathways
and that's what we like to call the
allelic series of a gene
you basically have many perturbations
of the same gene in different people
each with a different
frequency in the human population and
each with a different effect
on the individual charism so you said uh
in the past
there would be these small experiments
on perturbations and animal models
what does this puzzle solving process
look like today
so we basically have you know something
like seven billion people in the planet
and every one of them carries something
like six million mutations
you basically have an enormous matrix
of genotype by phenotype
by systematically measuring the
phenotype
of these individuals and the traditional
way of measuring this phenotype
has been to look at one trait at a time
you would gather families and you would
sort of paint
the pedigrees of a strong effect what we
like to call
mendelian mutation so a mutation that
gets transmitted in a dominant or a
recessive
but strong effect form where basically
one locus
plays a very big role in that disease
and you could then look at carriers
versus non-carriers
in one family carries versus
non-carriers in another family
and do that for hundreds sometimes
thousands of families
and then trace these inheritance
patterns and then figure out what is the
gene
that plays that role is this the matrix
that you're showing
in in talks or lectures
so that matrix is the
input to the stuff that i saw in talks
so basically that matrix has
traditionally been strong effect genes
what the matrix looks like now is
instead of pedigrees instead of families
you basically have thousands and
sometimes
hundreds of thousands of unrelated
individuals
each with all of their genetic variants
and each with their
phenotype for example height or lipids
or you know whether they're sick or not
for a particular trait
that has been the modern view instead of
going to families
going to unrelated individuals with one
phenotype at a time
and what we're doing now as we're
maturing
in all of these sciences is that we're
doing this in the context
of large medical systems or enormous
cohorts
that are very well phenotyped across
hundreds of
phenotypes sometimes with our complete
electronic health record
so you can now start relating not just
one gene segregating one family
not just thousands of variants
segregating with one phenotype
but now you can do millions of variants
versus hundreds of phenotypes
and as a computer scientist i mean
deconvolving that matrix partitioning it
into the layers of biology
that are associated with every one of
these elements
is a dream come true it's it's like the
world's greatest puzzle
and you can now solve that puzzle by
throwing in
more and more knowledge about the
function of different genomic regions
and
how these functions are changed across
tissues
and in the context of disease and that's
what my group and many other groups are
doing
we're trying to systematically relate
this genetic variation
with molecular variation at the
expression level
of the genes at the epigenomic level
of the gene regulatory circuitry and at
the cellular level
of what are the functions that are
happening in those cells at the single
cell level
using single cell profiling and then
relate all that
vast amount of knowledge computationally
with the thousands of traits that each
of these of thousands of variants
are perturbing i mean this is something
we talked about i think last time
so there's these effects at different
levels that happen you said at a single
cell level
you're trying to see things that happen
due to certain perturbations
and then so it's not just like a puzzle
of
um perturbation and disease
it's perturbation then effect at a
cellular level
at an organ level a body like
how do you disassemble this into like
what your group is working on
you're basically taking a bunch of the
hard problems in the space
how do you break apart a difficult
disease
uh and break it apart into problems that
you into puzzles that you can now start
solving
so there's a struggle here computer
scientists love hard puzzles
and they're like oh i want to you know
build a method that just deconvolves the
whole thing computationally
and you know that's very tempting and
it's very appealing
but biologists just like to decouple
that complexity experimentally to just
like peel off layers of complexity
experimentally and that's what many of
these modern tools that you know my
group and others have
both developed and used the fact that we
can now figure out tricks
for peeling off these layers of
complexity by testing
one cell type at a time or by testing
one cell
at a time and you could basically say
what is the effect of this genetic
variant associated with alzheimer's
on human brain human brain
sounds like oh it's an organ of course
just go one organ at a time
but human brain has of course dozens of
different brain regions
and within each of these brain regions
dozens of different cell types
and every single type of neuron every
single type of glial cell
between astrocytes oligodendrocytes
microglia
between you know all of the neural cells
and the vascular cells and the immune
cells
that are co-inhabiting the the brain
between the different types of
excitatory and inhibitory neurons that
are sort of interacting with each other
between different layers
of neurons in the cortical layers every
single one of these
has a different type of function
to play in cognition in
interaction with the environment in
maintenance
of the brain in energetic needs
in feeding the brain with blood with
oxygen
in clearing out the debris that are
resulting from the super high energy
production of cognition in in humans
so all of these things are basically um
potentially deconvolvable
computationally but experimentally
you can just do single cell profiling of
dozens of regions of the brain across
hundreds of individuals
across millions of cells and then now
you have
pieces of the puzzle that you can then
put back together
to understand that complexity
i mean first of all i mean the human
brain the cells in the human brain are
the most
okay maybe i'm romanticizing it but
cognition seems to be very complicated
so uh separating into the function
breaking alzheimer's down to
the cellular level seems very
challenging
is that basically you're trying to find
a way that
some perturbation and genome
results in some obvious
major dysfunction in the cell
you're trying to find something like
that exactly so so
what does human genetics do human
genetics basically looks at the whole
path
from genetic variation all the way to
disease
so human genetics has basically taken
thousands
of alzheimer's cases
and thousands of controls matched for
age for
sex for you know environmental
backgrounds and so forth and then looked
at that map
where you're asking what are the
individual genetic persuasions
and how are they related to all the way
to alzheimer's disease
and that has actually been quite
successful so we now have
you know more than 27 different loci
these are genomic regions
that are associated with alzheimer's at
this
end-to-end level but the moment you sort
of
break up that very long path into
smaller levels
you can basically say from genetics what
are the epigenomic alterations
at the level of gene regulatory elements
where that genetic variant perturbs
the control region nearby that effect is
much larger
you mean much larger in terms of this
down the line impact
or it's much larger in terms of the
measurable effect this a versus b
variance is actually so much cleanly
defined
when you go to the shorter branches
because for one genetic variant to
affect
alzheimer's that's a very long path that
basically means that in the context of
millions of these six million variants
that every one of us carries
that one single nucleotide has a
detectable effect
all the way to the end i mean it's just
mind-boggling that that's even possible
but indeed yeah but indeed there are
such effects
so the hope is or the most
scientifically speaking the
the most effective place where to detect
the alteration that results in disease
is
earlier on in the pipeline as early as
possible it's
it's a trade-off if you go very early on
in the pipeline
now each of these epigenomic alterations
for example this enhancer control region
is active maybe 50 less which is a
dramatic effect
now you can ask well how much does
changing one regulatory region in the
genome
in one cell type change disease well
that path is now long
so if you instead look at expression
the path between genetic variation the
expression of one gene goes through many
enhancer regions
and therefore it's a subtler effect at
the gene level but then now you're
closer because
one gene is acting on you know in the
context of only 20 000 other genes
as opposed to one enhancer acting in the
context of two million other enhancers
so you basically now have genetic
epigenomic the circuitry
transcriptomic the gene expression level
and then
cellular where you can basically say i
can measure various properties of those
cells
what is the calcium influx
rate when i have this genetic variation
what is the synaptic density what is the
electric
impulse conductivity and so on so forth
so you can measure things along this
path to disease and you can also measure
endophenotypes you can basically measure
you know
your brain activity you can do imaging
in the brain
you can basically measure i don't know
the heart rate the pulse the lipids
the amount of blood secreted and so
forth
and then through all of that you can
basically get at
the path to causality the path to
disease
and is there something beyond cellular
so you mentioned lifestyle
interventions or changes as a way to
or like be able to prescribe changes in
life style
like what what about organs what about
like
the function of the body as a whole yeah
absolutely so basically
when you go to your doctor they always
measure you know your pulse they always
measure your height those measure your
weight
your you know your bmis basically these
are just very basic variables
but with digital devices nowadays you
can start measuring hundreds of
variables for every individual
you can basically also phenotype
cognitively
through tests uh alzheimer's patients
there are cognitive tests that you can
imagine that you that you typically do
for uh cognitive decline these
minimental
you know observations that that you have
specific questions too
you can think of sort of enlarging the
set of cognitive tests
so in the mouse for example you do
experiments for how do they get out of
mazes
how do they find food whether they
recall a fear
whether they shake in a new environment
and so forth
in the human you can have much much
richer phenotypes where you can
basically say
not just imaging at the you know
organ level but and all kinds of other
activities at the organ level
but you can also do at the organism
level
you can do behavioral tests and how did
they do on empathy
how did they do on memory how did they
do on
long-term memory versus short-term
memory and so forth i love how you're
calling that phenotype
i guess it is it is but like your
behavior
patterns that might change over over uh
over a period of a life
it's yeah your ability to remember
things your ability to be
yeah empathetic or emotionally your
intelligence
perhaps even yeah but intelligence has
hundreds of variables
you can be your math intelligence your
literary intelligence your
puzzle-solving intelligence your logic
it could be like hundreds of things
and all of that is it's we were able to
measure that better and better so and
all that could be connected to the
entire pipeline
we used to think of each of these as a
single variable like intelligence i mean
that's ridiculous
it's basically dozens of different genes
that are controlling every single
variable you can basically think of you
know imagine us in a video game
where every one of us has measures of
you know strength stamina
you know energy left and so forth but
you could click on each of those like
five bars that are just the main bars
and
each of those will just give you then
hundreds of bars yeah and you can
basically say
okay great for my you know machine
learning task
i want someone who i'm a human
who has these particular forms of
intelligence i require now these
you know 20 different things and then
you can combine those things
and then relate them to of course
performance in a particular task
but you can also relate them to genetic
variation
that might be affecting different parts
of the brain
for example your frontal cortex versus
your temporal cortex versus your visual
cortex
and so forth so genetic variation that
affects expression of genes in different
parts of your brain
can basically affect your you know music
ability your auditory ability your smell
your you know just dozens of different
phenotypes
can be broken down into you know
hundreds of cognitive variables and then
relate each of those
to thousands of genes that are
associated with them
so somebody who loves rpgs role-playing
games
there's uh there's too few variables
that we can control so i'm excited
if we're in fact living in a simulation
and this is a video game
i'm excited by the quality of of the
video game the
the the the game designer did a hell of
a good job
so we're impressed oh i don't know the
sunset last night was a little
unrealistic
yeah yeah the graphics exactly come on
nvidia
to zoom back out we've been talking
about the
genetic origins of diseases but i think
it's fascinating to
talk about what are the most important
diseases to understand
and especially as it connects to the
things that you're working on
so it's very difficult to think about
important diseases to understand there's
many metrics of importance
one is lifestyle impact i mean if you
look at kovid the impact on lifestyle
has been enormous so understanding kovid
is important because it has impacted the
well-being
in terms of ability to have a job
ability to have an apartment ability to
go to work
ability to have a mental circle of
support
and all of that for you know millions of
americans like huge huge
impact so that's one aspect of
importance so basically mental disorders
alzheimer's has a huge importance in the
well-being of americans
whether or not it die it kills someone
for many many years it has a huge impact
so the first measure of importance is
just well-being
like impact on the quality of life
impact on the quality of life absolutely
the second metric which is much easier
to quantify is deaths
what is the number one killer the number
one killer
is actually heart disease it is actually
killing
650 000 americans per year
number two is cancer with 600 000
americans
number three far far down the list is
accidents
every single accident combined so
basically you you know you read the news
accidents like you know there was a huge
car crash all over the news
but the number of deaths number three by
far
167 000 lower respiratory disease so
that's
asthma not being able to breathe and so
forth 160 000
alzheimer's number four number five with
000 and then stroke brain aneurysms and
so forth that's 147
000 diabetes and metabolic disorders etc
that's 85 000.
the flu is 60 000 suicide
50 000 and then overdose et cetera
you know goes further down the list so
of course kovit has creeped up to be the
number three killer
this year with you know more than 100
000 americans
and counting um and you know
but but if you think about sort of what
do we use what are the most important
diseases you have to understand
both the quality of life and the
the sheer number of deaths and just
numbers of years lost if you wish
and and uh each of these diseases you
can think of as uh
and also including terrorist attacks and
school shootings for example
things which lead to fatalities you can
look at
as problems that could be solved
and some problems are harder to solve
than others
i mean that's part of the equation so
maybe if you look at these diseases if
you look at heart disease or cancer
or alzheimer's or just
like schizophrenia and obesity w like
not necessarily things that kill you but
affect the quality of life
which problems are solvable which aren't
which are harder to solve which aren't i
love your question because it puts it in
the context
of a global um effort
rather than just a local effort so
basically if you look at
the global aspect exercise
and nutrition are two interventions that
we can as a society
make a much better job at so if you
think about sort of the availability
of cheap food it's extremely high in
calories
it's extremely detrimental for you like
a lot of processed food
etc so if we change that equation
and as a society we made availability of
healthy food
much much easier and charged
a burger at mcdonald's the price that it
costs
on the health system then people would
actually start buying more healthy
foods so basically that's sort of a
societal intervention if you wish
in the same way increasing empathy
increasing education increasing the
social
framework and support would basically
lead to fewer suicides
it would lead to fewer murders it would
lead to fewer
you know deaths overall so
you know that's something that we as a
society can do you can you can also
think about external factors versus
internal factors so the external factors
are basically communicable diseases
like covid like the flu etc and
the internal factors are basically
things like you know cancer and
alzheimer's where basically your
your genetics will eventually you know
drive you there
um and then of course
with all of these factors every single
disease has both a genetic component
and environmental component so heart
disease you know huge then they
contribute
contribution alzheimer's it's like you
know
60 plus genetic
so i think it's like 79 heritability
so that basically means that genetics
alone explains
79 of alzheimer's incidence
and yes there's a 21 environmental
component
where you could basically enrich your
cognitive environment enrich your social
interactions read more books learn a
foreign language
go running you know sort of have a more
fulfilling life
all of that will actually decrease
alzheimer's but there's a limit to how
much that
that can impact because of the huge
genetic footprint so this is fascinating
so
each one of these problems have a
genetic component
and an environment component and so like
when there's a genetic component
what can we do about some of these
diseases what what have you worked on
what can you say that's uh in terms of
problems that are solvable here
or understandable so my group works on
the genetic component
but i would argue that understanding the
genetic component can have a huge impact
even on the environmental component why
is that
because genetics gives us access to
mechanism and if we can alter the
mechanism
if we can impact the mechanism we can
perhaps counteract
some of the environmental components
interesting so
understanding the biological mechanisms
leading to disease
is extremely important in being able to
intervene
but when you can intervene what you know
the analogy that i like to gay
to give is for example for obesity you
know think of it as a giant bathtub of
fat there's basically fat coming in from
your diet
and there's fat coming out from your
exercise
okay that's an in out equation and
that's the equation that everybody's
focusing on
but your metabolism impacts that
you know bathtub basically your
metabolism controls
the rate at which you're burning energy
it controls away the rate at which
you're storing energy
and it also teaches you about
the various valves that control the
input and the output equation
so if we can learn from the genetics
the valves we can then manipulate those
valves
and even if the environment is feeding
you a lot of fat
and getting a little that out you just
poke another hole at the bathtub
and just get a lot of the fat out yeah
that's fascinating
yeah so that we're not just passive
observers of our genetics
the more we understand the more we can
come up with actual treatments
and i think that's an important uh
aspect to realize
when people are thinking about strong
effect versus weak effect variants
so some variants have strong effects we
talked about these mendelian disorders
where a single gene has a sufficiently
large effect
pen and trans expressivity and so so
forth that basically
you can um trace it in families with
cases and not cases cases not cases and
so forth
but even the you know but
so so these are the genes that everybody
says oh that's the genes we should go
after
because that's a strong effect gene i
like to think about it slightly
differently
these are the genes where genetic
impacts that have a strong effect were
tolerated
because every single time we have a
genetic association with disease
it depends on two things number one the
obvious one
whether the gene has an impact on the
disease number two the more subtle one
is whether there is genetic in variation
standing and circulating and segregating
in the human population
that impacts that gene some genes
are so darn important that if you mess
with them
even a tiny little amount that person is
dead
so those genes don't have variation
you're not going to find the genetic
association if you don't have variation
that doesn't mean that the gene has no
role
it's simply that the gene it simply
means that the gene tolerates no
mutations
so that's actually a strong signal when
there's no variation that's so fast
exactly genes that have very little
variation
are hugely important you can actually
rank the importance of genes
based on how little variation they have
and those genes that have very little
variation but no association
with disease that's a very good metric
to say oh that's probably a
developmental gene
because we're not good at measuring
those phenotypes so it's genes that you
can tell
evolution has excluded mutations from
but yet we can't see them associated
with anything that we can measure
nowadays
it's probably early embryonic lethal
what are all the words you just said
earlier in brionic what
lethal meaning meaning that if you don't
have it
okay there's a bunch of stuff that um
is required for a stable functional
organism
exactly across the board for our entire
for for entire species i guess if you
look at sperm
it expresses thousands of proteins
does sperm actually need thousands of
proteins no
but it's probably just testing them
so my speculation is that misfolding of
these proteins is an early test for
failure
so that out of the you know millions of
sperm
that are possible you select the subset
that are just not grossly misfolding
thousands of proteins
so it's kind of an assert uh that this
is followed
correctly correct yeah this uh just uh
because
if this little thing about the folding
of a protein is incorrect
that probably means somewhere down the
line there's a bigger issue
that's exactly right so fail fast so
basically if you look at
the mammalian investment in
a new born that investment is enormous
in terms of resources
so mammals have basically evolved
mechanisms
for fail fast where basically in those
early
months of development i mean it's it's
horrendous of course at the personal
level
when you lose a uh you know your future
child
but in some ways
there's so little hope for that child to
develop
and sort of make it through the
remaining months that sort of fail fast
is probably
a good evolutionary principle from an
evolutionary perspective for
mammals and of course humans
have a lot of medical resources that you
can sort of give those children a chance
and you know we have so much more
success in sort of giving folks who have
these strong carrier mutations a chance
but if they're not even making it
through the first three months
we're not going to see them so that's
why when we
when we say what are the most important
genes to focus on the ones that have a
strong effect
mutation or the ones that have a weak
effect mutation
well you know the jury might be out
because the ones that have a strong
effect mutation
are basically you know not mattering as
much
the ones that only have weak effect
mutations
by understanding through genetics that
they have a weak effect mutation
and understanding that they have a
causal role on the disease
we can then say okay great evolution has
only tolerated a two percent
change in that gene pharmaceutically
i can go in and induce a 70 change in
that gene
and maybe i will poke another hole at
the bathtub
that was not easy
to control in you know many of the other
sort of strong effect genetic variants
so okay so there's this beautiful map
of uh across the population of things
that
you're saying strong and weak effects so
stuff with a lot of
mutations and stuff with little
mutations with
no mutations and you have this map and
it's it lays out the puzzle
yeah so so when i say strong effect i
mean at the level of individual
mutations so so basically
genes where so so
you have to think of first the effect of
the gene on the disease remember how i
was sort of
painting that map earlier from genetics
all the way to phenotype
that gene can have a strong effect on
the disease
but the genetic variant might have a
weak effect on
the gene so basically when you ask
what is the effect of that genetic
variant on the disease
it could be that that genetic variant
impacts the gene by a lot
and then the gene impacts the disease by
a little or it could be that the genetic
variant
impacts the gene by a little and then
the gene impacts the disease by a lot
so what we care about is genes that
impact the disease a lot
but genetics gives us the full equation
and what i would argue
is if we couple the genetics
with expression variation to basically
ask what
genes change by a lot
and you know which genes correlate with
disease by a lot
even if the genetic variants change them
by a little
then that those are the best places to
intervene
those are the best places where
pharmaceutically if i have
even a modest effect i will have a
strong effect on the disease
whereas those genetic variants that have
a huge effect on the disease i might not
be able to change that gene by this much
without affecting all kinds of other
things
interesting so yeah okay so that's what
we're looking at then
what have we been able to find in terms
of
which disease could be helped again
don't get me started this is um
we have found so much our understanding
of disease
has changed so dramatically with
genetics i mean places that we had no
idea would be involved
so one of the worst things about my
genome is that i have a genetic
predisposition to
age-related macular degeneration amd
so it's a form of blindness that causes
you to to lose the central part of your
vision
progressively as you grow older my
increased risk
is fairly small i have an eight percent
chance you only have a six percent
chance
you i'm on average yeah by the way when
you say my you mean literally yours
you know this about you i know this
about me
yeah which is kind of uh
i mean uh philosophically speaking is a
pretty powerful thing
so to live with i mean maybe that's uh
so we agreed to talk again by the way
for the
listeners to where we're going to try to
focus on science today and
a little bit of philosophy next time but
it's uh interesting to think about
the more you're able to know about
yourself from the genetic information in
terms of the diseases
how that changes your own view of life
yeah
so there's there's a lot of impact there
and there's a
something called genetic exceptionalism
which basically thinks of genetics as
something very very different
than everything else as a type of
determinism
and um you know let's talk about that
next time
so basically it's a good preview yeah so
let's go back to amd so basically with
amd
we have no idea what causes amd you know
it was it was a mystery
until the genetics were worked out and
now the fact that i know that i have a
predisposition
allows me to sort of make some life
choices number one
but number two the genes that lead to
that predisposition give us insights as
to how does it actually work
and that's a place where genetics gave
us something totally unexpected
so there's a complement pathway
which is an immune function pathway that
was in you know most of the loci
associated with amd and that basically
told us that wow there's an immune basis
to this eye disorder
that people had just not expected before
if you look at complement
it was recently also implicated in
schizophrenia
and there's a type of microglia
that is involved in synaptic pruning so
synapses are the connections between
neurons
and in this whole use it or lose it view
of
mental cognition and other capabilities
you basically have uh microglia which
are immune cells that are sort of
constantly traversing your brain
and then pruning neuronal connections
pruning synaptic connections
that are not utilized so
in schizophrenia there's thought to be
a change in the pruning that basically
if you don't prune your synapses the
right way
you will actually have an increased role
of schizophrenia this is something that
was completely unexpected
for schizophrenia of course we knew it
has to do with neurons but the role of
the complement complex
which is also implicated in amd which is
now also implicating schizophrenia was a
huge surprise what's the complement
complex
so it's basically a set of genes the
complement genes
that are basically having various immune
roles and as i was saying earlier our
immune system has been co-opted
for many different roles across the body
so they actually play
many diverse roles and somehow the
immune system
is connected to the synaptic pruning
process exact process
exactly so immune cells were co-opted to
prune synapses how did you figure this
out
how does one go about figuring this
intricate connection
uh like pipeline of connections out yeah
let me give you another example
so so alzheimer's disease the first
place that you would expect it to act is
obviously the brain
so we had basically this roadmap
epigenomics consortium view of the human
epigenome
the largest map of the human epigenome
that has ever been built
across 127 different tissues and samples
with dozens of epigenomic marks measured
in you know
hundreds of donors so what we've
basically
learned through that is that you you
basically can map
what are the active gene regulatory
elements for every one of the tissues in
the body
and then we connected these gene
regulatory
active maps of basically what regions
of the human genome are turning on in
every one of different tissues
we then can go back and say where are
all
the genetic loci that are associated
with disease
this is something that my group i think
was the first to do back in 2010
in this ernst nature biotech paper
but basically we were for the first time
able to show that specific
chromatin states specific epigenomic
states in that case enhancers
were in fact enriched enriched in
disease associated variants
we pushed that further in the ernst
nature paper a year later
and then in this roadmap epigenomics
paper
you know a few years after that but
basically that
matrix that you mentioned earlier was in
fact the first time that we could see
what genetic traits have genetic
variants that are enriched
in what tissues in the body
and a lot of that map made complete
sense if you looked at
a diversity of immune traits like
allergies and type 1 diabetes and so
forth
you basically could see that they were
enriching that the genetic variants
associated with those traits
were enriched in enhancers in these gene
regulatory elements
active in t cells and b cells and
hematopoietic stem cells and so forth
so that basically gave us a
confirmation in many ways that those
immune traits were instead
indeed enriching immune cells if you
look
if you if you looked at type 2 diabetes
you basically saw an enrichment in only
one type of sample and it was pancreatic
eyelids
and we know that type 2 diabetes in you
know sort of stems from the
dysregulation of insulin
in the beta cells of pancreatic eyelids
and that sort of was
you know spot on super precise if you
looked at blood pressure
where would you expect blood pressure to
occur
you know i don't know maybe in your
metabolism in ways that you process
coffee or something like that maybe in
your brain the way that you stress out
increases your blood pressure etc
what we found is that blood pressure
localized specifically
in the left ventricle of the heart so
the enhancers of the left technology in
the heart
contain a lot of genetic variants
associated with blood pressure
if you look at height we found an
enrichment specifically
in embryonic stem cell enhancers so the
genetic variants predisposing you to be
taller or shorter
are in fact acting in developmental stem
cells makes
complete sense if you looked at
inflammatory bowel disease
you basically found inflammatory which
is immune
and also bowel disease which is
digestive
and indeed we saw a double enrichment
both in the immune cells
and in the digestive cells so that
basically told us that
this is acting in both components
there's an immune component to
inflammatory bowel disease
and there's a digestive component and
the big surprise was for alzheimer's
we had seven different brain samples
we found zero enrichment in the brain
samples
for genetic variants associated with
alzheimer's and this is mind-boggling
our brains were literally hurting what
is going on
and what is going on is that the brain
samples are primarily
neurons oligodendrocytes and astrocytes
in terms of the cell types that make
them up
so that basically indicated that genetic
variants associated with alzheimer's
were probably not acting in
oligodendrocytes astrocytes or neurons
so what could they be acting in well the
fourth major cell type is actually
microglia
microglia are resident immune cells in
your brain
oh nice the immune oh wow
and they are cd14 plus which is this
sort of cell surface markers uh of those
cells
so their cd14 plus cells just like
macrophages that are circulating
in your blood the microglia are
resident monocytes that are basically
sitting in your brain they're
tissue-specific
monocytes and every one of your tissues
like your your fat for example
has a lot of macrophages that are resin
and the m1 versus m2 macrophage ratio
has a huge role to play in obesity and
you know so basically again these immune
cells are everywhere but basically what
we found
through this completely unbiased view of
what are the tissues that likely
underlie different disorders
we found that alzheimer's was
humongously enriched in microglia but
not at all in the other cell types so
what what are we supposed to make that
if you
look at the tissues involved is that
simply
useful for indication of uh
propensity for disease or does it give
us somehow a pathway of treatment
it's very much the second if you look at
the
um the way to therapeutics you have to
start somewhere
what are you gonna do you're gonna
basically make assays
that manipulate those genes
and those pathways in those cell types
so before we know the tissue of action
we don't even know where to start
we basically are at a loss but if you
know the tissue of action and even
better if you know the pathway of action
then you can basically screen your small
molecules
not for the gene you can screen them
directly for the pathway
in that cell type so you can basically
develop a high throughput multiplexed
you know robotic system for testing
the impact of your favorite molecules
that you know are safe efficacious and
you know sort of
hit that particular gene and so forth
you can basically screen those molecules
against either a set of genes that act
in that pathway
or on the pathway directly by having a
cellular assay
and then you can basically go into mice
and do experiments and basically
sort of figure out ways to manipulate
these processes
that allow you to then to go back to
humans and do a clinical trial that
basically says okay
i was able indeed to reverse these
processes in mice can i do the same
thing in humans
so that the the knowledge of the tissues
gives you the pathway
to treatment but that's not the only
part there are many
additional steps to figuring out the
mechanism of disease
i mean so that's really promising maybe
uh
to take a small step back you've you've
mentioned all these puzzles that were
figured out with the nature paper
for i mean you've mentioned a ton of
diseases
from obesity to alzheimer's even
schizophrenia i think you mentioned
and just what is the actual methodology
of figuring this out
so indeed i mentioned a lot of diseases
and and my lab works on a lot of
different disorders
and the reason for that is that
if you look at the
if you look at biology
it used to be you know zoology
departments in both technology
departments and you know virology
departments and so on so forth and mit
was one of the first schools to
basically create a biology department
like oh we're going to study
all of life suddenly why was that even
the case
because the advent of dna and the genome
and the central dogma of dna makes rna
mixed protein
in many ways unified biology you could
suddenly
study the process of transcription in
viruses
or in bacteria and have a huge impact on
yeast and fly and maybe even mammals
because of this realization of these
common underlying processes
and in the same way that dna unified
biology
genetics is unifying disease
studies so you used to have
um you used to have
uh you know i don't know um
cardiovascular disease department
and uh you know neurological disease
department
and neurodegeneration department and uh
you know um basically immune and cancer
and so forth
and all of these were studied in
different labs
you know because it made sense because
basically the first step was
understanding how the tissue functions
and we kind of knew the tissues involved
in cardiovascular disease and so forth
but what's happening with human genetics
is that all of that
all of these walls and edifices that we
had built are
crumbling and the reason for that is
that
genetics is in many ways revealing
unexpected connections so suddenly we
now have to bring the immunologists
to work on alzheimer's they were never
in the room they were in another
building altogether
the same way for schizophrenia we now
have to sort of worry about
all these interconnected aspects for
metabolic disorders we're finding
contributions from brain
so suddenly we have to call the
neurologist from the other building and
so forth
so in my view it makes no sense
anymore to basically say oh i'm a
geneticist
studying immune disorders i mean that's
that's ridiculous because i mean yeah of
course in many ways
you still need to sort of focus but what
what what we're doing is that we're
basically saying we'll go wherever the
genetics takes us
and by building these massive resources
by working on our latest map is now 833
tissues
sort of the the next generation of the
epigenomics roadmap which we're now
called epimap
is 833 different tissues and using those
we've basically found enrichments in 540
different disorders
those enrichments are not like oh great
you guys work on that and we'll work on
this
they're intertwined amazingly so of
course there's a lot of modularity
but there's these enhancers that are
sort of broadly active and these
disorders that are broadly active
so basically some enhancers are active
in on tissues and some disorders are
enriching
in all tissues so basically there's
these multifactorial
and this other class which i like to
call polyfactorial diseases
which are basically lighting up
everywhere and
in many ways it's you know sort of
cutting across
these walls that were previously built
across these departments
and the polyfactorial ones were probably
the previous
structure departments wasn't equipped to
deal with those
i mean again maybe it's a romanticized
question but you know there's
in physics there's a theory of
everything do you think
it's possible to move towards an almost
theory of everything of disease from a
genetic perspective
so if this unification continues is it
possible that
like do you think in those terms like
trying to arrive
at a fundamental understanding of how
disease emerges period
that unification is not just
foreseeable it's inevitable
i see it as inevitable we have to go
there
you cannot be a specialist anymore
if you're a genomicist you have to be a
specialist
in every single disorder and the reason
for that is that
the fundamental understanding of the
circuitry
of the human genome that you need to
solve
schizophrenia that fundamental circuitry
is hugely important to solve alzheimer's
and that same circuitry is hugely
important to solve metabolic disorders
and that same exact circuitry is uh
hugely important for solving immune
disorders and cancer
and you know every single disease so
all of them have the same sub task
and i teach dynamic programming in my
class dynamic program is all about sort
of
not re doing the work it's reusing the
work that you do once
so basically for us to say oh great you
know you guys in the immune building
go solve the fundamental circuitry of
everything and then you guys in the
schizophrenia building go solve the
fundamental circuitry of everything
separately is crazy so what we need to
do is come together
and sort of have a circuitry group the
circuitry building that sort of
tries to solve the circuitry of
everything and then
the immune folks who will apply this
knowledge
to all of the disorders that are
associated with
immune dysfunction and the schizophrenia
folks
will basically interact with both the
immune folks and with the neuronal folks
and all of them will be interacting with
the circuitry folks and so forth so
that's sort of the current
structure of my group if you wish so
basically what we're doing is
focusing on the fundamental circuitry
but at the same time we're the users of
our own tools
by collaborating with many other labs
in every one of these disorders that we
mentioned we basically have a heart
focus
on cardiovascular disease coronary
artery disease heart failure and so
forth
we have an immune focus on
several immune disorders we have a
cancer focus
on metastatic melanoma and immunotherapy
response
we have a psychiatric disease focus
on schizophrenia autism ptsd
and other psychiatric disorders we have
an alzheimer's and neurodegeneration
focus
on huntington disease als and
you know ad related disorders like
frontotemporal dementia and lewy body
dementia
and of course a huge focus on
alzheimer's we have a metabolic focus
on the role of exercise and diet
and sort of how they're impacting
metabolic
you know organs across the body and
across many different tissues
and all of them are interfacing
with the circuitry and the reason for
that
is another computer science principle of
eat your own dog food if everybody
ate their own dog food dog food would
taste a lot better
the reason why microsoft excel and word
and powerpoint was so important and so
successful is because the employees
that were working on them were using
them for their day-to-day tasks
you can't just simply build a circuitry
and say
here it is guys take the circuitry we're
done without being the
users of that circuitry because you then
go back and
because we span the whole spectrum from
profiling the epigenome
using comparative genomics finding the
important nucleotides in the genome
building the basic functional map of
what are the genes in the human
genome what are the gene regulatory
elements of the human genome
i mean over the years we've written a
series of papers on how do you find
human genes in the first place
using comparative genomes how do you
find the motifs
that are the building blocks of gene
regulation used in comparative genomics
how do you then find how these motifs
come together
and act in specific tissues using
epigenomics
how do you link regulators to enhancers
and enhancers to their target genes
using
epigenomics and regulatory genomics so
through the years we've basically built
all this infrastructure for
understanding what i like to say
every single nucleotide of the human
genome
and how it acts in every one of the
major cell types and tissues of the
human body
i mean this is no small task this is an
enormous task that takes the entire
field
and that's something that my group has
taken on along with many other groups
and we have also and that sort of thing
sets my group perhaps apart
we have also worked with specialists in
every one of these disorders
to basically further our understanding
all the way down to disease
and in some cases collaborating with
pharma to go all the way down to
therapeutics
because of our deep deep understanding
of that basic circuitry
and how it allows us to now improve the
circuitry
not just treat it as a black box but
basically go and say okay we need a
better
cell type specific wiring that we now
have
at the tissue specific level so we're
focusing on that because we're
understanding
you know the needs from the disease
front so you have a sense of the entire
pipeline
i mean one maybe you can indulge me one
nice question to ask would be
how do you from the scientific
perspective
go from knowing nothing about the
disease
to going you said uh
to go into the entire pipeline and
actually have a drug
or or a treatment that cures that
disease
so that's an enormously long path
and an enormously great challenge and
what i'm trying to argue is that
it progresses in stages of understanding
rather than one gene at a time
the traditional view of biology was you
have one postdoc working on this gene
and another prosthetic working on that
gene and
they'll just figure out everything about
that gene and that's their job
what we've realized is how polygenic the
diseases are so we can't have one
postdoctoral gene anymore
we now have to have these
cross-cutting needs and
i'm going to describe the path to
circuitry
along those needs and every single one
of these paths
we are now doing in parallel across
thousands of genes
so the first step is you have a genetic
association
and we talked a little bit about sort of
the mendelian path
and the polygenic path to that
association so the mendelian path was
looking through families
to basically find gene regions
and ultimately genes that are underlying
particular disorders
the polygenic path is basically looking
at
unrelated individuals in this giant
matrix of genotype by phenotype
and then finding hits where a particular
variant impacts
disease all the way to the end and then
we now have
a connection not between a gene and a
disease
but between a genetic region and a
disease
and that distinction is not understood
by most people
so i'm going to explain it a little bit
more
why do do we not have a connection
between a gene
and a disease but we have a connection
between a genetic region and a disease
the reason for that is that 93
of genetic variants that are associated
with disease don't
impact the protein at all
so if you look at the human genome
there's 20 000 genes there's 3.2 billion
nucleotides
only 1.5 percent of the genome
codes for proteins
the other 98.5
does not code for proteins if you now
look at where are the disease variants
located
93 percent of them fall in that
outside the genes portion of course
genes are enriched
but they're only enriched by a factor of
three
that means that still 93 of genetic
variants
fall outside the proteins
why is that difficult why is that a
problem the problem is that when a
variant falls outside the gene
you don't know what gene is impacted by
that variant you can't just say oh
it's near this gene let's just connect
that variant to the gene
and the reason for that is that the
genome circuitry
is very often long range
so you basically have that genetic
variant that could sit in the intron
of one gene and an intron is sort of
the place between the axons that code
for proteins so proteins are split up
into exons and introns and every exon
codes for a particular subset of amino
acids
and together they're spliced together
and then make the final protein
so that genetic variant might be sitting
in an intron of a gene it's transcribed
with the gene
it's processed and then excised but it
might not impact this gene at all it
might actually impact
another gene that's a million
nucleotides away so it's just riding
along even though it has nothing to do
with the
with this nearby neighborhood that's
exactly right
let me give you an example the strongest
genetic association with obesity
was discovered in this fto gene
fat and obesity-associated gene so
this fto gene was studied
ad nauseum people did tons of
experiments on
on it they figured out that fto is in
fact
a rna methylation transferase it
basically
crea it sort of impacts something that
we know that we call the
epi transcriptome just like the genome
can be modified
the transcriptome the transcript of the
genes can be modified
and we basically said oh great that
means that that ap transcriptomics is
hugely involved in obesity because that
that gene fto is is you know uh clearly
where the genetic locus
is at my group studied
fto in collaboration with you know a
wonderful team
led by melina klausmitzer and what we
found
is that this fto locus even though it
is associated with obesity does not
implicate
the fto gene
the genetic variant sits in the first
intron of the fdo gene
but it controls two genes irx3
and ir x5 that are sitting 1.2
million nucleotides away several genes
away
oh boy uh what am i supposed to feel
about that because isn't that like super
complicated then
uh so so the way that i was introduced
at a conference a few years ago
was uh and here's manolis kellys who
wrote the most depressing paper
of 2015 and the reason for that is that
the entire pharmaceutical industry was
so comfortable
that there was a single gene in that
locus
because in some loci you basically have
three dozen genes that are all sitting
in the same region of association
and you're like gosh which ones of those
is it but even that question of which
ones of those
is it is making the assumption that it
is one of those
as opposed to some random gene just far
far away which is what our paper showed
so basically what our paper showed is
that you can't ignore the circuitry
you have to first figure out the
circuitry all of those long-range
interactions
how every genetic variant impacts the
expression of every gene
in every tissue imaginable across
hundreds of individuals
and then you now have one of the
building blocks not even all of the
building blocks
for them going and understanding disease
so okay so so embrace
the the wholeness of the circuitry
correct but what
so back to the question of starting
knowing nothing
to the disease and and going to the
treatment so
what are the next steps so you basically
have to first figure out the tissue
and then describe how you figure out the
tissue you figure out the tissue by
taking all of these
non-coding variants that are sitting
outside proteins
and then figuring out what are the
epigenomic enrichments
and the reason for that you know
thankfully
is that there is convergence that
the same processes are impacted in
different ways
by different loci and that's
a saving grace for our field the fact
that
if i look at hundreds of genetic
variants associated with alzheimer's
they localize in a small number of
processes
can you clarify why that's helpful so
like they show up in the same exact way
in the in the specific set of processes
yeah so basically there's a small number
of biological processes
that underlie or at least that play them
the biggest role
in every disorder so in alzheimer's you
basically have
you know maybe 10 different types of
processes one of them is lipid
metabolism
one of them is immune cell function one
of them is
neuronal energetics so these are just a
small number of processes but you have
multiple lesions multiple genetic
perturbations that are associated with
those processes
so if you look at schizophrenia it's
excitatory neuron function it's
inhibitory neuron function it's synaptic
pruning it's calcium signaling and so
forth
so when you look at disease genetics
you have one hit here and one hit there
and one hit there and one hit there
completely different parts of the genome
but it turns out all of those he
hits are calcium signaling proteins oh
cool
you're like aha that means that calcium
signaling is important
so those people who are focusing on one
doctors at a time cannot possibly
see that picture you have to become a
genomicist you have to
look at the omics the um the holistic
picture
to understand these enrichments but you
mentioned the convergence thing so the
the whatever the thing associated with
the disease
shows up so let me explain convergence
yeah convergence is such a beautiful
concept
so you basically have these four genes
that are converging on calcium signaling
so that basically means that they are
acting each in their own way
but together in the same process
but now in every one of these loci you
have
many enhancers controlling each of those
genes
that's another type of convergence where
dysregulation of seven different
enhancers
might all converge on this regulation of
that one gene
which then converges on calcium
signaling
and in each one of those enhancers you
might have multiple genetic variants
distributed across many different people
everyone
has their own different mutation but all
of these mutations are impacting that
enhancer and all of these enhancers are
impacting that gene
and all of these genes are impacting
this pathway and all these pathways are
acting in the same tissue
and all these tissues are converging
together on the same biological process
of schizophrenia
and and you're saying the saving grace
is that that conversion seems to happen
for a lot of these diseases
for all of them basically that for every
single disease that we've looked at
we have found an epigenomic enrichment
how do you do that
you basically have all of the genetic
variants associated with the disorder
and then you're asking for all of the
enhancers active in a particular tissue
for 540 disorders we've basically found
that indeed
there is an enrichment that basically
means that there is commonality
and from the commonality we can just get
insights
so to explain in mathematical terms
we're basically
building an empirical prior
we're using a bayesian approach to
basically say great all of these
variants
are equally likely in a particular locus
to be important
energy so in a genetic locus you
basically have
a dozen variants that are co-inherited
because the way that inheritance works
in the human genome is through all of
these recombination events
during meiosis you basically have
you know you you inherit maybe three
chromosome three for example in your in
your body it's inherited from four
different
parts one part comes from your dad
another part comes from your mom another
part comes from your dad and other part
comes from your mom so basically
the way that it i'm sorry from your
mom's mom
so you basically have one copy that
comes from your dad and one copy that
comes from your mom
but that copy that you got from your mom
is a mixture
of her maternal and her paternal
chromosome
and the copy that you got from your dad
is a mixture of his maternal and his
paternal chromosome
so these break points that happen when
chromosomes are lighting up
and lining up are basically ensuring
through these crossover events they're
ensuring that every
uh child cell
during the process of meiosis where you
basically have
you know one spermatozoid that basically
couples with one oval
to basically create one egg to basically
create the zygote
you basically have half of your genome
that comes from that and half of your
genome that comes from mom
but in order to light up not line them
up you basically have these crossover
events
these crossover events are basically
leading to
co-inheritance of that entire block
coming from the your maternal
grandmother and that entire
block coming from your mother grand
grandfather over many generations
these crossover events don't happen
randomly
there's a protein called prdm9 that
basically
guides the double-stranded breaks
and then leads to these crossovers
and that protein has a particular
preference to only a small number of hot
spots
of recombination which then lead to a
small number
of breaks between these co-inheritance
patterns
so even though there are six million
variants there are six million loci
there there's you know this variation is
inherited in blocks
and every one of these blocks has like
two dozen genetic variants that are all
associated
so in the case of fto it wasn't just one
variant
it was 89 common variants that were all
humongously associated with obesity
which ones of those is the important one
well if you look at only one locus you
have no idea
but if you look at many loci you
basically say aha
all of them are enriching in the same
epigenomic map in that particular case
it was
mesenchymal stem cells so these are the
progenitor cells
that give rise to your brown fat
and your white fat progenitor is like
the early on developmental substance
so you start from one zygote and that's
a totipotent
cell type it can do anything you then
differ you know that cell divides
divides divides
and then every cell division is
leading to specialization where you now
have
a mesodermal lineage and ectodermal
lineage and endodermal lineage
that basically leads to different parts
of your day or your body
the ectoderm will basically give rise to
your skin
ecto means outside derm is skin
so ectoderm but it also gives rise to
your neurons and your whole brain so
that's a lot
of ectoderm mesoderm gives rise to your
internal organs
including the vasculature and you know
your muscle and stuff like that
so you basically have this progressive
differentiation
and then if you look further further
down that lineage you basically have one
lineage that will give rise to both your
muscle
and your bone but also your fat
and if you go further down the lineage
of your fat
you basically have your white fat cells
these are the cells that store energy so
when you eat a lot but you don't
exercise too much there's an excess
a set of calories a lot excess energy
what you do with those
you basically create you spend a lot of
that energy to create these high-energy
molecules
lipids which you can then
burn when you need them on a rainy day
so that leads to obesity if you don't
exercise and if you overeat
because your body is like oh great i
have all these calories i'm going to
store them
more calories i'm going to store them
too oh more calories and
the you know 42 of european chromosomes
have a predisposition to storing fat
which was selected probably in
the you know food scarcity periods
like basically as we were exiting africa
you know before and during the ice ages
you know there was probably a selection
to those individuals who made it north
to basically be able to store energy you
know a lot more energy
so you basically now have this lineage
that is deciding whether you want to
store energy in your white fat
or burn energy in your base fat
it turns out that your fat is you know
we
like we we have such a bad view of fat
fat is your best friend
fat can both store all these excess
lipids that would be otherwise
circulating through your
you know body and causing damage but it
can also burn calories directly
if you have too much of energy you can
just choose to just burn some of that as
heat
so basically when you're cold you're
burning energy
to basically warm your body up and
you're burning all these lipids and
you're burning all these scatters
so what we basically found is that
across the board
genetic variants associated with obesity
across many of these regions were all
enriched
repeatedly in mesenchymal
stem cell enhancers so that gave us a
hint as to which of these genetic
variants
was likely driving this whole
association
and we ended up with this one genetic
variant
called rs1421085
and that genetic variant out of the 89
was the one that we predicted to be
causal for the disease
wow so going back to those steps first
step is figure out the relevant tissue
based on the global enrichment second
step is figure out the causal variant
among many variants in this linkage
disequilibrium in this co-inherited
block
between these recombination hotspots
these boundaries of these inherited
blocks
that's the second step the third step is
once you know that causal variant
try to figure out what is the motif that
is disrupted
by that causal variant basically how
does it act variants don't just disrupt
elements
they disrupt the binding of specific
regulators
so basically the third step there was
how do you find the motif
that is responsible like the gene
regulatory
word the building block of gene
regulation that is responsible
for that disregulatory event and the
fourth step is finding out what
regulator normally binds that motif and
is now
no longer able to bind and then once you
have the regulator can you then try to
figure out how to
what uh after it developed how to fix it
that's exactly right you now know how to
intervene you have basically
a regulator you have a gene that you can
then perturb and you say well maybe that
regulator
has a global role in obesity i can
perturb the regulator
just to clarify when we say perturb like
on the scale of a human life can a human
being be helped
of course of course yeah so i guess her
understanding is the first step
no no but perturbed basically means you
now develop therapeutics pharmaceutical
therapeutics against that
or you develop other types of
intervention that affect the expression
of that gene
what do uh pharmaceutical therapeutics
look like
when your understandings in a genetic
level
yeah sorry if it's a dumb question no no
it's a brilliant question but i want to
save it for a little bit later when we
start talking about therapeutics
perfect we've talked about the first
four steps there's two more
so basically the first step is figure
out i mean the zeroth step the starting
point is the genetics
the first step after that is figure out
the tissue of action
the second step is figuring out the
nucleotide
that is responsible or set of
nucleotides the third step is figure out
the motif
and the upstream regulator number four
number five and six
is what are the targets so number five
is great
now i know the regulator i know the
motif i know the tissue
and i know the variant what does it
actually do
so you have to now trace it to the
biological process
and the genes that mediate that
biological process
so knowing all of this can now allow you
to find the target genes
how by basically doing perturbation
experiments
or by looking at the folding of the
epigenome or by looking at the genetic
impact of that genetic variant on the
expression of genes
and we use all three so let me go
through them
basically one of them is physical links
this is the folding of the genome onto
itself
how do you even figure out the folding
it's a little bit of a tangent but it's
a super awesome technology
think of the genome as again this
massive packaging that we talked about
of taking two meters worth of dna and
putting it
in something that's a million times
smaller than 2 meters worth of dna
that's a single cell
you basically have this massive
packaging and this packaging basically
leads to
the chromosome being wrapped around in
sort of
tight ways in ways however that are
functionally
capable of being reopened and reclosed
so i can then go in and figure out that
folding
by sort of chopping up the spaghetti
soup
putting glue and ligating the segments
that were chopped up but nearby each
other
and then sequencing through these
ligation events to figure out
that this region of these chromosomes
that region of the chromosome were near
each other
that means they were interacting even
though they were far away on the genome
itself
so that chopping up sequencing and
re-gluing
is basically giving you folds
of the genome that we said
how does cutting it help you figure out
which ones were close in the original
folding so you have a
bowl of noodles go on
and in that bowl of noodles some some
noodles are
near each other yes so throw in a bunch
of glue
you basically freeze the noodles in
place throw in a cutter that chops up
the noodles into the little pieces
now throw in some ligation enzyme that
lets those pieces that were free
re-ligate near each other
in some cases they re-ligate what you
had just got
but that's very rare most of the time
they will re-ligate
in whatever was proximal
you now have glued the red noodle that
was crossing the
blue noodle to each other you then
reverse the glue the glue goes away and
you just sequence the heck out of it
most of the time you'll find red segment
with
you know red segment but you can
specifically select for ligation events
that have happened
that were not from the same segment by
sort of marking a particular way
and then selecting those and then your
sequencing you look for
red with blue matches of sort of things
that were glued
that were not immediate proximal to each
other
and that reveals the linking of the blue
noodle and the red noodle
you're with me so far yeah good so we
you know we've done these experiments
physical
that's the physical that's step one of
the physical and what what the physical
revealed is topologically associated
domains basically big blocks of the
genome
that are topologically don't you know
connected together
that's the physical the second one is
the
genetic links it basically says
across individuals that have different
genetic
variants how are their genes expressed
differently
remember before i was saying that the
path between genetics and disease is
enormous
but we can break it up to look at the
path between genetics and
gene expression so instead of using
alzheimer's as a phenotype
i can now use expression of irx3 as the
phenotype
expression of gene a and i can look at
all of the g
all of the humans who contain a g at
that location and all the humans will
contain a t
at that location and basically say wow
turns out that the expression of this
gene is higher
for the t humans than for the g humans
at that location
so that basically gives me a genetic
link between a genetic variant
a locus or region and the expression of
nearby genes good on the genetic link
i think so awesome so the third link is
the activity link
what's an activity link it basically
says if i look across 833 different
epigenomes
whenever these enhancer is active
this gene is active that gives me an
activity link
between this region of the dna and that
gene
and then the fourth one is perturbations
where i can go in and
you know blow up that region and see
what are the genes that change in
expression
or i can go in and over activate that
region and see what genes
change in expression uh so i guess
that's
similar to activity yeah yeah so that's
basically it's similar to activity i
agree but it's causal rather than
correlational
again i'm i'm a little weird like no no
you're 100 on
it's exactly the same but the
perturbation where i go and intervene
yes i basically take a bunch of cells so
you know crispr
right crispr is this genome guidance and
cutting
mechanism it's what george like likes to
call
genome vandalism so you basically are
able to
one you can basically take a
guide rna that you put into the crispr
system and the crispr system will
basically use this guide rna scan the
genome
find wherever there's a match and then
cut the genome
so you know i digress but it's a
bacterial
immune defense system so basically
bacteria are constantly attacked by
viruses
but sometimes they win against the
viruses
and they chop up these viruses and
remember as a trophy
inside their genome they have this loci
this crispr loci
that basically stands for clustered
repeats interspersed
et cetera so basically it's it's an
interspersed repeats
structure where basically you have a set
of repetitive regions
and then interspersed were these
variable segments
that were basically matching viruses so
when this was first discovered
it was basically hypothesized that this
is probably a bacterial immune system
that remembers
the trophies of the viruses that manage
to kill
and then the bacteria pass on you know
they sort of do lateral transfer of dna
and they pass on these memories so that
the next bacterium says oh you killed
that guy
when that guy shows up again i will
recognize him and the crispr system was
basically evolved
as a bacterial adaptive immune response
to sense foreigners that should not
belong
and to just go and cut their genome so
it's an rna guided
rna cutting enzyme or an rna guided dna
cutting
enzyme so there's different systems some
of them called dna some of them called
rna
but all of them remember this uh sort of
viral attack so what we have done now
as a field is you know through the work
of you know
uh jennifer donna emmanuel carpenter
feng zhang and many others
is co-opted that system
of bacterial immune defense as a way to
cut genomes
you basically have this
guiding system that allows you to use an
rna guide
to bring enzymes to cut dna at a
particular locus
that's so fascinating just so this is
like already a natural mechanism
a natural tool for cutting that was
useful
this particular context yeah and we're
like well we can use that thing to
actually
it's a nice tool that's already in the
body yeah yeah it's not in our body it's
a bacterial body
it was discovered by the by the yogurt
industry
they were trying to make better yogurts
and they were trying to make their
bacteria
in their yogurt cultures more resilient
to viruses
and they were studying bacteria and they
found that wow this crispr system is
awesome
it allows you to defend against that and
then it was co-opted in mammalian
systems that don't use anything like
that
as a as a as a targeting way to
basically bring these dna cutting
enzymes
to any locus in the genome why would you
want to cut dna
to do anything the reason is that our
dna has a dna repair mechanism
where if a region of the genome gets
randomly cut you will basically scan the
genome for anything that matches
and sort of use it by homology
so the reason why we're deployed is
because we now have a spare copy
as soon as my mom's copy is deactivated
i can use my dad's copy
and somewhere else if my dad's copy is
deactivated i can use my mom's copy
to repair it so this is called
homologous
based repair so all you have to do is
the
the cutting and that's it you don't have
to do the fixing that's exactly right
you don't have to do the fixing
because it's already built in that's
exactly right but the fixing
can be co-opted by throwing in a bunch
of homologous
segments that instead of having your
dad's version
have whatever other version you'd like
to use
so the thing so you you then control the
fixing by throwing in a bunch of other
stuff exactly right
that's how you do genome editing so
that's what crispr is that's what's
wonderful in popular culture people use
the term i've never well that's
brilliant that's
a crispr regional explanation genome
vandalism
followed by a bunch of band-aids that
have the sequence that you'd like and
you can control the
the choices of band-aids correct yeah
and of course there's new generations of
crispr there's something that's called
prime editing that was
sort of very very much in the press
recently that basically instead of sort
of making a double stranded break
which again is genome vandalism you
basically make a single
stranded break you basically just nick
one of the two strands
enabling you to sort of peel off without
sort of completely breaking it up
and then repair it locally using a guide
that is coupled to your initial rna
that took you to that location dumb
question but
is crispr as awesome and cool as it
sounds
i mean technically speaking in terms of
like
as a tool for manipulating our genetics
in the positive uh meaning of the word
manipulating
or is there downsides drawbacks in this
whole context of therapeutics that we're
talking about yeah or understanding and
so
so so um when i teach my students about
crispr
i show them articles with the headline
genome editing tool revolutionizes
biology
and then i show them the date of these
two of these articles and they're 2004
like five years before crispr was
invented and the reason is that
they're not talking about crispr they're
talking about zinc finger enzymes
that are another way to bring these
cutters to the genome
it's a very difficult way of sort of
designing the right set of zinc finger
proteins the right set of
amino acids that will now target a
particular long stretch of dna
because you you know for every location
that you want to target you need to
design
a particular regulator a particular
protein
that will match that region well there's
another technology called talens
which are basically you know just a
different way of using
proteins to sort of you know guide these
cutters to a particular location of the
genome
these require a massive team of
engineers
of biological engineers to basically
design a set of amino acids
that will target a particular sequence
of your genome
the reason why crispr is amazingly
awesomely revolutionary
is because instead of having this team
of engineers
design a new set of proteins for every
locker that you want to target
you just type it in your computer and
you just synthesize an rna guide
the beauty of crispr is not the cutting
it's not the fixing
all of that was there before it's the
guiding
and the only thing that changes that it
makes the guiding easier
by sort of you know just typing in the
rna sequence
which then allows the system to sort of
scan the dna to find that
so the coding the the engineering of the
cutter is
easier on the uh in terms of
that's kind of similar to the story of
deep learning versus uh
old school machine learning some of the
some of the challenging parts are
automated
okay so uh but crispr is just one
cutting exact technology exactly and
then there's that's part of the
challenges
and exciting opportunities of the field
is to design
different cutting technologies yeah yeah
so now um we
you know this was a big parenthesis on
crispr but now
you you know when we were talking about
perturbations you basically now have the
ability to not just look at correlation
between enhancers
and genes but actually go and either
destroy that enhancer and see if the
gene changes in expression
or you can use the crispr targeting
system
to bring in not vandalism and cutting
but you can couple the crispr system
with and the crispr system is called
usually crispr cas9 because cast 9 is
the protein that will then come and cut
but there's a version of that protein
called dead cast 9 where the cutting
part is deactivated
so you basically use d cas9 dead cas9
to bring in an activator or to bring in
a repressor so you can now ask is this
enhancer changing that gene
by taking this modified crispr
which is already modified from the
bacteria to be used in humans that you
can now modify the cast 9 to be dead
cas9 and you can now further modify it
to bring in a regulator
and you can basically turn on or turn
off that enhancer and then see what is
the impact on that gene
so these are the four ways of linking
the locus
to the target gene and that's step
number five
okay step number five is find the target
gene and step number six is
what the heck does that gene do you
basically now go and manipulate that
gene
to basically see what are the processes
that change
and you can basically ask well you know
in this particular case in the fto locus
we found mesenchymal stem cells that are
the progenitors of white fat
and brown fat or beige fat
we found the rs-1421085 nucleotide
variant as the
causal variant we found this
large enhancer this master regulator i
like to call it ob1
for uh obesity one like the strongest
enhancer associated with whatever
and ob1 was kind of chubby as the actor
i don't know if you remember him
[Laughter]
yeah so you basically are using this
jedi mind trick to basically find out
the uh
thank you the location of the genome
that is responsible
the enhancer that harbors it the motif
the upstream regulator which is arid 5b
for 80 rich interacting domain 5b
that's a protein that sort of comes and
binds normally that protein is normally
a repressor
it represses this super enhancer this
massive 12 000 nucleotide
master regulatory control region and it
turns
off irx3 which is a gene that's 600 000
nucleotides away
and irix 5 which is 1.2 million
nucleotides away
so those are what's the effect of
turning them off that's exactly
the next question so step six is what do
these genes actually do
so we then ask what does rx3 and rx5 do
the first thing we did is look across
individuals for individuals that had
higher expression of rx3 or lower
expression rx3
and then we looked at the expression of
all of the other genes in the genome
and we look for simply correlation and
we found that iric 3 and rx-5 were both
correlated
positively with lipid metabolism
and negatively with mitochondrial
biogenesis you're like what the heck
does that mean
it doesn't sound related to obesity not
at all superficially
but lipid metabolism should because
lipids
is these high energy molecules that
basically store
fat so hyer extreme and rx5 are
negatively correlated with lipid
metabolism so that basically means that
when they turn on
lipid metabolism positively when they
turn on they turn
on lipid metabolism and they're
negatively correlated with
mitochondrial biotins what do
mitochondria do
in this whole process again small
parenthesis what are mitochondria
mitochondria are little organelles
they arose they only are found in
eukaryotes u means good karyo
means nucleus so truly like a true
nucleus so eukaryotes have a nucleus
prokaryotes are before the nucleus they
don't have a nucleus
so eukaryotes have a nucleus
compartmentalization
eukaryotes have also organelles
some eukaryotes have chloroplasts
these are the plants they
photosynthesize
some other eukaryotes like us have
another type of organelle
called mitochondria these
arose from an ancient species
that we engulfed this is an
endosymbiosis
event symbiosis bio means life sim
means together so symbiotes are things
that live together
endosymbiosis endomeans inside so
endosymbiosis means you live together
holding the other one inside you so
the pre-eukaryotes engulfed
an organism that was very good at energy
production
and that organism eventually shed most
of its genome
to now have only 13 genes in the
mitochondrial genome
and those 13 genes are all involved in
energy production
the electron transport chain so
basically
electrons are these massive super energy
rich molecules
we basically have these organelles that
produce
energy and when your muscle exercises
you basically multiply your mitochondria
you basically sort of
you know use more and more mitochondria
and that's how you get beefed up so
basically the m the muscle sort of
learns how to generate more energy
so basically every single time your
muscles will you know overnight
regenerate and sort of become stronger
and amplify their mitochondrions and so
forth
so what does mitochondria do the
mitochondria
use energy to sort of do any kind of
task
when you're thinking you're using energy
this energy comes from mitochondria
your neurons have mitochondria all over
the place basically this mitochondria
can multiply as organelles and they can
be spread along the body of your muscle
some of your muscle cells have actually
multiple nuclei they're polynucleated
but they also have multiple mitochondria
to basically uh
deal with the fact that your muscle is
enormous you can sort of span this super
super long length
and you need energy throughout the
length of your muscle so that's why you
have mitochondria throughout the length
and you also need transcription through
the length so you have multiple nuclei
as well
so these two processes
lipids store energy what do mitochondria
do so there's a process known as
thermogenesis thermoheat
genesis generation thermogenesis is
generation of heat
remember that bathtub with
in and out that's the equation that
everybody's focused on
so how much energy do you consume how
much energy you burn
but in every thermodynamic system
there's
three parts to the equation there's
energy in
energy out and energy lost
any machine has loss
of energy how do you lose energy you
emanate heat
so heat is energy loss so
um
there's which is where the thermogenesis
comes in thermogenesis
is actually a regulatory process that
modulates the third component of the
thermodynamic equation
you can basically control thermogenesis
explicitly
you can turn on and turn off
thermogenesis and that's where the
mitochondria comes into exactly
so irix 3 and rx5 turn out to be the
master regulators
of a process of thermogenesis versus
lipogenesis generation of fat so irex
and rx5 in most people
burn heat burn burn calories as heat so
when you eat too much
just burn it burn it off in your in your
fat cells so if that bathtub
has basically a sort of dissipation
knob that most people are able to turn
on
i am unable to turn that on because i am
a homozygous carrier
for the mutation that changes a t into a
c
in the rs1421085 allele a locus
a snip i have the risk allele twice from
my mom and for my dad
so i'm unable to thermogenize
i'm unable to turn on thermogenesis
through irix 3 and rx5
because the regulator that normally
binds here r85b can no longer buy
because it's an 80 rich interacting
domain and as soon as i change the t
into a c it can no longer bind because
it's no longer at rich
but doesn't that mean that you're able
to use the energy more efficiently
you're not generating heat or is it that
means that i can eat less
and get around just fine yes yeah so
that's a feature actually it's a feature
in a food scarce environment
yeah but if we're all starving i'm doing
great if we all have access to massive
amounts of food
i'm i'm obese basically that's taken us
through the entire process of then
understanding that why mitochondria and
then the lipids are both
no distant or somehow different size of
the same coin
and you basically choose to store energy
or you can choose to burn energy
and then all of that is involved in the
puzzle of obesity
and that's what's fascinating right here
we are in 2007
discovering the strongest genetic
association with obesity
and knowing nothing about how it works
for almost 10 years
for 10 years everybody focused on this
fto gene
and they were like oh it must have to do
something with you know
rna modification and it's like no it has
nothing to do with the function of fto
it has everything to do with all of this
other process
and suddenly the moment you solve that
puzzle which is a multi-year effort by
the way and tremendous effort by melina
and many many others
so this tremendous effort basically led
us to recognize
this circuitry you went from having some
89 common variants associated in that
region of the dna
sitting on top of this gene to knowing
the whole circuitry
when you know the circuitry you can now
go crazy you can now start
intervening at every level you can start
intervening at the arid
5b level you can start intervening with
crispr cas9 at the single
snip level you can start intervening at
iraq 3 and rx5
directly there you can start intervening
at the thermogenesis level because you
know the pathway
you can start interviewing at the at the
differentiation level
where these the decision to make
either white fat or beige fat the energy
burning base fat
is made developmentally in the first
three days of differentiation of your
adipocytes so
as they're differentiating you basically
can choose to make fat burning machines
or fat storing machines and sort of
that's how you populate your your fat
you basically can now go in
pharmaceutically and do all of that
and in our paper we actually did all of
that
we went in and manipulated every single
aspect at the nucleotide level
we used crispr cast 9 genome editing to
basically take
primary adipocytes from risk and
non-risk individuals
and show that by editing that one
nucleotide out of 3.2 billion
nucleotides in the human genome
you could then flip between an obese
phenotype and a lean phenotype like a
switch
you can basically take a micelles that
are non-thermogenizing
and just flipping to thermogenizing
cells by changing one nucleotide
it's mind-boggling it's so inspiring
that this puzzle could be solved in this
way and it feels
within reach to then be able to
crack the problem with some of these
diseases what are
so 2007 you mentioned 2000 what are the
technologies
the tools that came along that made this
possible like what
what are you excited about maybe if we
just look at the buffet of things that
you've kind of mentioned is there is
this what's involved
what should we be excited about what are
you excited about
i love that question because there's so
much ahead of us there's so
so much um there's
uh so so basically solving that one
locus
required massive amounts of knowledge
that we have been building across the
years
through the epigenome through the
comparative genomics to find out the
causal variant and the
control the controller regulatory motif
through the conserved circuitry
it required knowing this regulatory
genomic wiring
it required high c of these sort of
topologically associated domains to
basically find this long-range
interaction
it required eqtls of this sort of
genetic perturbation of these
intermediate
gene phenotypes it required all of the
arsenal of tools that i've been
describing
was put together for one locus and this
was a massive team effort
huge you know investment in time
energy money effort intellectual you
know everything
you're referring to i'm sorry this one
basically yeah this one piece
this one single paper at least one
single locus i like to say that this is
a paper about one nucleotide in the
human genome about one bit of
information
c versus t in the human genome that's
one bit of information and we have 3.2
billion
nucleotides to go through so how do you
do that systematically
i am so excited about the next phase of
research
because the technologies that my group
and many other groups have developed
allows us to now do this systematically
not just one
locus at a time but thousands of loci
at a time so let me describe some of
these technologies
the first one is automation and robotics
so basically you know we talked about
how you can take
all of these molecules and see which of
these molecules are targeting each of
these genes and what do they do
so you can basically now screen through
millions of molecules
through thousands and thousands and
thousands of plates each of which has
thousands and thousands and thousands of
molecules
every single time testing um you know
all of these genes and asking
which of these molecules perturb these
genes so that's technology number one
automation and robotics technology
number two
is parallel readouts so instead of
perturbing one locus
and then asking if i use crispr cast 9
on this enhancer
to basically use dcas9 to turn on or
turn off the enhancer
or if i use crystal cast 9 on the snip
to basically change
that one snip at a time then what
happens but we have
120 000 disease associated snips that we
want to test
we want we don't want to spend 120 000
years doing it
so what do we do we've basically
developed this technology
for massively parallel reporter
assays mpra so in collaboration with
tarzan mickelson mary flanders i mean
jason dura's group has done a lot of
that so there's
there's a lot of groups that basically
have developed technologies
for testing 10 000 genetic variants at a
time
how do you do that you you know we
talked about
micro array technology the ability to
synthesize
these huge microarrays that allow you to
do all kinds of things like measure gene
expression by hybridization
by measuring the genotype of a person by
looking at hybridization with one
version with a t versus the other
version with the t
with an with a c and then sort of
figuring out that i am
a risk carrier for obesity based on
these hybridization
differential hybridization in my genome
that says oh you seem to only have this
allele or you seem to have that allele
microarrays can also be used to
systematically synthesize
small fragments of dna so you can
basically synthesize these 150
nucleotide long fragments
across 450 000 spots at a time
you can now take the result of that
synthesis
which basically works through all of
these sort of layers of adding one
nucleotide at a time
you can basically just type it into your
computer and order it
and you can basically order 10
000 or 100 000 of these small
dna segments at a time and that's where
awesome molecular biology comes in
you can basically take all these
segments have a common start and end
barcode or sort of like gator like you
just like pieces of a puzzle
you can make the same end piece and the
same start piece
for all of them and you can now use
plasmids which are these extra
chromosomal
small dna circular segments
that are basically inhabiting all our
all our genomes we basically have
you know plasmids floating around i mean
bacteria use plasmids
for transferring dna and that's where
they put a lot of antibiotic resistance
genes so they can easily transfer them
from one but one bacterium to the other
so one bacterium involves a gene to be
resistant to
a particular antibiotic it basically
says to all its friends hey
here's that sort of dna piece we can now
co-opt
these plasmids into human cells you can
basically
make a human cell culture and add
plasmids
to that human cell culture that contain
the things that you want to test you now
have this library of 450 000 elements
you can insert them each into the common
plasmid yeah
and then test them in millions of cells
in parallel
and the common plasmid is all the same
before you add it exactly the rest of
the plasmid is the same
so it's it's called an episomal reporter
assay
episome means not inside the genome it's
sort of outside
the chromosomes so it's an episomal
assay
that allows you to have a variable
region where you basically test
10 000 different enhancers and you have
a common region which basically has the
same reporter gene
you know some can do some very cool
molecular biology you can basically take
the 450 000
elements that you've generated and and
you have a piece of the puzzle here a
piece of the puzzle here which is
identical so they're
compatible with that plasmid you can
chop them up in the middle
to separate a barcode reporter from the
enhancer
and in the middle put the same gene
again using the same pieces of the
puzzle
you now can have a barcode readout
of what is the impact of 10 000
different versions of an enhancer
on gene expression so we're not doing
one experiment we're doing 10 000
experiments
and those ten thousand can be five
thousand
of different loci and each of them in
two versions
risk or non-risk i can now test
tens of these little hypotheses exactly
and then you can do ten thousand and we
can test ten thousand hypothesis at once
how how hard is it to generate those
ten thousand uh trivial trivial but it's
biology
no no generating the ten thousand is
trivial because you basically
add it by technology you basically have
these arrays
that that add one nucleotide at a time
at every spot
oh and yeah so it's printing in it so
you're able to
you're able to control yeah uh super
costly
is it ten thousand bucks so this isn't
millions
thousand bucks for ten thousand
experiments sounds like the right you
know
i mean so that's super that's exciting
because you don't have to do one thing
at a time yeah
you can now use that technology these
massively parallel reporter assays to
test
10 000 locations at a time we've
made multiple modifications of that
technology one was
sharper mpra which stands for
you know basically getting
a higher resolution view by tiling
these um these elements
so you can see where along the
region of control are they acting
and we made another modification called
hydra for
high you know definition regulatory
annotation or something like that which
basically allows you to
test seven million of these at a time by
sort of cutting them directly from the
dna
so instead of synthesizing which
basically has the limit of 450 000 that
you can synthesize at a time
we basically said hey if we want to test
all accessible regions of the genome
let's just do an experiment that cuts
accessible regions
let's take those accessible regions put
them all with the same end
joints of the puzzles and then now use
those to create
a much much larger much much larger
uh array of things that you can test and
then tiling all of these regions you can
then pinpoint what are the driver
nucleotides
what are the elements how are they
acting across seven million experiments
at a time so basically
this is all the same family of
technology where you're basically using
these
parallel readouts of the barcodes
and then you know to do this we used a
technology called starseek for
self-transcribing uh reporter asses a
technology developed by alex stark
my my former postdoc who's now api over
in vienna so we basically coupled the
starsig the self-transcribing
uh reporters where the enhancer
can be part of the the gene itself so
instead of having a separate barcode
that enhancer basically acts to turn on
the gene and it's transcribed
as part of the gene so you don't have to
have the two separate parts exactly so
you can just read them
so there's a constant improvements in
this whole process yes
by the way generating all these options
are is it basically brute force
uh how much human intuition is oh gosh
of course it's human intuition
and human creativity and incorporating
all of the
input data sets because again the the
genome is enormous
3.2 billion you don't want to test that
instead you basically use all of these
tools that i've
talked about already you generate your
top favorite
10 000 hypothesis and then you go and
test all ten thousand and then from what
what comes out you can then go go to the
next step
so that's technology number two so
technology one number one is robotics
automation where you have thousands of
wells and you constantly test them
the second technology is instead of
having wells you have these massively
parallel readouts
in sort of these pooled asses the third
technology
is coupling crispr perturbations
with these single cell rna
readouts so let me make another
parenthesis here
to describe now single cell rna
sequencing
okay so what does single cellular
sequencing mean so
rna sequencing is what has been
traditionally used oh well traditionally
the last 20 years
ever since the advent of next generation
sequencing so basically before
rna expression profiling was based on
this microarrays
the next technology after that was based
on sequencing so you chop up your rna
and you just sequence small molecules
just like you
would sequence the genome basically
reverse transcribe the small rnas
into dna and you sequence that dna
in order to get the number of sequencing
reads
corresponding to the expression level of
every gene in the genome
you now have rna sequencing how do you
go to single cell rna sequencing
that technology also went through stages
of evolution
the first was microfluidics you
basically had
these or even even chambers you
basically had these ways of isolating
individual cells
putting them into a well for every one
of these cells
so you have 384 well plates and you know
do 384
parallel reactions to measure the
expression of 384 cells
that sounds amazing and it was amazing
but we want to do
a million cells how do you go from you
know these wells to a million cells you
can't
so what what what the next technology
was after that
is instead of using a well for every
reaction you now
use a lipid droplet for every reaction
so you use micro droplets as reaction
chambers
to basically amplify rna
so here's the idea you basically have
microfluidics where you basically have
every single cell
coming down one tube in your
microfluidics and you have little
bubbles getting created in the other way
with specific primers that mark every
cell with its own barcode
you basically couple the two and you end
up with little bubbles that have
a cell and tons of markers for that cell
you now mark up all of the rna for that
one cell with the same exact barcode
and you then lice all of the droplets
and you sequence the heck out of that
and you have for every rna molecule
a unique identifier that tells you what
cell was it on that is such good
engineering
microfluidics and uh
using some kind of primer to put it put
up put a label on the thing
i mean i don't you're making it sound
easy i assume it's it's beautiful right
challenging but it's gorgeous yeah
so there's the next generation
engineering yeah so that's the second
generation
next generation is forget the
microfluidics all together just use big
bottles
how can you possibly do that with big
bottles so here's the idea
you dissociate all of your cells or all
of your nuclei
from complex cells like brain cells that
you know are very long and sticky so you
can't do that
so you know if you have blood cells or
if you have you know neuronal nuclei or
brain nuclei
you can basically dissociate let's say a
million cells
you now want to add a unique barcode a
unique barcode in each one of a million
cells
using only big bottles i can't possibly
do that sounds crazy
but here's the idea you use a hundred of
these bottles
you randomly shuffle all your million
cells
and you throw them into the hundred
bottles randomly completely random
you add one barcode out of 100 to every
one of those cells
you then you now take them all out you
shuffle them again
and you throw them again into the same
hundred bottles
but now in a different randomization
and you add a second barcode so every
cell now has
two barcodes you take them out again
you shuffle them and you throw them back
in another
third barcode is adding randomly from
the same hundred barcodes
you've now labeled every cell
probabilistically
based on the unique path that he took of
which of a hundred bottles to go for the
first time
which of 100 bottles a second time and
which of 100 bottles a third time
a hundred times 100 times 100 is a
million unique barcodes
in every single one of these cells
without ever using microfluidic very
clever that's beautiful right computer
science perspective that's very clever
so you now have the single cell
sequencing technology you can use the
wells you can use the
bubbles or you can use the bottles and
you know sort of
you have way bubbles still sound pretty
damp because bubbles are awesome and
that's basically the main technology
that we're using okay so the evolves is
the main technology
so so there are kids now that companies
to sell
to basically carry out single cell or
any sequencing that you know you can
basically for two thousand dollars
you can basically get ten thousand cells
from one sample
and for every one of those cells you
basically have the transcription
of thousands of genes and
you know of course the data for any one
cell is noisy but being computer
scientists
we can aggregate the data from all of
those cells together
across thousands of individuals together
to basically make very robust inferences
okay so the third technology is
basically single cell
rna sequencing that allows you to now
start asking
not just what is the brain expression
level difference of that genetic variant
but what is the expression difference of
that one genetic variant
across every single subtype of brain
cell
how is the variance changing you can't
just
you know with a brain sample you can
just ask about the mean
what is the average expression if i
instead have
3 000 cells that are neurons
i can ask not just what is the neuronal
expression
i can say for layer 5 excitatory neurons
of which i have i don't know 300 cells
what is the variance
that this genetic variant has so
suddenly
it's amazingly more powerful i can
basically start asking
about this middle layer of gene
expression at unprecedented levels
and when you look at the average it
washes out some potentially
important signal that corresponds to
ultimately the disease completely
yeah so that i can do that at the rna
level
but i can also do that at the dna level
for the epigenome
so remember how before i was telling you
about all these technologies we're using
to probe the epigenome
one of them is dna accessibility so what
we're doing in my lab is that from the
same
dissociation of say a brain sample where
you now have all these tens of thousands
of cells floating around
you basically take half of them to do
rna profiling
and the other have to do epigenome
profiling both at the single cell level
so that allows you to now figure out
what are
the millions of dna enhancers
that are accessible in every one of tens
of thousands of cells
and computationally we can now take the
rna and the dna
readout and group them together to
basically figure out
how is every enhancer
related to every gene and remember these
sort of enhancer gene linking that we
were doing across 833 samples
833 is awesome don't get me wrong but
10 million is way more awesome so we can
now look at correlated activity
across 2.3 million enhancers and 20 000
genes
in each of millions of cells to
basically start piecing together the
regulatory circuitry
of every single type of neuron every
single type of astrocytes
oligodendrocyte
microglial cell inside the brains of
1500 individuals that we've sampled
across multiple different brain regions
across both dna and rna
so that's the data set that my team
generated last year alone
so in one year we basically generated 10
million cells
from human brain across a dozen
different disorders
across schizophrenia alzheimer's frontal
temporal dementia louis body dementia
als you know huntington's disease
post-traumatic stress disorder autism
like you know bipolar disorder
healthy aging etc so it's possible that
even just within that
data set lie a lot of keys
to understanding these diseases
and then be able to like directly leads
to then treatment
correct correct so basically we are now
motivating yeah so our computational
team is in heaven right now and we're
looking for people i mean if
you have super how much does this
decision
so this is a very interesting kind of
side question
how much of this is biology how much of
this is computation
so you have the computational biology
group but
how much of are you should
should you be comfortable with biology
to be able to solve some of these
problems
if you just find if you put several of
the hats you were on
fundamentally are you thinking like a
computer scientist here
you have to this is the only way
as i said we are the descendants of the
first digital computer we're trying to
understand the digital computer we
understand we're trying to understand
the circuitry the logic
of this digital you know core computer
and all of these analog layers
surrounding it so
you you know the case that i've been
making is that you cannot think one gene
at a time
the traditional biology is dead there's
no way you cannot solve disease with
traditional biology
you need it as a component once you've
figured out rx3 and rx5
you now can then say hey have you guys
worked on those genes with your single
gene approach
we'd love to know everything you know
and if you haven't we now know how
important these genes are
let's now launch a single gene program
to dissect them and understand them
but you cannot use that as a way to
dissect disease you have to think
genomically
you have to think from the global
perspective and you have to build these
circuits
systematically so we need numbers
of computer scientists who are
interested and willing
to dive into this data you know fully
fully in
and sort of extract meaning we need
computer science people who can
understand sort of machine learning and
inference and sort of you know decouple
these matrices
come up with super smart ways of sort of
dissecting them
but we also need by all computer
scientists who understand biology
who are able to design the next
generation of experiments
because many of these experiments no one
in the right mind would design them
without thinking of the analytical
approach that you would use to
deconvolve the data afterwards
right because it's massive amounts of
ridiculously noisy data
and if you don't have the computational
pipeline
in your head before you even design the
experiment you would never design the
experiment that way
that's brilliant so you in designing the
experiment you have to see the entirety
of the computational pipeline that
drives the design
that that even drives the necessity for
that design
basically you know if you didn't have a
computer scientist way of thinking
you would never design these hugely
combinatorial
massively parallel experiments
so that's why you need interdisciplinary
teams you need teams
and and i want to i want to sort of
clarify that what do we mean by
computational biology group
the focus is not on computational the
focus is on the biology
so we are a biology group what type of
biology
computational biology yeah the type of
biology
that uses the whole genome that's the
type of biology that designs
experiments genomic experiments that can
only be interpreted in the context of
the whole genome
right so it's it's philosophically
looking at biology as a computer
correct correct so which is a
in the context of the history of biology
is a big transformation
yeah yeah you can think of the name as
what do we do
only computation that's not true but how
do we study it
only computationally that is true so all
of these single cell sequencing
can now be coupled with the technologies
that we talked about earlier for
perturbation
so here's a crazy thing instead of using
these wells
and these robotic systems for doing one
drug at a time
or for perturbing one gene at a time in
thousands of wells
you can now do this using a pool of
cells
and single cell or any sequencing how
you basically can take
these perturbations using crispr
and instead of using a single guide rna
you can use a library of guide rnas
generated exactly the same way using
this array technology
so you synthesize a thousand different
guide rnas
you now take each of these guide rnas
and you insert them in a pool of cells
where every cell gets one perturbation
and you use crispr editing
or crispr uh so with either crispr cas9
to edit the gina with these thousand
perturbations or the
or with the activation or with the
repression
and you now can have a single cell
readout
where every single cell has received one
of these modifications and you can now
in massively parallel ways
couple the perturbation and
the readout in a single experiment how
are you tracking which perturbations
each cell received so there's there's
ways of doing that
but basically one way is to make that
perturbation an expressible
vector so that part of your rna reading
is actually
that perturbation itself so you can
basically put it in a
expressible part so you can self-drive
it
so the the point that i want to get
across is that the sky is the limit
you basically have these tools these
building blocks of molecular biology
you have this massive data sets of
computational biology
you have this huge ability to sort of
use
machine learning and statistical methods
and you know linear algebra to sort of
reduce the dimensionality of all these
massive data sets
and then you end up with a series of
actionable targets that you can then
couple with pharma and just go after
systematically
so the ability to sort of bring genetics
to the epigenomics to the
transcriptomics to the cellular readouts
using these sort of high throughput
perturbation technology that i'm talking
about
and ultimately to the organismal through
the
electronic health record endophenotypes
and ultimately the disease
battery of assays at the cognitive level
at the physiological level
and you know every other level
this there is no better or more exciting
field in my view
to be a computer scientist then or to be
a scientist in period
basically this confluence of
technologies of computation
of data of insight and of tools for
manipulation
is unprecedented in human history and i
think this is what's shaping
the next century to really be a
transformative century for our species
and for our planet
so you think the 21st century will be
remembered for
the big leaps in biology and
understanding and
alleviation of biology if you look at
the path
between discovery and therapeutics it's
been on the order of 50 years
it's been shortened to 40 30 20 and now
it's on the order of 10 years
but the huge number of technologies that
are
going on right now for discovery
will result undoubtedly in the most
dramatic manipulation of human biology
that we've ever seen
in the history of humanity in the next
few years do you think we might be able
to cure some of the diseases we started
this conversation with
absolutely absolutely it's it's only a
matter of time basically the complexity
is enormous and i don't want to
underestimate the complexity
but the number of insides is
unprecedented
and the ability to manipulate is
unprecedented and the ability to deliver
these small molecules and other
non-traditional medicine perturbations
there's a lot of sort of new gen there's
a new generation
of perturbations that you can use at the
dna level at the rna level
at the you know microrna level uh the
genomic level
there's there's a battery of new
generations of perturbations
if you couple that with cell type
identifiers
that can basically sense when you are in
the right cell based on the specific
combination and then
turn on that intervention for that cell
you can now think of combinatorial
interventions
where you can basically sort of feed a
synthetic biology construct to someone
that will basically do different things
in different cells
so basically for cancer this is one of
the therapeutics that our collaborator
ron weiss is using
to basically start sort of engineering
these circuits that will use microrna
sensors of the environment to sort of
know if you're in a tumor cell
or if you're in an immune cell or if
you're in stromal cells and so forth
and basically turn on particular
interventions there you can sort of
create constructs that are tuned to only
the liver cells or only the heart cells
or only the
you know uh you know brain cells and
then
have these new generations of
therapeutics coupled with this immense
amount
of knowledge on the sort of which
targets to choose and what biological
processes to measure
and how to intervene my view is that
disease
is going to be fundamentally altered and
alleviated
as we go forward
next time we talk we'll talk about the
philosophical implications that the
effect of life but let's stick to
biology
for just a little longer we did pretty
good today we still stuck to the science
what um what are you excited in terms of
uh
the future of this of this field the
technologies
in your own group in your own mind
you're leading the world at mit in the
science
and the engineering of this work so
what are you excited about here i could
not be more excited
we are one of many many teams who are
working on this
in my team the most exciting parts are
um
you know manifold so basically we've now
assembled this battery of technologies
we've assembled these massive massive
data sets and now we're really sort of
in the stage of uh our our team's path
of generating disease insights so we are
simultaneously working on a paper on
schizophrenia right now
that is basically using the single cell
profiling technologies using this
editing and manipulation technologies
to basically show how the
master regulators underlying changes in
the brain
that are sort of found in in
schizophrenia are in fact
affecting excitatory neurons and
inhibitory neurons in pathways
that are active both in synaptic pruning
but also in early development we've
basically found a set of four regulators
that are connecting these two processes
that were previously separate
in schizophrenia in sort of having
a sort of more unified view across those
two
those two sides the second one is in the
in the
area of metabolism we basically now have
a beautiful collaboration with the
goodyear lab that's basically looking at
um multi-tissue perturbations
in six or seven different tissues across
the body
in the context of exercise and in the
context of nutritional
interventions using both mouse and human
where we can basically see what are the
cell to cell communications
that are that are changing across them
and what we're finding is this
immense role of both immune cells as
well as
adipocyte stem cells in sort of
reshaping that circuitry of all of these
different tissues
and that sort of painting to a new path
for therapeutical interventions there
in alzheimer's it's this huge focus on
microglia
and now we're discovering different
classes of microglial cells
that are basically either synaptic or
um immune and these are
playing vastly different roles in
alzheimer's versus in schizophrenia
and what we're finding is this immense
complexity
as you go further and further down of
how in fact there's 10 different types
of microglia each with
their own sort of expression programs we
used to think of them as oh yeah they're
microglia
but in fact now we're realizing just
even in that sort of
least abundant of cell types there's
this incredible diversity there
the differences between brain regions is
is another
sort of major major insight again you
know one would think that
oh astrocytes are astrocytes no matter
where they are but no
there's incredible region-specific
differences
in the expression patterns of all of the
major brain cell types
across different brain regions so
basically there's the neocortical
regions that are sort of the recent
innovation
that makes us so different from all
other species there's the sort of
you know reptilian brain sort of regions
that are sort of much more
uh you know very extremely distinct
there's a cerebellum
there's um each of those basically is
associated in a different way
with disease and what we're doing now is
looking into
pseudotemporal models for how disease
progresses
across different regions of the brain if
you look at alzheimer's it basically
starts in this
small region called the enter rhino
cortex and then it spreads
through the brain and uh you know
through the hippocampus and
you know the uh ultimately affecting the
neocortex and with every brain region
that it hits
it basically has a different impact on
the
cognitive and you know memory aspects
orientation
short-term memory long-term memory etc
which is
you know dramatically affecting the
cognitive path that the individuals go
through
so what we're doing now is creating
these computational models for ordering
the cells and the regions and the
individuals
according to their ability to predict
alzheimer's disease
so we can have a cell level predictor
of pathology that allows us to now
create
a temporal time course that tells us
when every gene turns on along this
pathology progression and then trace
that across
regions and pathological measures that
are region-specific
but also cognitive measures and so so
forth so
that allows us to now sort of for the
first time look at can we actually do
early intervention for alzheimer's
where we know that the disease starts
manifesting for 10 years before
you actually have your first cognitive
loss
can we start seeing that path to build
new diagnostics new prognosis new
biomarkers
for this sort of early intervention in
alzheimer's
the other aspect that we're looking at
is mosaicism we talked about the common
variants and the rare variants
but in addition to those rare variants
as your
initial cell uh that that forms the
zygote divides and divides and divides
with every cell division there are
additional mutations that are happening
so what you end up with is your brain
being a mosaic
of multiple different types of genetic
underpinnings some cells
contain imitation that other cells don't
have
so every human has the
common variant that all of us carry to
some degree
the rare variant that your immediate
tree
of the human species carries and then
there's the somatic variant which is the
tree
that happened after the zygote that sort
of forms your own body
so these somatic alterations is
something that has been previously
inaccessible
to study in human postmortem samples
but right now with the advent of single
cell rna sequencing
in this particular case we're using the
well-based sequencing which is much more
expensive but gives you a lot richer
information
about each of those transcripts so we're
using now that richer information
to infer mutations that have happened
in each of the thousands of genes that
sort of are active
in these cells and then understand
how the genome relates
to the function this genotype
phenotype relationship that we usually
build in geos between
genomic association studies between
genetic variation and disease
we're now building that at the cell
level where for every cell we can relate
the unique specific genome of that cell
with the expression patterns of that
cell
and the predicted function using these
predictive models that i mentioned
before on this regulation for cognition
for pathology
in alzheimer's at the cell level and
what we're finding is that
the genes that are altered and the
genetic regions that are altered
in common variants versus rare variant
versus somatic variant
are actually very different from each
other the somatic variants are pointing
to
neuronal energetics and oligodendrocyte
functions
that are not visible in the genetic
lesions that you find for the common
variants
probably because they have too strong of
an effect that evolution is just not
tolerating them
on the common side of the allele
frequency spectrum so the somatic one
that's the variation that happens after
the the zygo after
correct you individual i mean it's a
dumb question but
there's there's mutation and variation i
guess that happens there
and you're saying that they're through
this if we focus in on
individual cells we're able to detect a
story that's interesting there
and that might be a very unique kind of
important variability that arises for
you said neuronal or something
energetic energetics energetic cool
terms so your your
i mean the metabolism of humans is
dramatically altered from that of
nearby species you know we talked about
that last time that basically we are
able to consume meat
that is incredibly energy rich and that
allows us to sort of have functions
that are you know meeting this humongous
brain that we have
it's basically on one hand every one of
our brain cells is much more energy
efficient than our neighbors
than our relatives number two we have
way more of these cells
and number three we have you know
this new diet that allows us to now feed
all these needs
that basically creates a massive amount
of damage
oxidative damage from this huge super
powered factory of ideas and thoughts
that we that we carry in our skull and
that factory has energetic needs
and there's a lot of sort of biological
processes underlying that
that we are finding are altered in the
context of alzheimer's disease
that's fascinating that so you have to
consider all of these systems
if you want to understand even something
like diseases that you would
maybe traditionally associate with just
the particular cells of the brain
yeah the immune system
the metabolic system the metabolic
system and these are all the things that
makes us uniquely human so our immune
system
is dramatically different from that of
our neighbors our societies are
so much more clustered the history of
infections that have plagued
the human population is you know
dramatically different from every other
species
the the you know the way that our
society in our population has sort of
exploded
has basically put unique pressures on
our immune system and our immune system
has both coped with that density and
also been shaped by
as i mentioned the you know vast amount
of death that has happened in the black
plague and other sort of selective
events
in human history famines ice ages and so
forth
so that's number one then on on the sort
of immune side on the metabolic side
you know again we are able to sort of
run marathons
you know you know i don't know if you
remember the sort of human versus horse
experiment
where the horse actually tires out
faster than the human and the human
actually wins
so so on the metabolic side we're
dramatically different on the immune
side we're dramatically different on the
brain side
again you know no need to sort of you
know it's a no-brainer how
our brain is like just enormously more
capable
and then uh in you know in the side of
cancer so basically the cancers that
humans
are having the exposure the
environmental exposures is again
dramatically different
and the lifespan the expansion of human
lifespan
is unseen in any other species in
you know recent evolutionary history
and that now leads to a lot of new
disorders
that are starting to you know manifest
late in life so uh you know alzheimer's
is one example where basically
you know these vast energetic needs over
a lifetime
of thinking can basically lead to all of
these debris
and eventually saturate the system and
lead to
you know alzheimer's in in the late life
but there's you know there's just such a
such a dramatic
uh set of frontiers when it comes to
aging research
that you know will so what i often like
to say is that
if you want to re to to engineer a car
to go from 70 miles an hour to 120 miles
an hour that's fine
you can basically you know fix a few
components if you wanted to now go out
400 miles an hour
you have to completely redesign the
entire car
because the system has just not evolved
to go that far basically our human body
has only evolved to live to i don't know
maybe we can get to 150 with minor
changes
but if you know as we start pushing
these frontiers for not just
living but well living the f
zine that we talked about last time so
to to basically push f zine
into the 80s and 90s and 100s and you
know much further than that
we will face new challenges
that have you know never been faced
before in terms of cancer the number of
divisions in terms of alzheimer's and
brain related disorders
in terms of metabolic disorders in terms
of regeneration
there's just so many different frontiers
ahead of us so
i am thrilled about where we're heading
so basically i see this confluence
in my lab and many other labs of ai
of you know sort of you know the next
frontier of ai for drug design so
basically these sort of
graph neural networks on specific
chemical
uh designs that allow you to
create new generations of therapeutics
these
molecular biology tricks for intervening
at the system at every level
this personalized medicine prediction
diagnosis and prognosis using the
electronic health records
and using these polygenic risk scores
weighted
by the burden the number of mutations
that are accumulating across common rare
and somatic variants
the burden converging across all of
these different
molecular pathways the delivery
of specific drugs and specific
interventions into specific cell types
and again you've talked with bob langer
about this there's you know many giants
in that field
and then the last concept is not
intervening at the single gene level
i want you to sort of conceptualize the
concept of an
on target side effect
what is an on target side effect an
off-target side effect is when you
design a molecule to target
one gene and instead it targets another
gene and you have side effects because
of that
an on target side effect is when your
molecule does exactly what you're
expecting
but that gene is pliotropic plio means
many
tropos means ways many ways it acts in
many ways
it's a multifunctional gene so you find
that this gene plays a role in this but
as we talked about
the wiring of genes to phenotypes is
extremely dense and extremely complex
so the next stage of intervention will
be
intervening not at the gene level but at
the network level
intervening at the set of pathways and
the set of genes with
multi-input perturbations to the system
multi-input modulations
pharmaceutical or other interventional
that basically allow you to now work at
the
sort of full level of understanding not
just in your brain but across your body
not just in one gene but across the set
of pathways and so forth
for every one of these disorders so i
think that we're finally at the level of
systems medicine of basically instead of
sort of medicine being at the single
gene level medicine being at the systems
level where it can be personalized
based on a specific set of genetic
markers and genetic perturbations that
you are
either born with or that you have
developed during your lifetime your
unique set of exposures
your unique set of biomarkers and you
know your unique
set of you know current set of
conditions through your
ehr and other ways
and the precision component
of intervening extremely precisely in
the specific pathways
and specific combinations of genes that
should be modulated to sort of bring you
from the disease state
to the physiologically normal state or
even to a physiologically improved state
through this combination of intervention
so that that's in my view the field
where basically computer science comes
together
with you know artificial intelligence
statistics all of these other tools
molecular biology technologies and
biotechnology and pharmaceutical
technologies that are sort of
in revolutionary the way of intervention
and of course this massive amount of
molecular biology and data gathering and
generation perturbation
in massively parallel ways so there's no
better way there's no better
you know time there's no better place
to be sort of you know looking at this
whole confluence
of of ideas and i'm just so thrilled to
be
a small part of this amazing enormous
ecosystem
it's exciting to imagine what the humans
of 100 200 years from now
what their life experience is like
because
these ideas seem to have potential to
transform the quality of life
that when they look back at us
they probably wonder how we were put up
with all the suffering
in the world manoa it's a huge honor
thank you for spending this early sunday
morning with me
i deeply appreciate it see you next time
so like a plan thank you
thanks for listening to this
conversation with manolas kellis and
thank you to our sponsors
scm rush which is an seo optimization
tool
pessimist archive which is one of my
favorite history podcasts
eight sleep which is a self-cooling
mattress with smart
sensors and an app and finally better
help
which is an online therapy service
please check out the sponsors in the
description to get a discount
and to support this podcast if you enjoy
this thing
subscribe on youtube review it with five
stars not a podcast
follow on spotify support it on patreon
or connect with me on twitter
at lex friedman and now let me leave you
some words
from haruki murakami human beings are
ultimately nothing but carriers
passageways for genes they ride us
into the ground like race horses from
generation to generation genes don't
think about what constitutes good
or evil they don't care whether we're
happy
or unhappy we're just means to an end
for them
the only thing they think about is what
is most efficient
for them thank you for listening and
hope to see you
next time