Transcript
1Yp67Sywlcw • FTI ITB Morning Lectures - Introduction to Bioinformatics
/home/itcorpmy/itcorp.my.id/harry/yt_channel/out/EcoEduid/.shards/text-0001.zst#text/0021_1Yp67Sywlcw.txt
Kind: captions
Language: en
good morning ladies and gentlemen
uh in this this morning
we we are we are glad to have dr
himanshu rajeh or dr rajeh from nicole
state of university
usa and he will
share his experience he will let us know
more about
bio the interesting topic that is
bioinformatics
professor raj you know that we are
most of us are engineers here
with a little bit background on biology
so uh in bioinformatics is uh
growing growing topics and we should
know about that
but we have very little experience about
that
so you we will be glad if you can share
us
and tell us more about what is
bioinformatics
okay the time is yours thank you
thank you so much for such a nice
introduction let me share my screen with
you all
real quick
okay great so uh like they said i'm dr
himanshu rajei
i'm assistant professor at nicole state
university
and thank you for having me to share
some of my knowledge of bioinformatics
with you all
this lecture i have arranged it in such
a way that it's going to be a little
interactive okay so it's not going to be
just
me talking all the time i would much
appreciate
your interaction your responses we are
going to do
several activities on computer of course
um but your participation and your
outcomes of those activities would be
much appreciated they will essentially
help me
um judge whether you're understanding or
not and feel free to ask questions
as we move on through this presentation
i really like
to answer questions during the lecture
as well okay so if you do have questions
don't hesitate i'll give you time to
answer to ask questions
so if you do have questions don't
hesitate to put in the text chat window
of zoom and we'll take it from there
also i have a google drive folder
and i'm going to share that link with
you all so the activities that i told
you about
the activities that we are going to do
throughout this lecture
um we will have some files that i have
uploaded onto that folder and we will
use those
so as and when time comes i'll tell you
what
and when we are using that folder on
google drive and i'll share that link
with you all
in the text chat of the zoom
okay bioinformatics is a
fairly new science so put it this way
it evolved after the invention of
computers
much after in fact the invention of
computers because it involves computers
okay so biology was advancing
since centuries i would say
but computers are themselves fairly new
of an invention
a couple of decades ago and that is when
the thought
started in people that can we use
computer in
other sciences like chemistry like
biology like
physics can we get help from this
tremendous technology of computers
to do certain tasks and the answer to
that was
yes and people actually started doing
that
but when i was in my bachelor's degree
this
word bioinformatics was completely new
i didn't even know about it it barely
existed
um and of course my bachelor's degree is
from india so i was completely unaware
of bioinformatics at that time
but when i was in my master's degree and
i had pretty much decided to go for
biology at that time that is when we
were actually
hearing or reading in news articles
about bioinformatics but still it was
not taught
at colleges at that time as well in fact
as you guys can probably understand that
misconceptions circulate when the topic
is like
very new or fresh and there were
misconceptions going on
in the community i still remember myself
reading an article
in the newspaper that of course it was a
question mark
future of biology might be at stake
because now computers will take over
everything
maybe computers can do experiments and
what would biologists do
and while reading that news article of
course i was just a master's degree
student at that time i was like
oh boy i decided to take biology as my
career
and is this field in trouble and at that
same time
i remember having discussion with my
parents you know we all have that kind
of a phase at some point of time in
our lives what to do with our career so
um we me and my parents we came across
this
one week of a workshop on bioinformatics
at a city in india called as chennai
it's really good for education
so my parents told me why don't you fly
over there and why don't you see what
this bioinformatics is about
and see if your career is at stakes
see what's the future of biology what
does it look like and i did fly
to chennai for that one week workshop of
bioinformatics
that trip to chennai was kind of
memorable
for several things first thing is it was
my first flight trip
second thing is i got introduced to
bioinformatics that was the first and
foremost important thing
and i made really good friends over
there and the
last but not the least in fact the most
important thing that i learned
from that one week workshop on
bioinformatics at chennai
is that biology
is not in trouble biology still holds
strong
we need biological experiments we need
to be in
lab but we also need
help from this new technology that's
coming up computers
and maybe those computers can actually
help us steer ourselves
in a correct way while doing our
experiments so that was my take home
message
from that one week workshop that i
attended on bioinformatics and
i was kind of intrigued by this science
so although i did not decide to do my
career in bioinformatics i stuck with
molecular cell biology
i always try to keep myself updated with
what's going on
in this science so like i said
we cannot proceed in bioinformatics
without biology
so let's have a little bit of background
on biology okay just what's required
so let me just start off with
important biomolecules there are several
molecules in our cells not just our
cells we have bacterial cells fungal
cells
these cells behave as they behave
and these cells interact using molecules
so it's pretty much
like i always tell my students these
molecules are non-living
but their interaction with each other
makes cells
that are living how does that happen
we really don't have full answer yet and
that's why still
research is going on so let's have a
quick introduction of some of these
important biomolecules that we are going
to deal with in bioinformatics
let's start with dna let's start with a
very
familiar biomolecule deoxyribonucleic
acid
the genetic material for most of the
cells
you name it prokaryote eukaryote
some viruses are different though they
are exceptions
however viruses are not cells so let's
keep viruses aside for a while
and let's just focus on dna the
structure of dna
is double-stranded i'm sure you have
seen a picture
or similar pictures like this um
on several occasions so double stranded
helical the strands are anti-parallel
the monomer dna is a long chain okay
people
it's the monomer of dna the single unit
is a nucleotide and there are four of
them in the dna
adenine thymine guanine and cytosine
they are represented by the single
letter abbreviations
essentially coming from the first letter
in their name
so a for adenine t for timing etc
the chemistry the rule of chemistry here
is
adenine a on one strand of dna pairs
with
thymine t on the other strand of dna
and guanine g on one strand pairs with
cytosine c on the other strand of dna so
keep this in mind
some of you might already know this if
you
know this fine if you don't know well
just have a quick introduction about
this but
notice here there are a couple of points
that we should note from this slide
the structure of dna that double helical
antiparallel strands is pretty much the
same
in any organism you talk about okay
so that's one major biomolecule that we
are going to look at
okay so we have a sequence of dna
on one strand of course it's a stretch
of nucleotides and of course the other
strand also has the same thing
the complementary sequence that's what
we call
the second biomolecule that we are going
to talk about
is rna ribonucleic acid
okay notice the very first thing on this
slide is
rather than dna just being a
common structure double helical
structure
rna has three types messenger rna
ribosomal rna and transfer rna
okay so there are three types of rna
another thing
that you should keep in mind is
whichever
rna we are talking about the rna is
single
stranded okay it is not double stranded
like dna so no matter what rna you talk
about
messenger rna single stranded ribosomal
rna
and trna are also single stranded
molecules
which means nothing prevents them
to fold onto themselves and they can
form structures like this
what's shown in this picture of course
not just this they can even form
several other kind of structures okay so
keep that in mind
rna comes in diverse forms because
each and every molecule is single
stranded
now again just to have a quick
introduction
every gene can have every protein making
gene
um will have its own messenger rna
produced when cell
expresses that gene so messenger rna is
the molecule
that's going to be produced from every
gene that can make protein
okay so when you think about messenger
rna
put it this way ribosomes are going to
read it in triplets
and they are going to call corresponding
trnas with
amino acids i'm going to make proteins
so messenger rna
we call it messenger because it carries
message from genes
the other two rna molecules ribosomal
rna and
transfer rna they never make proteins
from themselves they just help with
protein formation
for example ribosomal rna
just goes and becomes part of ribosome
okay so along with some other proteins
it just sits in cytoplasm and that's
what ribosome is
it helps in formation of proteins it
helps to read
these messenger rnas and form proteins
but ribosomal
rna molecules are never going to form
proteins
from themselves ribosomes are not going
to read these
same thing applies to trna these are
again helper molecules they help with
protein synthesis
okay so trna will never be
formed into protein of itself just helps
so the genes
that code for ribosomal rna
the genes that make transfer rna they
never make proteins
they get expressed but they just stop
at rna formation and that rna actually
does
perform some action in the cell so
that's
rna of course it is also made up of
nucleotides so the monomer of rna are
also nucleotides
notice that there is no thyamine instead
we have
uracil in rna molecules
and of course if you are a biologist you
might know
that trna has some other uncommon
nucleotides in it
but that's not really the part of our
lecture here
my attention is on uracil because that
is the unique nucleotide
in rna and that replaces thymine but
keep in mind
that rna is single stranded that should
be the take-home message and it comes in
three types
it can fold onto itself to assume
several different structures
so let's collect these information
collect these points with
us make a note of them and move on
the third but the most diverse
biomolecule is proteins
proteins there are several proteins in
our cell
okay because every protein coding gene
will have its own messenger rna and
there are several of protein making
genes and those mrnas will be read by
ribosomes of course
ribosomes contain ribosomal rna and
transfer rnas are going to come into
picture with
loaded amino acid and we are going to
have proteins
so nonetheless the monomer here
is amino acids one amino acid is going
to join to each
other amino acid with peptide bond
and form a chain of amino acid
that is basically protein however
just a simple chain of amino acid
is a primary structure of protein okay
these amino acids have their single
letter abbreviations just like
nucleotides do
and we are going to see i'm going to
point out to some of these
um single letter abbreviations to you
later on when i show you certain
bioinformatic things
but primary structure is a simple string
of amino acids if i keep
writing single letter abbreviations of
amino acids one after the other
that's a simple primary structure that
is not sufficient people for protein to
work in the cell
i'm sure you guys know that each
molecule
some of you are chemistry majors some of
you are biology majors so you know
that each molecule has its own
three-dimensional
shape and when it assumes that shape
when it forms um when it takes that
shape in the cell
that is when it can perform certain
actions
because that is when it can find its
binding
partners in the cell each protein is
looking
for something to bind to something it
could be an
ion it could be another protein it could
be
maybe sugar something and that structure
of the binding partner of protein should
perfectly fit
into the three-dimensional structure of
protein and that is why
no protein stops at primary structure
there are secondary structures like
alpha helices
beta sheets and even further
those secondary structures are folded in
the cell
to form a tertiary structure for every
single protein
okay so every single protein in the cell
will have its own three-dimensional
structure
now some proteins don't stop even here
some proteins need to attach themselves
to another protein cell needs to couple
a couple of proteins together
and that's when they can act so they act
together they act in a group
bunch of proteins bound to each other
and so
some proteins not all few proteins have
something called as
quaternary structure so not all proteins
have this some proteins do
but the proteins that do have quaternary
structure it is basically
proteins different proteins attached to
each other and performing
a biological function so again for us
the take home message here is the
monomer of protein is amino acid
it's an amino acid sequence
three-dimensional structure of protein
comes into picture
it is critically important to know okay
so keep these things in mind and let's
move ahead
although i have shown you a eukaryotic
cell
you can see this this is a nucleus of
the cell
even prokaryotic cells have the same
process going on from dna this is where
we essentially start this is where the
genes are
this is the genetic material of every
cell
this house has all the genes now when
sell
any cell prokaryote or eukaryote
us plants animals bacteria whatever
when the cell decides to activate
a gene it will form an rna molecule
from that gene okay and the process of
going from dna to rna is called as
transcription so when a gene is
activated
that gene is going to be transcribed now
i just told you a few seconds ago
that some genes go all the way down to
proteins so their
rna molecules are messenger rnas and
they will
further be translated with the help of
ribosomes and a protein will form
from them some genes can do that but
what if we are talking about a gene
that just makes ribosomal rna or a
transfer rna
that kind of rna will never ever form
its own protein
but still when a ribosomal rna gene is
activated
we will have transcription of that gene
and we will form
ribosomal rna okay but those genes will
stop here
so as a whole class we can settle on the
thought
that when a gene is activated it is at
least
getting transcribed now if we are
talking about
messenger rnas they will also get
translated
with the help of ribosomes and form
proteins
keep in mind that these three are
extremely diverse molecules and we are
talking about
a stretch of nucleotides double stranded
anti-parallel helical molecule here
with the process of transcription it is
forming a single stranded molecule
of rna okay and i'm going to
technicalities here
okay so keep that in mind rna are single
stranded
formed from a double-stranded molecule
still the monomer is nucleotide but
thymine is replaced by uracil and here
if messenger rnas are translated to
proteins
then this is a whole different
biomolecule in itself the monomer is
amino acids
so cell is doing an incredible thing
here
cell is creating three different kinds
of course this
is what cell has already but cell is
essentially creating
two completely diverse molecules and
there are several proteins so we have
tremendous amount of diversity
here and this is this whole process
together is called a central dogma
of molecular biology okay because this
holds true for prokaryotes as well as
eukaryotes
so we are going to stick to the central
dogma
and we are going to appreciate this
diversity
in biomolecules that we see and we are
going to try and see how that fits
within the information that we can
collect from these biomolecules
i want your attention for now on this
process
transcription how come
from a double stranded dna molecule we
have a single stranded rna
the process happens kind of like this
here we have
double stranded dna okay this is where
the gene so
in this picture a gene is shown to you
right here
just a cartoon of a gene two strands of
dna
since rna is single stranded
when cell decides to activate mark my
words okay later by letter
when cell decides to activate this gene
the two strands of dna are going to
separate
and cell is going to recruit an enzyme
to read
keep your focus on my mouse pointer to
read
just one strand of dna and form
an rna molecule so only
one strand of dna is going to be used
to form an rna molecule makes perfect
sense to us
because rna is single stranded cell is
never going to use both of these dna
strands and form
a double-stranded rna that's not how it
happens rna is
rarely double-stranded if it is
double-stranded then it is the
single-stranded rna folded onto itself
that's it otherwise rna is
single-stranded so only one strand of
dna is used
the strand of dna that cell is going to
use to make rna is called as template
strand
keep that name in your mind somewhere we
are going to
come to this name at least couple of
times today
and the other strand of dna is called as
a
coding strand or the sense strand this
strand of dna
is not used to form rna okay
maybe this figure would do um
a better judgment to the point that i'm
trying to make
so this strand of dna it's shown in red
color to you there are no real colors in
dna this is just for our understanding
but this trend of dna has this sequence
let's say for example
it's not being used to make rna by cell
in fact this bottom strand of dna that's
being used that's a template strand it
acts as a template for rna formation
so as you can see the sequence of rna
is complementary
to the template strand okay if we have t
in the template strand cell will add
a in the rna and of course if there is
adenine in the template strand
rna doesn't have t but it has u instead
we learned that few seconds ago
so cell will put you but the point to
note here
is the sequence of rna is going to be
complementary
these nucleotides pair to each other to
the template strand of dna
and if you go back a second and look at
this strand of dna the other coding
strand of dna
that strand of dna was also
complementary
to this trend of dna because usually
these two strands of dna bind to each
other
now we have rna which is complementary
to this strand
this strand of dna is also complementary
to the template strand
so the sequence look at the sequence of
rna
the sequence of rna perfectly matches
with the sequence
of coding strand of dna apart from the t
is replaced by use
okay so keep that in mind the sequence
of rna
is the same exact sequence just because
both of these strands
are complementary to template strand
template strand of dna
the other strand of dna is acting as a
template to form rna
and that is why most of the databases
in bioinformatics they will provide you
with this sequence
you will see the coding strand sequence
okay
the sequence that is very exactly in
fact exactly similar to the rna sequence
so if you are looking at a gene sequence
in the database and if you wonder
hey what would be the rna sequence here
all you have to do
is just replace those t's by use and
that's your rna sequence
and that is why databases biological
databases
give you the coding strand sequence okay
so keep that in mind
and again some of these things might
sound
um like you know um foreign to you right
now
but when we actually look at those
biological databases trust me it will
all make sense
as long as you're trying to keep up with
the pace
so coding strand is the sequence that we
see
all right when
biological inventions were taking place
when scientists were discovering
how are these genes expressed and a lot
of expression data
um was essentially piling up in
scientific community
when human genome project was going on
people had questions in their mind
humans have lots of genes human cells
have tremendous amount of genes in them
what is their sequence what is the dna
sequence of each gene
and there was a worldwide collaborative
project
to sequence the entire human genome
it generated tremendous amount of data
now where to keep that data
we needed some help to preserve that
data
we cannot just preserve that data on
paper if we do that it will just remain
in one lab
or maybe at one place we wanted
scientific community wanted
access of that data to worldwide
it wanted outreach people in the world
everybody should have access to that
data and so where to store
that data that is when people looked
into some
other sciences like computer science can
we get help from computers
maybe to store this data is computer
science advanced enough
now luckily fortunately even computers
were evolving around the same time and
the answer came to be
yes yes we can get help from computers
and store this data furthermore
not only the store we can even
try and analyze this data to draw some
meaningful conclusions
we all do experiments people in lab at
some point of time
even otherwise our life is full of
experiments essentially
in no matter in what science you talk
about even in other subjects people do
some types of experiments we we know two
things
by doing an experiment no matter whether
it's chemistry whether it's physics
whether it's biology
experiment takes time
and sometimes the reagents that we use
for these experiments are costly
they take money now if
we and typically we don't know the
outcome of experiment we are doing
research
when we start off with an experiment we
don't know what's going to be an outcome
we can kind of
predict our hypotheses but we don't real
we don't even know whether we are going
to be heading into right direction or
not typically
and that is where decades ago
people were trying to get any help
possible from computer science
can we at least virtually predict the
outcome of an experiment
can we at least know if we are heading
into right direction
in order for us to save time and money
there is no point in spending five years
doing an experiment
only to realize that i was chasing
shadows
if i can get periodic help from
computers computer will not be doing any
experiment for me
however i am going to just check with
computer maybe plan out my experiment
in computer we call it as in vitro
experiment the experiment that we do
with animals are in vivo
but experiment that we do in a test tube
are in vitro and the experiments that we
do with computers are in silicon
because they have silicon chip so these
are three different words
that we need to kind of keep somewhere
in our mind
but can we do some of those in silico
experiments
and periodically judge maybe on a
monthly basis maybe on bi-monthly basis
just to see
if our experiments are going in the
right direction or not if we can do that
we can modify our hypothesis and always
steer
ourselves at right direction and that's
what i'm going to focus
my lecture on today okay i'm going to
introduce you
again this is just introductory
bioinformatics so i'm just going to
introduce you
to some pre-existing tools in
bioinformatics how can we use those
in our day-to-day experiments day-to-day
biological experiments some of those
tools you can even use in chemistry
or you can even use in bio process
so it's going to be interesting some of
you are all
might already be familiar with some of
those tools so if you are
that's fantastic you will be able to do
those activities
very quickly if you are not um you will
learn those so that's going to be
knowledge to you all
okay so stay tuned some interesting
stuff is going to come to you
the point here on this slide that i want
to make before i leave this slide
is there is tremendous amount of data
in biology that is being generated okay
and we are actually going to talk about
what kind of data that we are talking
about
well first thing is right in front of us
the nucleic acid sequence
so let's see what type of data we can
gather
in biology shall we and this is where
informatics comes into picture
wherever we have data we have
information
and in this it is the information in
context of biology
and that is where this culmination of
biology and compu
information technology i.t or computers
is essentially what bioinformatics is
all about
and this led to starting of a whole new
field nowadays people do careers in
bioinformatics there are
majors named as bioinformatics in
colleges
because this has tremendous potential
keep in mind though
we can never ever do any earth shaking
discovery
with bioinformatics i mean just in
bioinformatics
we need to do biological experiments in
order to invent new things
we can use bioinformatics we can get
help from computers
only to assist us with our biological
experiments so that is one thing
to bear in our mind real well and now
it's time since we are now introduced to
informatics
it's time to look into what kind of data
we can collect in biology
and we are actually going to make this
slide together okay
um as you can see this light is almost
blank and it has those three
familiar biomolecules with us
so i'm going to um stop my slideshow for
a while
and i'm back to that text box that i
have there
with biomolecule dna what kind of data
can we get
and we are going to finish this like
we're going to complete this light
together
so again i i would much appreciate your
input
as i complete this light i'm going to
use dna
as my bio molecule okay i'm going to
complete dna
but you guys are going to help me with
rna and proteins
so let's start with dna of course we can
have
new nucleotide sequence
that is a data
so nucleotide
sequence could be a data for dna
the structure of dna is pretty much the
same
in any organism we talk about so i
wouldn't put structure there is
barely any diversity there so i wouldn't
put it
as a diversified data for dna okay
uh however nucleotide sequence
definitely yes
how about this there are four different
types of nucleotides in dna a t
g and c sometimes it's important to know
how many a's how many adenines are there
in dna
how many thymines cytosines are guanines
so
percentage of each nucleotide
that could be some meaningful
information
the other meaningful information here
would be
if i have one dna molecule sequence
how similar that is with the other dna
molecule sequence for example
let's talk about us let's talk about
humans we have
several genes in our body i can take a
common example hemoglobin
it's the protein that carries oxygen in
us
of course it's a protein which is coming
from its own gene
so gene of hemoglobin there
there are several globins in our cells
but that gene
how similar is that gene in its
nucleotide sequence with
mouse hemoglobin if you have that kind
of a question
you need to first obtain human
hemoglobin gene sequence compare it
of course obtain mouse hemoglobin gene
sequence and compare
both of them with each other there is a
scientific work to it
there is a bioinformatic work to it you
got to align
those sequences with each other so
sequence
alignment that could be
a form of data for dna okay
if you can think of something else feel
free to put in the text chat window
okay as we speak so sequence alignment
or percent homology these are some words
that we should keep in mind
between
several
dna molecules
okay can somebody tell me what kind of
data can we have
for rna we can always go back to dna if
something strikes to us
rna unlike dna
has several types we just learned about
that
so if you can put your thoughts in the
text chat window of zoom
i would much appreciate that
what kind of data can we have for rna of
course nucleotide sequence
any other thoughts
types of rna i love that
yes types of rna for sure
so let's put that right here there are
three types of rna
if i show you just a nucleotide sequence
of rna
i'm not telling you much here you might
ask me is this mrna
is this rrna or trna so type
of rna fantastic
any other thing that you can think of
rna is single stranded so i told you
some peculiarities about rna
rna structure fantastic we are all
learning together
but drop
sure of
rna in parenthesis i'm
going to write um
folding pattern
transcriptomics yes
so we can have a set of rna molecules in
a cell
all of those rna molecules have
definitely come
from expression transcription of certain
genes
so if we have a question hey here is a
cell
how many different rna molecules are
there in the cell and what is their
sequence
that is transcriptome just like genome
genome is a set of genes in our cells
transcriptome is a set of rna
in our cells so set of
rna in a cell
let's stick to simple english in
parentheses
transcriptome fantastic people
this is going well any other things you
can think of
for rna
translation start point and end points
that holds true for mrna for sure
yes messenger rna the rna molecule that
forms proteins
it has to have some starting point for
protein synthesis and some ending point
which means it has to have a start codon
somewhere okay and it has to have a stop
codon
which tells ribosomes where to start
making protein and where to stop making
that protein
so that whole sequence of rna the whole
sequence of
messenger rna now i'm fine tuning my
words
i'm building upon this answer from start
to stop codon
is called as o r f
open reading frame the whole sequence of
messenger rna from start to stop codon
it is imperative to predict open reading
frames for messenger rnas fantastic yes
what else
which mrna is expressed in certain
conditions
yes conditional expression
that kind of goes with transcriptomics
but yes conditional expression of
rna is kind of important some cells some
some genes are expressed only under
stressed conditions
so what are those genes it's important
to know that
okay so definitely nucleotide sequence
definitely the type
definitely the structure of rna
definitely
the um conditional expression the
transcriptome
that is good any other thing that you
can think of
for rna otherwise we will go to proteins
we can always go back
what about proteins people
what type of data can we have here
protein sequence and structure yes so
let's say
amino acid
sequence shall we that's the primary
structure of every protein simple amino
acid sequence by
their single letter abbreviation
structure
three-dimensional structure of protein
is critically important for its function
and so it is almost very critical
tremendously important to be able to
predict
if i just give you a simple amino acid
sequence of a protein
my question is will you be able to at
least predict
solving a full three-dimensional
structure that takes time
it takes involving and money consuming
techniques such as x-ray crystallography
or 3-d cryo-electron microscopy etc
before going to that can you at least
predict
has any other organism been shown to
have a similar protein
of the 3d structure so yes we we can
definitely look at the 3d structure
of the protein what else
what else can we look into protein
aha i like that protein function
thank you people i told you proteins
are the most diverse biomolecules in the
cell
and they come with variety of functions
functions are their own
so what is the function of the protein
of our interest
if we have just the sequence of the
protein
if we can predict the three-dimensional
structure of the protein
or just the sequence matches let's say
that we are looking into a human protein
human hemoglobin let's say for example
hemoglobin carries oxygen
let's say that we are looking into human
hemoglobin protein sequence just the
amino acid sequence
if we can somehow
match that sequence with all of the
plant proteins that are known
and if we do see some similarity with
that maybe that protein implant can also
carry oxygen
maybe just because maybe this is just
prediction and that's what
bioinformatics helps us with
it helps us to do meaningful
to to generate meaningful predictions
and we can test those predictions
further more with real experiments
so yes the function of protein
essentially
any other
factor that might matter into biological
data for protein how about this
um conditional again
um
formation of protein
or synthesis
how about that just like rna some
proteins are made
only under certain conditions like most
of the antibodies are made
only when we have infection okay there
are some antibodies that are made even
without
but there are some proteins that are
made only under stress conditions
there are some proteins that are always
being made
homology fantastic yes
amino acid sequence
homology how
similar one protein is to the other
protein
and that kind of thing can
be in rna as well
wherever you have some kind of sequence
we can definitely
have sequence homology sequence
alignment
any other thing you can think of this is
this is going tremendously well people
thank you
thank you for your feedback any other
things
aha protein interactions
that relates to what i just told you few
minutes ago
protein wants to find binding partners
so what other molecules does it bind to
how does it interact in the cell is it
no that brings to another point
cellular
location of
the protein that's also important
some proteins are membrane proteins some
proteins
are just remain within the cell some
proteins are secreted out of the cell
okay so where does this protein go that
is also
important any other
thing that you can think of
how about
rna splicing yes
fantastic r and
a splicing
in parentheses we can write
alternative surprising
in you care
i deliberately avoided this because not
everyone knows this is kind of a
complicated topic alternative splicing
but i'm glad somebody mentioned this
fantastic
any other things in protein in fact i
would like to add something in dna now
how about
name of
the gene if we are really looking into
a gene sequence because there could be
several other
dna several other stretches of
nucleotides that may or may not be a
gene
if we are looking into the gene then
it's good to have the name of that gene
its location
remember dna is the genetic material
chromosomal location
where exactly on chromosome that gene is
present
are there any diseases associated with
that gene
what if that gene sequence might have
some mutations in some people
if they do have mutations then what
diseases could they have
so diseases
are disorders
associated with
specific genes
you know what people every single point
that we are putting on this slide
there is a database out there for that
there are databases out there
that correlate the name of the gene with
diseases
there are databases out there that have
um
homology between um several genes
between different organisms we are going
to touch upon some of those
there are databases out there to tell
you protein three-dimensional structure
there are databases out there to analyze
the whole transcriptome
no matter what organism you talk about
so things have advanced
quite far we are going to just touch
upon those databases and just some of
the widely used databases that's the
whole point of today's lecture
okay so this is going well let's move
on let's keep this slide this is a slide
in progress always
okay so now i think it's time for me to
introduce you
two common biological databases
that house gene sequences
one of them was originated in america
the other one tells us essentially the
same thing but it's originated in europe
okay
so american one is ncbi genbank
national center for biotechnology
information and the european one
is embell bank these two
databases house gene sequences
of course there are protein sequences
that are housed
by these two databases they essentially
are
different versions american and european
version of the same data so there is
redundancy
there is correlation between these two
databases
and there is interrelations in fact
for today we will stick with the
american version of the database just
because it is more user friendly
however we are going to use some cool
tools
from this emblem bag okay and
there are many more databases that i
have not even listed on this slide
there are some specific databases what
if
genbank houses the gene sequences from
all organisms that are sequenced like
humans
mice fruit flies worms
plants but what if we just want to look
into fruit fly sequences then there is
a database for that flybase what if we
want to look at just
plant gene sequences then there are some
small databases just for that so there
are some specific ones
but as of now we are going to stick to
genbank and again i'm going to just
stop slideshow for a while and i'm going
to share my browser screen with you
i'm going to show you one
critical thing to do how to search for a
gene sequence
in a genbank okay so let me stop sharing
my screen
and let me be back with my web browser
here we go
what i'm going to do is in the search
window
i'm just going to type ncbi
genbank
that is genbank notice
that there are several sister databases
in genbank
there are nucleotide sequence databases
in which case you will select a
nucleotide
there are genome databases there are
gene expression omnibus geo database
let's start off with gene
this also houses some free textbooks by
the way people there are book databases
as well in there
let's start with gene and you can pretty
much type your favorite organism and the
name of your favorite gene in here
my favorite gene in humans
is acting so let's search for human
a c t b beta actin
just an example later on we will search
for some other genes as well i'm just
going to show you
how to search for a gene and this is
where you will get a gene card for that
gene
beta actin make sure that we are looking
into human gene
click on that that will take you
to an ncbi page for that
gene the name of the gene right up front
there is some summary you can get some
meaningful information about this gene
okay um you can also get some
information about the expression pattern
of this gene
in which human tissues is this gene
expressed well it tells you ubiquitous
expression
several tissues first of all it is a
protein coding gene
keep going down i'm just going to
quickly scroll if you are a biologist
you will have also actually appreciate
this little
interactive browser it tells you a
cartoon of the structure of that gene
and tells you some meaningful
information about how many exons
how many protein making sequences are
there how many introns are there
and um where do they start where do they
end
so as you hover your mouse pointer on
that
it tells you that information keep going
down
if the experiment is done by some people
some scientists out there in the lab
this graph will pop
up and this is the expression data
i like actin gene because it's expressed
in
every single human tissue in every
single pretty much human cell
and that's what this graph tells you
okay you can also change the type of
experiment
over here from the drop down menu and
see several other types of graphs
but again that's reserved for some
advanced things let's move on what other
things
does this webpage show you of course
several
references the people names of the
people that work
on um this data associated conditions
are there any mutations associated with
this gene
if so what kind of diseases or disorders
or syndromes
can humans get you can have that
information right there
okay so without any without going
anywhere on to google
just in this database itself you can do
this for any gene
and there are several other things down
there
what mutations you can have what kind of
interactions does this protein do
to which other proteins it can interact
maybe it can interact with some viral
proteins
so you will find all kinds of
interactions and the associated
research studies listed right here on
this web page
the most important thing that i want to
point out is
what if i want to know this gene
sequence in that case you have to go up
right here where you see this
interactive browser
if you are a biologist again feel free
to look around
but click on this link genbank
that will take you to the sequence of
that gene
and that is the sequence okay we are
getting to that
page it doesn't tell you that it's human
actin
it just tells you that human actin gene
is on human chromosome
7 and of course the earlier page also
had that information
this number though is a unique database
id
for human acting gene so if you are a
researcher working on this gene you
better note down this id
so that you can refer to this same gene
sequence in future you can almost just
put this number
in the search window and you will be
coming directly to this page
this also tells you how long is the
nucleotide sequence
so about um 3 454 base pairs
so about 3 400 base pairs it's a linear
dna
keep scrolling down it tells you the
names of people who submitted this
sequence
make a note of this section features
it tells you that
of course it's genomic dna all the way
starting from the first nucleotide to
the last nucleotide
it also tells you that it's a gene the
name of that gene is actb
and all of that sequence starting from
first to the last nucleotide
is the same gene it also tells you
the mrna sequence for that gene okay
it asks you to join several nucleotides
to make an mrna
now you might be thinking oh do i have
to manually join these nucleotides no no
no
look look under the subheading
under this mrna transcript id just click
on that
and it will take you to just
the mrna sequence for that gene
there you go mrna
if you scroll down that is just the mrna
sequence of course replace
ts by use okay
it also gives you the coding
dna sequence just the exams
of the gene and what would the protein
sequence look like
so this is the protein sequence these
are single letter abbreviations
of the amino acids so every single
information that you need to know
is right there on this page if you want
to know just the coding sequence
separately just click on this external
link
ccds and that will take you to just
this sequence it asks you to manually
join these nucleotides you don't have to
just click on that link and then we have
a full gene sequence starting from first
nucleotide
to the very last nucleotide right here
now notice that this sequence has
numbers
okay nucleotide number of positions what
if you want to work
with this sequence what if you want to
do some analysis with the sequence and
you want to get rid of these numbers
you can do a simple trick just copy this
whole thing
copy and go to this website
this is a fantastic sequence
manipulation
suite online software developed by
university of alberta in canada
it has several free tools
for us to play around okay we are just
going to look at
some of these for example filter dna
whatever other
non-dna characters you might have what
if somebody gives you a word file of dna
sequence
with some other characters you don't
want those characters because those
characters will be thrown off by any
bioinformatics software
in that case just run your sequence
through this
i mean it gives you an example sequence
just clear that off
and paste our sequence in here
and hit submit what it gives you
is the sequence without numbers and now
you can play around with this sequence
okay it is also in some kind of a format
which i'll tell you what that format is
i think i'm not really sharing this
screen with you
so let me go
back and share my whole
desktop
i can see your screen before oh you
could
you could see the output i saw your cut
and paste
oh okay um now this is the output here
we go
so now we don't have numbers we have
just the sequence
okay filter dna sequence it gives a name
to it
and i want you guys to notice this sign
the greater than sign that starts off
with the sequence that greater than sign
signifies something
it's a format of a sequence that this
software online software converts our
sequence to it's called as fasta format
and that's what we are going to come to
now so let me unshare my screen
and let me put the powerpoint back up
since we had some introduction about
genes right here
the fasta format
lot of bioinformatics software's don't
accept
just the dna sequence it has to be
in this fast a format
we call it faster and what it means is
whatever sequence you are looking into
put this greater than sign
in front of it and that helps computer
to know
that this is where computer
should start reading the sequence and
now you can list
multiple sequences one after the other
as long as you start every sequence with
a greater than sign
you can even you are allowed to put
certain name for that sequence so
greater than sign
whatever unique name you want to put for
this sequence you can even type
in simple english like human beta actin
something like that
and um you know mouse beta acting
something like that
and don't be under impression that you
have to have a limited number of
nucleotides now
you can go on to like thousands of
nucleotides here
and then put another greater than sign
and put your second sequence
put another greater than sign put your
third sequence
so that is a fast start format
of a sequence now how can we get that
well fortunately it is easy
for us to get any sequence on genbank
in fasta format genbank has made that
real easy um let me go
back to the browser that we were working
with
and that will be more clear to you i'll
show you right there how you can go to
fasta format of any sequence
from genbank
here we go we are back to that beta
acting gene sequence of humans
on chromosome 7. there is a link here
for every single genbank entry fasta
just click on that
and you will get the whole sequence into
fasta format
there you go you see that family are
greater than sign you see some name so
all you can do is just copy paste
this sequence into any kind of
bioinformatics software
that you want to use it with you can do
one more thing
if you want a standalone file standalone
fasta file from this sequence all you
got to do is just click send to
file and select format faster and say
create
file this will actually download
a fasta file of that sequence on your
computer
if you have some software installed on
your computer you can use this
feature okay um
there are several other tools that you
can use in fact you can
sign into ncbi using one of your google
account and that can
save you can have your favorite searches
saved you can play around with this ncbi
it's all free
and in open domain if you want to search
something within this sequence
there is a feature in ncbi to do that
all you have to do is go here find
within this sequence
okay so the most important thing is now
you know
how to obtain fasta sequences
of any dna sequence from genbank
we are going to use that skill and i'm
going to share my screen back with you
and give you the very first assignment
okay and that's going to be kind of
interesting so let's go back to my
powerpoint we are going to play around a
little
now with these dna sequences it's time
for that
here we have i have assembled it's not
me i have just put together
these several coronavirus whole genome
sequences
here we have the sars cove two right at
the bottom
okay that's causing covalent we have we
are living with pandemic these days
a similar one to that is sars cove of
course there is
one middle eastern variant and there are
several other coronal viruses
and these links will take you to their
full
viral genome sequences on genbank
your job is to obtain them in fasta
format
and then go to this link to align them
with each other and we are going to
actually do a multiple sequence
alignment the software will do it for us
to see which of these viral sequences
are similar to each other
and which of them differ from each other
so let me go to this link real quick
and show you how it kind of looks like
um now again i might have
lost that shared screen but i will
share my entire desktop with you all
so you can see everything in there okay
here we have
the cluster omega web server i'll just
align sample sequences for you
we are aligning dna we just load example
sequences in there notice that they are
in fasta format so when you
copy and paste your whole coronavirus
genomes make sure you post them with the
fasta format next to each other
okay other than that leave the default
parameters
just like that and just hit submit
and let the server do its job it's going
to take some time to run
and it gives you the alignment when it
gives you the alignment the first thing
you should do is go to this guide
tree and that tells you which two
sequences are highly similar to each
other
and the third one is a little distant so
people start doing this
start gathering those sequences from my
powerpoint slide i'm actually going to
share
um the google drive folder link with you
now it's time to do that
and i'll give you about 5-10 minutes for
this activity and we will further move
on to the next one okay
so let me stop sharing
and go back to my powerpoint
so that you can see
there you go if you can click on these
links you can get those sequences from
right here
otherwise i'm gonna show you
and i'm gonna share that link of google
drive folder with you the same text
you will find as activity one but start
doing this
and if you can get to that cladogram the
guide tree click on guide retrieve you
can get that
then screenshot it and maybe post it in
the same folder i have given you
edit access for that i'm posting that
link
um in a few seconds
okay here we go
here is the link
to the google drive folder
so this is your time start working on
that alignment
thank you
that
if you guys can get to um
the guide tree cladogram post it either
in zoom text chat
or post it into our google drive folder
right there
as long as we have a couple of responses
we can
safely move on to next activity but i'm
going to give you
a few minutes to do this and then we'll
move on
so we have some questions how in dna is
not a sequence of
dna is a sequence of nucleotides but
percentages of nucleotides
in dna could be a form of data
and same thing holds true for rna as
well
so percent a percent you in rna or
percent
in rna we need to know that kind of a
thing
for certain experiments that we do in
biology again that's beyond this just
introductory lecture
but if you are a biologist or if you are
studying um
advanced biology you will come across
percent
80 or percent gc of a dna
somebody said that protein modification
yes
that should be included into protein
data some proteins are
post-translationally modified they
receive phosphate groups
or methyl groups or acetyl groups that
needs to be added into
protein modification good job
fantastic if someone can get cladogram
done
either post it in this text chat window
or
um upload it to that google drive folder
that web server might take some time to
run
because these are huge sequences
so you can just get the job running and
we can move
on after a few minutes with the power
point
and as long as the jaws as soon as the
job runs
we can have the outcome of this activity
wow
[Music]
hmm
this same slide is available
as activity one document in our google
drive folder
so um you can get the same links
from that activity one google doc in our
folder
thank you mister
foreign
we will give you a minute to get this
job
in costa omega started and then we will
move on
i have to explain you a couple of other
activities
get the job moving submit the job and
then the web server does the magic
let cluster omega web server do its job
and once the results come we can have a
look at them
you can just screenshot the cladogram
and upload it
maybe put your name as file name that
would be perfect
okay hopefully people have gotten these
links
if you have not then the same document
is on google doc
so for now let's move on a little
keep this job going if you have
submitted the job to coastal omega web
server
let it run it will run on its own and
whenever it's done you will see the
results right in front of you
on that web page this is how the outcome
should kind of
look like okay so
these are the accession numbers of all
of those five
different coronaviruses and here we have
sars cove and sars cove too
now just by those names they should be
similar we our common sense can tell us
that but
cladogram wise the nucleotide the genome
sequence wise also
they are the most similar okay then
comes the middle eastern version
and these two other human coronaviruses
are a little distant
that's how so this cladogram generated
by coastal omega
you yourself will generate a similar
platogram
okay in a few minutes it
pretend pretty much matches with
biological observation
so that's that's our expected outcome
for this activity
the other thing that we can play around
is this
ncbi blast in previous activity
we aligned individual sequences with
each other
in this using this basic local alignment
search tool
you can have a sequence and align it to
the whole
genome of certain organism okay so if i
have a short nucleotide sequence and i
want to know
do humans have this sequence well just
put it into blast
and blast it against human genome so
let me again uh demonstrate it to you
real quick
by sharing my screen entire screen
rather
so i'll be back in a second with my
whole screen share
and we are going to look at ncbi blast
here we go
last
basic local alignment search tool go to
nucleotide blast
and you can pretty much put any dna
sequence right here
okay that is the sequence keep this the
same
if you want to search it against
certain organism you can type the name
of that organism right here
for example homo sapiens
there you have it if you want to just
search it
in all of the genomes it houses the
genomes of
every single organism for which the
genomes are sequenced
then just leave it blank and just hit
blast
and you will see the outcome within a
few seconds the job
keeps running within a few seconds it
will be blasted and if there is some
significant similarity
it will tell you exactly what it
matches with so that random sequence
that i
um showed you matches with several of
these things it tells you the name
of either protein or organism and
um the percent homology like hundred
percent
match or fifty percent whatever that is
so we are going to do
another activity you have five
different patient sequences right here
in your google drive folder so if you go
to google drive
within our folder there is
a file oops i don't see it here
um patient sequences
i can upload it
there you go get this file
this file have um five c
dna sequences from patients so we now
that you know if a patient is if someone
is infected
with some kind of a pathogen either
bacterial or viral
there are several methods to diagnose
that infection
however if you
are a molecular biologist you can take
the tissue
infected tissue sample and just isolate
dna from it and sequence it
okay so by sequencing you will have
human dna of course
but if that patient is infected by some
other pathogen
you will have that pathogens genome as
well
so my question to you is take these
sequences individually and blast them
don't put any um don't blast them
against human or any
specific organism just put them in the
query window
of blast right here ncbi blast
so let's go back
to blast nucleotide blast so just copy
and paste you don't even have to copy
that greater than sign just copy and
paste
those sequences one by one though one
patient at a time
make sure you don't put any organism and
just hit blast
if all of the results show you just
homo sapiens then that person is not
infected
one of those patients is infected by
something if somebody can make out
what patient number that is and what is
that pathogen that that person is
infected with
and put that in the text chat window
that would be fantastic
okay so i'm going back to um my
powerpoint
for a while but you have all of those
five patient sequences so start working
on that
on your web browsers one by one put the
patient sequence in here
and click blast you can have multiple
windows going on depending on your
internet speed
of blast and maybe blast all five of
those simultaneously
and see what are the hits if all of the
hits are from homo sapiens
then that person just has you know we
just sequence
first people's own dna person's own dna
human dna
but if some other hits pop up
like for example some bacterial name
then
we would like to know what that name is
or if it is some viral name
i would much appreciate if you can put
that name in the text chat window
okay so we will see which patient is
infected and infected by what
and i'll give you
maybe three or so minutes
let's see if somebody can come up
with the answer
patient three is infected by sarsko too
fantastic job yes so see
a simple blast search can tell you
several things
fantastic job people thumbs up so within
no time
you guys can tell this answer by using
open source
publicly available tools okay so that is
the skill
that we should learn from these
bioinformatics softwares
we can have real quick answers nothing
or checking discovery
but these in silicon experiments the
things that we do on computer
can really help us in biology so in
reality you would actually have to
sample the tissue of the patient
isolate the dna sequence the dna once
you have the sequence with you
this is exactly the thing that you will
do okay so that's the power of
bioinformatics fantastic
people who are still working on that
keep working
we will move on on the powerpoint you
will have these sequences with you trust
me
and later on we will have time for
questions we have couple of
more interesting activities for you guys
to do so let's pick up
our powerpoint right from here and
let's move a little ahead
you just did that another topic now this
might be interesting to some of you
genetic engineering can we
put a human gene and make
tons and tons of a protein from that
gene
by using bacteria
synthetically i mean we if technically
if we want to
maybe make a growth hormone for humans
we cannot isolate it from
humans that has several ethical and
other issues to
deal with however we can take the gene
from
humans and we can put it into bacteria
and let bacteria grow and those bacteria
are very fast to grow
within 24 hours or so you will have
millions of bacterial cells and they
would have produced
tons of the proteins that you want
protein molecules that you want from
that gene
that technology is recombinant dna
technology
of course all of this has to happen in
lab but can we plan it out
on computer the answer to that is yes
and this is how you do that
in order to know how this works we need
to have a couple of background of what
bacteria have
they are simple cells they don't have
nucleus their dna is just in their
cytoplasm
but some bacterial cells have these
special dna molecules called as plasmids
these are circular dna molecules if a
bacteria has
these then typically these plasmids give
them antibiotic resistance so they have
those antibiotic resistant genes on them
okay
other thing that you need to know is now
this this cartoon shows you
this this this this is just one specific
plasmid
puc university of california 19 this is
ambisal and resistance gene
of course it's dna so it will have its
own dna sequence
and of course it has several other
things but an important thing for
us to now is there are these
enzymes that some bacteria make
restriction enzymes
these are sequence specific dna cutters
if you apply a specific enzyme
onto any dna molecule it will search the
whole dna
for that specific nucleotide sequence
that it can cut
and it will make a cut it's almost like
molecular scissors
we can use those enzymes these are
isolated from some bacteria it's almost
bacterial defense mechanism
any foreign dna like viral dna enters
into bacteria bacteria can chop that dna
off
of course bacteria protects their dna by
some other mechanism
but humans use these restriction enzymes
for gene cloning
and this is how you do it you take a
plasmid
and you generate something called as
restriction map of that plasmid
that map will actually tell you which
enzymes can cut where
on this plasmid now your job is to
take this plasmid and find out
one enzyme mark my words okay only one
enzyme that can cut only once
because if i cut this plasmid it's a
circular dna
if i cut it three times i'm going to get
multiple pieces out of that plasmid
that's not what i want
i want to either cut it just once or
maybe cut it twice to put
take a fragment out and that's what i
have shown in this gel picture
we actually verify in the lab whether
this has really worked or not
in this is the gel picture from my own
research lab where we cut this plasmid
we run it out on the gel
okay so this is just the plasmid not
cut with any restriction enzyme
if i cut this plasmid with two
restriction enzymes
that will that will generate two
fragments one of this fragment
that's cut by two enzymes will be out
that's what's shown here
and the rest is the remaining plasmid
you can actually see that on the gel
but for now for this lecture we can just
focus on one enzyme
let's at least design our experimental
strategy to cut this plasmid with one
enzyme and that will generate these kind
of sticky ends
then what you can do is let's say human
insulin gene this is how
industry prepares insulin these days and
i'm sure in dr bhupathy's lecture last
time you talked about insulin gene
if i want to start a company to make
lots and lots of insulin and sell it to
people
i can take human insulin gene
and clone it into a plasmid i will have
to just plan out
just figure out which enzyme which
restriction enzyme
to cut the plasmid with maybe just one
enzyme that will give a single cut and
just open up this plasmid
then what i have to do is make sure
that that same restriction enzyme does
not cut
does not cut anywhere in the insulin
gene
i will introduce deliberately introduced
by a technique called spcr
um the same restriction enzyme sites at
the end of that gene but it should not
cut anywhere in between
if it does it will cut my gene into two
pieces
i don't want that okay so two things we
need to take care of
that restriction enzyme that we choose
should cut our plasmid just once
and it should not cut the gene that we
want to insert into our plasmid
if you can take care of those two things
chances are good that you will be
successful in your cloning
and then what we do is this
we take a plasmid the plasmid of our
interest
we digest it with one enzyme let's say
echo r1 isolated from e coli
cut the plasmid open introduce eco r1
sites
into uh at the end of this insulin gene
by pcr this is beyond our lecture right
now how to introduce these sites we know
how to
but what we could do is make sure
that eco r1 since we use eco r1 to cut
the plasma
we should make sure that eco r1 does
not cut anywhere in human insulin g
okay if it does we cannot use that
enzyme we have to use some other enzyme
that cuts plasmid just once
once we have that synthetic recombinant
plasmid ready
we can just deliberately put it into
bacterial cells and allow them to grow
they will grow within 24 hours and you
will have lots of insulin protein human
insulin protein made in bacterial cells
then all you have to do is apply your
chemistry techniques purify that insulin
sell it and become a millionaire so
genetic engineering okay
let's have a try of this i'm going to
point out
this database to you ad gene is a it's a
huge
plasmid repository and we are going to
use this plasmid
poc19 okay so i'm going to point out to
this link to you it will take you
directly to that plasmid sequence
your job is to tell me which enzyme you
will use
to cut this plasmid in order to insert
human insulin gene into it just the name
of that enzyme
so what you will have to do is go to
this
website ad gene look into the plc19
sequence
look into all of the restriction enzymes
the name of the enzymes that cut
this plasmid just once then
you will have to obtain human insulin
gene sequence from genbank
in the same way i told you before okay
and generate a restriction map of human
insulin gene i'll quickly show you how
to do that
we are going to use an nab cutter
software to do that again it's online
okay so follow up on this
link i'm opening it in my web browser
and i will be sharing my entire screen
again so you can see what i
see this is the poc19 sequence it lists
the entire sequence of the plasmid in
fasta format
now if you go over here say analyze
sequence
this might take a second to load it it
makes you
for you the restriction map of this
plasmid so these are several restriction
enzymes
if you just hover your mouse pointer on
those it even tells you
where and which sequence they cut so
figure out um
which enzymes can cut just once in this
plasmid
of course don't disrupt the antibiotic
resistance gene we want to keep that
intact
um you can go to enzymes right here and
it tells you
how many enzymes can give just one cut
now let me introduce you
how will you oh of course i'm i told you
how to search for human gene sequence
but how will you make a restriction map
of human insurance gene
let's say that you obtain the gene
sequence from genbank
well for that you have to go to
neb cutter i'll post the link in zoom
texture
window this is where you need to go to
copy paste human insulin gene sequence
right here for now i'm just going to put
a random sequence okay
human insulin gene should be linear so
just click submit and it will generate a
restriction map
for you it will also list single cutters
for a long gene it will list single
cutters it will list zero cutters
so what you need to make sure is the
enzyme that you
picked to cut plc 19 does not
cut human insulin genes so it should be
in the zero cutters
that's what you need to make sure okay
i'm posting this
nab cutter link in
the zoom text chat window so you can use
this for human insurance
and tell me all you got to tell me
is which enzyme you are going to use
there you go
foreign
so somebody is asking how do we deal
with
some post translational modifications
that are
needed for certain human or any other
eukaryotic protein so that is where
the buffers come into picture remember
bacteria will make that protein
in them but companies are going to
isolate that protein and store it
in a buffer that has specific conditions
it essentially mimics the cytoplasmic
environment
of human cells and then they are going
to sell it so that buffer takes care
of proper three-dimensional shape and
they will even include
some kind of cofactors into that buffer
that protein needs to bind to
in order to be active okay so that
that's a really good question
companies do take care of that if really
they are dealing with a protein that has
some special requirements
foreign
keep in mind in this activity all you
got to tell
in the text jet window is the name of
that enzyme
which cuts just once in puc19 plasmid
but does not cut in human and
challenging
that's all we are looking for very
simplistic planning out as of now
just to begin with you can have a very
complicated cloning strategy mapped out
when we really want to do that
experiment you can have a directional
cloning
if you want to insert human insulin gene
in certain direction you can do that
with two enzymes
but to learn let's start simple
and there is no one specific correct
answer there could be
several single cutters in puc19
that don't cut human insulin gene
so as long as we have a couple of
responses in the text chat window
we are almost there um towards the end
of our lecture
we have one little interesting activity
remaining
to do
and that has to deal with proteins so we
love we played around with some dna
sequences
we also played around with some rna
viral genomes
whole genome sequence alignments we
actually figured out which patient is
infected
with a pathogen and now we have to
do something with protein sequences in
our last activity and we do have time
for that
so maybe wait a minute or two
and see what responses we get from here
sure if it's a single cutter
if it cuts poc19 just once
and if acc1 does not cut
entire human insulin gene anywhere
you can use that like i said there are
several correct answers for
this as long as you are doing it right
you are on the right track to map out
this cloning strategy
again it's not going to be a directional
cloning because we will be using just
one restriction enzyme
and you can introduce acc one if you
want to use that
you can deliberately introduce acc one
side at the end of human insulin gene
and amplify the whole gene by pcr
that's a different deal but first thing
is to make sure that acc one does not
cut
anywhere in between human ensuring gene
if that's the case we are good to go
okay
so fantastic at least we have one
response
that is fabulous so
now it's time for me to introduce that
last activity
to you all about proteins um again i'm
going to go
back to my powerpoint for a while
3d structure of each protein is
important
okay so if we are presented with
a random protein sequence just a primary
structure of protein okay
single letter abbreviations of amino
acids
all we know is that this is the amino
acid sequence of the protein
what structure could it possibly have
we can solve the real structure by using
complicated experiments but that takes
time
and money but can we at least predict
the structure
of the protein can we even predict the
name of the protein that this structure
could be very close to
well yes these days we can do that
servers web servers like this can help
us do that so all you have to do
is just copy paste this protein sequence
into this hh
thread web server just so just follow
this link i'm going to keep this light
on
you will see the search window just copy
paste this protein sequence in there
the job is going to take a few minutes
to run
okay but um we will have last 10 minutes
for you to answer
questions and we still have about seven
minutes to end this lecture
so if somebody can tell me what this
protein is and how
its structure could look like a picture
of that structure or how would
would it look like maybe put that into
our google drive folder that would be
great
and you have the remaining time to maybe
catch up
on some previous activities or
tell me the outcome of this activity
again
this is just i'm just giving you a
random protein sequence and i am asking
you to model
its 3d structure we don't have to have
any software installed
all we have to do is just use this open
source web server
so nifty tools out there people when
when i used to think about
bioinformatics
i used to have that one apprehension one
fear
that i might need to do some coding well
coding is another aspect in
bioinformatics i would say that it's
little advanced aspect of bioinformatics
if you can code then you can actually
get
some simpler things done on your
computer itself
you don't really need web servers but
what if
you want to get some advanced thing done
for example protein 3d structure
modeling
self coding a program to do that kind of
a thing
is complicated it does require some
advanced knowledge of computer
programming
not everyone has that but they can still
do
the bioinformatic tasks using these
publicly available tools
and that's what bulk of bioinformatics
is about
so i'm waiting for the outcome of this
i'm going to keep this slide on
shared with you and i'm going to have
one eye
on the text chat window
yes it is aquaporin it is the protein
that can transport water molecules
inside
human cells and it's human aquaporin
okay so try to try to click on those
links that come to you as um these
result and those links will take you
to p d b
protein data bank and that's where you
will be able to actually see the
structure of
aquaporin sometimes it's a tetramer of
circular ring-like proteins it's almost
like a membrane channel
this is a membrane protein sits on human
cell membranes
almost all of the cells have it okay and
this is just to transport water
molecules through
human cell membrane otherwise our
membrane
repels water because i'm sure some of
the biology majors you have learned this
membrane is hydrophobic at least from
the inner side of it due to that
phospholipid bilayer
so the question in front of scientific
community was well how does
water go through well proteins like
aquaporin
there are several of them they help
water come
into human cells okay so great job
people
if you still have to catch up on some
activities you can do so
otherwise keep working on this activity
it takes time
depending on the internet it takes time
for the servers to run
sometimes however the good thing about
web servers
is um that once you submit the job to
them
it is their computer that runs it
doesn't really take
the memory of our computer and so they
are fantastic
to get some high-end jobs done in
bioinformatics
so hopefully this lecture gave you
some introduction about faster format
of sequences we played around
with homology alignment of some dna
sequences
we actually aligned five whole genomes
of
um human coronaviruses we actually
figured out the infected patient
okay just a mock activity just for fun
we actually modeled a three-dimensional
structure of a protein all on the
internet
so all of these fun activities i hope it
gave a primer
on bioinformatics to you all furthermore
if you do have questions we will have
time
to take questions at the end of my
lecture and we are almost there
so thank you so much for having me and
sharing some basics
of bioinformatics with you here is my
contact information though
if you do have questions down the road
don't hesitate to contact me
the best way to reach me is by my email
okay so thank you and this is when i
will take
any questions you may have
okay thank you very much dr ranji and
now we have about 12 minutes to
9 so maybe we can have two or three
question please if you have question you
can raise your hand or
write it in the chat
[Music]
could you please take a look at the
uh the youtube as well maybe they have
question in the youtube
for no uh
where is the chat
you maybe open the mic if you like to
is anybody willing to
raise any question to dr rajei
i really like that people ask questions
as they were doing the activity
yeah i think that some of the
maybe people are still trying to do the
activity because
it takes time yeah
because uh he he did in the chat i just
think there's a number of
and then salon children it might be you
like to
raise a christian
or we should address the question in the
chat that has been raised and
previously and you already uh
answered briefly about uh on this
question professor
dr
the question is there any question
no is there any race no
yes sir this uh just i i i just
have a a question to just say
so if we just uh a sequence
uh any any any sequences so then
there will be appear or in the in in
the
the software or or not so i just i just
put
the descriptions any sequences
like the food so what will happen
so uh any sequence as in dna sequence
and what software would you like to put
yeah like the last one so the the the
last one
so this is the last sequence yeah okay
so
uh that's a protein sequence and i named
it as unknown protein
just because of you know allowing
students to find out what protein that
is
but if you can put that same sequence
into the now that's a good question
without even predicting the 3d structure
if you want to know what protein that is
what you could do is um i'm going to
share my screen for a while again here
and we are going to go to
the web browser
what you could do is this
go to ncbi blast
and notice that although we use
nucleotide glass there are several
other types of basic local alignment
search tools
you can go to this blast
and put that copy and paste that protein
sequence right here and hit
search and it will tell you the name of
the protein
so if we are just looking into the name
of the protein that unknown protein you
can get it from right here but if you
are looking into
a possible structure then of course you
don't have a blast will be of no use to
you
you will have to use those structure
prediction
algorithms okay
thank you hey maybe one question from me
uh dr raj if we
are uh elucidating the the
uh not the structure the sequence of a
protein for example
and we we publish in uh we publish that
that sequence will it be automatically
uh published in a database or do the
person who do the research has to
publish
that uh sequence in database themselves
good question so if you um have
let's say identified a new protein
sequence or even gene sequence yes
you will have to separately go to ncbi
genbank it's a free account you can
create that account
and submit the sequence with your name
and then you can quote
the paper the published paper that um
you know essentially lists
the several experiments that you did to
come up to that sequence but you will
have to
actually submit it yourself to the
database that you want to put into
okay so so uh in other words if we
browse to the database
there might be some information that are
available
in the literature in the in the paper
or journals or everything that has not
been uh
in the database yet yes yes yes
for sure yes so now now
talking about human genes since we have
had
a whole human genome project already
done
most of those genes are sequenced for
humans
but there are several other organisms
their genomes are not sequenced so not
everything is known what databases show
us
is what's known and what's submitted to
them now that brings me to another point
could those sequence have errors
what if i'm sequencing a gene and
by doing the experiment i have a couple
of nucleotides error
in that maybe the real sequence is
different but in my experiment i had
some error
i went i rushed myself and published
that sequence on genbank
then the whole world is essentially
seeing an incorrect sequence
what we are seeing is what people put in
there
so we have to be very careful about what
sequence we are actually using where it
is coming from
and that's why i told the whole audience
spend some time into the features
section see who has put that sequence
how many people have used it
what is the source of that sequence does
the length of that sequence even
correspond to the gene
oh okay so we have to be very critical
about what are in the determination
very careful yes exactly because this is
something that people put and
as we know being chemists or biologists
that experimental errors are always
possible
and if there is an experimental error it
reflects in the sequence
in fact um i can actually show you we
have one more question
i can show you where we can find some
information
so again let me um thanks for that
question it's a really nice question
go back to ncbi
genbank and
here we go i can just pull up any random
nucleotide sequence any sequence
human acntb
and i'm just going to click on a random
link
and show you what i mean by that
so this is the genbank accession number
for this sequence
but after the decimal point the number
that you see
is the number of times that sequence has
been revised
so this sequence has been just submitted
once and that's it
sometimes that number could be 14 in
that case
people revise the sequence people some
some other people maybe
they review the errors they correct the
errors and submit the new sequence
so you will see for every submission
this number will keep increasing so
that's the version number of that
sequence
okay so we should use the the
the highest number actually yeah the
latest one
if it's available yeah things are some
i think it's a question from the chat
book
yes it is known that polycystronic mrnas
are common in prokaryotes
yeah it is known then how do we know
which sections correspond to which
expression
in a single strand now that's a
fantastic question
so in polycystronic messenger rnas how
do you know
where the translation starts and where
translation ends
well those start and stop codons those
are universal
okay so even bacteria are going to have
start and stop codons
in their messenger rnas so that i'm
going to put something in the text chat
window o
r f prediction
look into that there are several tools
to predict open reading frames
for each messenger rna although we did
not do this activity due to time
limitations in this one little lecture
you can actually just google the same
thing orf open reading frame prediction
put that polycystronic mrna sequence in
there and it will give you three
if it really codes for three proteins it
will give you three open reading frames
and you can straight away now that well
these are the proteins
so thanks for that question another
question this is the
application of bioinformatics is still
limited
seeing how much potential it posses that
you have displayed
how do you suggest it can be applied to
industrial application and which
industry section will be um i would say
biotech industry
would most benefit even chemical
industry would most benefit so that
genetic
engineering section that we uh when
looking into
you can take pretty much a gene from you
can actually take antibiotic gene
from um from um
certain fungi clone it into a plasmid
and make antibiotics
you could do that in some eukaryotic
cells that's possible synthetic
version of any kind of a protein
is could be done with bioinformatics at
least planned out experiment planning
could be done with bioinformatic tools
it can be used in chemistry as well
there are some chemi informatic tools
molecular interactions three-dimensional
can this molecule fit into this molecule
that brings me to a concept of drug
design
if i'm generating a drug molecule will
it even act in human body
what are the potential interacting
partners you can predict this
all on computer is it possible to search
for silencing genes such as rnai
yes there are separate databases for rna
interference so yes it is possible
although we did not go through that yes
it is possible
okay we have time limitation yeah that's
this is quite nice and
peaceful i think we have to close the
[Music]
tools that we available play with
further yes i
i tell it again play further because
it's very interesting to play with
them and you're also introducing us to a
different kind of libraries i said the
database is
library for us to
to go further with what is in the cell
and what feature can we do with that and
to conclude with i would like to share
screen
okay so this is actually a part
of a monday morning lectures on
bioprocess engineering and we are on the
second
class next week we will have another
class on fundamental of an aerobic
digestion process and this is uh
actually in conjunction with
80 years of chemical engineering
education in indonesia
so you are all welcome to join the next
class
next week on fundamental of anaerobic
digestion process
in the same time monday morning 7
to 9. so be prepared
get up early and listen to this
interesting talk and next week we will
have professor rama rajvupathi and it
will be host
by professor kendra satyadi so i would
like once again thank you all
for listening and joining this lecture
today maybe
pagunter would like to take the picture
of all of us
okay please turn on your camera
then
yeah i will i will stop my share screen