- Our story (and commentary)
- Overview of genome sequencing
- Tools of genome sequencing
Our Story
Fragile X Syndrome is the single greatest genetic cause
of mental retardation in humans. You are a molecular neurologist hoping
to understand the nature of the disease in order to effect its ultimate
cure. The protein responsible for Fragile X Syndrome has long been known.
FMRP is a protein that affects the translation of 100's of mRNAs. You'd
like to know if there is one mRNA or perhaps a small subset of them that
is responsible for the disease symptoms.
One approach is to systematically disrupt the genes
encoding the affected mRNAs. Such experiments with humans are currently
frowned upon, so you have turned to mice. Unfortunately, FMRP-deficient
mice turn out not to be the ideal system in which to study the basis of
mental retardation. Mice are only subtly affected by the loss of the protein,
and experiments are not easy to perform. Things are not going well.
Then one morning, you wake up and think Flies. Yes…
flies. Fruit flies. People have been using the fruit-fly Drosophila as
a model system for a hundred years. It is easy to do genetic manipulations
and a great deal is known about the behavior and development of Drosophila.
True, it might be difficult to detect mental retardation in a fruit-fly,
but that question shows a certain want of feeling.
You decide to go for it.
You obtain the FMRP sequence, use it to scan the Drosophila
genome for a gene that encodes a similar protein, find it, clone it, mutate
it, put the modified gene back into Drosophila, gain deep insight into
the causes of mental retardation, and book a flight to Stockholm to pick
up your Nobel Prize.
Commentary
The key point in this little tale is contained in this
fragment "… scan the Drosophila genome for a gene that encodes a similar
protein…". Having in hand the genome of Drosophila and hundreds
of other organisms has made possible lines of inquiry that were unthinkable
just a several years ago. Our goal today is to understand how genomic sequences
are obtained and how they may be put to good use. First we'll search the
Drosophila
genome
(as did the hero of our tale) for a gene encoding an FMRP-like protein.
Then we'll examine the process of sequencing of the
Drosophila
genome.
Understanding how the genome sequence was deduced may illuminate both the
power and the limitations of the resource we have at our disposal.
Overview of genome sequencing
Note that the path leading to Stockholm described in this
story
relied on the existence of Drosophila genes and
proteins in an accessible database. Before 2000, no database contained
entries for more than a small fraction of genes and proteins from Drosophila.
Before 1995, no database contained entries for more than a small fraction
of genes from any organism. The fact that GenBank and other
similar databases provide so rich a source of information results from
the thousands of genome sequencing projects that have sprung up since 1995.
One can break up a genome project in many ways. Here's
one:
-
Obtain the raw sequence of a genome
-
Identify genes within the genome
-
Deduce function of the protein encoded by the genes
In this module, we focus on the first problem, getting the raw sequence
and figuring out how much of it we really have. To do this, we'll consider as an
example the elucidation of the Drosophila genome, as described in:
Myers EW et al (2000). A whole-genome assembly
of Drosophila.
Science
287:2196-2204.
You'll eventually want to digest much of this article. For now, I just want to make sure that you can obtain it.
If you need help getting the article, consult How to Find Articles. If you're having a problem getting this article, solve it! Now! You won't be able to get anywhere in this course if you can't find articles.
Tools of genome sequencing
The main task for today is to understand some of the techniques used in the paper. I know you are capable of finding background on the web, but I've saved you some trouble by gathering together some useful links (I won't always be so helpful). Use them or anything else you like to get the basic idea.
What is shotgun sequencing?
What is dideoxy sequencing?
What are BAC libraries? What are P1 inserts?
-
Monaco AP and Larin Z (1994). YACs, BACs, PACs, and MACs:
Artificial chromosomes as research tools. Trends in Biotechnology 12:280-286.
-
Shizuya H et al (1992). Cloning and stable maintenance of
300-kilobase-pair fragments of human DNA in Escherichia coli using
an F-factor-based vector. Proceedings
of the National Academy of Sciences USA 89:8794-8797.