HAM Surname DNA Project
Research
through Genetics
Instructions
for creating TMRCA Phylogenetic charts from DNA information
I
have to thank L. David Roper for
posting his
instructions and
and
Dean McGee as well for making
this easy for us!
September, 2005
by Dave Hamm,
N ovi, MI
TOOLS:
Whit's Y-Haplogroup
Predictor
by Whit Athey
Y-DNA
Comparison
Utility by Dean McGee (for the classical view).
Y-DNA
Comparison
Utility by Dean McGee for the extended number of markers (i.e.,
67) from FTDNA.
PHYLIP
software to generate phylogenetic tree data.
- Instructions
for using applying PHYLIP to DNA data, by L. David Roper from the ROPER DNA Project
MEGA
software for generating Phylogenetic and Network Tree graphs from the
PHYLIP data.
Instructions:
This would be my
version of L. David Roper's set of instructions. I have
adapted them for TMRCA, and stepped through it a bit more.
For more
detailed instructions, see the documentation that comes with each
individual package.
-
Preparation:
1)
Bookmark the Y-Haplogroup Predictor by Whit Athey and the Y-DNA
Comparison
utility by Dean McGee (both listed above as the "tools").
2)
Download and install the PHYLIP and MEGA software tools
listed above.
3)
Bookmark the Instructions given above from the ROPER DNA
Project.
4) Make
sure that your data is in a format acceptable to Dean
McGee's Y-DNA Comparison Utility.
You might as well also make it acceptable format for the PHYLIP
software as well, which will be used later.
If you can cut-and-paste directly
from the FTDNA web page into Dean McGee's Y-DNA Comparison Utility (and
it works), then you can skip this step.
Many folks can just cut-and-paste the
data from the FTDNA page, but others also have data from other vendors
or from the National Genographic Project. For those who do not
have their data already formatted for them, you MUST arrange it in an
text file with the following format:
23651
13 23
17 9 12
12 11 13
13 14 11
29 16 8
9 11 11
24 15 21
29 11 14
14 15
23921
13 25
14 11 11
13 12 12
12 13 14
30 18 9
10 11 11
25 15 18
30 15 16
16 17 11
12 19 23
17 16 17
17 37 37
12 12
-
where the first field (as in "23651") is the participant
User ID number. This could be a name and haplotype group, for example.
But, please remember that later, when we use the PHYLIP program, it
expect the first field not to exceed 10 characters. So, be sure not to
exceed 10 characters in that first field before the DYS data begins for
the line.
-
You will notice that the above data has 25 marker
results for user ID "23651" whereas User ID number "23921" has a 37
marker test. So here, be sure to wrap your line if it exceeds the width
of
the text editor window.
I usually call this raw data file something like
SURNAME_raw_data.txt
- Procedure:
5)
Cut-and-paste the data into Dean McGee's Y-DNA
Comparison Utility.
- Visit Dean McGee's Y-DNA Utility on the internet.
- select:
Y-DNA Comparision
Utility, FTDNA Mode for
Y-Search mode
Y-DNA
Comparision Utility, FTDNA Mode for
FTDNA mode
If your limit is 67 FTDNA markers, then under the "Generate Tables"
box, type in "67" for the "Max alleles."
If you need to cover special cases (such as DYS464e), then you will
need to modify the "Marker exists" boxes, and include some character in
your data (such as minus "-" ) where DYS464e does not exist. So,
in some cases, you may need to pay attention to what is selected for
the marker list here as well as what is in your data.
- select
- LEFT
column
MIDDLE column
For Genetic Distance, use "Infinite allele mutation
model"
- de-select "modal
haplotype"
Probability: 95 %
- select "Show
Mutation Rates"
under TMRCA, select "Generate PHYLIP data"
mutation rate: FTDNA
at
0.004
For "Units" select "Years" - and type
in "25" years/generation
- click on "execute"
If there are any errors (in the "Debug" window),
then check your data very closely, or go back to step 4).
If you have problems, his instructions can be found at the bottom of
the Utility.
If you have a large file, you
may observe that the first line of Dean McGee's Y-DNA Utility will give
a "Status" box, and will indicate the number of lines that are being
analyzed. (That line count will automatically be plugged in as
the count of
the total number of lines in the next step.) For very large
files, you may want to de-select things that you do not need, such as
"Generate Fluxus Data" and/or "Show Mutation Rates."
If you are
prompted with a popup about system resources, click on "cancel" to
enable the program to keep on running.
6)
When you have a successful run from Dean McGee's Y-DNA Utility,
- scroll down to the very bottom of the page to the section
titled "Time to Most Recent Common Ancestor (Years)"
- Select all of the data in the "PHYLIP
compatible TMRCA table."
- select "copy" from the "Edit" menu item.
- paste this into a new text file called
infile_SURNAME_95.txt, using
notepad.
- Then, save and exit notepad.
-
Convert tha data to phylogenetic tree format:
7)
Copy the "infile_SURNAME_95.txt" to the PHYLIP executable area.
DISCUSSION:
The "distance.html" documentation that accompanies the PHYLIP package
explains:
FITCH and the Neighbor-Joining option of NEIGHBOR fit a tree which has
the
branch lengths unconstrained.
KITSCH and the UPGMA option of NEIGHBOR, by
contrast, assume that an "evolutionary clock" is valid,
according to which the
true branch lengths from the root of the tree to each tip are the same:
the
expected
amount of evolution in any lineage is proportional to elapsed time.
So, basically, for TMRCA, we would want to use either the:
-
"neighbor" program with the "UPGMA" option, or
- "kitsch" program with the default "Fitch-Margoliash" option.
PROCEDURE:
- Run the "kitsch" program
within the PHYLIP package.
You could have alternatively run the "neighbor" program, but L. David
Roper indicates that "kitsch" is more accurate.
For the "neighbor program, L. David Roper suggests that you use the
UPGMA option (Un-weighed Pair Group Method with Arithmetic Mean)
For the "kitsch" program, take the "Fitch-Margoliash" method.
- When prompted for an input file name, indicate the input file
to be: infile_SURNAME_95.txt
- When prompted for an output file name,
- type "f" to write to a new File
- When prompted for a new file name,
indicate: outfile_SURNAME_95.txt
- For the "kitsch" program, take the "D" option for the "Fitch-Margoliash" Method
For the "neighbor" program, select the "UPGMA" Method.
- Select option "L" for "Lower trangular data matrix"
- Select option "J" for "Randomize input order of
species"
- when prompted for a "Random number seed (must be odd),"
enter: 9
- when prompted for "Number of times to jumble" enter:
for
small sets of data (around 20), use:
99
for large sets of data, use:
11
The number of jumbles will affect the length of time it takes to
execute large sets of data.
A lower number for large data sets will mean less computer time
required.
For example, 99 jumps on 150 lines of data could take over 6 hours to
execute on a 3 GHz computer.
For example, 11
jumps on 150 lines of data could take over 1 hour to execute on a 3 GHz
computer.
A larger number of jumps for smaller sets of data will increase the
accuracy.
- Type "y" to start the calculations.
- when prompted for an "outtree" file name,
- type "f" to write to a new file
-
enter:
outtree_SURNAME_95.txt
NOTE:
If you get the error similar
to:
diagonal element of row 3 of distance matrix is not zero.
Is it a distance matrix?
then, you probably either forgot to remove the "modal" column from the
input data,
or forgot to enter the number of lines in the data.
exit the kitsch program and Go back to Step (6).
NOTE:
If you get the error similar
to:
end-of-line or end-of-file in middle of species name for species xxx
then, your line count is probably wrong.
exit the kitsch program and Go back to Step (6).
If the data has not been sorted into groups, this can take a while to
run for large sets of data.
- View the "tree" formatted output:
8) Rename the output file "outtree_SURNAME_95.txt" to
"outtree_SURNAME_95.tre"
9) If you have MEGA installed, then you can simply double
click on the file name.
Otherwise, start Mega and select your *.tre file.
10)
Cut-and-paste from the MEGA software package by selecting
- image from the main menu items
- select "Copy to clipboard"
Paste this into your favorite paint package and save it off in JPEG
format.
- Dave
Hamm September, 2005 (updated October, 2006)
Examples:


References:
Calculating
Time to Most Recent Common Ancestor
(TMRCA) by Dr.
Bruce Walsh, University
of Arizona, FTDNA's Advisory Board
Time to
Most Recent Common Ancestor (PDF file) by Dr.
Bruce Walsh, University of Arizona. 2001. Published by the
Genetics Society of America. The math behind the theory, for the
more curious.
Haplogroups
of the World - Doug McDonald's Map
(PDF file) of the distribution
of Y-DNA and
mtDNA
Instructions
for using applying PHYLIP to DNA data, by L. David Roper from the ROPER DNA Project
Back to HAM Country