software for generating Phylogenetic and Network Tree graphs from the
PHYLIP data. Instructions:
This would be my
version of L. David Roper's set of instructions. I have
adapted them for TMRCA, and stepped through it a bit more. For more
detailed instructions, see the documentation that comes with each
Bookmark the Y-Haplogroup Predictor by Whit Athey and the Y-DNA
utility by Dean McGee (both listed above as the "tools"). 2)
Download and install the PHYLIP and MEGA software tools
listed above. 3)
Bookmark the Instructions given above from the ROPER DNA
sure that your data is in a format acceptable to Dean
McGee's Y-DNA Comparison Utility.
You will want to also make it acceptable format for the PHYLIP
software as well, which will be explained later.
If you can cut-and-paste directly
from the FTDNA web page into Dean McGee's Y-DNA Comparison Utility (and
it works), then you can skip this step.
Many folks can just cut-and-paste the
data from the FTDNA page, but others also have data from other vendors
or from the National Genographic Project. For those who do not
have their data already formatted for them, you MUST arrange it in an
text file with the format following the FTDNA notation:
where the first field (as in "43270 WmNC") is the participant
User ID number and an abbreviated description. This could be a name and haplotype group, for example.
But, please remember that later, when we use the PHYLIP program, it
expect the first field not to exceed 10 characters. So, be sure not to
exceed 10 characters in that first field before the DYS data begins for
The newer version of McGee's Utility will only accept the FTDNA format
with the "minus" ("-") notation for multiple alleles. This is a new
format from FTDNA, McGee's older version had no "-" separator.
You can place this format into a notepad file, and later cut-and-paste into McGee's Utility.
Two simple things to remember for the cut-and-paste:
1) You must save off to notepad and exit before opening again for cut-and-paste.
2) McGee's Utility expects each line to start with a kit number or ID.
So, after cut-and-paste, check that each line in McGee's Utility begins
with a kit ID.
- Some of your data may have a mixture of 37 marker
results, 67 marker results, and 111 marker results. So here, be sure to wrap your line if it exceeds the width
the text editor window.
first field should contain 10 characters, and can be padded with
spaces. If you are dealing with data from FTDNA, remove the column for
the SNP haplotype group.
b) McGee's new Y-DNA 111
marker version of the Utility expects to see the "new" data format from
FTDNA. That format now places the palindrones (YCAIIa/b, CDYa/b,
DYS464a/b/c/d, etc) into one column, with the values separated by the
minus sign ("-"). Which means, the new data format from FTDNA will
contain minus signs instead of the old space (" ") character between
some of the data. For the data from FTDNA that is separated by the minus
sign, McGee's Utility no longer accepts the space (or tab) character.
So, be sure to include the minus sign characters if you are using the
new Y-DNA 111 marker mode.
Cut-and-paste this text data into a text document ("filename.txt") for later use. I suggest calling this raw data file something like
- Visit Dean McGee's Y-DNA Utility on the internet.
If you need to cover special cases (such as DYS464e), then you will
need to modify the "Marker exists" boxes, and include some character in
your data (such as minus "-" ) where DYS464e does not exist. So,
in some cases, you may need to pay attention to what is selected for
the marker list here as well as what is in your data. In general, his utility will accept a "minus" for data that does not exist.
For Genetic Distance, use "Hybrid" mutation
- de-select "modal
Probability: 95 %
under TMRCA, select "Generate PHYLIP data"
mutation rate: FTDNA
For "Units" select "Years" - and type
in "25" years/generation
Cut-and-paste the data into Dean McGee's Y-DNA
If you have pasted correctly, each line should start on the left side of the input box with the kit number or name. - click on "execute"
If there are any errors (in the "Debug" window),
then check your data very closely, or go back to step 4).
If you have problems, his instructions can be found at the bottom of
If you are interested in experimenting with
individual marker mutation rates, the new 111 marker version lets you
"Execute Setup" for that purpose. (A normal "Execute" will generate the
default setup for you, as the "Show Setup" button is enabled by default.
If you have a large file, you
may observe that the first line of Dean McGee's Y-DNA Utility will give
a "Status" box, and will indicate the number of lines that are being
analyzed. (That line count will automatically be plugged in as
the count of
the total number of lines in the next step.) For very large
files, you may want to de-select things that you do not need, such as
"Generate Fluxus Data" and/or "Show Mutation Rates."
If you are
prompted with a popup about system resources, click on "continue" to
enable the program to keep on running.
When you have a successful run from Dean McGee's Y-DNA Utility,
- scroll down to the very bottom of the page to the section
titled "Time to Most Recent Common Ancestor (Years)"
- Select all of the data in the "PHYLIP
compatible TMRCA table."
- select "copy" from the "Edit" menu item.
- paste this into a new text file called
- Then, save and exit notepad.
Convert tha data to phylogenetic tree format:
Copy the "infile_SURNAME_95.txt" to your PHYLIP executable area. (You should have the PHYLIP package saved to a folder on your computer disk.)
The "distance.html" documentation that accompanies the PHYLIP package
FITCH and the Neighbor-Joining option of NEIGHBOR fit a tree which has
branch lengths unconstrained.
KITSCH and the UPGMA option of NEIGHBOR, by
contrast, assume that an "evolutionary clock" is valid,
according to which the
true branch lengths from the root of the tree to each tip are the same:
amount of evolution in any lineage is proportional to elapsed time.
So, basically, for TMRCA, we would want to use either the:
"neighbor" program with the "UPGMA" option, or
- "kitsch" program with the default "Fitch-Margoliash" option.
( I use the "kitsch" program.)
- Run the "kitsch" program
within the PHYLIP package.
You could have alternatively run the "neighbor" program, but L. David
Roper indicates that "kitsch" is more accurate.
For the "neighbor program, L. David Roper suggests that you use the
option (Un-weighed Pair Group Method with Arithmetic Mean)
For the "kitsch" program, take the "Fitch-Margoliash" method.
- When prompted for an input file name, indicate the input file
to be: infile_SURNAME_95.txt
- When prompted for an output file name,
- type "f" to write to a new File
- When prompted for a new file name,
- For the "kitsch" program, take the "D" option for the "Fitch-Margoliash" Method
For the "neighbor" program, select the "UPGMA" Method.
- Select option "L" for "Lower trangular data matrix"
- Select option "J" for "Randomize input order of
- when prompted for a "Random number seed (must be odd),"
- when prompted for "Number of times to jumble" enter:
small sets of data (around 20), use:
large sets of data, use:
The number of jumbles will affect the length of time it takes to
execute large sets of data.
A lower number for large data sets will mean less computer time
For example, 99 jumps on 150 lines of data could take over 6 hours to
execute on a 3 GHz computer. For example, 11
jumps on 150 lines of data could take over 1 hour to execute on a 3 GHz
A larger number of jumps for smaller sets of data will increase the
- Type "y" to start the calculations.
- when prompted for an "outtree" file name,
- type "f" to write to a new file
If you get the error similar
element of row 3 of distance matrix is not zero.
it a distance matrix?
then, you probably either forgot to remove the "modal" column from the
or forgot to enter the number of lines in the data.
exit the kitsch program and Go back to Step (6).
If you get the error similar
or end-of-file in middle of species name for species xxx
then, your line count is probably wrong.
exit the kitsch program and Go back to Step (6).
If the data has not been sorted into groups, this can take a while to
run for large sets of data.
- View the "tree" formatted output:
8) Rename the output file "outtree_SURNAME_95.txt" to
9) If you have MEGA installed, then you can simply double
click on the file name.
Otherwise, start Mega and select your *.tre file.
In order to copy your phylogenetic tree from MEGA into a paint package:
Cut-and-paste from the MEGA software package by selecting:
- image from the main menu items
- select "Copy to clipboard"
Paste this into your favorite paint package and save it off in JPEG
format. I typically use mspaint using the window tool.