HAM Surname DNA Project
Research
through Genetics
Instructions
for using PHYLIP "CONTRAST" program from FTDNA information
December, 2005
by Dave Hamm,
Novi, MI
TOOLS:
PHYLIP
software to generate phylogenetic tree data.
This would be my
version of Felsenstein's original set of instructions that are
included with the PHYLIP package.
I have
adapted them for FTDNA data, and stepped through it a bit.
For more
detailed instructions, see the documentation that comes with the PHYLIP
package, Joseph Felsenstein's documentation for "discrete characters"
and for the "contrast" program. (© Copyright 1986-2004 by the
University
of Washington. Written by Joseph Felsenstein).
Instructions:
PHYLIP "CONTRAST" INSTRUCTIONS
Instructions
for using the "CONTRAST"
package of PHYLIP by using of
FTDNA data.
(for use with
FTDNA data conversion)
by Dave
Hamm odoniv@earthlink.net
December, 2005
CONTRAST examines correlation of traits as they evolve along a given
phylogeny.
Contrast is
© Copyright 1986-2004 by the University of Washington. Written by
Joseph Felsenstein.
For more
information, see the "contrast" documentation included within the
PHYLIP package.
CONTRAST -- Computes
contrasts for comparative method
This program
implements the contrasts calculation described in Felsenstein's 1985
paper on the comparative method (Felsenstein, 1985d). It reads in a
data set of the standard quantitative characters sort, and also a tree
from the treefile. It then forms the contrasts between species that,
according to that tree, are statistically independent. This is done for
each character. The contrasts are all standardized by branch lengths
(actually, square roots of branch lengths).
The method is
explained in the 1985 paper. It assumes a Brownian motion model. This
model was introduced by Edwards and Cavalli-Sforza (1964;
Cavalli-Sforza and Edwards, 1967) as an approximation to the evolution
of gene frequencies. Felsenstein has discussed (Felsenstein, 1973b,
1981c, 1985d, 1988b) the difficulties inherent in using it as a model
for the evolution of quantitative characters. Chief among these is that
the characters do not necessarily evolve independently or at equal
rates. This program allows one to evaluate this, if there is
independent information on the phylogeny. You can compute the variance
of the contrasts for each character, as a measure of the variance
accumulating per unit branch length. You can also test covariances of
characters.
If you have the "gnumeric"
spreadsheet program, it can graph the results or provide a similar
functionality.
"Contrast" takes two
different types of data, depending upon the analysis.
First, we will
run through the continuus character data type.
By default,
"contrast" will compute the phenotypes of individual specimens.
REQUIREMENTS:
1) An
input file. The program will accept distance information used to create
the "tree" file.
Use continuous quantitative characters (not gene frequencies).
for example: infile_contrast_HAM.txt
However, the first line of the data needs to indicate:
a) number of taxa
b) columns of data after the name
Then follows the standard PHYLIP format for data:
The first 10 characters are taken as the name, and thereafter the
values of the individual characters are read free-format, preceded and
separated by blanks.
2)
An input tree. A 'tree" format file, which includes positive
branch lengths.
such as: outtree_contrast_HAM.tre
(This is what you should get from "kitsch" by default.)
PROCEDURE:
1) run:
contrast.exe
2) specify input
file:
infile_contrast_HAM.txt
where the data has the format:
---------------------------------------------------------
9
9
40777_WmVA 0 775
500 1475 6225 6550 5575 6550 5000
PQ4ZU_Arth 775 0
1800 1300 6550 6550 6550 6550 6550
42370_WmNC 500
1800 0 1725 5575 5325 5575 6550 5000
PV4HM_Vale 1475
1300 1725 0 7050 8200 7050 8200 6225
44176_WmVA 6225
6550 5575 7050 0 775 4525 8200 5000
N14608_WmV 6550
6550 5325 8200 775 0 8200 8200 10850
43250_Rich 5575
6550 5575 7050 4525 8200 0 2350 1975
N13303_HAM 6550
6550 6550 8200 8200 8200 2350 0 1300
41641_JoVA 5000
6550 5000 6225 5000 10850 1975 1300 0
---------------------------------------------------------
The first line
includes two numbers,
<number of samples> <columns of data after
the name>
3) specify
the input tree file for contrast:
-
Take option "f" to provide a file name:
outtree_contrast_HAM.tre
This could be an output tree file from the "kitsch" program.
The
main requirements for the "tree" format is that:
a) must contain branch lengths
b) must have positive branch lengths
which is what you should get from "kitsch" by default.
example
(outtree_contrast_HAM.tre):
---------------------------------------------------------
((N14608_WmV:387.50000,44176_WmVA:387.50000):2741.79286,(((41641_JoVA:650.00000,N13303_HAM:650.00000):415.11408,43250_Rich:1065.11408):1960.37829,(((42370_WmNC:250.00000,40777_WmVA:250.00000):217.64849,PQ4ZU_Arth:467.64849):262.81169,
PV4HM_Vale:730.46018):2295.03218):103.80049);
---------------------------------------------------------
4) specify the
output file name:
-
Take option "f" to provide a file name:
f
Please enter a new file name> outfile_contrast_HAM.txt
5) Specify
that Contrasts sould be print output by selecting the "C" option:
c
Type "y" to execute:
y
6) The program
should execute.
You
should use wordpad to the output file (if you took Terminal Type of IBM
PC).
Notepad does not work well for viewing.
OUTPUT:
The output should
provide:
- IF the
"c" option is taken (to print output), you should also get a line
called "Contrasts" as in:
Contrasts
(columns are different characters)
---------
-------- --- --------- -----------
- The following
is what is output by default:
Covariance matrix
---------- ------
Regressions
(columns on rows)
-----------
-------- -- -----
Correlations
------------
Example
Output:
Continuous
character contrasts analysis, version 3.64
9
Populations, 9 Characters
Name
Phenotypes
----
----------
40777_WmVA
0.00000 775.00000 500.00000 1475.00000 6225.00000 6550.00000
5575.00000 6550.00000 5000.00000
PQ4ZU_Arth
775.00000 0.00000 1800.00000 1300.00000 6550.00000
6550.00000
6550.00000 6550.00000 6550.00000
42370_WmNC
500.00000 1800.00000 0.00000 1725.00000 5575.00000
5325.00000
5575.00000 6550.00000 5000.00000
PV4HM_Vale
1475.00000 1300.00000 1725.00000 0.00000 7050.00000
8200.00000
7050.00000 8200.00000 6225.00000
44176_WmVA
6225.00000 6550.00000 5575.00000 7050.00000 0.00000
775.00000
4525.00000 8200.00000 5000.00000
N14608_WmV
6550.00000 6550.00000 5325.00000 8200.00000 775.00000
0.00000
8200.00000 8200.00000 10850.00000
43250_Rich
5575.00000 6550.00000 5575.00000 7050.00000 4525.00000 8200.00000
0.00000 2350.00000 1975.00000
N13303_HAM
6550.00000 6550.00000 6550.00000 8200.00000 8200.00000 8200.00000
2350.00000 0.00000 1300.00000
41641_JoVA
5000.00000 6550.00000 5000.00000 6225.00000 5000.00000 10850.00000
1975.00000 1300.00000 0.00000
Contrasts
(columns are different characters)
---------
-------- --- --------- -----------
11.67434 0.00000 -8.98027 41.30922
27.83882 -27.83882 132.00990 0.00000 210.13820
-42.98927
0.00000 -42.98927 -54.77664 -88.75203 73.49778 -10.40063
36.05551 -36.05551
4.70721 0.00000 4.70721
3.82461 48.83735 31.18529 50.89675 -40.01132 -31.18529
22.36068 45.83939 -22.36068 11.18034 -29.06888
-54.78367 0.00000 0.00000 0.00000
-18.44324
45.22985 -54.45147 10.53899 -22.83449 -21.51711
-34.25173 0.00000 -54.45147
-29.06284
-16.13800 -23.74748 42.68580 -25.35800 -58.05337 -30.79298
-47.81052 -16.50346
68.52555 79.27878 63.38587 88.50629 -10.83208
28.50569 -72.62570 -82.81628 -66.38914
46.20695 41.17827 29.20949 53.29359 -87.57067
-116.54045 40.01598 61.71943 68.39870
Covariance matrix
---------- ------
1315.2699
999.3969 1081.6324 1274.7001 11.2178 -738.7986
78.3464 -396.4997 493.9340
999.3969
1548.5249 390.4195 1188.9403 -802.6044 -635.8298 -645.2690
-406.5636 -580.4011
1081.6324
390.4195 1356.3435 916.3782 380.7922 -73.0986
-167.1339 -506.1940 82.8429
1274.7001
1188.9403 916.3782 2181.6597 -134.4274 -1507.7424
30.8699 -1026.1713 878.4253
11.2178 -802.6044 380.7922 -134.4274 2604.0510
959.7037 741.1462 -1056.1759 489.7839
-738.7986
-635.8298 -73.0986 -1507.7424 959.7037 3547.2659 -882.6605
-671.9685 -2150.8108
78.3464 -645.2690 -167.1339 30.8699 741.1462
-882.6605 3640.3032 943.1421 4557.4924
-396.4997
-406.5636 -506.1940 -1026.1713 -1056.1759 -671.9685 943.1421
1981.8220 1307.0543
493.9340
-580.4011 82.8429 878.4253 489.7839 -2150.8108
4557.4924 1307.0543 7344.2268
Regressions
(columns on rows)
-----------
-------- -- -----
1.0000 0.7598
0.8224 0.9692 0.0085
-0.5617 0.0596 -0.3015
0.3755
0.6454 1.0000
0.2521 0.7678 -0.5183
-0.4106 -0.4167 -0.2625 -0.3748
0.7975 0.2878
1.0000 0.6756 0.2807
-0.0539 -0.1232 -0.3732 0.0611
0.5843 0.5450
0.4200 1.0000 -0.0616
-0.6911 0.0141 -0.4704
0.4026
0.0043 -0.3082 0.1462
-0.0516 1.0000
0.3685 0.2846 -0.4056
0.1881
-0.2083 -0.1792 -0.0206
-0.4250 0.2705 1.0000
-0.2488 -0.1894 -0.6063
0.0215 -0.1773 -0.0459
0.0085 0.2036 -0.2425
1.0000 0.2591 1.2520
-0.2001 -0.2051 -0.2554
-0.5178 -0.5329 -0.3391
0.4759 1.0000 0.6595
0.0673 -0.0790 0.0113
0.1196 0.0667 -0.2929
0.6206 0.1780 1.0000
Correlations
------------
1.0000 0.7003
0.8098 0.7525 0.0061
-0.3420 0.0358 -0.2456
0.1589
0.7003 1.0000
0.2694 0.6469 -0.3997
-0.2713 -0.2718 -0.2321 -0.1721
0.8098 0.2694
1.0000 0.5327 0.2026
-0.0333 -0.0752 -0.3087 0.0262
0.7525 0.6469
0.5327 1.0000 -0.0564
-0.5420 0.0110 -0.4935
0.2195
0.0061 -0.3997 0.2026
-0.0564 1.0000
0.3158 0.2407 -0.4649
0.1120
-0.3420 -0.2713 -0.0333
-0.5420 0.3158 1.0000
-0.2456 -0.2534 -0.4214
0.0358 -0.2718 -0.0752
0.0110 0.2407 -0.2456
1.0000 0.3511 0.8814
-0.2456 -0.2321 -0.3087
-0.4935 -0.4649 -0.2534
0.3511 1.0000 0.3426
0.1589 -0.1721 0.0262
0.2195 0.1120 -0.4214
0.8814 0.3426 1.0000
--------------------------------------------------------------------------------------------------------
From
Felsenstein's documentation:
The statistics
that are printed out include the covariances between all pairs of
characters, the regressions of each character on each other (column j
is regressed on row i), and the correlations between all pairs of
characters. In assessing degress of freedom it is important to realize
that each contrast was taken to have expectation zero, which is known
because each contrast could as easily have been computed xi-xj instead
of xj-xi. Thus there is no loss of a degree of freedom for estimation
of a mean. The degrees of freedom is thus the same as the number of
contrasts, namely one less than the number of species (tips). If you
feed these contrasts into a multivariate statistics program make sure
that it knows that each variable has expectation exactly zero.
Other References:
Instructions
for using applying PHYLIP to DNA data, by L. David Roper from the ROPER DNA Project
Back to HAM Country