HAM DNA ProjectHAM Surname DNA Project

Research through Genetics


Home Contacts GEDCOMS Links Queries Wills & Estates HAM DNA  Project

Instructions for using PHYLIP "CONTRAST" program from FTDNA information

December, 2005
by Dave Hamm, Novi, MI


TOOLS:

     PHYLIP   software to generate phylogenetic tree data.
  

This would be my version of  Felsenstein's original set of instructions that are included with the PHYLIP package. 
I have adapted them for FTDNA data, and stepped through it a bit.

For more detailed instructions, see the documentation that comes with the PHYLIP package, Joseph Felsenstein's documentation for "discrete characters" and for the "contrast" program. (© Copyright 1986-2004 by the University of Washington. Written by Joseph Felsenstein). 

Instructions:

 PHYLIP "CONTRAST" INSTRUCTIONS

  Instructions for using the "CONTRAST" package of PHYLIP by using of FTDNA data.

(for use with FTDNA data conversion)
 
by Dave Hamm     odoniv@earthlink.net
December, 2005
 
CONTRAST examines correlation of traits as they evolve along a given phylogeny.


Contrast is © Copyright 1986-2004 by the University of Washington. Written by Joseph Felsenstein.
For more information, see the "contrast" documentation included within the PHYLIP package.

CONTRAST -- Computes contrasts for comparative method

This program implements the contrasts calculation described in Felsenstein's 1985 paper on the comparative method (Felsenstein, 1985d). It reads in a data set of the standard quantitative characters sort, and also a tree from the treefile. It then forms the contrasts between species that, according to that tree, are statistically independent. This is done for each character. The contrasts are all standardized by branch lengths (actually, square roots of branch lengths).

The method is explained in the 1985 paper. It assumes a Brownian motion model. This model was introduced by Edwards and Cavalli-Sforza (1964; Cavalli-Sforza and Edwards, 1967) as an approximation to the evolution of gene frequencies. Felsenstein has discussed (Felsenstein, 1973b, 1981c, 1985d, 1988b) the difficulties inherent in using it as a model for the evolution of quantitative characters. Chief among these is that the characters do not necessarily evolve independently or at equal rates. This program allows one to evaluate this, if there is independent information on the phylogeny. You can compute the variance of the contrasts for each character, as a measure of the variance accumulating per unit branch length. You can also test covariances of characters.

If you have the "gnumeric" spreadsheet program, it can graph the results or provide a similar functionality.

"Contrast" takes two different types of data, depending upon the analysis.

First, we will run through the continuus character data type.

By default, "contrast" will compute the phenotypes of individual specimens.

REQUIREMENTS:

  1) An input file. The program will accept distance information used to create the "tree" file.
      Use continuous quantitative characters (not gene frequencies).

       for example:      infile_contrast_HAM.txt

      However, the first line of the data needs to indicate:

         a) number of taxa
         b) columns of data after the name

    Then follows the standard PHYLIP format for data:
    The first 10 characters are taken as the name, and thereafter the values of the individual characters are read free-format, preceded and separated by blanks.

  2)  An input tree.  A 'tree" format file, which includes positive branch lengths.
     
       such as:   outtree_contrast_HAM.tre

       (This is what you should get from "kitsch" by default.)

PROCEDURE:


1) run:

   contrast.exe

2) specify input file:

   infile_contrast_HAM.txt

    where the data has the format:
---------------------------------------------------------
9     9
40777_WmVA 0 775 500 1475 6225 6550 5575 6550 5000
PQ4ZU_Arth 775 0 1800 1300 6550 6550 6550 6550 6550
42370_WmNC 500 1800 0 1725 5575 5325 5575 6550 5000
PV4HM_Vale 1475 1300 1725 0 7050 8200 7050 8200 6225
44176_WmVA 6225 6550 5575 7050 0 775 4525 8200 5000
N14608_WmV 6550 6550 5325 8200 775 0 8200 8200 10850
43250_Rich 5575 6550 5575 7050 4525 8200 0 2350 1975
N13303_HAM 6550 6550 6550 8200 8200 8200 2350 0 1300
41641_JoVA 5000 6550 5000 6225 5000 10850 1975 1300 0
---------------------------------------------------------

The first line includes two numbers,

   <number of samples>    <columns of data after the name>

3)  specify the input tree file for contrast:
   - Take option "f" to provide a file name:

   outtree_contrast_HAM.tre

   This could be an output tree file from the "kitsch" program.
   The main requirements for the "tree" format is that:

    a) must contain branch lengths
    b) must have positive branch lengths

   which is what you should get from "kitsch" by default.

example (outtree_contrast_HAM.tre):
---------------------------------------------------------
((N14608_WmV:387.50000,44176_WmVA:387.50000):2741.79286,(((41641_JoVA:650.00000,N13303_HAM:650.00000):415.11408,43250_Rich:1065.11408):1960.37829,(((42370_WmNC:250.00000,40777_WmVA:250.00000):217.64849,PQ4ZU_Arth:467.64849):262.81169,
PV4HM_Vale:730.46018):2295.03218):103.80049);
---------------------------------------------------------
4) specify the output file name:
   - Take option "f" to provide a file name:

   f

   Please enter a new file name>   outfile_contrast_HAM.txt

5)  Specify that Contrasts sould be print output by selecting the "C" option:

   c

   Type "y" to execute:

   y


6) The program should execute.

   You should use wordpad to the output file (if you took Terminal Type of IBM PC). 
   Notepad does not work well for viewing.

  OUTPUT:

The output should provide:

 - IF the "c" option is taken (to print output), you should also get a line called "Contrasts" as in:

Contrasts (columns are different characters)
--------- -------- --- --------- -----------

- The following is what is output by default:

Covariance matrix
---------- ------

Regressions (columns on rows)
----------- -------- -- -----

Correlations
------------

Example Output:


Continuous character contrasts analysis, version 3.64

   9 Populations,    9 Characters

Name                       Phenotypes
----                       ----------

40777_WmVA   0.00000 775.00000 500.00000 1475.00000 6225.00000 6550.00000
           5575.00000 6550.00000 5000.00000
PQ4ZU_Arth 775.00000   0.00000 1800.00000 1300.00000 6550.00000 6550.00000
           6550.00000 6550.00000 6550.00000
42370_WmNC 500.00000 1800.00000   0.00000 1725.00000 5575.00000 5325.00000
           5575.00000 6550.00000 5000.00000
PV4HM_Vale 1475.00000 1300.00000 1725.00000   0.00000 7050.00000 8200.00000
           7050.00000 8200.00000 6225.00000
44176_WmVA 6225.00000 6550.00000 5575.00000 7050.00000   0.00000 775.00000
           4525.00000 8200.00000 5000.00000
N14608_WmV 6550.00000 6550.00000 5325.00000 8200.00000 775.00000   0.00000
           8200.00000 8200.00000 10850.00000
43250_Rich 5575.00000 6550.00000 5575.00000 7050.00000 4525.00000 8200.00000
             0.00000 2350.00000 1975.00000
N13303_HAM 6550.00000 6550.00000 6550.00000 8200.00000 8200.00000 8200.00000
           2350.00000   0.00000 1300.00000
41641_JoVA 5000.00000 6550.00000 5000.00000 6225.00000 5000.00000 10850.00000
           1975.00000 1300.00000   0.00000


Contrasts (columns are different characters)
--------- -------- --- --------- -----------

  11.67434   0.00000  -8.98027  41.30922  27.83882 -27.83882 132.00990   0.00000 210.13820
 -42.98927   0.00000 -42.98927 -54.77664 -88.75203  73.49778 -10.40063  36.05551 -36.05551
   4.70721   0.00000   4.70721   3.82461  48.83735  31.18529  50.89675 -40.01132 -31.18529
  22.36068  45.83939 -22.36068  11.18034 -29.06888 -54.78367   0.00000   0.00000   0.00000
 -18.44324  45.22985 -54.45147  10.53899 -22.83449 -21.51711 -34.25173   0.00000 -54.45147
 -29.06284 -16.13800 -23.74748  42.68580 -25.35800 -58.05337 -30.79298 -47.81052 -16.50346
  68.52555  79.27878  63.38587  88.50629 -10.83208  28.50569 -72.62570 -82.81628 -66.38914
  46.20695  41.17827  29.20949  53.29359 -87.57067 -116.54045  40.01598  61.71943  68.39870

Covariance matrix
---------- ------

 1315.2699  999.3969 1081.6324 1274.7001   11.2178 -738.7986   78.3464 -396.4997  493.9340
  999.3969 1548.5249  390.4195 1188.9403 -802.6044 -635.8298 -645.2690 -406.5636 -580.4011
 1081.6324  390.4195 1356.3435  916.3782  380.7922  -73.0986 -167.1339 -506.1940   82.8429
 1274.7001 1188.9403  916.3782 2181.6597 -134.4274 -1507.7424   30.8699 -1026.1713  878.4253
   11.2178 -802.6044  380.7922 -134.4274 2604.0510  959.7037  741.1462 -1056.1759  489.7839
 -738.7986 -635.8298  -73.0986 -1507.7424  959.7037 3547.2659 -882.6605 -671.9685 -2150.8108
   78.3464 -645.2690 -167.1339   30.8699  741.1462 -882.6605 3640.3032  943.1421 4557.4924
 -396.4997 -406.5636 -506.1940 -1026.1713 -1056.1759 -671.9685  943.1421 1981.8220 1307.0543
  493.9340 -580.4011   82.8429  878.4253  489.7839 -2150.8108 4557.4924 1307.0543 7344.2268

Regressions (columns on rows)
----------- -------- -- -----

    1.0000    0.7598    0.8224    0.9692    0.0085   -0.5617    0.0596   -0.3015    0.3755
    0.6454    1.0000    0.2521    0.7678   -0.5183   -0.4106   -0.4167   -0.2625   -0.3748
    0.7975    0.2878    1.0000    0.6756    0.2807   -0.0539   -0.1232   -0.3732    0.0611
    0.5843    0.5450    0.4200    1.0000   -0.0616   -0.6911    0.0141   -0.4704    0.4026
    0.0043   -0.3082    0.1462   -0.0516    1.0000    0.3685    0.2846   -0.4056    0.1881
   -0.2083   -0.1792   -0.0206   -0.4250    0.2705    1.0000   -0.2488   -0.1894   -0.6063
    0.0215   -0.1773   -0.0459    0.0085    0.2036   -0.2425    1.0000    0.2591    1.2520
   -0.2001   -0.2051   -0.2554   -0.5178   -0.5329   -0.3391    0.4759    1.0000    0.6595
    0.0673   -0.0790    0.0113    0.1196    0.0667   -0.2929    0.6206    0.1780    1.0000

Correlations
------------

    1.0000    0.7003    0.8098    0.7525    0.0061   -0.3420    0.0358   -0.2456    0.1589
    0.7003    1.0000    0.2694    0.6469   -0.3997   -0.2713   -0.2718   -0.2321   -0.1721
    0.8098    0.2694    1.0000    0.5327    0.2026   -0.0333   -0.0752   -0.3087    0.0262
    0.7525    0.6469    0.5327    1.0000   -0.0564   -0.5420    0.0110   -0.4935    0.2195
    0.0061   -0.3997    0.2026   -0.0564    1.0000    0.3158    0.2407   -0.4649    0.1120
   -0.3420   -0.2713   -0.0333   -0.5420    0.3158    1.0000   -0.2456   -0.2534   -0.4214
    0.0358   -0.2718   -0.0752    0.0110    0.2407   -0.2456    1.0000    0.3511    0.8814
   -0.2456   -0.2321   -0.3087   -0.4935   -0.4649   -0.2534    0.3511    1.0000    0.3426
    0.1589   -0.1721    0.0262    0.2195    0.1120   -0.4214    0.8814    0.3426    1.0000
--------------------------------------------------------------------------------------------------------
From Felsenstein's documentation:

The statistics that are printed out include the covariances between all pairs of characters, the regressions of each character on each other (column j is regressed on row i), and the correlations between all pairs of characters. In assessing degress of freedom it is important to realize that each contrast was taken to have expectation zero, which is known because each contrast could as easily have been computed xi-xj instead of xj-xi. Thus there is no loss of a degree of freedom for estimation of a mean. The degrees of freedom is thus the same as the number of contrasts, namely one less than the number of species (tips). If you feed these contrasts into a multivariate statistics program make sure that it knows that each variable has expectation exactly zero.



Other References: 


   Instructions   for using applying PHYLIP to DNA data, by L. David Roper from the ROPER DNA Project


  • Back to HAM Country