HAM Surname DNA Project
Research
through Genetics

HAM DNA Group #2
Spectral Reconstruction
(from about 1740 to about 1850)
Phylogenetic Chart
April,
2006
This
is my attempt to reconstruct the ancestral lines in Group01 of the HAM
DNA Project. I call it a "Spectral" reconstruction for lack of a better
term. By that I mean I am reconstructing by adding hypothetical
ancestral lines where no data currently exist for them. The
hypothetical information for the most part is based uponrunning the
current data through the LAMARC
* software, which does
a Hastings-Metropolis Monte Carlo Markov Chain analysis with
a Bayesian analysis. That output was then analized for ancestral
nodes. At the time of this writing, there is no scientific method
accepted by geneticists to reconstruct ancestral nodes for use in
genealogy.
The goal is to provide a reasonable estimate of how and when the
ancestral lines came to be, so that genealogists can better estimate
where to concentrate their research. If the result of DNA
analysis
results in a better determination of dates and time spans, then the
data would deliver more than what is originally expected.
What follows are HAM Surname DNA
Project Phylogenetic charts, generated using the data from the DNA
results for the HAM DNA Project. Unless otherwise indicated, all
charts are based upon TMRCA calculations, which is based upon Genetic
Distance and Mutation Rate to give Time to Most Recent Common Ancestor
(TMRCA).
First, let me provide a brief overview ot the methodology behind this
study.
For this hypothetical reconstruction, three participants in HAM DNA
Project Group #2 were chosen, mainly because they are the only ones
in Group02 with 37 marker results to date.
The marker repeat data was translated into "ATGC" format with FT2DNA,
then converted into LAMARC format, and then run through LAMARC
*
which does a Hastings-Metropolis Monte Carlo Markov Chain analysis with
a Bayesian analysis (final
number of samples was set to 30,000).
Overall, LAMARC found for the three individuals to have a Most Probable
Estimate (MPE) for Theta to be: .0000306
DYS
marker values that were determined to be most significant from LAMARC
Bayesian analysis:
DYS390 with a Theta (MPE) of 0.119850
DYS426 with a Theta
(MPE) of 0.00012 * (not yet
shown to mutate for Group 2)
DYS449 with a Theta (MPE) of 0.128910
DYS464d with a Theta (MPE)
of 0.000109 * (not yet shown to mutate
for Group 2)
GATA-H4 with a Theta (MPE) of 0.056399
Each
of the three participant's data were then modified by +/- one marker
around DYS426 and DYS464d in order to produce hypothetical
results from future DNA tests.
Those two hypothetical markers modifications were recorded and labeled
below as MACK01-MACK18, and then run through Dean McGee's Y-DNA Comparison
Utility.
After conversion to ATGC format and
running through LAMARC, calculations were then performed by use
of Dean McGee's Y-Comparison Utilty,
and the resulting output can be found here.
The output was then converted to graphic format with the KITSCH program
within the PHYLIP
package. The branch length view was produced with the MEGA software. Instructions
for the graphing are given in the HAM Country Tools
area.
Reconstruction of Ancestral nodes was originally inspired by the work
of Charles Kerchner.
HAM DNA Group 2 time based phylogenetic tree:

Only
participants in HAM DNA Group #2 with 37 marker results were included
in this study. Currently (April, 2006) that is three individuals,
represented by kits 41641, 46118, and 48988.
Participant 41641 has ancestor Joseph HAM
from Monroe County, VA. Participants 41641 and 46118
share a common ancestor within the last 325 years (at 95 %
probability), with only one
mis-match within 37 markers. Participant
46118 descends from Levi HAM who was born in South Carolina. The
most recent ancestor (TMRCA) so for participants
46118 and 48988 with a TMRCA of 400 years (at 95 % probability).
New participant 48988 descends from Obed Jones HAM of South Carolina.
From
previous observations of other lines that I have charted, I
would
have thought that 46118 and 41641 should have a common ancestor from
about
1844, since they have branch
lengths (or median) tMRCA value of 162.5 years (2006 - 162 = 1844) on
my normal phylogenetic charts.**
On the chart above, using simulated data, branch lengths still show the
162.5 years, but the path to the TMRCA has picked up a significant
difference. This difference implies that future DNA data may show
that the branch lengths may pick up about an additional 50 some years,
by following paths inferred from the Most Probable Estimates suggested
by the LAMARC output.
That is, the chart above suggests my best estimate to date of how these
three
individuals might share a MRCA on a phylogenetic chart of the DNA
data. Whether or not the DNA evidence will develop into what
is shown above remains to be seen. It will be interesting to find
out whether or not we can obtain more detail from the DNA results than
is currently understood today.
What is obvious in the above graph is that Levi (kit #46118) does not
have a clear path to
the other kits as represented here. His path to TMRCA does not
make a whole lot
of sense, at least back to his grandfather. As shown here, he has three
separate branch lengths of 12.38, 13.77, and 3.58 (years). It is
not clear to me what this short section is trying to suggest, but we
can take this as a clue for reconstructive purposes.
Remembering that LAMARC gave different values for Theta for the MLE
run, as opposed to the Bayesian run, it occurs to me that perhaps
DYS456 plays a part in the reconstruction of the ancestral line for
#46118. For the Maximum Likelihood Estimate (MLE) run from
LAMARC, it gave:
DYS456 with a Theta
(MLE) of 0.000509 (not
yet shown to have mutated within Group02)
That is, it may be likely that the line of 46118 mutated
around DYS456. Thinking that thought through, it appears to me to
be logical that the most likely MACK values above that should be
modifed for DYS456 would be MACK05, MACK06, and MACK14.
Reconstructing the mutation of DYS456 for MACK05, MACK06, and MACK14
for and graphing it out produces the following chart:

I would think it very likely that 48988,
46118, and 41641 would share a common immigrant ancestor. The DNA
evidence tells me that this group has a very interesting area to focus
upon. My reconstruction is not exactly scientifically rigorous,
but it does demonstrate two things, a) more DNA evidence should help
clear up details regarding relationships, and b) better reconstruction
tools should improve our estimates of ancestral nodes.
CONCLUSIONS:
Group 2 needs more DNA participants if they want more
information. Lacking that, I am trying to see if I can figure out
what the DNA tells us anyway. Using the best of the latest
genetic software that I can find so far, my estimate for conclusions
are:
1)
Group 2 needs more DNA participants if they want more
information. Carefully chosen from lines that branched off in the
mid 1700's and at around 1800 would help the DNA efforts.
2)
the DNA tells me that members #41641 (Thomas) and #46118 (Bill) should
connect up in about 1807, plus or minus a generation.
3)
the DNA tells me that those two guys should connect up to #48988 (John
Jeffrey) in about 1742, plus or minus a generation.
And
I probably should say four, I could be terribly wrong. Nobody has
published a scientifically accepted method of reconstructing ancestral
nodes yet. If I am wrong about how to do this, then I
will just have to try to do better next time.
Of
course, what is implied here is that I am interpreting the results from
LAMARC to be estimating that DYS426, DYS456, and DYS464d should show
mutating markers from future DNA sampling for Group02. That
assumption could be wrong, due to the smal size of Group02.
Another explanation would be that LAMARC is delivering erroneous
information because of the lack of data.
What
else could be expected from the reconstruction? Knowing
an approximate date that ancestors might connect, would give
genealogists an idea
of when and perhaps even where they would want to do the normal
genealogy research for that
connection.
My disclaimer would be that these DNA
estimates have nothing to do with good old fashion, solid genealogy
research, or more DNA evidence. I am simply trying to get more
information out of the
DNA information. My thoughts are that there is more information
in those DNA numbers, and I am trying to figure out what that
information might tell us.
Genealogical information about Group 2 can be found here.
The original 37 marker data for Group 2 can be found here.
* LAMARC is not intended for small populations.
** [It
is fairly common to see the tMRCA occur on the midpoint of the 95 %
probability figure, especially when 37 markers are returned. The
observed error is usually plus or minus 1 generation on those
charts. For
participants 46118 and 48988, we
have branch lengths (or median) tMRCA value of 219.5 years and
separately known ancient ancestors in
South Carolina in about 1800.]
Back to HAM Country