Global25 workshop 4: a neighbor joining tree of ancient and present-day West Eurasian...

Phylogenetic trees are easy to produce, but there’s an infinite number of ways to run them, and, depending on the input data you’re using, some methods are a lot more effective than others. In this tutorial I’m going to demonstrate one method that has worked well for me when looking at the fine scale genetic relationships between ancient and present-day human populations with my Global25 data.
To get started download this datasheet, plug it into the PAST program, which is freely available here, then select all of the columns by clicking on the empty tab above the labels, and choose Multivariate > Clustering > Neighbor joining. Here’s a screen cap of me doing just that…

Then, from the tabs on the right, choose Chord as the similarity index and ZAF_2100BP — an ancient forager from Southern Africa and the most distinct unit in the datasheet — as the root. PAST offers an exceptionally large range of similarity indices and they generally produce similar results, but, in my experience, Chord creates among the most visually pleasing outcomes when dealing with fine scale genetic substructures.

This is the tree you should see after exporting the image via the graph settings tab in PAST, and, if you like, rotating it 90 degrees with an image editing software of your choice. Note the fairly substantial differences between the populations from Northwestern Europe, which are often difficult to tease apart in such analyses.

If you have your own Global25 coordinates you can add them to my PAST-compatible datasheet to see where you cluster in this tree. And, of course, you can design your own PAST-compatible datasheets and trees with any combination of populations and/or individuals from the Global25 text files at the links below. It’s easy; just copy paste the coordinates of your choice into an empty text file, open it with PAST and then save it with the dat extension to create a new PAST datasheet. But make sure never to mix up the scaled and non-scaled coordinates.

Global 25 datasheet (scaled)
Global 25 pop averages (scaled)
Global 25 datasheet
Global 25 pop averages

An important point to keep in mind when running these sorts of analyses is that PAST and other such programs need enough genetic differentiation to latch onto in order to produce meaningful results. Thus, even when studying the relationships between very closely related populations, it’s not just useful to include a root population or individual, but also some near and far related groups to help the analysis algorithm flesh out the key genetic substructures.
To be honest, I don’t really know whether using the Chord index and rooting the tree with an ancient Southern African is the best way to run a neighbor joining tree analysis of ancient and present-day West Eurasian genetic variation. What do you think? Feel free to let me know in the comments.
