Algebraic statistics in phylogenetic reconstruction: incorporating invariable sites
Títol de la revista
ISSN de la revista
Títol del volum
Autors
Correu electrònic de l'autor
GMAIL.COM Tutor / director
Tribunal avaluador
Realitzat a/amb
Tipus de document
Data
Condicions d'accés
Llicència
Publicacions relacionades
Datasets relacionats
Projecte CCD
Abstract
In modern phylogenetics, when comparing sequences of DNA the traditional approach assumes that nu- cleotide substitutions follow a Markov model, that sites on a DNA sequence evolve independently and in the same way. However, recent studies have shown that some positions in DNA sequences remain invariant. This thesis aims to investigate cases where certain regions of DNA sequences do not change throughout the evolutionary process. The approach, inspired by the work of Allman and Rhodes in [1], considers a model where a proportion of sites in the DNA sequences cannot vary, while the remaining sites are variable. This model is referred to as the general Markov model plus invariable sites (GM + I ). In [1] they obtain formulae to recover the called invariable parameters involved in the GM + I model. We implement a computational method based on this proposition using Python and evaluate its performance using simulated data. The performance of the method is analyzed in different situations, and potential improvements and variations based on our findings are suggested.



