EMVC-2: an efficient single-nucleotide variant caller based on expectation maximization

dc.contributor.authorDufort Álvarez, Guillermo
dc.contributor.authorXargay Ferrer, Martí
dc.contributor.authorPagès Zamora, Alba Maria
dc.contributor.authorOchoa Álvarez, Idoia
dc.contributor.groupUniversitat Politècnica de Catalunya. SPCOM - Processament del Senyal i Comunicacions
dc.contributor.otherUniversitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions
dc.date.accessioned2025-03-20T12:58:00Z
dc.date.available2025-03-20T12:58:00Z
dc.date.issued2024-03-04
dc.description.abstractMotivation: Single-nucleotide variants (SNVs) are the most common type of genetic variation in the human genome. Accurate and efficient detection of SNVs from next-generation sequencing (NGS) data is essential for various applications in genomics and personalized medicine. However, SNV calling methods usually suffer from high computational complexity and limited accuracy. In this context, there is a need for new methods that overcome these limitations and provide fast reliable results. Results: We present EMVC-2, a novel method for SNV calling from NGS data. EMVC-2 uses a multi-class ensemble classification approach based on the expectation–maximization algorithm that infers at each locus the most likely genotype from multiple labels provided by different learners. The inferred variants are then validated by a decision tree that filters out unlikely ones. We evaluate EMVC-2 on several publicly available real human NGS data for which the set of SNVs is available, and demonstrate that it outperforms state-of-the-art variant callers in terms of accuracy and speed, on average. Availability and implementation: EMVC-2 is coded in C and Python, and is freely available for download at: https://github.com/guilledufort/EMVC-2. EMVC-2 is also available in Bioconda.
dc.description.peerreviewedPeer Reviewed
dc.description.sponsorshipThis work was partially supported by Universidad de la República; Ramon y Cajal [contract number RYC2019-028578-I]; and Gipuzkoa Fellows [grant numbers 2022-FELL-000003–01, PID2021-126718OA-I00, PID2019-104958RBC41] funded by MCIN/AEI/10.13039/501100011033.
dc.description.versionPostprint (published version)
dc.identifier.citationDufort Álvarez, G. [et al.]. EMVC-2: an efficient single-nucleotide variant caller based on expectation maximization. "Bioinformatics", 4 Març 2024, vol. 40, núm. 3, article btad681.
dc.identifier.doi10.1093/bioinformatics/btad681
dc.identifier.issn1367-4803
dc.identifier.urihttps://hdl.handle.net/2117/426774
dc.language.isoeng
dc.relation.publisherversionhttps://academic.oup.com/bioinformatics/article/40/3/btad681/7420212
dc.rights.accessOpen Access
dc.rights.licensenameAttribution 4.0 International
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/
dc.subjectÀrees temàtiques de la UPC::Informàtica::Aplicacions de la informàtica::Bioinformàtica
dc.subject.otherBiology computing
dc.subject.otherComputational complexity
dc.subject.otherDecision trees
dc.subject.otherGenetics
dc.subject.otherGenomics
dc.subject.otherPattern classification
dc.titleEMVC-2: an efficient single-nucleotide variant caller based on expectation maximization
dc.typeArticle
dspace.entity.typePublication
local.citation.authorDufort Álvarez, G.; Xargay-Ferrer, M.; Pagès-Zamora, A.; Ochoa, I.
local.citation.number3, article btad681
local.citation.publicationNameBioinformatics
local.citation.volume40
local.identifier.drac40841433

Fitxers

Paquet original

Mostrant 1 - 2 de 2
Carregant...
Miniatura
Nom:
2024EMVC2 [extended]
Mida:
530.12 KB
Format:
Adobe Portable Document Format
Descripció:
Carregant...
Miniatura
Nom:
btad681.pdf
Mida:
251.05 KB
Format:
Adobe Portable Document Format
Descripció: