Show simple item record

dc.contributor.authorRamon Gurrea, Elies
dc.contributor.authorBelanche Muñoz, Luis Antonio
dc.contributor.authorPérez Enciso, Miguel
dc.contributor.otherUniversitat Politècnica de Catalunya. Departament de Ciències de la Computació
dc.identifier.citationRamón, E.; Belanche, L.; Pérez, M. HIV drug resistance prediction with weighted categorical kernel functions. "BMC bioinformatics", 30 Juliol 2019, vol. 20, article 410, p. 1-13.
dc.description.abstractBackground: Antiretroviral drugs are a very effective therapy against HIV infection. However, the high mutation rate of HIV permits the emergence of variants that can be resistant to the drug treatment. Predicting drug resistance to previously unobserved variants is therefore very important for an optimum medical treatment. In this paper, we propose the use of weighted categorical kernel functions to predict drug resistance from virus sequence data. These kernel functions are very simple to implement and are able to take into account HIV data particularities, such as allele mixtures, and to weigh the different importance of each protein residue, as it is known that not all positions contribute equally to the resistance. Results: We analyzed 21 drugs of four classes: protease inhibitors (PI), integrase inhibitors (INI), nucleoside reverse transcriptase inhibitors (NRTI) and non-nucleoside reverse transcriptase inhibitors (NNRTI). We compared two categorical kernel functions, Overlap and Jaccard, against two well-known noncategorical kernel functions (Linear and RBF) and Random Forest (RF). Weighted versions of these kernels were also considered, where the weights were obtained from the RF decrease in node impurity. The Jaccard kernel was the best method, either in its weighted or unweighted form, for 20 out of the 21 drugs. Conclusions: Results show that kernels that take into account both the categorical nature of the data and the presence of mixtures consistently result in the best prediction model. The advantage of including weights depended on the protein targeted by the drug. In the case of reverse transcriptase, weights based in the relative importance of each position clearly increased the prediction performance, while the improvement in the protease was much smaller. This seems to be related to the distribution of weights, as measured by the Gini index. All methods described, together with documentation and examples, are freely available at
dc.format.extent13 p.
dc.rightsAttribution 3.0 Spain
dc.subjectÀrees temàtiques de la UPC::Informàtica::Intel·ligència artificial::Aprenentatge automàtic
dc.subject.lcshHIV infections
dc.subject.lcshMachine learning
dc.subject.lcshDrug resistance
dc.subject.lcshAntiretroviral agents
dc.subject.otherDrug resistance prediction
dc.subject.otherCategorical kernel
dc.subject.otherWeighted kernel
dc.subject.otherSupport vector machine
dc.subject.otherRandom Forest
dc.subject.otherKernel PCA
dc.titleHIV drug resistance prediction with weighted categorical kernel functions
dc.subject.lemacInfeccions per VIH
dc.subject.lemacAprenentatge automàtic
dc.subject.lemacResistència als medicaments
dc.contributor.groupUniversitat Politècnica de Catalunya. SOCO - Soft Computing
dc.description.peerreviewedPeer Reviewed
dc.rights.accessOpen Access
dc.description.versionPostprint (published version)
local.citation.authorRamón, E.; Belanche, Ll.; Pérez, M.
local.citation.publicationNameBMC bioinformatics
local.citation.numberarticle 410

Files in this item


This item appears in the following Collection(s)

Show simple item record

Attribution 3.0 Spain
Except where otherwise noted, content on this work is licensed under a Creative Commons license : Attribution 3.0 Spain