Multiple Hybridization Events Punctuate the Evolutionary Trajectory of Malassezia furfur

ABSTRACT Malassezia species are important fungal skin commensals and are part of the normal microbiota of humans and other animals. However, under certain circumstances these fungi can also display a pathogenic behavior. For example, Malassezia furfur is a common commensal of human skin and yet is often responsible for skin disorders but also systemic infections. Comparative genomics analysis of M. furfur revealed that some isolates have a hybrid origin, similar to several other recently described hybrid fungal pathogens. Because hybrid species exhibit genomic plasticity that can impact phenotypes, we sought to elucidate the genomic evolution and phenotypic characteristics of M. furfur hybrids in comparison to their parental lineages. To this end, we performed a comparative genomics analysis between hybrid strains and their presumptive parental lineages and assessed phenotypic characteristics. Our results provide evidence that at least two distinct hybridization events occurred between the same parental lineages and that the parental strains may have originally been hybrids themselves. Analysis of the mating-type locus reveals that M. furfur has a pseudobipolar mating system and provides evidence that after sexual liaisons of mating compatible cells, hybridization involved cell-cell fusion leading to a diploid/aneuploid state. This study provides new insights into the evolutionary trajectory of M. furfur and contributes with valuable genomic resources for future pathogenicity studies.

ABSTRACT Malassezia species are important fungal skin commensals and are part of the normal microbiota of humans and other animals. However, under certain circumstances these fungi can also display a pathogenic behavior. For example, Malassezia furfur is a common commensal of human skin and yet is often responsible for skin disorders but also systemic infections. Comparative genomics analysis of M. furfur revealed that some isolates have a hybrid origin, similar to several other recently described hybrid fungal pathogens. Because hybrid species exhibit genomic plasticity that can impact phenotypes, we sought to elucidate the genomic evolution and phenotypic characteristics of M. furfur hybrids in comparison to their parental lineages. To this end, we performed a comparative genomics analysis between hybrid strains and their presumptive parental lineages and assessed phenotypic characteristics. Our results provide evidence that at least two distinct hybridization events occurred between the same parental lineages and that the parental strains may have originally been hybrids themselves. Analysis of the mating-type locus reveals that M. furfur has a pseudobipolar mating system and provides evidence that after sexual liaisons of mating compatible cells, hybridization involved cell-cell fusion leading to a diploid/aneuploid state. This study provides new insights into the evolutionary trajectory of M. furfur and contributes with valuable genomic resources for future pathogenicity studies. IMPORTANCE Malassezia furfur is a common commensal member of human/animal microbiota that is also associated with several pathogenic states. Recent studies report involvement of Malassezia species in Crohn's disease, a type of inflammatory bowel disease, pancreatic cancer progression, and exacerbation of cystic fibrosis. A recent genomics analysis of M. furfur revealed the existence of hybrid isolates and identified their putative parental lineages. In this study, we explored the genomic and phenotypic features of these hybrids in comparison to their putative parental lineages. Our results revealed the existence of a pseudobipolar mating system in this species and showed evidence for the occurrence of multiple hybridization events in the evolutionary trajectory of M. furfur. These findings significantly advance our understanding of the evolution of this commensal microbe and are relevant for future studies exploring the role of hybridization in the adaptation to new niches or environments, including the emergence of pathogenicity.
suggested the existence of M. furfur hybrid strains (16,31). In this study, we further explored the genomic patterns of these hybrids and of 17 additional strains presently classified as M. furfur (Table 1) to assess how widespread hybridization is within this species. Following the original observations of a hybrid genotype (31), we performed AFLP with two different adaptor and primer set combinations for 22 M. furfur strains, each combination representing different polymorphisms in restriction sites broadly spread over the genomes, giving an indication of genetic relatedness and shared or discriminating fragment sizes (see Materials and Methods for more details). These AFLP analyses confirmed that the banding patterns of three previously identified hybrids (CBS1878, CBS4172, and CBS7019) are indeed a combination of those of the proposed parental lineages (Fig. 1) (16). Nevertheless, the hybrid strain CBS7019 did not cluster with the other two, suggesting the existence of two putative hybrid clades (Fig. 1). In total, the AFLP clustering analysis identified four lineages: P1 (parental 1), which corresponds to the parental CBS7982 lineage with five additional strains; P2, which corresponds to the parental CBS14141 lineage and four additional strains; H1 (hybrid 1), which corresponds to CBS1878, CBS4172, and five additional hybrid strains; and finally, H2, which harbors CBS7019 and three additional hybrid strains (Fig. 1).
Genome analyses confirm two independent hybridization events and their parental lineages. To confirm the existence of two independent hybridization events, 13 strains (Table 1) comprising representative isolates of all four AFLP-determined lineages were further compared in detail at the genomic level. To this end, we performed the genome assembly of the representative isolate of each of the parental lineages to serve as references for comparison with hybrid genome sequences (CBS9595 representing P1: 8.2 Mb, 8 scaffolds; and CBS14141 representing P2: 8.3 Mb, 9 scaffolds; see Table 2 and Materials and Methods for more details [see also Text S1]). The sequence alignment of these genome assemblies has shown that P1 and P2 lineages have an overall sequence similarity of 89%. Therefore, a read mapping approach where all the reads are aligned on a single haplotype would not be possible, and hybrid sequencing reads were mapped to a combined reference, including the P1 and the P2 (CBS14141: 8.3 Mb, 9 scaffolds) genome assemblies (see Materials and Methods for more details) to identify the source and sequence of the different subgenomes of the hybrids ( Fig. 2A).
AFLP-determined H1 and H2 lineages on average presented six single nucleotide polymorphisms (SNPs)/kb (all homozygous given the alignment to a combined reference; see also Table S1), with the majority of them (.85%) corresponding to P2 for all hybrid strains, suggesting that CBS9595 is possibly closer to the actual parent 1 than CBS14141 is to the actual parent 2. Noteworthy, these SNPs were not homogeneously distributed along the genome but rather formed blocks of variants (see Fig. S1a and Materials and Methods for more details on block definition). Therefore, based on the assumption that strains originating from the same hybridization event would share the same blocks of variants, we utilized these as signatures of the hybrids' evolutionary past and compared them between the different hybrids to confirm the number of hybridization events. Jaccard metrics revealed a high block overlap between the strains identified by AFLP as H1 (.84.6%) and the strains identified by AFLP as H2 (.91.8%), confirming the results of the AFLP analysis. It is important to highlight that among the H1 strains, CBS4172 presented the lowest similarity with its peers. While this strain revealed 84.6 and 84.9% block overlap with CBS1878 and CBS9365, respectively, the latter two strains had an overlap of 99%. The block overlap in strains of hybrid lineage H2 was more homogeneous. The overlap of the high polymorphic regions between strains of H1 and H2 varied between 20.8 and 23.1%, suggesting independent origins. Together, these results confirm that the hybrid lineages resulted from two independent hybridization events between P1 and P2. These findings suggest an apparent propensity of P1 and P2 to hybridize and for their hybrids to survive.
Genomic divergence uncovers the hybrid origin of P1 and P2. Considering the possible ancestry of the highly polymorphic blocks in the two hybrid lineages, we sought to determine their presence in the parental lineages. To this end, the same  Hybridization in Malassezia furfur mBio methodology was followed as for the hybrids (see Materials and Methods), and we verified that, indeed, the strains of both P1 and P2 that were not used for genome assembly also harbor highly polymorphic genomic regions. A comparison between these blocks and the ones observed in the hybrid strains showed that the highest overlaps between H1 strains and P1 and P2 were 14.6 and 21%, respectively. Similarly, The horizontal scale represents the similarity percentage. Pink shading highlights restriction fragments shared between parental 1 lineage and hybrids; blue shading highlights shared restriction fragments between parental lineage 2 and hybrids. Both versions resulted from using different primer/adaptor pairs, reflecting different polymorphisms in the genomic DNA, thus resulting in some clustering variation for some strains. CBS6093 belongs to the H2 lineage based on dendrogram B, but clusters outside any of the other lineages in dendrogram A, suggesting genomic deviation from other H2 strains, a finding also supported by mating type, b-glucosidase activity, and MALDI-TOF data. CBS7982 clusters together with other P1 strains as expected in dendrogram A, but clusters close to H2 strains in dendrogram B. Interestingly, CBS7982 was found to contain a mitochondrial sequence different from other P1 strains (CBS9595, CD866, and CBS9574; see Fig. S1B).
comparing the regions of high variability in the hybrid lineage H2 with what was detected in P1 and P2 revealed that the highest overlaps were 64 and 13%, respectively. This is in line with our previous observations in the hybrid genomes and suggests that none of the strains sequenced for the P1 or P2 lineages represents the direct parental strain of the hybrids, but rather their close relatives. The origin of these blocks of high genomic variability was unknown but, as reported by Mixão and Gabaldón (37) for C. albicans, two possible models could explain their existence: (i) continuous admixture between different strains, and (ii) a hybrid ancestor that experienced genomic recombination. A key aspect to distinguish these models is the divergence between the reference genome and the observed sequence, which we expect to vary between the different blocks in the first model and to be similar among them in the second one (37). Therefore, we next assessed the level of divergence between these blocks and the reference genome in both lineages. For P2 strains, we identified a clear single peak of sequence divergence in both CBS8735 and PM315 (Fig. 2B), suggesting that the genomic material of all the blocks was acquired at a single time point, which in turns supports the existence of a hybrid ancestor. Our estimations point to a current haplotype divergence between the strains of parental lineage P2 of 4 to 4.5% (see Table S1). To determine whether any sequenced strain could be the alternative parent of P2 lineage, we performed a BLASTn search in the NCBI genome database. The only hit obtained was the CBS14141 (the same strain we use as the P2 reference) genome with 95.2% sequence similarity, a value consistent with our estimations. This means that the alternative parent of the P2 lineage has not been sequenced thus far.
For the P1 lineage, this analysis was complicated by the limited number of polymorphic blocks (ranging from a minimum of 78 to a maximum of 463 blocks), and their short size (average, 245 bp), because a variation in a single polymorphism can have a large impact on the estimation of the sequence divergence. Indeed, despite the observed overlap between the peaks of sequence divergence in the different P1 strains, we also detected multiple sequence divergence peaks (Fig. 2C). Therefore, we could not exclude any of the above-mentioned models (i.e., a continuous admixture between different strains or a hybrid ancestor that experienced genomic recombination) and properly estimate the sequence divergence. Even so, we performed a BLASTn search in the NCBI genome database using the longest blocks (.500 bp), and we verified that their best hits always presented a sequence similarity of ;93% with the CBS14141 genome (P2 lineage), suggesting that P1 also has a hybrid origin.
Mitochondrial genome analyses provide further proof for hybrid origin of parental lineages; all hybrids inherited mitochondria from P2. We next focused on comparing the mitochondrial genome sequences of the different M. furfur lineages. Genomic reads of each parental strain were mapped against mitochondrial genomes of the respective strains. As reference mitochondria for the P2 lineages, we selected the CBS14141 mitochondrial sequence available in NCBI (accession number KY911086 .1). As expected, all P2 strains shared the same mitochondrial genome. In the P1 lineage, the publicly available mitochondrial genome of CBS7982 (NCBI accession number KY911085.1) was used as a reference for mapping sequencing reads from P1 strains.
Mapping results revealed the presence of two different mitochondrial sequences in the P1 lineage with the mitochondrial sequence of CBS7982 differing from that of Hybrid genomes were sequenced originating sequencing reads from P1 (blue rectangles) and P2 (pink rectangles). These reads were simultaneously aligned to the combined reference of P1 and P2. Light blue and light pink correspond to the alleles present in this reference. Dark blue and dark pink correspond to alleles which are aligned in P1 or P2, respectively, but present lower sequence identity forming blocks of genomic variability. Differences in the patterns of genomic variability were used to determine the different hybrid lineages. Estimated sequence divergence between the two alleles (i.e., between dark blue and light blue or between dark pink and light pink) in terms of SNPs/bp in the blocks of genomic variability were used to determine the origin of such blocks: hybridization or admixture between different strains. (B) Sequence divergence in the blocks of genomic variability of P2 lineages show a single density peak, suggesting a hybrid origin. (C) Sequence divergence in the blocks of genomic variability of P1 lineages show multiple density peaks, with a single peak shared by all strains, not allowing the exclusion of any of the above-mentioned scenarios.
CBS9595, CD866, and CBS9574 (see Fig. S1b). To compare both P1 mitochondrial types, a draft mitochondrial genome assembly with 49 kb was generated for CBS9595 (see Materials and Methods). A BLASTn search with this mitochondrial genome against the NCBI nr database revealed that its sequence is equally distant to that of the mitochondrial genomes of CBS7982 (P1 lineage) and CBS14141 (P2 lineage), with a similarity of 93.45%. These results are in line with our observations for the polymorphic regions of the nuclear genome, further providing support that P1 results from the cross of two diverged lineages, i.e., hybridization, and CBS7982 harbors the mitochondrial genome of the alternative parent. A recent study assessing three mitochondrial genomic loci of 43 M. furfur strains found two rather divergent mitochondrial clades for samples belonging to the P1 lineage (45), supporting our results and suggesting a wider presence of two different mitochondria among P1 strains. When reads of the strains belonging to both hybrid lineages were mapped against the mitochondrial reference genomes, it was observed that all hybrid strains inherited the mitochondria from the P2 lineage. Together, these analyses suggest that both parental lineages are not genetically "pure" but are genomic mosaics likely resulting from a hybridization event between two unknown lineages that are approximately 7% divergent in their mitochondrial genome in the case of P1 (divergence in the nuclear genome could not be confidently estimated) and 4% divergent in their nuclear genome in the case of P2 (Fig. 3). The observed small differences in the genetic mosaics detected in the nuclear genomes in each of the parental lineages suggest that strains in each lineage (P1 and P2) diverged before some of the recombination events occurred, a scenario which implies a Hybridization in Malassezia furfur mBio nonhaploid state of the strains at the time of their respective divergence. Nevertheless, all analyses performed in this study support a haploid state for all of them (see Fig. S2 and Fig. S3A), indicating independent ploidy reduction. M. furfur hybrid lineages are mostly diploid with some LOH. Karyotype data from this study and previous publications (46, 47) demonstrate significant chromosomal variation in M. furfur with hybrid strains containing additional chromosomal bands compared to strains from their hypothesized haploid parental lineages (see Fig. S2a). Moreover, fluorescence-activated cell sorting (FACS) ploidy analysis for H1 strains displayed a DNA content between 1n and 2n, suggesting aneuploidy, whereas the DNA-content of both analyzed H2 strains is consistent with assignment as diploids (see Fig. S2b). Based on whole-genome sequencing data analysis, the analyzed strains for both hybrid lineages generally seem of diploid nature (see Fig. S4), with a few exceptions, such as a putative triplication of a chromosome from P2 in the hybrid strain CBS9365 (H1; see Fig. S4).
Considering the nonhaploid state of these hybrid strains, we decided to look for LOH events, an important feature to restore genomic stability through the erasure of one of the haplotypes of the recombining region (34). In the case of the M. furfur hybrids analyzed in this study, we estimated that LOH covers approximately 20% of the genome (see Table S1). This is a low value compared to other fungal hybrids, such as Candida hybrids, where LOH has been estimated to cover at least 50% of the genome (37)(38)(39)(40)44). Considering that sequence divergence between the parental lineages is higher in M. furfur (;10%) than in Candida hybrids (;4%) (37)(38)(39)(40)44), we hypothesize that the lower occurrence of LOH in M. furfur may be related to a lower number of potential recombining sites. Indeed, a recent study on C. neoformans Â C. gattii hybrids, which present 7% sequence divergence, has shown that they experience fewer recombination events (41).
The direction of the LOH event, i.e., which allele is retained, varies from hybrid to hybrid, or even from niche to niche according to the most advantageous phenotypes (48). While in some hybrids there seems to be a tendency to retain the allele of a given parental lineage, in others this appears to be a random process (34,40,44,49,50). In M. furfur H1 and H2 hybrid lineages we found that .60% of the genome covered by LOH corresponds to the allele of P2, except for the strains CBS6001 and CBS7019, where this value is 45% (see Table S1). Although this result may suggest a slight tendency to retain the allele of P2 parental, we consider it to be not sufficiently clear or strong, as in other hybrids where .80% of the genome retains the allele of the same parent (49,50). Indeed, if instead of focusing on the percentage of the genome covered by LOH, we analyze the number of LOH events that favored each of the parental alleles, we see that in three out of the six hybrid strains ;56% of the events tended to P2, while in the other three this number is reduced to 43%, suggesting high stochasticity in the process.
M. furfur possesses the genetic machinery of a pseudobipolar mating system. To understand the origin of the hybridization events leading to the H1 and H2 lineages, analysis of the mating-type genes of M. furfur was carried out (see Text S1B for a detailed description of the results). The MAT genes of M. furfur haploid strains were identified based on similarity searches with the MAT genes of two opposite matingtypes of M. sympodialis (strain ATCC 42132, MAT a1b1; strain ATCC 44340, MAT a2b2). Two MAT a and two MAT b loci were identified in the haploid strains of M. furfur. The MAT a and b loci of the M. furfur haploid strains analyzed are ;590 kb apart (see Table S2A), suggesting a pseudobipolar configuration. A representative comparison of the MAT loci of CBS14139 (MAT a1b1) and CBS7982 (MAT a2b2), as model strains, is shown in Fig. 4. The M. furfur MAT structure reflects that of the MAT locus of M. yamatoensis (16,20) and differs from the M. globosa and M. sympodialis MAT loci, for which MAT a and MAT b are ;167 and ;140 kb apart, respectively (20,22). This is consistent with a whole-genome based clustering that groups M. furfur and M. yamatoensis in a separate phylogenetic cluster within the Malassezia genus (16). In all M. furfur haploid strains analyzed, the two MAT a locus genes are divergently oriented, corroborating previous findings for other Malassezia species (16,20,22). Moreover, a comparison between the MAT a1 and a2 loci revealed that the Mfa and Pra genes have an opposite arrangement albeit being located in highly syntenic flanking regions (Fig. 4). In Basidiomycetes, the tetrapolar mating system is thought to be ancestral, and transition from a tetrapolar to bipolar system may be linked to the evolution of pathogenicity (51,52). For example, in the genus Cryptococcus, pathogenic species have a bipolar mating system, whereas closely related nonpathogenic species such as Cryptococcus amylolentus have a tetrapolar mating system (53). Based on findings for the red yeast Sporobolomyces salmonicolor (cited as Sporidiobolus salmonicolor), the pseudobipolar mating system was proposed to be a gradual stage in the transition from a tetrapolar to bipolar system (52).
Hybrid genome searches with the MAT genes of haploid M. furfur identified two MAT a loci and two additional MAT b loci in the M. furfur hybrid strains CBS1878 (H1) and CBS7019 (H2). The sexual identity of the b loci among strains from all lineages was assigned following sequence comparison and phylogenetic analysis of the predicted proteins, resulting in MAT b1, b2, b3, and b4 alleles, with MAT b3 and MAT b4 being present only in the hybrids, and closely related to MAT b1 and MAT b2, respectively (Fig. 5C). In particular, based on genome data (both genome assembly and read mapping data), strain CBS1878 was designated as MAT a2a2b1b4, and strain CBS7019 was designated as MAT a1a2b3b4. Interestingly, CBS1878 did not contain MAT a1, but the a2 locus was present in two copies. A closer inspection of the read alignment on the MAT locus with IGV (54) revealed that part of the Illumina paired-end reads flanking this duplication in a2 had their mate aligning in the edges of the region corresponding to the a1 allele (region present in the reference P1, but without read coverage in H1 strains), suggesting the occurrence of a LOH from an ancestral a1a2 state to a derived a2a2 configuration. Based on read mapping results, a similar occurrence seems to have happened for H1 strains CBS 4172 and CBS9365 (see Fig. S1c). A molecular assay for M. furfur mating type identifies mating-compatible strains in the parental lineages. Following these initial findings for the M. furfur mating-type regions, a PCR assay was developed with primers that specifically amplify MAT a1, MAT a2, MAT b1-b3, and MAT b2-b4. PCR results for the MAT a1 and a2, and sequencing analysis of MAT b1-b3 and b2-b4, indicated additional mating-type configurations ( Fig. 5A to C; see also Table S2B). Both P1 and P2 parental lineages contain strains with either MAT a1 or MAT a2, while all P1 strains are of the MAT b2 type and all P2 strains are of the MAT b1 type (Fig. 5A to C; see also Table S2B). The presence of a1b1, a2b1, a1b2 and a2b2 MAT combinations supports the hypothesis that recombination occurs within this region, corroborating findings on the pseudo-bipolar MAT structure of M. sympodialis and in contrast with the lack of recombination reported for Ustilago hordei, which is bipolar (20). Our findings also highlight incompatibility at the B loci within each parental lineage, despite compatibility being present at the MAT a locus, which might explain why sexual reproduction could not be observed under laboratory conditions. Attempts to cross compatible MAT a and MAT b strains belonging to the P1 and P2 parental lineages were also carried out, but also in this case sexual reproduction could not be observed. It seems likely that two individual mating events between representatives of both parental lineages originally resulted in hybrid strains that contained the combination MAT a1a2b1b2, yet our data showed that various alterations of the mating loci occurred in the hybrids. All H1-strains lost the MAT a1 copy and may have duplicated the MAT a2 allele (only confirmed for CBS1878, CBS4172, and CBS9365). Strains belonging to the H2 hybrid lineage still have both parental MAT a copies, with the exception of CBS6093, that seems to have lost the parental MAT a2 copy. In addition, two unique MAT b arrangements were observed in the hybrids: MAT b4 in both hybrid lineages, with similarity to MAT b2 of the P1 strains; and MAT b3 which is only present in H2 strains, and is a phylogenetic sister of MAT b1 of the P2 strains ( Fig. 5; see also Table S2B). Strains of hybrid lineage H1 retained the MAT b1 copy. Considering that recombination in the MAT locus has previously been observed in other hybrid lineages (40,55,56), and is associated with a possible restoration of hybrid fertility (55, 56), we hypothesize that strains in lineages H1 and H2 may have undergone genomic changes leading to reestablishment of a viable sexual state.
Targeted sequencing of five nuclear loci and genomic data suggest P1 and P2 lineages may be two separate species. The species M. furfur is represented by two neotype cultures, namely, CBS1878 and CBS7019, corresponding to the respective names Malassezia furfur and Pityrosporum ovale (synonym of M. furfur), but these belong to the hybrid lineages H1 and H2, respectively. Many species are described based on a limited number of nuclear DNA loci, ribosomal loci ITS and LSU being two of the most frequently used taxonomic markers in fungi, but these loci may not reflect the genetic heterogeneity of a yeast strain sufficiently as is also shown in this study for the ITS of ribosomal DNA and three protein coding genes (b-tubulin, chitin synthase [CHS2], and translation elongation factor 1-a [EF1-a]) (see Fig. S3A1 to 5). Phylogenetic analysis of published sequencing data for the intergenic transcribed spacer (IGS) of the rDNA resulted in separate clusters for each of the studied lineages, making it a potential diagnostic tool for identifying hybrids belonging to H1 or H2 among genetically uncharacterized M. furfur isolates (see Fig. S3A2). As chromatograms of hybrid strains for the protein coding genes possessed multiple sites with two different nucleotide peaks representing both parental backgrounds, they were phased into two sequences representing their respective parental copies (see Fig. S3A3 to 5). Based on ITS data, both hybrid lineages H1 and H2 cannot be distinguished from the parental lineage P2, and yet for the most part they contain genetic material from both P1 and P2. Type strains or neotype strains are reference strains, often used as representatives for a species in specific functional assays, but as in this case they represent hybrid lineages with a combined genomic content of two haploid parental lineages, they may not be the best choice to serve as reference strains and should probably be disqualified to serve as neotypes. At present, all strains considered in this study are classified as M. furfur. However, based on sequence divergence between the P1 and P2 lineages for the five analyzed nuclear genomic loci it is likely that P1 and P2 may represent two closely related species. For example, ITS sequence similarity between P1 strain CBS9595 and the P2 strain CBS14141 is 98%, but their similarity for protein coding genes is much lower: 97.3% for EF1-a, 96.4% for b-tubulin, and 87.6% for CHS2, respectively. According to a study that assessed more than 9,000 yeast isolates to establish species and genus thresholds for ribosomal ITS and LSU regions, a species threshold of 98.41% was presented for ITS (57), which supports the assignment of these lineages into two species. Furthermore, as above-mentioned, based on whole-genome sequence alignments for CBS9595 (P1) and CBS14141 (P2), an overall sequence similarity of 89% was estimated, further supporting that these strains represent two species.
Hybrid lineages H1 and H2 show different morphologies but limited differentiation based on traditional physiological properties. At the phenotypic level, cells belonging to hybrid lineages differ in size and shape from parental lineage cells ( Fig. 6; see also Table S3), with hybrid cells being thinner and more elongated. However, growth profiling experiments traditionally used for Malassezia species identification, seem not to differ significantly between hybrid and parental lineages (see Table S4). One noteworthy observation is that all P2 strains were positive for b-glucosidase activity and strains from the other lineages were all negative, except for one P1 strain (CBS9589) and for one H2-strain, CBS6093. The latter strain also seems somewhat different from other H2 strains based on the mating type, AFLP pattern, and matrixassisted laser desorption ionization-time of flight (MALDI-TOF) mass spectrum (see Fig. S3B) and would be an interesting candidate for further genomic exploration. Variable results for Cremophor EL utilization and b-glucosidase activity have been described previously for M. furfur isolates (12,15,(23)(24)(25)(26) and may be an expression of the heterogeneity of the species. Analysis of mass spectra generated with a Bruker MALDI Biotyper also illustrated the differences between P1 and P2 lineages and formed separate dendrogram clusters for H1, H2, P1, and P2, with a few exceptions (see Fig. S3B). The putative H2 strain CBS6093 clustered basal to clusters for P2 and H2, supporting above-mentioned deviating findings for that strain. In addition, the MALDI-TOF mass spectrum of H1 strain CBS4172 clusters with the P2 strains. This divergent character is in line with a deviating PFGE chromosomal banding pattern and this strain has lost MAT a1, and only has one copy of MAT a2. These findings need to be repeated but confirm the current high heterogeneity present in the species M. furfur that might be better interpreted as a species complex.
Based on AFLP, mating type analysis, and Sanger sequencing data, the H1 hybrid  Tables S3 and  S4 for cell size measurements and traditional physiological data, respectively. Scale bar, 5 mm. Note that the visible horizontal line in the SEM photo for CBS7019 is the result of a technical artifact, possibly due to thickness variation in the gold coating layer.
Hybridization in Malassezia furfur mBio lineage consists of seven strains, five originating from animal skin and two from diseased human skin. Hybrid lineage H2 consists of four strains, all isolated from human diseased skin, with the exception of CBS6093 for which the origin is unknown. Of note, as has also previously been mentioned (30,58), deep-seated isolates (e.g., blood, urine, and rectal swabs) seem to almost exclusively belong to the genotype representing the P2 lineage. Such an observation could, at first sight, be at odds with the previously proposed hypothesis that hybridization plays a role in the emergence of pathogenicity (34). However, it is important to note that such a hypothesis does not imply that hybridization is the only mechanism leading to pathogenicity; in the particular case of M. furfur P2 lineage, our results show that it was likely to result from a hybridization event as well. Leong and colleagues explored antifungal susceptibility patterns of 26 M. furfur strains, including some strains also considered in this study (59). Four hybrid strains (H1, CBS1878 and CBS7019; H2, CBS6000 and CBS600) showed reduced susceptibility to certain azoles. This feature was not exclusive for strains from hybrid lineages H1 and H2 but rather seemed linked to disease state backgrounds of strains, and reduced azole susceptibility also included disease isolates belonging to parental lineage P2 (59). Whether hybridization may have facilitated genetic changes driving this reduced azole susceptibility in any of the lineages H1, H2, or P2, or whether this was the result from mere exposure to these drugs, remains to be elucidated. Considering that five of the seven H1 strains are derived from animals, we hypothesize that the hybridization event for H1 may have facilitated a host-shift event between humans and animals. This hypothesis should be tested in future analyses, including a larger sampling of strains. In summary, this study identified two individual hybrid lineages H1 and H2, with P1 and P2 representing their parental lineages, although not the exact parental strains. We propose that the diploid hybrid lineages H1 and H2 are the result of two separate mating events between mating-compatible strains from the P1 and P2 groups. Interestingly, our genome analysis shows that both haploid parental lineages were themselves the result of prior hybridization events but became haploid before some of their members mated again to form hybrid lineages H1 and H2. The various hybridization events and subsequent further evolutionary changes have contributed to additional genetic diversification in M. furfur; but also, at the phenotypic level, the hybrid strains differ from their parental lineages. The implications of possible hybridizationdriven changes in pathogenicity, or adaptation to new environments, will drive further analysis and in-depth examination comparing pathogenicity factors, lipid production and utilization, and salient physiological and phenotypic facets.

MATERIALS AND METHODS
DNA extraction. To extract DNA for whole-genome sequencing, cells were grown on modified Dixon agar (60) for 48 h at 30°C and harvested into 50-mL tubes. Yeast cells were lysed, using the Qiagen genomic DNA purification procedure for yeast samples (Qiagen, Hilden, Germany), with minor modifications. Lyticase incubation was performed for 2 h at 30°C, and RNase/proteinase incubation was performed for 2 h at 55°C. Genomic DNA was purified using Genomic-tip 100/G prep columns, according to the manufacturer's handbook. For AFLP, MLST, and the mating type assay, gDNA extraction was performed according to a CTAB (cetyltrimethylammonium bromide) method described previously (61) with the modification that DNA was purified in two steps: first, with phenol-chloroform, and second, with chloroform only.
Whole-genome sequencing. Thirteen samples were selected for whole-genome sequencing data analysis (Table 1). For three of them (CBS7982, CBS14141, and CBS4172), we retrieved Illumina data from SRA (Table 1) (16). For the remaining ones, whole-genome sequencing was performed at the Genomics Unit from the Centre for Genomic Regulation (group 1: CBS8735, PM315, CBS1878, and CBS7019) and at the Genome Institute Singapore (group 2: CBS9595, CD866, CBS9574, CBS6000, and CBS6001). Except when specified, the protocol was similar in both groups of strains. Libraries were prepared using an NEBNext Ultra DNA Library Prep kit for Illumina (New England BioLabs, USA) according to the manufacturer's instructions. All reagents subsequently mentioned are from the NEBNext Ultra DNA Library Prep kit for Illumina, if not specified otherwise. First, 1 mg of gDNA was fragmented by ultrasonic acoustic energy in Covaris to sizes of ;600 bp in group 1 and ;300-400 bp in group 2. After shearing, the ends of the DNA fragments were blunted with the End Prep Enzyme Mix, and then NEBNext Adaptors for Illumina were ligated using a Blunt/ TA Ligase Master Mix. The adaptor-ligated DNA was cleaned up using a MinElute PCR purification kit (Qiagen, Germany) and a further size selection step was performed using an agarose gel. Size-selected DNA was then purified using the Qiagen gel extraction kit with MinElute columns (Qiagen), and library amplification was performed by PCR with the NEBNext Q5 Hot Start 2Â PCR Master Mix and index primers (12 to 15 cycles in group 1, 6 cycles in group 2). A purification step was done using AMPure XP Beads (Agencourt, USA). The final library was analyzed using Agilent DNA 1000 chip (Agilent) to estimate the quantity and check size distribution, and it was then quantified by qPCR using the KAPA library quantification kit (Kapa Biosystems, USA) prior to amplification with Illumina's cBot. Libraries were loaded and sequenced 2 Â 125 on Illumina's HiSeq2500 for group 1 and 2 Â 101 on Illumina's HiSeq2000 for group 2. Base-calling was performed using Illumina pipeline software. Deconvolution was performed using CASAVA software (Illumina, USA).
Four samples (CBS7982, CBS9595, CBS14141, and CBS1878) were additionally sequenced with PacBio long-read sequencing strategy. In addition, previously generated PacBio sequencing data were used for mating type analysis only-for five samples (CBS14139, CBS8735, PM315, CBS4172, CBS9369, and CBS7019). All samples were sequenced on the PacBio RSII platform (Pacific Biosciences, USA), except for CBS9595, libraries were prepared in 2015 with a DNA Template Prep kit 3.0, and polymerase/template complexes were subsequently formed using polymerase binding kit P6 v2 and then sequenced with sequencing reagent kit 4.0. Sample CBS9595 was sequenced more recently, with the following specifications: the library was prepared using the SMRTbell Template Prep kit 1.0, polymerase/template complexes were generated with DNA/polymerase binding kit P6 v2, and the sample was sequenced using DNA sequencing reagent kit 4.0 v2 with a 360-min runtime per SMRTcell.
Read mapping and variant calling. All paired-end Illumina libraries were inspected with FastQC v0.11.5 (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/) and trimmed and filtered with Trimmomatic v0.36 (63). Read mapping was performed with sppIDer pipeline (78) using a combined reference, including the genome assembly of CBS9595 (as representative of P1 lineage; see Text S1 for more details) and CBS14141 (as representative of P2 lineage; see Text S1 for more details). To guarantee the proper correspondence between the scaffolds of both parentals, we aligned both genomes with the nucmer tool of MUMmer v3 (79). Average genome coverage for each sample was estimated with SAMtools v1.9 (80). Variant calling was performed with HaploTypo v1.0.1 (81), selecting Freebayes v1.3.2 (82) as a variant caller and using the default settings for the remaining parameters. Read alignment was inspected with the Integrative Genomics Viewer (IGV) (54).
Definition of blocks of high genomic variability. To determine for each of the haplotypes of the hybrid and the parental genomes the regions with high variability compared to the reference, we used the methodology developed and tested by (40) for LOH blocks definition. Briefly, we used bedtools merge (83) with a distance of 100 bp to merge the homozygous SNPs of each sample, and we set a minimum polymorphic region size of 100 bp. These blocks were compared among the different strains using bedtools jaccard (83). The sequence divergence between the reference alleles and the allele of the polymorphic regions was calculated by dividing the number of SNPs overlapping such regions by the total number of base pairs covered by them. Of note, these regions did not represent LOH regions in the hybrids, since they only reflect the differences with the reference. Nevertheless, they were used to infer the patterns of the respective parental lineages.
LOH blocks definition in the hybrid strains. Since read mapping of the hybrid strains was performed simultaneously in both parental lineages, it was not possible to define LOH blocks based on the distribution of heterozygous variants, as usually performed (37,40,43,44). Instead, an alternative approach where regions deleted in one parental and duplicated in the alternative one were used as an indicator of recombination, i.e., LOH. Therefore, LOH block definition in hybrid strains relied on read depth of coverage. Briefly, bedtools genomecov (83) was used to determine the number of reads covering each position. Positions covered by 0 reads were considered deleted, while those covered by 150% of the average genome coverage were considered duplicated. Similarly to a procedure developed previously (40), we determined a minimum block size of 100 bp. Bedtools intersect was used to determine the intersection of the deleted regions of P1 and the duplicated regions of P2 and vice versa. Only duplicated regions in one parent that intersect a deleted region in the alternative one were considered as LOH. An enrichment analysis of the genes overlapping LOH blocks was performed with FatiGO (84).
Identification of M. furfur mating type region. P/R (MAT A) and HD (MAT B) loci of M. sympodialis strains ATCC 42132 (MAT a1b1) and ATCC 44340 (MAT a2b2) (20) were used as query for tBLASTn analysis on the PacBio genomic assemblies of M. furfur haploid strains. The designation of M. furfur MAT A loci was assigned following that of the closest M. sympodialis orthologs based on the E value of the tBLASTn outcome, whereas that of MAT B loci was assigned according to phylogenetic clustering of the predicted concatenated HD proteins (see below). In all cases, the MAT genes identified in M. furfur strains were confirmed by reciprocal BLASTx on GenBank. The nomenclature of the M. furfur MAT genes follows that of the closely related Ustilaginomycotina: mfa is the pheromone-encoding gene, pra is the pheromone receptor, bE and bW are the HD transcription factors, followed by a number to distinguish from different alleles (18).
Open reading frames of the MAT genes were predicted by comparison with their respective orthologs through BLASTx on GenBank, and using RNA-seq data available for M. furfur CBS14141 (BioProject PRJNA741845 [59]). DNA sequences of the MAT genes were aligned with MUSCLE (86), and their phylogenetic reconstruction was performed with MEGA7 (87) using the maximum-likelihood method (Tamura threeparameter model with gamma distribution) and 100 bootstrap replications. Similarly, translated HD proteins were predicted with ExPASy translate tool (88), concatenated bE-bW sequences were aligned with MUSCLE (86), and the respective maximum-likelihood tree (Jones-Tailor-Thornton, uniform rates) with 100 bootstrap replications was obtained with MEGA7 (87).
The identified genes of the MAT a1, a2, b1, and b2 loci of the M. furfur haploid strains were used as queries for BLASTn and tBLASTx analyses to identify the MAT regions in the M. furfur hybrid strains CBS1878, CBS4172, and CBS7019. The MAT a1 and MAT a2 designation followed that of the haploid strains used as input. For the identified MAT B loci, the DNA sequences of bE and bW genes and their predicted encoded proteins were aligned with MUSCLE (86) and then subjected to phylogenetic analysis with MEGA7 (87), as described above (data not shown).
Characterization of M. furfur mating type. Primers for the amplification of MAT a1, a2, b1, and b2 alleles were designed on the basis of an alignment of all MAT loci of the available M. furfur parental and hybrid strains derived from their genome sequences. The primers for MAT A loci specifically amplified the a1 (JOHE44273, 59-TTGGCAGAGTTGACAGGCT-39; JOHE44272, 59-AACCATCCATGCTGACATTT-39) or a2 allele (JOHE44274, 59-GAGCCACAAGATAATGTCAA-39; JOHE44275, 59-AGACTTCCTGAACAGTGTCC-39), while the primers for the MAT B loci amplified b1 and b3 (JOHE44491, 59-TTCGGTTGACGGTCCCTCGGC-39; JOHE44492, 59-ACCGCGACTGCGCATCCGCG-39) or b2 and b4 (JOHE44494, 59-TTCGCCAAATGTGTT CGGCC-39; JOHE44496, 59-CAGCAACACCCGCCTCGCTT-39). For the amplification of the MAT A loci, ExTaq (TaKaRa) was used following the manufacturer's instructions with the following PCR conditions: initial denaturation 2 min at 94°C, followed by 33 cycles of 30 s at 94°C, 30 s of annealing at 58°C, 1 min 15-s extension at 72°C, and a final extension of 5 min at 72°C. For the amplification of the MAT b1-b3, LATaq (Taqara) supplemented with GC Buffer II was used according to the manufacturer's instructions, and the PCR conditions were as follows: initial denaturation for 2 min at 94°C, followed by 33 cycles of 30 s at 94°C, 30 s of annealing at 60°C, a 2-min extension at 72°C, and a final extension of 5 min at 72°C. The MAT b2-b4 alleles were amplified using the same conditions as reported for MAT b1-b3, except for the use of LATaq supplemented with GC Buffer I. MAT B loci are high in G1C (;65%), and the use of specific Taq polymerase was important for their successful amplification.
The amplified MAT b1-b3 and MAT b2-b4 alleles were then sequenced using the primers used for amplification, and sequences were aligned using MEGAX with MUSCLE and then subjected to phylogenetic analysis using the maximum-likelihood method (Tamura-Nei model) and 500 bootstrap replications (86,89).
Scanning electron microscopy. Hybrid and parent strains were cultivated on modified Dixon (mDixon) medium for 72 h, and a loop of cells was suspended in water. The cells were briefly vortexed to dislodge from each other. Droplets of 1, 2, and 3 mL were gently placed on mDixon agar and dried for 1 h in a laminar flow cabinet to fix the cells onto the agar. After preexamination under a stereomicroscope, small 4 Â 4 mm selections with both individual cells and cells grouped together were cut out using a surgical blade (no. 11; Swann-Morton, Sheffield, UK) and glued on a copper sample cup with a small droplet of frozen-tissue medium (KP-Cryoblock; Klinipath, Duiven, The Netherlands) and subsequently snap-frozen in nitrogen slush, and transferred into an Oxford CT1500 Cryostation connected to a JEOL 5600LV scanning electron microscope (JEOL, Tokyo, Japan). Samples were sputter coated (3 Â 1 min) using a gold target in the cryostation. Electron micrographs were taken at an acceleration voltage of 5 kV.
Data availability. Sequencing data, genome assemblies and annotations are available at NCBI database under the BioProject accessions PRJNA732434 and PRJNA779728.