Comprehensive Characterization of HIV-1 Molecular Epidemiology and Demographic History in the Brazilian Region Most Heavily Affected by AIDS

ABSTRACT The high incidence of AIDS cases and the dominance of HIV-1 subtype C infections are two features that distinguish the HIV-1 epidemic in the two southernmost Brazilian states (Rio Grande do Sul [RS] and Santa Catarina [SC]) from the epidemic in other parts of the country. Nevertheless, previous studies on HIV molecular epidemiology were conducted mainly in capital cities, and a more comprehensive understanding of factors driving this unique epidemic in Brazil is necessary. Blood samples were collected from individuals in 13 municipalities in the Brazilian southern region. HIV-1 env and pol genes were submitted to phylogenetic analyses for assignment of subtype, and viral population phylodynamics were reconstructed by applying Skygrid and logistic coalescent models in a Bayesian analysis. A high prevalence of subtype C was observed in all sampled locations; however, an increased frequency of recombinant strains was found in RS, with evidence for new circulating forms (CRFs). In the SC state, subtype B and C epidemics were associated with distinct exposure groups. Although logistic models estimated similar growth rates for HIV-1 subtype C (HIV-1C) and HIV-1B, a Skygrid plot reveals that the former epidemic has been expanding for a longer time. Our results highlight a consistent expansion of HIV-1C in south Brazil, and we also discuss how heterosexual and men who have sex with men (MSM) transmission chains might have impacted the current prevalence of HIV-1 subtypes in this region. IMPORTANCE The AIDS epidemic in south Brazil is expanding rapidly, but the circumstances driving this condition are not well known. A high prevalence of HIV-1 subtype C was reported in the capital cities of this region, in contrast to the subtype B dominance in the rest of the country. This study sought to comparatively investigate the HIV-1 subtype B and C epidemics by sampling individuals from several cities in the two states with the highest AIDS incidences in Brazil. Our analyses showed distinct epidemic growth curves for the two epidemics, and we also found evidence suggesting that separate transmission chains may be impacting the viral phylodynamics and the emergence of new recombinant forms.

H igh-level molecular diversity is one of the main features of HIV-1. Due to the very high replication rate and the lack of a proofreading activity of the viral reverse transcriptase (RT) enzyme, HIV-1 group M, the pandemic strain, has evolved into a diverse number of variants (1). Among 9 HIV-1 group M subtypes, HIV-1 subtype C (HIV-1C) is responsible for Ͼ50% of infections worldwide, occurring mostly in southern and east African countries and India. HIV-1B, in turn, is widely dispersed throughout the world, being the dominant strain in Europe and in the Americas, but with a worldwide prevalence of only 10% (1,2). The high level of variability of HIV-1 is also a consequence of the genetic recombination process, which can occur in unique recombinant forms (URFs), when fewer than three individuals are infected, or in circulating forms (CRFs), when three individuals not epidemiologically related are infected with the same mosaic strain (3). Currently, more than 70 CRFs have been described, encompassing around 20% of HIV-1 infections around the world (see http://www.hiv.lanl.gov/ [accessed June 2016]).
The HIV-1 molecular epidemic in Brazil is driven mainly by subtype B, with a minor circulation of subtype F1, BF1 recombi-nant forms, and subtype C (4). However, in the two southernmost states, Rio Grande do Sul (RS) and Santa Catarina (SC), a high prevalence of HIV-1C is observed (40 to 80%), followed by BC mosaic forms (20 to 30%, including CRF31_BC) and subtype B (15 to 30%) (5). The HIV-1C prevalence decreases in the northward direction, representing up to 30% of infections in Paraná state and between 5 and 20% of infections in other nearby states (5). Besides the high prevalence of HIV-1C, RS and SC states registered the highest AIDS incidence rates among the Brazilian states in the last 10 years, notifying around 35 new cases per 100,000 habitants per year (6). The AIDS incidence was also observed to have increased more than two times in countryside cities in RS and SC over 10 years (7). The same process has also been observed for other Brazilian states but at a much lower rate.
A decade of epidemiological studies described the HIV-1C epidemic in south Brazil, investigating its origin and highlighting its dispersion throughout the country and continent (8)(9)(10)(11)(12)(13). However, those studies included samples mainly from individuals in the capital cities, and very little molecular information is available for countryside locations in the southern region. In the face of this interesting epidemiological scenario, where a high AIDS incidence is observed, along with unique molecular features, a better characterization of the HIV-1 epidemic in a broad sampling area is necessary. The present study aims to describe the HIV-1 molecular epidemiology in countryside cities of RS and SC states. To do so, we obtained samples from individuals in several cities, including border towns, in an effort to construct a comprehensive data bank with viral sequences and patient information. In addition, we analyzed the phylodynamics of HIV-1C and -B using demographic reconstruction of multiple genetic loci to describe the historical epidemic growth trends of these two dominant strains in the region.

MATERIALS AND METHODS
Patients and samples. HIV-1-seropositive patients were invited to enroll in this study throughout 13 cities of the Brazilian states of SC and RS ( Fig. 1). The inclusion criteria were diagnosis of HIV infection and no previous exposure to antiretroviral (ARV) treatment. The Universidade Federal de Santa Catarina (UFSC) and the Fundação Estadual de Pesquisa e Produção em Saúde (FEPPS-RS) ethics committees assessed and approved this study. Informed consent was obtained from all enrolled volunteers.
Blood samples were collected onto FTA cards (Whatman Inc.) and dried at room temperature. Three to five 0.5-mm dried blood spot (DBS) punches were washed and dried according to the manufacturer's instructions. Provirus DNA of HIV-1 was amplified by nested PCR directly from the washed punches, and primers and reaction conditions were as follows (14). The HIV-1 envelope (env) (HXB2 bp 6846 to 7360) and polymerase (pol) (HXB2 bp 2274 to 3545) genes were the genome target regions. Purified PCR products were sequenced by using a BigDye Terminator v3.1 cycle sequencing system (Applied Biosystems) and read by using an ABI 3100 genetic analyzer (Applied Biosystems). The generated chromatograms were then assembled into contigs and visually inspected by using SeqMan software in the Lasergene package (DNASTAR).
Subtype classification. Sequences generated here were submitted to the REGA, RIP, and SCUEAL online subtyping tools (15)(16)(17). The subtype assigned by these automated tools was confirmed by phylogenetic tree reconstruction. Subtype reference sequences were downloaded from the Los Alamos HIV Sequence Database (http://www.hiv.lanl.gov/) and aligned with the new sequences by using the Muscle algorithm (18). Maximum likelihood (ML) tree reconstruction was performed by using PhyML with the GTRϩIϩ⌫4 (general time reversible [GTR] substitution model assuming an estimated proportion of invariant sites [I] and four gamma-distributed rate categories [⌫4]) nucleotide substitution model, as implemented in the SeaView program (19). Aiming to identify the recombination breakpoints, env and pol sequences identified as recombinant sequences by using the online tools and in the ML trees were submitted to bootscanning analysis using Simplot 3.5.1 software (20). Bayesian phylodynamics analyses. In order to better understand the expansion of the HIV-1 epidemic in south Brazil, time-scaled phylogenetic trees were reconstructed for subtype B and C sequences by using BEAST/BEAGLE software (21). To improve the accuracy of demographic history reconstruction, only samples classified as belonging to subtype B or C for both the env and pol genes were analyzed. Additional sequences from SC and RS states, which met the same criteria as those for the sequences generated here, were downloaded from the Los Alamos HIV Database and aligned with the countryside sequence data set, compounding a data set sampled from 2008 to 2014. Trees were reconstructed by using an uncorrelated log-normal relaxed molecular clock and the GTRϩIϩ⌫4 nucleotide substitution model. Due to the lack of temporal signal in the data sets analyzed here, as determined by using Path-O-Gen software (http://tree.bio.ed.ac.uk/software/pathogen/), normal distributed priors were applied to the tree root height for the Brazilian HIV-1C (mean of 41 and standard deviation of 4.1 for Pol and mean of 43 and standard deviation of 4.1 for Env) and HIV-1B (mean of 47 and standard deviation of 8.7 for Pol and mean of 50 and standard deviation of 4.1 for Env) clades, as previously estimated (11,22).
The nonparametric Bayesian Skygrid coalescent model was used to estimate HIV effective population size (N e ) trajectories (23). This model bases estimations on data from multiple genetic loci to improve the accuracy of the analyses. To estimate the epidemic growth rates (r) of HIV-1B and HIV-1C, trees were also reconstructed under different parametric demographic models. Marginal likelihood estimation (MLE) was applied to test for the best parametric model in a Bayesian framework (24,25). As distinct genetic loci from an organism are supposed to share the same demographic dynamics without following the same evolutionary history, HIV-1 env and pol sequences were analyzed in BEAST as separated partitions with unlinked substitution and clock models but sharing the same coalescent prior.
Monte Carlo Markov chains (MCMCs) were run sufficiently long enough to ensure that the effective sample sizes (ESSs) for all the parameters were above 200. For the HIV-1C data set, five independent runs of 100 million steps were performed and later combined by using LogCombiner (http: //beast.bio.ed.ac.uk/LogCombiner). For HIV-1B, a single run of 200 million steps was performed. Tracer software (http://tree.bio.ed.ac.uk /software/tracer/) was used to diagnose MCMCs, adjust the initial burnin, and perform demographic reconstructions.
Transmission dynamics among exposure groups. The dynamic of viral transmissions among the main HIV-1 exposure groups was also investigated by using the same data set as the one used in the phylodynamics analysis. To do so, ancestral trait reconstruction was performed by using a discrete model of transition with the Bayesian stochastic search variable selection (BSSVS) procedure and asymmetric transition rates in an analysis where traits were the exposure group of each taxon (26). This analysis was complemented with Markov jump estimation of the number of trait transitions throughout evolutionary history (27). RStudio (http://www .rstudio.org/) was used to summarize the posterior densities of the highly supported transitions from the BEAST log files.
Statistical analysis. Patients were invited to answer a questionnaire at the time of enrollment. Information about gender, age, exposure group, ethnicity, education level, and marital status were included in the questionnaire, while clinical data were retrieved from patient medical records. A multivariable logistic regression model was applied to determine associations between variables and HIV-1 subtype. To deal with the perfect separation of some variables, the logistic regression was conducted in a Bayesian framework (28). All analyses were executed with RStudio, and statistical significance was achieved when P values were Յ0.05. Data relative to new cases of AIDS annually reported in Brazil were retrieved from the DATASUS system, a data bank hosted by the Brazilian Ministry of Health (http://datasus.saude.gov.br/). Accession numbers. The sequences generated in the present study were deposited in GenBank under accession numbers KR065788 to KR066336 and KP224476 to KP224501.

RESULTS
A total of 359 samples were collected between October 2009 and February 2014: 163 (45%) samples were from RS state, and 196 (55%) samples were from SC state. HIV-1 sequences from the pol region were obtained for 253 (71%) samples and from the env region for 284 (79%) samples, totaling 317 (88%) patients with molecular data for pol, env, or both. Among 317 patients with subtype information, 45% were women, and 55% were men. The mean date of diagnosis was May 2010, and the mean patient age was 40 years. The most frequently related HIV-1 exposure group was heterosexual individuals (HET) (71%), followed by men who have sex with men (MSM) (20%). Seventy-two percent of the enrolled patients were self-defined white, while 17% defined themselves as African descendants or Amerindians.
Recombinant strains were observed in 21% (66/317) of the patients studied here, as 77% of them (51/66) were infected by unique forms (URFs). Among the URF_BC samples (n ϭ 33), 30% were intergenic recombinants whose recombination breakpoints were not able to be observed in the pol and env fragments sequenced here. For URF_BF1 (n ϭ 14), intergenic mosaic strains represented 57% of the infections. Some of the URFs identified here were closely related in phylogenetic trees and also showed similar recombination breakpoints in the bootscanning analysis. Two sequences from Chapecó city were intergenic URF_BF1 sequences. They were very closely related in the trees, but direct transmission is unlikely, as the samples were obtained from two women reporting heterosexual exposure to HIV. Among the intragenic recombinant mosaic forms, all identified in pol sequences, three distinct groups of URF_BC strains were identified with similar breakpoints and clustered together in trees (Fig. 2). These groups comprised the LAJ03 and LAJ29 sequences from Lajeado city, named BC_cluster1; URU09 and URU11 from Uruguaiana city, named BC_cluster2; and CRA96, URU05, and URU18 from Cruz Alta and Uruguaiana cities, named BC_cluster3. Heterosexual men and women composed each of these clusters.
A CRF31_BC-related recombination pattern was observed for BC_cluster1 and BC_cluster2 sequences (Fig. 2). The sequences in these clusters share the first recombination breakpoint, between positions 2965 and 2984 relative to the HXB2 strain, with CRF31_BC. However, they have a smaller subtype B insertion in a pol subtype C backbone than in CRF31_BC. In BC_cluster2, it was still possible to detect another recombination breakpoint, although caution is necessary when interpreting data from bootscanning analysis for the extremities of the sequences. BC_cluster3, in turn, shows a distinct mosaic display with a subtype B insertion in positions close to positions 2850 and 3060 (HXB2). It is also possible to observe an additional recombination breakpoint close to the beginning of the analyzed fragment.  A statistically significant association was observed among HIV-1 subtypes (constrained to comparisons between HIV-1B and HIV-1C) and epidemiological data ( Table 1). In SC, the multivariable analysis showed that HIV-1C infections were associated with the heterosexual exposure group and negatively associated with the white ethnic group ( Table 1). The odds of being infected with HIV-1B, in turn, were higher among MSM and lower for the African descendant/Amerindian ethnic group. Exposure group and ethnicity were not related to each other in this location (data not shown). In RS state, the odds of being infected with HIV-1C were significantly lower for individuals with a higher level of education (Ͼ12 years) and individuals living with a partner but not legally married. Also, note the borderline significance (P ϭ 0.07) of the negative association between subtype C and male gender in RS state.
Differences between HIV-1B and HIV-1C regarding the epidemic expansion rate were further investigated. Skygrid coalescent analysis reconstructed very similar growth curves for both subtypes, from their independent introduction until the early1990s (Fig. 3a). From this point forward, the N e value for HIV-1B decreased until around the mid-2000s, when it started to increase again. By its turn, the N e value for HIV-1C continued to increase in until the mid-to late 2000s, when the epidemic growth rate seemed to have stabilized. Considering that AIDS symptoms arise typically 6 to 10 years after infection (29), the numbers of new cases (incidences) of AIDS annually reported by the Brazilian Ministry of Health in RS and SC, stratified by MSM and HET, follow similar epidemic growth trends for HIV-1B and -C, respectively (Fig. 3b). The incidence of AIDS among MSM annually increased until the mid-1990s, matching the period when the growth rate of the HIV-1B epidemic decreased. The more recent increase of HIV-1B N e also seems to be reflected in the MSM epidemic after the mid-2000s. The incidence of AIDS among HET, on the other hand, seems to be correlated with the HIV-1C epidemic; however, after the stabilization of new cases in HET from the 2000s, the N e value for HIV-1C continued to increase, possibly reflecting expansion throughout the MSM group, whose incidence did not follow the shrinkage of HIV-1B N e since the 1990s. To investigate this hypothesis, an analysis to count the number of transitions among HIV-1 exposure groups was performed. While the number of transitions between HET and MSM groups has remained constant through time for HIV-1B (data not shown), an increase in HET-to-MSM transitions was observed for HIV-1C, reaching the densest period in the late 1990s (Fig. 3a). These findings may illustrate the period of the most effective introduction of HIV-1C in the MSM group, which might have al-lowed the HIV-1C epidemic to keep expanding in the following years. MLE indicated the logistic growth model as the best coalescent parametric model. This analysis estimated very similar logistic growth rates for HIV-1C (0.53 year Ϫ1 ; 95% high posterior density [HPD], 0.43 to 0.63) and HIV-1B (0.49 year Ϫ1 ; 95% HPD, 0.36 to 0.63), with large overlapping HPD intervals (Fig. 3c).

DISCUSSION
The study presented here describes HIV-1 molecular epidemiology in a large area, comprising 13 municipalities from the two southernmost Brazilian states. In terms of molecular diversity, our results corroborate previous studies that also reported a large dominance of subtype C (ϳ65%) in SC state (30)(31)(32)(33) and a more diverse epidemic in RS state, mainly because of the high prevalence of BC mosaic forms in the latter (ϳ25%) (34)(35)(36). In both states, a low prevalence of subtype F1 and BF1 recombinants (ϳ3%) was observed, while this frequency can reach up to 20% in northern regions of Brazil (37,38) and 40% in Argentina (39). It is interesting to note that the locations with higher subtype F1 or BF1 prevalences were cities close to the border with Argentina (i.e., Uruguaiana and Chapecó) and the port city of Itajaí, where a high international population flux may foster contact with new HIV-1 strains (Fig. 1).
The CRF31_BC strain was almost exclusively found in RS state, more precisely in cities near Porto Alegre (i.e., Lajeado and Caxias do Sul) (Fig. 1), where the CRF31_BC frequency is around 22% and may reach 35% in the metropolitan region (34,40). Despite the more pronounced occurrence in Porto Alegre and nearby cities, CRF31_BC is not limited to this region, as it was also observed in more distant locations in the present study. Among the BC mosaic forms, some displayed a CRF31_BC-related recombination pattern, which likely originated from a second recombination event within the CRF31_BC pol fragment, as previously reported for other samples from Porto Alegre (41). In our sampling, this pattern was observed in two clusters of sequences (Fig. 2) and in some other isolated sequences classified here as URF_BC (data not shown). Although very similar, only a thorough analysis of the recombination region can confirm if these sequences are indeed second-generation recombinants (SGRs) derived from CRF31_BC.
Three possible new CRFs were identified here: BC_cluster1 and BC_cluster2, which might be SGRs from CRF31_BC, and BC_cluster3, with a different recombination pattern. Further studies aiming to sequence the full-length genomes of these strains are needed to confirm this hypothesis, but our findings point to a greater diversity of circulating BC mosaic strains than previously reported (41). Eight distinct BC CRFs have been described worldwide so far, with six of them in the Chinese province of Yunnan, a region near India, where subtypes C and B cocirculate (42). The high prevalence of AIDS in injection drug users (IDU) makes Yunnan province a hot spot for HIV recombination. The very low prevalence of AIDS cases among IDU in Brazil (6) could explain the low level of diversity of BC CRFs observed in the southern region. Nevertheless, due to the coprevalence of HIV-1C and -B, it is likely that CRFs other than CRF31_BC are circulating in the states of SC and RS.
Indeed, separate transmission chains for subtypes B and C might be acting on the compartmentalization of the epidemic in south Brazil. In SC, we observed an association between the het- erosexual group and HIV-1C, and the same association was previously reported in the state capital city of Florianópolis (30). The lower prevalence of BC mosaic strains in SC might be a consequence of the segregation of HIV-1C and -B in separate transmission chains, a scenario that was not observed in our sampling from RS state, where a higher number of mosaic strains was observed. Moreover, it was reported that IDU could have had an initial role in the dispersion of subtype C in RS, promoting admixture with the subtype B epidemic in the early stages of the epidemic (43). A second demographic factor that may be shaping viral transmissions in south Brazil is ethnicity. In SC state, all patients who self-declared as African descendants or Amerindians were infected with HIV-1C. Although this demographic feature does not represent an HIV exposure group, factors related to ethnicity or socioeconomic group might be segregating HIV-1 subtypes in separate epidemics. Future studies better designed to capture representative samples from different socioeconomic groups may be more effective in clarifying this issue.
The phylodynamic analysis performed here reconstructed distinct N e trajectories for HIV-1C and HIV-1B since the early 1990s (Fig. 3a). Before then, the rates of both epidemics increased very similarly, as also captured by the logistic parametric model (Fig.  3b). However, from the early 1990s, HIV-1C experienced a continued expansion for Ͼ15 years, with a decrease of the growth rate only in the mid-2000s. The similar curve of AIDS incidence in HET and the association of HIV-1C infections with this group indicate that the HIV-1C epidemic in Brazil might have been propelled by heterosexual transmissions. Although our sampling data did not find a segregated epidemic in RS, other studies with samples from the early 2000s did (35,36,44), which supports that in RS, HIV-1C also disseminated mainly among HET in the past. In fact, in Porto Alegre (capital city of RS), a study reported that the prevalence of HIV-1C significantly increased in HET but also subtly increased in MSM in the period from 1998 to 2008, while at the same time, the prevalence of HIV-1B significantly decreased in both groups (43). Our phylodynamic analysis also captured a shrinkage in HIV-1B N e , which could reflect competition with HIV-1C spreading throughout the MSM group after a period of intense transitions from HET to MSM (likely fostered by bisexual activity) in the late 1990s (Fig. 3a). In our sampling, representative mostly of the early 2010s, 64% of MSM were infected with HIV-1C, reinforcing that this subtype was also effective in dispersing in this group.
Although our findings highlight the success of subtype C in the HIV-1 epidemic in south Brazil, this does not mean that HIV-1C is spreading faster than subtype B. The logistic growth rates estimated here for HIV-1C (0.53 year Ϫ1 ; 95% HPD, 0.43 to 0.63) and HIV-1B (0.49 year Ϫ1 ; 95% HPD, 0.36 to 0.63) were similar, corroborating data reported previously by Bello et al., who also did not find a significant difference between rates of epidemic expansion of HIV-1C and -B (22,45). Rather, it is conceivable that an efficient introduction of HIV-1C in the HET group allowed progressive spread through a large and expanding network of host individuals, more recently also effectively spreading among MSM. This scenario might have fostered HIV-1C to continuously disseminate in south Brazil and, more recently, also to other parts of the country (5). As recently reported, RS was the likely point of onset of the HIV-1C epidemic in Brazil and has spread northward from that state (46). Because progressive intermixing of partially separated epidemics seems to be a natural trend (47), the segrega-tion of HIV-1C in the HET group can no longer be observed in RS. In time, it should also weaken the association among subtypes and exposure categories in other locations.
It is also interesting to note that HET transmission might be driving HIV-1C epidemic expansion in other parts of the country. In the southern states, where HIV-1C prevails, numbers of cases of AIDS in HET are 4 to 6 times higher than in MSM (6). In turn, in southeastern states, where the frequency of HIV-1C is Ͻ10% (5), the HET/MSM ratio is 1.5 to 3.5. Finally, in the central western region, where recent studies reported a frequency of HIV-1C of up to 20%, the HET/MSM ratio of AIDS cases is 4 to 8 (5, 6). These epidemiological data, taken together with our results, support HET transmission as a factor impacting HIV-1C diffusion in Brazil, responsible for successful introduction and establishment in the southern region, the recent spread to the central western states, and imposing constraints to viral dispersion toward the southeast.
In summary, our study depicts the molecular epidemic in several cities in SC and RS states. This is the first time that such a broad sampling area was investigated in south Brazil. Although the sample size from some of the cities is small (e.g., Santa Maria and Lages), we were able to capture the diversity of HIV-1 in the region. Here, we describe the epidemic growth trends of HIV-1C and -B, showing how MSM and HET transmission chains might have impacted the phylodynamics of the viral populations. Our analyses, distinct from previous studies, provided more precise estimations, mainly because our demographic reconstruction was informed by two loci analyzed as separate partitions but linked by the coalescent prior. This approach highlights the importance of sequencing more than one gene when it comes to phylodynamic analysis applied to HIV. Due to the small sample size, estimations for HIV-1B showed larger HPD intervals than those for HIV-1C (Fig. 3a). Because of this, many of the N e trajectories are overlapping, even though a distinct trend has been observed since the 1990s. In addition, we also described potential new circulating BC mosaic strains whose recombination breakpoints can be derived from CRF31_BC. The region studied here is the region most heavily affected by AIDS in Brazil, and our findings can be valuable for the development of new public health policies that aim to slow HIV dissemination in south Brazil and beyond.