To our knowledge, this study is the first reported mGWAS investigating the genetics of microbes and CRS phenotypes. We found considerable variation within our S. aureus dataset, with > 90,000 SNPs identified from 23 STs and almost 6000 genes comprising the pan-genome. Despite this variation, there were few significant associations observed between the genetic variation and CRS phenotypes tested. One indel in the core genome of the isolates was found to reach corrected significant thresholds, while two highly variable genes were found to reach corrected significance in the accessory genome. Assessment of the pan-genome content for correlation with disease presentation showed two superantigen like toxin genes (SSL genes) that reached significance thresholds. SSL5 was identified in higher prevalence in the CRSsNP cohort, while SSL14 was more prevalent in CRSwNP cohort. The SSL genes have been given little attention in studies of S. aureus virulence, with a limited number of reports in the general literature and no reports in rhinology research. These proteins are thought to impede neutrophil migration and complement stimulation, as opposed to T-cell activation that is commonly associated with traditional superantigens [35, 36]. The role of SSLs in the pathogenesis of CRS is nevertheless unknown without functional verification.
Testing for the genetic basis of disease was first pioneered in human genetics using genetic linkage methods. This approach was only appropriate in specific monogenic disorders (such as cystic fibrosis) where single gene mutations caused the relevant disorder, with disease causing alleles tracked though generations due to recombination events secondary to sexual reproduction . In complex trait genetics where various genes and environmental factors contribute to a disease phenotype, it was not possible to use traditional approaches relying on the principles of Mendelian inheritance. Instead, there has been a revolution in the study of complex trait genetics in the form of GWAS, where thousands of SNPs can be tested simultaneously for association with disease. In human genetics, this approach has led to the discovery of loci associated with various human diseases that have been subsequently targeted by novel medical therapies . As GWAS rely on large-scale comparisons (as opposed to linkage-based analysis), it is possible to translate this work into the field of microbial genomics .
The use of microbial GWAS (mGWAS) in microbial genomics is relatively new, with the first successful mGWAS published in 2013 , and other studies published since then [41, 42]. This approach has also recently been used to study S. aureus, with Read and colleagues  identifying 121 genetic loci significantly associated with changes in the toxicity of individual MRSA isolates of the same sequence type (ST239) . In the field of rhinology however, all existing studies investigating the genetics of S. aureus and CRS phenotypes have used a “bottom-up” approach, surveying known virulence or resistance genes across CRSwNP, CRSsNP and control groups [44, 45]. Thunberg et al  used a DNA microarray to detect 170 distinct virulence and antibiotic resistance genes in 18 CRS patients and 16 controls, finding no significant difference in gene prevalence. Heymans et al  used a similar approach but with PCR to screen for 22 exotoxin genes (enterotoxins and exfoliative toxins A and B) in 55 CRSwNP, 16 CRSsNP and 22 control patients; they also found no significant difference in prevalence. Investigating genetic association across only a small gene panel risks missing truly significant associations across the genome, and it can lead to erroneous conclusions due to linkage disequilibrium, particularly for organisms exhibiting relatively low recombination rates like S. aureus. In addition, gene presence/absence is only one form of genetic variation, with SNPs and indels being potentially important in a given phenotype. The use of mGWAS is therefore the only feasible approach to determining whether there is a precise genetic basis in the S. aureus genome that accounts for different CRS phenotypes .
The possibility that our study has indeed identified S. aureus point mutations or indels that causes polyp or non-polyp disease is unlikely, due to the low level of correlation found. Read and colleagues recently identified 121 significant loci associated with toxicity in an mGWAS of 90 methicillin-resistant S. aureus (MRSA) isolates. The authors selected 13 of the significant polymorphisms that were thought to affect toxicity, finding that only four of these regions affected T-cell survival in vitro when verified using transposon mutagenesis. Considering our study has only uncovered four regions of interest (two indels and two SSL genes), it is highly likely these associations will be false positives due to the large number of variants tested. Future work involving increased sample size may be able to detect weaker associations not able to be elucidated in our study.
We further targeted known virulence and resistance genes in silico, to establish the prevalence of previously described genes and to compare this cohort with previous studies of S. aureus in CRS. Similar to the previous literature, there were no significant differences in the prevalence of the virulence genes tested across CRSwNP and CRSsNP isolates [44, 45]. There was a high prevalence of some membrane damaging toxins (Hemolysins a, b, d and HlgA, HlgB and Hlg), while the leukocidins were far less prevalent, with the LukS/F genes (Panton-Valentine Leukocidin (PVL)) not identified in any isolates. The protease genes were seen in varying prevalence in the isolates studied. Fourteen of the known superantigen (SAG) toxin genes (Sea/b/c/g/h/I/k/l/m/o/p/q/u/TSST) were identified in the isolates with varying prevalence. SEG, SEM and SEO (of the ECG cluster) were the most commonly identified. The SAG’s have been the most studied virulence genes in the CRS literature. We found similar prevalence of the most common SAG’s reported in these studies confirming that there is variation between the SAG’s and that the ECG cluster, is the most ubiquitous. [44–46] In relation to cell wall associated proteins, the ICA locus genes (ICA A,B,C,D,R), involved in biofilm formation were observed with a very high prevelance, of over 90% of isolates carrying the gene. The fibronectin binding proteins FnbA/FnbB, thought to be involved in cell wall internalization, were observed with less prevalence in all groups (from 16 to 28%). In relation to antibiotic resistance genes, screening results were as expected, with methicillin and macrolide resistance genes rare and other antibiotic resistance genes far more common (eg. the aminoglycoside, phenicol, tetracycline, fluroquinolone, β-lactamase and fosfomycin resistance genes).
There were a number of limitations inherent in this study. Firstly, a single colony of S. aureus was selected for sequencing from each patient, raising the possibility that this may not be the disease-associated S. aureus strain but rather a non-invasive commensal strain. We did not sequence control S. aureus isolates patients in the current study, as our hypothesis was whether there exists an association between S. aureus virulence and CRS phenotypes. We acknowledge that the number of samples used in this study were modest and that a greater number of isolates would be required to identify a smaller effect size. Further mGWAS using all CRS strains (CRSwNP and CRSwoNP) as a disease group and non-CRS strains as a control group may lead to the identification of genetic loci in S. aureus that contribute to the pathogenesis of CRS, and should be the focus of future mGWAS. We have not verified the significant associations identified in this study using functional characterisation, so inferring gene expression remains speculative. In light of our findings, it is likely that the spectrum of disease in CRS may be more related to host genetics and environmental factors than a response to specific S. aureus virulence mechanisms.