Center for Advanced Genomic Technology Papers
Permanent URI for this collection
Browse
Recent Submissions
Item VisANT 3.0: New Modules for Pathway Visualization, Editing, Prediction and Construction(Oxford University Press, 2007-07) Hu, Zhenjun; Ng, David M.; Yamada, Takuji; Chen, Chunnuan; Kawashima, Shuichi; Mellor, Joe; Linghu, Bolan; Kanehisa, Minoru; Stuart, Joshua M.; DeLisi, CharlesWith the integration of the KEGG and Predictome databases as well as two search engines for coexpressed genes/proteins using data sets obtained from the Stanford Microarray Database (SMD) and Gene Expression Omnibus (GEO) database, VisANT 3.0 supports exploratory pathway analysis, which includes multi-scale visualization of multiple pathways, editing and annotating pathways using a KEGG compatible visual notation and visualization of expression data in the context of pathways. Expression levels are represented either by color intensity or by nodes with an embedded expression profile. Multiple experiments can be navigated or animated. Known KEGG pathways can be enriched by querying either coexpressed components of known pathway members or proteins with known physical interactions. Predicted pathways for genes/proteins with unknown functions can be inferred from coexpression or physical interaction data. Pathways produced in VisANT can be saved as computer-readable XML format (VisML), graphic images or high-resolution Scalable Vector Graphics (SVG). Pathways in the format of VisML can be securely shared within an interested group or published online using a simple Web link. VisANT is freely available at http://visant.bu.edu.Item High-Precision High-Coverage Functional Inference from Integrated Data Sources(BioMed Central, 2008-2-25) Linghu, Bolan; Snitkin, Evan S.; Holloway, Dustin T.; Gustafson, Adam M.; Xia, Yu; DeLisi, CharlesBACKGROUND. Information obtained from diverse data sources can be combined in a principled manner using various machine learning methods to increase the reliability and range of knowledge about protein function. The result is a weighted functional linkage network (FLN) in which linked neighbors share at least one function with high probability. Precision is, however, low. Aiming to provide precise functional annotation for as many proteins as possible, we explore and propose a two-step framework for functional annotation (1) construction of a high-coverage and reliable FLN via machine learning techniques (2) development of a decision rule for the constructed FLN to optimize functional annotation. RESULTS. We first apply this framework to Saccharomyces cerevisiae. In the first step, we demonstrate that four commonly used machine learning methods, Linear SVM, Linear Discriminant Analysis, Naïve Bayes, and Neural Network, all combine heterogeneous data to produce reliable and high-coverage FLNs, in which the linkage weight more accurately estimates functional coupling of linked proteins than use individual data sources alone. In the second step, empirical tuning of an adjustable decision rule on the constructed FLN reveals that basing annotation on maximum edge weight results in the most precise annotation at high coverages. In particular at low coverage all rules evaluated perform comparably. At coverage above approximately 50%, however, they diverge rapidly. At full coverage, the maximum weight decision rule still has a precision of approximately 70%, whereas for other methods, precision ranges from a high of slightly more than 30%, down to 3%. In addition, a scoring scheme to estimate the precisions of individual predictions is also provided. Finally, tests of the robustness of the framework indicate that our framework can be successfully applied to less studied organisms. CONCLUSION. We provide a general two-step function-annotation framework, and show that high coverage, high precision annotations can be achieved by constructing a high-coverage and reliable FLN via data integration followed by applying a maximum weight decision rule.Item A Predictive Phosphorylation Signature of Lung Cancer(Public Library of Science, 2009-11-25) Wu, Chang-Jiun; Cai, Tianxi; Rikova, Klarisa; Merberg, David; Kasif, Simon; Steffen, MartinBACKGROUND. Aberrant activation of signaling pathways drives many of the fundamental biological processes that accompany tumor initiation and progression. Inappropriate phosphorylation of intermediates in these signaling pathways are a frequently observed molecular lesion that accompanies the undesirable activation or repression of pro- and anti-oncogenic pathways. Therefore, methods which directly query signaling pathway activation via phosphorylation assays in individual cancer biopsies are expected to provide important insights into the molecular "logic" that distinguishes cancer and normal tissue on one hand, and enables personalized intervention strategies on the other. RESULTS. We first document the largest available set of tyrosine phosphorylation sites that are, individually, differentially phosphorylated in lung cancer, thus providing an immediate set of drug targets. Next, we develop a novel computational methodology to identify pathways whose phosphorylation activity is strongly correlated with the lung cancer phenotype. Finally, we demonstrate the feasibility of classifying lung cancers based on multi-variate phosphorylation signatures. CONCLUSIONS. Highly predictive and biologically transparent phosphorylation signatures of lung cancer provide evidence for the existence of a robust set of phosphorylation mechanisms (captured by the signatures) present in the majority of lung cancers, and that reliably distinguish each lung cancer from normal. This approach should improve our understanding of cancer and help guide its treatment, since the phosphorylation signatures highlight proteins and pathways whose phosphorylation should be inhibited in order to prevent unregulated proliferation.Item Biological Process Linkage Networks(Public Library of Science, 2009-4-23) Dotan-Cohen, Dikla; Letovsky, Stan; Melkman, Avraham A.; Kasif, SimonBACKGROUND. The traditional approach to studying complex biological networks is based on the identification of interactions between internal components of signaling or metabolic pathways. By comparison, little is known about interactions between higher order biological systems, such as biological pathways and processes. We propose a methodology for gleaning patterns of interactions between biological processes by analyzing protein-protein interactions, transcriptional co-expression and genetic interactions. At the heart of the methodology are the concept of Linked Processes and the resultant network of biological processes, the Process Linkage Network (PLN). RESULTS. We construct, catalogue, and analyze different types of PLNs derived from different data sources and different species. When applied to the Gene Ontology, many of the resulting links connect processes that are distant from each other in the hierarchy, even though the connection makes eminent sense biologically. Some others, however, carry an element of surprise and may reflect mechanisms that are unique to the organism under investigation. In this aspect our method complements the link structure between processes inherent in the Gene Ontology, which by its very nature is species-independent. As a practical application of the linkage of processes we demonstrate that it can be effectively used in protein function prediction, having the power to increase both the coverage and the accuracy of predictions, when carefully integrated into prediction methods. CONCLUSIONS. Our approach constitutes a promising new direction towards understanding the higher levels of organization of the cell as a system which should help current efforts to re-engineer ontologies and improve our ability to predict which proteins are involved in specific biological processes.Item Network-Based Analysis of Affected Biological Processes in Type 2 Diabetes Models(Public Library of SCience, 2007-6-15) Liu, Manway; Liberzon, Arthur; Kong, Sek Won; Lai, Weil R.; Park, Peter J.; Kohane, Isaac S.; Kasif, SimonType 2 diabetes mellitus is a complex disorder associated with multiple genetic, epigenetic, developmental, and environmental factors. Animal models of type 2 diabetes differ based on diet, drug treatment, and gene knockouts, and yet all display the clinical hallmarks of hyperglycemia and insulin resistance in peripheral tissue. The recent advances in gene-expression microarray technologies present an unprecedented opportunity to study type 2 diabetes mellitus at a genome-wide scale and across different models. To date, a key challenge has been to identify the biological processes or signaling pathways that play significant roles in the disorder. Here, using a network-based analysis methodology, we identified two sets of genes, associated with insulin signaling and a network of nuclear receptors, which are recurrent in a statistically significant number of diabetes and insulin resistance models and transcriptionally altered across diverse tissue types. We additionally identified a network of protein–protein interactions between members from the two gene sets that may facilitate signaling between them. Taken together, the results illustrate the benefits of integrating high-throughput microarray studies, together with protein–protein interaction networks, in elucidating the underlying biological processes associated with a complex disorder. Author Summary Type 2 diabetes mellitus currently affects millions of people. It is clinically characterized by insulin resistance in addition to an impaired glucose response and associated with numerous complications including heart disease, stroke, neuropathy, and kidney failure, among others. Accurate identification of the underlying molecular mechanisms of the disease or its complications is an important research problem that could lead to novel diagnostics and therapy. The main challenge stems from the fact that insulin resistance is a complex disorder and affects a multitude of biological processes, metabolic networks, and signaling pathways. In this report, the authors develop a network-based methodology that appears to be more sensitive than previous approaches in detecting deregulated molecular processes in a disease state. The methodology revealed that both insulin signaling and nuclear receptor networks are consistently and differentially expressed in many models of insulin resistance. The positive results suggest such network-based diagnostic technologies hold promise as potentially useful clinical and research tools in the future.Item Functional Characterization of the YmcB and YqeV tRNA Methylthiotransferases of Bacillus Subtilis(2010-5-14) Anton, Brian P.; Russell, Susan P.; Vertrees, Jason; Kasif, Simon; Raleigh, Elisabeth A.; Limbach, Patrick A.; Roberts, Richard J.Methylthiotransferases (MTTases) are a closely related family of proteins that perform both radical-S-adenosylmethionine (SAM) mediated sulfur insertion and SAM-dependent methylation to modify nucleic acid or protein targets with a methyl thioether group (–SCH3). Members of two of the four known subgroups of MTTases have been characterized, typified by MiaB, which modifies N6-isopentenyladenosine (i6A) to 2-methylthio-N6-isopentenyladenosine (ms2i6A) in tRNA, and RimO, which modifies a specific aspartate residue in ribosomal protein S12. In this work, we have characterized the two MTTases encoded by Bacillus subtilis 168 and find that, consistent with bioinformatic predictions, ymcB is required for ms2i6A formation (MiaB activity), and yqeV is required for modification of N6-threonylcarbamoyladenosine (t6A) to 2-methylthio-N6-threonylcarbamoyladenosine (ms2t6A) in tRNA. The enzyme responsible for the latter activity belongs to a third MTTase subgroup, no member of which has previously been characterized. We performed domain-swapping experiments between YmcB and YqeV to narrow down the protein domain(s) responsible for distinguishing i6A from t6A and found that the C-terminal TRAM domain, putatively involved with RNA binding, is likely not involved with this discrimination. Finally, we performed a computational analysis to identify candidate residues outside the TRAM domain that may be involved with substrate recognition. These residues represent interesting targets for further analysis.Item VisANT 3.5: Multi-Scale Network Visualization, Analysis and Inference Based on the Gene Ontology(2009-7-1) Hu, Zhenjun; Hung, Jui-Hung; Wang, Yan; Chang, Yi-Chien; Huang, Chia-Ling; Huyck, Matt; DeLisi, CharlesDespite its wide usage in biological databases and applications, the role of the gene ontology (GO) in network analysis is usually limited to functional annotation of genes or gene sets with auxiliary information on correlations ignored. Here, we report on new capabilities of VisANT—an integrative software platform for the visualization, mining, analysis and modeling of the biological networks—which extend the application of GO in network visualization, analysis and inference. The new VisANT functions can be classified into three categories. (i) Visualization: a new tree-based browser allows visualization of GO hierarchies. GO terms can be easily dropped into the network to group genes annotated under the term, thereby integrating the hierarchical ontology with the network. This facilitates multi-scale visualization and analysis. (ii) Flexible annotation schema: in addition to conventional methods for annotating network nodes with the most specific functional descriptions available, VisANT also provides functions to annotate genes at any customized level of abstraction. (iii) Finding over-represented GO terms and expression-enriched GO modules: two new algorithms have been implemented as VisANT plugins. One detects over-represented GO annotations in any given sub-network and the other finds the GO categories that are enriched in a specified phenotype or perturbed dataset. Both algorithms take account of network topology (i.e. correlations between genes based on various sources of evidence). VisANT is freely available at http://visant.bu.edu.Item MuPlex: Multi-Objective Multiplex PCR Assay Design(Oxford University Press, 2005-06-27) Rachlin, John; Ding, Chunming; Cantor, Charles R.; Kasif, SimonWe have developed a web-enabled system called MuPlex that aids researchers in the design of multiplex PCR assays. Multiplex PCR is a key technology for an endless list of applications, including detecting infectious microorganisms, whole-genome sequencing and closure, forensic analysis and for enabling flexible yet low-cost genotyping. However, the design of a multiplex PCR assays is computationally challenging because it involves tradeoffs among competing objectives, and extensive computational analysis is required in order to screen out primer-pair cross interactions. With MuPlex, users specify a set of DNA sequences along with primer selection criteria, interaction parameters and the target multiplexing level. MuPlex designs a set of multiplex PCR assays designed to cover as many of the input sequences as possible. MuPlex provides multiple solution alternatives that reveal tradeoffs among competing objectives. MuPlex is uniquely designed for large-scale multiplex PCR assay design in an automated high-throughput environment, where high coverage of potentially thousands of single nucleotide polymorphisms is required. The server is available at http://genomics14.bu.edu:8080/MuPlex/MuPlex.html.Item Biological Context Networks: A Mosaic View of the Interactome(2006-11-28) Rachlin, John; Cohen, Dikla Dotan; Cantor, Charles R.; Kasif, SimonNetwork models are a fundamental tool for the visualization and analysis of molecular interactions occurring in biological systems. While broadly illuminating the molecular machinery of the cell, graphical representations of protein interaction networks mask complex patterns of interaction that depend on temporal, spatial, or condition-specific contexts. In this paper, we introduce a novel graph construct called a biological context network that explicitly captures these changing patterns of interaction from one biological context to another. We consider known gene ontology biological process and cellular component annotations as a proxy for context, and show that aggregating small process-specific protein interaction sub-networks leads to the emergence of observed scale-free properties. The biological context model also provides the basis for characterizing proteins in terms of several context-specific measures, including 'interactive promiscuity,' which identifies proteins whose interacting partners vary from one context to another. We show that such context-sensitive measures are significantly better predictors of knockout lethality than node degree, reaching better than 70% accuracy among the top scoring proteins.Item Integration of Heterogeneous Expression Data Sets Extends the Role of the Retinol Pathway in Diabetes and Insulin Resistance(Oxford University Press, 2009-9-28) Park, Peter J.; Kong, Sek Won; Tebaldi, Toma; Lai, Weil R.; Kasif, Simon; Kohane, Isaac S.Motivation: Type 2 diabetes is a chronic metabolic disease that involves both environmental and genetic factors. To understand the genetics of type 2 diabetes and insulin resistance, the DIabetes Genome Anatomy Project (DGAP) was launched to profile gene expression in a variety of related animal models and human subjects. We asked whether these heterogeneous models can be integrated to provide consistent and robust biological insights into the biology of insulin resistance. Results: We perform integrative analysis of the 16 DGAP data sets that span multiple tissues, conditions, array types, laboratories, species, genetic backgrounds and study designs. For each data set, we identify differentially expressed genes compared with control. Then, for the combined data, we rank genes according to the frequency with which they were found to be statistically significant across data sets. This analysis reveals RetSat as a widely shared component of mechanisms involved in insulin resistance and sensitivity and adds to the growing importance of the retinol pathway in diabetes, adipogenesis and insulin resistance. Top candidates obtained from our analysis have been confirmed in recent laboratory studies. Contact: Isaac_kohane@harvard.eduItem Seeing the Forest for the Trees: Using the Gene Ontology to Restructure Hierarchical Clustering(Oxford University Press, 2009-6-3) Dotan-Cohen, Dikla; Kasif, Simon; Melkman, Avraham A.Motivation: There is a growing interest in improving the cluster analysis of expression data by incorporating into it prior knowledge, such as the Gene Ontology (GO) annotations of genes, in order to improve the biological relevance of the clusters that are subjected to subsequent scrutiny. The structure of the GO is another source of background knowledge that can be exploited through the use of semantic similarity. Results: We propose here a novel algorithm that integrates semantic similarities (derived from the ontology structure) into the procedure of deriving clusters from the dendrogram constructed during expression-based hierarchical clustering. Our approach can handle the multiple annotations, from different levels of the GO hierarchy, which most genes have. Moreover, it treats annotated and unannotated genes in a uniform manner. Consequently, the clusters obtained by our algorithm are characterized by significantly enriched annotations. In both cross-validation tests and when using an external index such as protein–protein interactions, our algorithm performs better than previous approaches. When applied to human cancer expression data, our algorithm identifies, among others, clusters of genes related to immune response and glucose metabolism. These clusters are also supported by protein–protein interaction data. Contact: dotna@cs.bgu.ac.il Supplementary information: Supplementary data are available at Bioinformatics online.Item Phylogenetic Detection of Conserved Gene Clusters in Microbial Genomes(BioMed Central, 2005-10-3) Zheng, Yu; Anton, Brian P.; Roberts, Richard J.; Kasif, SimonBACKGROUND. Microbial genomes contain an abundance of genes with conserved proximity forming clusters on the chromosome. However, the conservation can be a result of many factors such as vertical inheritance, or functional selection. Thus, identification of conserved gene clusters that are under functional selection provides an effective channel for gene annotation, microarray screening, and pathway reconstruction. The problem of devising a robust method to identify these conserved gene clusters and to evaluate the significance of the conservation in multiple genomes has a number of implications for comparative, evolutionary and functional genomics as well as synthetic biology. RESULTS. In this paper we describe a new method for detecting conserved gene clusters that incorporates the information captured by a genome phylogenetic tree. We show that our method can overcome the common problem of overestimation of significance due to the bias in the genome database and thereby achieve better accuracy when detecting functionally connected gene clusters. Our results can be accessed at database GeneChords . CONCLUSION. The methodology described in this paper gives a scalable framework for discovering conserved gene clusters in microbial genomes. It serves as a platform for many other functional genomic analyses in microorganisms, such as operon prediction, regulatory site prediction, functional annotation of genes, evolutionary origin and development of gene clusters.Item Portraits of Breast Cancer Progression(BioMed Central, 2007-8-6) Dalgin, Gul S.; Alexe, Gabriela; Scanfeld, Daniel; Tamayo, Pablo; Mesirov, Jill P.; Ganesan, Shridar; DeLisi, Charles; Bhanot, GyanBACKGROUND. Clustering analysis of microarray data is often criticized for giving ambiguous results because of sensitivity to data perturbation or clustering techniques used. In this paper, we describe a new method based on principal component analysis and ensemble consensus clustering that avoids these problems. RESULTS. We illustrate the method on a public microarray dataset from 36 breast cancer patients of whom 31 were diagnosed with at least two of three pathological stages of disease (atypical ductal hyperplasia (ADH), ductal carcinoma in situ (DCIS) and invasive ductal carcinoma (IDC). Our method identifies an optimum set of genes and divides the samples into stable clusters which correlate with clinical classification into Luminal, Basal-like and Her2+ subtypes. Our analysis reveals a hierarchical portrait of breast cancer progression and identifies genes and pathways for each stage, grade and subtype. An intriguing observation is that the disease phenotype is distinguishable in ADH and progresses along distinct pathways for each subtype. The genetic signature for disease heterogeneity across subtypes is greater than the heterogeneity of progression from DCIS to IDC within a subtype, suggesting that the disease subtypes have distinct progression pathways. Our method identifies six disease subtype and one normal clusters. The first split separates the normal samples from the cancer samples. Next, the cancer cluster splits into low grade (pathological grades 1 and 2) and high grade (pathological grades 2 and 3) while the normal cluster is unchanged. Further, the low grade cluster splits into two subclusters and the high grade cluster into four. The final six disease clusters are mapped into one Luminal A, three Luminal B, one Basal-like and one Her2+. CONCLUSION. We confirm that the cancer phenotype can be identified in early stage because the genes altered in this stage progressively alter further as the disease progresses through DCIS into IDC. We identify six subtypes of disease which have distinct genetic signatures and remain separated in the clustering hierarchy. Our findings suggest that the heterogeneity of disease across subtypes is higher than the heterogeneity of the disease progression within a subtype, indicating that the subtypes are in fact distinct diseases.Item Accelerated Postnatal Growth Increases Lipogenic Gene Expression and Adipocyte Size in Low–Birth Weight Mice(American Diabetes Association, 2009-2-10) Isganaitis, Elvira; Jimenez-Chillaron, Jose; Woo, Melissa; Chow, Alice; DeCoste, Jennifer; Vokes, Martha; Liu, Manway; Kasif, Simon; Zavacki, Ann-Marie; Leshan, Rebecca L.; Myers, Martin G.; Patti, Mary-ElizabethOBJECTIVE: To characterize the hormonal milieu and adipose gene expression in response to catch-up growth (CUG), a growth pattern associated with obesity and diabetes risk, in a mouse model of low birth weight (LBW). RESEARCH DESIGN AND METHODS: ICR mice were food restricted by 50% from gestational days 12.5–18.5, reducing offspring birth weight by 25%. During the suckling period, dams were either fed ad libitum, permitting CUG in offspring, or food restricted, preventing CUG. Offspring were killed at age 3 weeks, and gonadal fat was removed for RNA extraction, array analysis, RT-PCR, and evaluation of cell size and number. Serum insulin, thyroxine (T4), corticosterone, and adipokines were measured. RESULTS: At age 3 weeks, LBW mice with CUG (designated U-C) had body weight comparable with controls (designated C-C); weight was reduced by 49% in LBW mice without CUG (designated U-U). Adiposity was altered by postnatal nutrition, with gonadal fat increased by 50% in U-C and decreased by 58% in U-U mice (P < 0.05 vs. C-C mice). Adipose expression of the lipogenic genes Fasn, AccI, Lpin1, and Srebf1 was significantly increased in U-C compared with both C-C and U-U mice (P < 0.05). Mitochondrial DNA copy number was reduced by >50% in U-C versus U-U mice (P = 0.014). Although cell numbers did not differ, mean adipocyte diameter was increased in U-C and reduced in U-U mice (P < 0.01). CONCLUSIONS: CUG results in increased adipose tissue lipogenic gene expression and adipocyte diameter but not increased cellularity, suggesting that catch-up fat is primarily associated with lipogenesis rather than adipogenesis in this murine model.Item Genes Involved in Complex Adaptive Processes Tend to Have Highly Conserved Upstream Regions in Mammalian Genomes(BioMed Central, 2005-11-27) Lee, Soohyun; Kohane, Isaac; Kasif, SimonBACKGROUND: Recent advances in genome sequencing suggest a remarkable conservation in gene content of mammalian organisms. The similarity in gene repertoire present in different organisms has increased interest in studying regulatory mechanisms of gene expression aimed at elucidating the differences in phenotypes. In particular, a proximal promoter region contains a large number of regulatory elements that control the expression of its downstream gene. Although many studies have focused on identification of these elements, a broader picture on the complexity of transcriptional regulation of different biological processes has not been addressed in mammals. The regulatory complexity may strongly correlate with gene function, as different evolutionary forces must act on the regulatory systems under different biological conditions. We investigate this hypothesis by comparing the conservation of promoters upstream of genes classified in different functional categories. RESULTS: By conducting a rank correlation analysis between functional annotation and upstream sequence alignment scores obtained by human-mouse and human-dog comparison, we found a significantly greater conservation of the upstream sequence of genes involved in development, cell communication, neural functions and signaling processes than those involved in more basic processes shared with unicellular organisms such as metabolism and ribosomal function. This observation persists after controlling for G+C content. Considering conservation as a functional signature, we hypothesize a higher density of cis-regulatory elements upstream of genes participating in complex and adaptive processes. CONCLUSION: We identified a class of functions that are associated with either high or low promoter conservation in mammals. We detected a significant tendency that points to complex and adaptive processes were associated with higher promoter conservation, despite the fact that they have emerged relatively recently during evolution. We described and contrasted several hypotheses that provide a deeper insight into how transcriptional complexity might have been emerged during evolution.Item Computational Tradeoffs in Multiplex PCR Assay Design for SNP Genotyping(BioMed Central, 2005-7-25) Rachlin, John; Ding, Chunming; Cantor, Charles R.; Kasif, SimonBACKGROUND: Multiplex PCR is a key technology for detecting infectious microorganisms, whole-genome sequencing, forensic analysis, and for enabling flexible yet low-cost genotyping. However, the design of a multiplex PCR assays requires the consideration of multiple competing objectives and physical constraints, and extensive computational analysis must be performed in order to identify the possible formation of primer-dimers that can negatively impact product yield. RESULTS: This paper examines the computational design limits of multiplex PCR in the context of SNP genotyping and examines tradeoffs associated with several key design factors including multiplexing level (the number of primer pairs per tube), coverage (the % of SNP whose associated primers are actually assigned to one of several available tube), and tube-size uniformity. We also examine how design performance depends on the total number of available SNPs from which to choose, and primer stringency criterial. We show that finding high-multiplexing/high-coverage designs is subject to a computational phase transition, becoming dramatically more difficult when the probability of primer pair interaction exceeds a critical threshold. The precise location of this critical transition point depends on the number of available SNPs and the level of multiplexing required. We also demonstrate how coverage performance is impacted by the number of available snps, primer selection criteria, and target multiplexing levels. CONCLUSION: The presence of a phase transition suggests limits to scaling Multiplex PCR performance for high-throughput genomics applications. Achieving broad SNP coverage rapidly transitions from being very easy to very hard as the target multiplexing level (# of primer pairs per tube) increases. The onset of a phase transition can be "delayed" by having a larger pool of SNPs, or loosening primer selection constraints so as to increase the number of candidate primer pairs per SNP, though the latter may produce other adverse effects. The resulting design performance tradeoffs define a benchmark that can serve as the basis for comparing competing multiplex PCR design optimization algorithms and can also provide general rules-of-thumb to experimentalists seeking to understand the performance limits of standard multiplex PCR.