Wittkopp, P. J. & Kalay, G. Cis-regulatory elements: molecular mechanisms and evolutionary processes underlying divergence. Nat. Rev. Genet. 13, 59–69 (2011).
Google Scholar
Hill, M. S., Vande Zande, P. & Wittkopp, P. J. Molecular and evolutionary processes generating variation in gene expression. Nat. Rev. Genet. 22, 203–215 (2021).
Google Scholar
Fuqua, T. et al. Dense and pleiotropic regulatory information in a developmental enhancer. Nature 587, 235–239 (2020).
Google Scholar
de Visser, J. A. G. M. & Krug, J. Empirical fitness landscapes and the predictability of evolution. Nat. Rev. Genet. 15, 480–490 (2014).
Google Scholar
Kondrashov, D. A. & Kondrashov, F. A. Topological features of rugged fitness landscapes in sequence space. Trends Genet. 31, 24–33 (2015).
Google Scholar
de Visser, J. A. G. M., Elena, S. F., Fragata, I. & Matuszewski, S. The utility of fitness landscapes and big data for predicting evolution. Heredity 121, 401–405 (2018).
Google Scholar
Weirauch, M. T. & Hughes, T. R. Conserved expression without conserved regulatory sequence: the more things change, the more they stay the same. Trends Genet. 26, 66–74 (2010).
Google Scholar
Orr, H. A. The genetic theory of adaptation: a brief history. Nat. Rev. Genet. 6, 119–127 (2005).
Google Scholar
Weinreich, D. M., Lan, Y., Wylie, C. S. & Heckendorn, R. B. Should evolutionary geneticists worry about higher-order epistasis? Curr. Opin. Genet. Dev. 23, 700–707 (2013).
Google Scholar
Venkataram, S. et al. Development of a comprehensive genotype-to-fitness map of adaptation-driving mutations in yeast. Cell 166, 1585–1596 (2016).
Google Scholar
Keren, L. et al. Massively parallel interrogation of the effects of gene expression levels on fitness. Cell 166, 1282–1294 (2016).
Google Scholar
Sarkisyan, K. S. et al. Local fitness landscape of the green fluorescent protein. Nature 533, 397–401 (2016).
Google Scholar
Ogden, P. J., Kelsic, E. D., Sinai, S. & Church, G. M. Comprehensive AAV capsid fitness landscape reveals a viral gene and enables machine-guided design. Science 366, 1139–1143 (2019).
Google Scholar
Pitt, J. N. & Ferré-D’Amaré, A. R. Rapid construction of empirical RNA fitness landscapes. Science 330, 376–379 (2010).
Google Scholar
Shultzaberger, R. K., Malashock, D. S., Kirsch, J. F. & Eisen, M. B. The fitness landscapes of cis-acting binding sites in different promoter and environmental contexts. PLoS Genet. 6, e1001042 (2010).
Google Scholar
Mustonen, V., Kinney, J., Callan, C. G. & Lässig, M. Energy-dependent fitness: a quantitative model for the evolution of yeast transcription factor binding sites. Proc. Natl Acad. Sci. USA 105, 12376–12381 (2008).
Google Scholar
Hartl, D. L. What can we learn from fitness landscapes? Curr. Opin. Microbiol. 0, 51–57 (2014).
Google Scholar
Otwinowski, J. & Nemenman, I. Genotype to phenotype mapping and the fitness landscape of the E. coli lac promoter. PLoS ONE 8, e61570 (2013).
Google Scholar
Sinai, S. & Kelsic, E. D. A primer on model-guided exploration of fitness landscapes for biological sequence design. Preprint at https://arxiv.org/abs/2010.10614 (2020).
Zhou, J. & Troyanskaya, O. G. Predicting effects of noncoding variants with deep learning-based sequence model. Nat. Methods 12, 931–934 (2015).
Google Scholar
Avsec, Ž. et al. Base-resolution models of transcription-factor binding reveal soft motif syntax. Nat. Genet. 53, 354–366 (2021).
Google Scholar
Shrikumar, A., Greenside, P. & Kundaje, A. Learning important features through propagating activation differences. Proc. 34th International Conference on Machine Learning 3145–3153 (2017).
Avsec, Ž. et al. Effective gene expression prediction from sequence by integrating long-range interactions. Nat. Methods 18, 1196–1203 (2021).
Google Scholar
Fragata, I., Blanckaert, A., Louro, M. A. D., Liberles, D. A. & Bank, C. Evolution in the light of fitness landscape theory. Trends Ecol. Evol. 34, 69–82 (2019).
Google Scholar
Payne, J. L. & Wagner, A. The causes of evolvability and their evolution. Nat. Rev. Genet. 20, 24–38 (2019).
Google Scholar
de Boer, C. G. et al. Deciphering eukaryotic gene-regulatory logic with 100 million random promoters. Nat. Biotechnol. 38, 56–65 (2020).
Google Scholar
Crocker, J. et al. Low affinity binding site clusters confer hox specificity and regulatory robustness. Cell 160, 191–203 (2015).
Google Scholar
Habib, N., Wapinski, I., Margalit, H., Regev, A. & Friedman, N. A functional selection model explains evolutionary robustness despite plasticity in regulatory networks. Mol. Syst. Biol. 8, 619 (2012).
Google Scholar
Gillespie, J. H. Molecular evolution over the mutational landscape. Evolution 38, 1116–1129 (1984).
Google Scholar
Jerison, E. R. & Desai, M. M. Genomic investigations of evolutionary dynamics and epistasis in microbial evolution experiments. Curr. Opin. Genet. Dev. 35, 33–39 (2015).
Google Scholar
Sæther, B.-E. & Engen, S. The concept of fitness in fluctuating environments. Trends Ecol. Evol. 30, 273–281 (2015).
Google Scholar
Vaswani, A. et al. in Advances in Neural Information Processing Systems 30 (eds. Guyon, I. et al.) 5998–6008 (Curran Associates, 2017).
Weirauch, M. T. et al. Evaluation of methods for modeling transcription factor sequence specificity. Nat. Biotechnol. 31, 126–134 (2013).
Google Scholar
Yang, N. & Bielawski, N. Statistical methods for detecting molecular adaptation. Trends Ecol. Evol. 15, 496–503 (2000).
Google Scholar
Moses, A. M. Statistical tests for natural selection on regulatory regions based on the strength of transcription factor binding sites. BMC Evol. Biol. 9, 286 (2009).
Google Scholar
Rifkin, S. A., Houle, D., Kim, J. & White, K. P. A mutation accumulation assay reveals a broad capacity for rapid evolution of gene expression. Nature 438, 220–223 (2005).
Google Scholar
Peter, J. et al. Genome evolution across 1,011 Saccharomyces cerevisiae isolates. Nature 556, 339–344 (2018).
Google Scholar
Erb, I. & van Nimwegen, E. Transcription factor binding site positioning in yeast: proximal promoter motifs characterize TATA-less promoters. PLoS One 6, e24279 (2011).
Google Scholar
Gilad, Y., Oshlack, A. & Rifkin, S. A. Natural selection on gene expression. Trends Genet. 22, 456–461 (2006).
Google Scholar
Alhusaini, N. & Coller, J. The deadenylase components Not2p, Not3p, and Not5p promote mRNA decapping. RNA 22, 709–721 (2016).
Google Scholar
Yang, J.-R., Maclean, C. J., Park, C., Zhao, H. & Zhang, J. Intra and interspecific variations of gene expression levels in yeast are largely neutral: (Nei Lecture, SMBE 2016, Gold Coast). Mol. Biol. Evol. 34, 2125–2139 (2017).
Google Scholar
Chen, J. et al. A quantitative framework for characterizing the evolutionary history of mammalian gene expression. Genome Res. 29, 53–63 (2019).
Google Scholar
Payne, J. L. & Wagner, A. Mechanisms of mutational robustness in transcriptional regulation. Front. Genet. 6, 322 (2015).
Google Scholar
Shoval, O. et al. Evolutionary trade-offs, Pareto optimality, and the geometry of phenotype space. Science 336, 1157–1160 (2012).
Google Scholar
van Dijk, D. et al. Finding archetypal spaces using neural networks. IEEE International Conference on Big Data 2634-2643 (2019).
He, X., Duque, T. S. P. C. & Sinha, S. Evolutionary origins of transcription factor binding site clusters. Mol. Biol. Evol. 29, 1059–1070 (2012).
Google Scholar
Cliften, P. F. et al. Surveying Saccharomyces genomes to identify functional elements by comparative DNA sequence analysis. Genome Res. 11, 1175–1186 (2001).
Google Scholar
Heinz, S., Romanoski, C. E., Benner, C. & Glass, C. K. The selection and function of cell type-specific enhancers. Nat. Rev. Mol. Cell Biol. 16, 144–154 (2015).
Google Scholar
Lehner, B. Selection to minimise noise in living systems and its implications for the evolution of gene expression. Mol. Syst. Biol. 4, 170 (2008).
Google Scholar
Metzger, B. P. H., Yuan, D. C., Gruber, J. D., Duveau, F. & Wittkopp, P. J. Selection on noise constrains variation in a eukaryotic promoter. Nature 521, 344–347 (2015).
Google Scholar
Kosuri, S. et al. Composability of regulatory sequences controlling transcription and translation in Escherichia coli. Proc. Natl Acad. Sci. USA. 110, 14024–14029 (2013).
Google Scholar
Shalem, O. et al. Systematic dissection of the sequence determinants of gene 3’ end mediated expression control. PLoS Genet. 11, e1005147 (2015).
Google Scholar
Kinney, J. B., Murugan, A., Callan, C. G. Jr & Cox, E. C. Using deep sequencing to characterize the biophysical mechanism of a transcriptional regulatory sequence. Proc. Natl Acad. Sci. USA. 107, 9158–9163 (2010).
Google Scholar
Sharon, E. et al. Inferring gene regulatory logic from high-throughput measurements of thousands of systematically designed promoters. Nat. Biotechnol. 30, 521–530 (2012).
Google Scholar
Melnikov, A. et al. Systematic dissection and optimization of inducible enhancers in human cells using a massively parallel reporter assay. Nat. Biotechnol. 30, 271–277 (2012).
Google Scholar
Kwasnieski, J. C., Mogno, I., Myers, C. A., Corbo, J. C. & Cohen, B. A. Complex effects of nucleotide variants in a mammalian cis-regulatory element. Proc. Natl Acad. Sci. USA 109, 19498–19503 (2012).
Google Scholar
Kircher, M. et al. Saturation mutagenesis of twenty disease-associated regulatory elements at single base-pair resolution. Nat. Commun. 10, 3583 (2019).
Google Scholar
Townsley, K. G., Brennand, K. J. & Huckins, L. M. Massively parallel techniques for cataloguing the regulome of the human brain. Nat. Neurosci. 23, 1509–1521 (2020).
Google Scholar
Renganaath, K. et al. Systematic identification of cis-regulatory variants that cause gene expression differences in a yeast cross. eLife 9, e62669 (2020).
Google Scholar
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
Google Scholar
Alipanahi, B., Delong, A., Weirauch, M. T. & Frey, B. J. Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning. Nat. Biotechnol. 33, 831–838 (2015).
Google Scholar
Travers, C. et al. Opportunities and obstacles for deep learning in biology and medicine. J. R. Soc. Interface 15, 20170387 (2018).
Avsec, Ž. et al. The Kipoi repository accelerates community exchange and reuse of predictive models for genomics. Nat. Biotechnol. 37, 592–600 (2019).
Google Scholar
Quang, D. & Xie, X. FactorNet: a deep learning framework for predicting cell type specific transcription factor binding from nucleotide-resolution sequential data. Methods 166, 40–47 (2019).
Google Scholar
Zhou H. et al. Towards a better understanding of reverse-complement equivariance for deep learning models in genomics. Proc. 16th Machine Learning in Computational Biology meeting 165, 1–33 (2022).
Morrow, A. et al. Convolutional kitchen sinks for transcription factor binding site prediction. Preprint at https://arxiv.org/abs/1706.00125 (2017).
Kelley, D. R., Snoek, J. & Rinn, J. L. Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks. Genome Res. 26, 990–999 (2016).
Google Scholar
Koo, P. K., Majdandzic, A., Ploenzke, M., Anand, P. & Paul, S. B. Global importance analysis: an interpretability method to quantify importance of genomic features in deep neural networks. PLoS Comput. Biol. 17, e1008925 (2021).
Google Scholar
Quang, D. & Xie, X. DanQ: a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences. Nucleic Acids Res. 44, e107 (2016).
Google Scholar
Kingma, D. P. & Ba, J. Adam: a method for stochastic optimization. International Conference on Learning Representations (Poster) (2015).
Abadi, M. et al. TensorFlow: large-scale machine learning on heterogenous systems. Software available from https://www.tensorflow.org/ (2015).
Jouppi, N. P. et al. In-datacenter performance analysis of a tensor processing unit. In Proc. 44th Annual International Symposium on Computer Architecture 1–12 (2017).
Li, J., Pu, Y., Tang, J., Zou, Q. & Guo, F. DeepATT: a hybrid category attention neural network for identifying functional effects of DNA sequences. Brief. Bioinform. 22, bbaa159 (2020).
Ullah, F. & Ben-Hur, A. A self-attention model for inferring cooperativity between regulatory features. Nucleic Acids Res. 49, e77 (2021).
Google Scholar
Clauwaert, J., Menschaert, G. & Waegeman, W. Explainability in transformer models for functional genomics. Brief. Bioinform. 22, bbab060 (2021).
Google Scholar
Hinton, G. & Tieleman, T. Lecture 6.5—RmsProp: divide the gradient by a running average of its recent magnitude. COURSERA: Neural Networks for Machine Learning 4, 26–31 (2012).
Sinai, S. et al. AdaLead: a simple and robust adaptive greedy search algorithm for sequence design. Preprint at https://arxiv.org/abs/2010.02141 (2020).
Linder, J., Bogard, N., Rosenberg, A. B. & Seelig, G. A generative neural network for maximizing fitness and diversity of synthetic DNA and protein sequences. Cell Syst. 11, 49–62 (2020).
Google Scholar
Brookes, D., Park, H. & Listgarten, J. Conditioning by adaptive sampling for robust design. Proc. Mach. Learn. Res. 97, 773–782 (2019).
Killoran, N., Lee, L. J., Delong, A., Duvenaud, D. & Frey, B. J. Generating and designing DNA with deep generative models. Neurips Computational Biology Workshop (2017).
Fortin, F.-A., Rainville, F.-M. D., Gardner, M.-A., Parizeau, M. & Gagné, C. DEAP: evolutionary algorithms made easy. J. Mach. Learn. Res. 13, 2171–2175 (2012).
Google Scholar
Jaeger, S. A. et al. Conservation and regulatory associations of a wide affinity range of mouse transcription factor binding sites. Genomics 95, 185–195 (2010).
Google Scholar
Tanay, A. Extensive low-affinity transcriptional interactions in the yeast genome. Genome Res. 16, 962–972 (2006).
Google Scholar
Sniegowski, P. D. & Gerrish, P. J. Beneficial mutations and the dynamics of adaptation in asexual populations. Phil. Trans. R. Soc. B 365, 1255–1263 (2010).
Google Scholar
Szendro, I. G., Franke, J., de Visser, J. A. & Krug, J. Predictability of evolution depends nonmonotonically on population size. Proc. Natl Acad. Sci. USA 110, 571–576 (2013).
Google Scholar
Orr, H. A. The population genetics of adaptation: the adaptation of DNA Sequences. Evolution 56, 1317–1330 (2002).
Google Scholar
Bailey, T. L. DREME: motif discovery in transcription factor ChIP-seq data. Bioinformatics 27, 1653–1659 (2011).
Google Scholar
de Boer, C. G. & Hughes, T. R. YeTFaSCo: a database of evaluated yeast transcription factor sequence specificities. Nucleic Acids Res. 40, D169–D179 (2012).
Google Scholar
Kent, W. J. BLAT—the BLAST-Like Alignment Tool. Genome Res. 12, 656–664 (2002).
Google Scholar
Cherry, J. M. et al. Saccharomyces Genome Database: the genomics resource of budding yeast. Nucleic Acids Res. 40, D700–D705 (2012).
Google Scholar
Smith, J. D., McManus, K. F. & Fraser, H. B. A novel test for selection on cis-regulatory elements reveals positive and negative selection acting on mammalian transcriptional enhancers. Mol. Biol. Evol. 30, 2509–2518 (2013).
Google Scholar
Liu, J. & Robinson-Rechavi, M. Robust inference of positive selection on regulatory sequences in the human brain. Sci. Adv. 6, eabc9863 (2020).
Google Scholar
Rice, D. P. & Townsend, J. P. A test for selection employing quantitative trait locus and mutation accumulation data. Genetics 190, 1533–1545 (2012).
Google Scholar
Denver, D. R., Morris, K., Lynch, M. & Thomas, W. K. High mutation rate and predominance of insertions in the Caenorhabditis elegans nuclear genome. Nature 430, 679–682 (2004).
Google Scholar
Virtanen, P. et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat. Methods 17, 261–272 (2020).
Google Scholar
Thompson, D. A. et al. Evolutionary principles of modular gene regulation in yeasts. eLife 2, e00603 (2013).
Google Scholar
Yassour, M. et al. Strand-specific RNA sequencing reveals extensive regulated long antisense transcripts that are conserved across yeast species. Genome Biol. 11, R87 (2010).
Google Scholar
Grabherr, M. G. et al. Full-length transcriptome assembly from RNA-seq data without a reference genome. Nat. Biotechnol. 29, 644–652 (2011).
Google Scholar
Wu, T. D. & Watanabe, C. K. GMAP: a genomic mapping and alignment program for mRNA and EST sequences. Bioinformatics 21, 1859–1875 (2005).
Google Scholar
Wapinski, I., Pfeffer, A., Friedman, N. & Regev, A. Natural history and evolutionary principles of gene duplication in fungi. Nature 449, 54–61 (2007).
Google Scholar
Li, B. & Dewey, C. N. RSEM: accurate transcript quantification from RNA-seq data with or without a reference genome. BMC Bioinformatics 12, 323 (2011).
Google Scholar
Yates, A. D. et al. Ensembl 2020. Nucleic Acids Res. 48, D682–D688 (2020).
Google Scholar
DiCarlo, J. E. et al. Genome engineering in Saccharomyces cerevisiae using CRISPR–Cas systems. Nucleic Acids Res. 41, 4336–4343 (2013).
Google Scholar
Fleiss, A. et al. Reshuffling yeast chromosomes with CRISPR/Cas9. PLoS Genet. 15, e1008332 (2019).
Google Scholar
Horwitz, A. A. et al. Efficient multiplexed integration of synergistic alleles and metabolic pathways in yeasts via CRISPR–Cas. Cell Syst. 1, 88–96 (2015).
Google Scholar
Livak, K. J. & Schmittgen, T. D. Analysis of relative gene expression data using real-time quantitative PCR and the 2−ΔΔCT method. Methods 25, 402–408 (2001).
Google Scholar
Vandesompele, J. et al. Accurate normalization of real-time quantitative RT–PCR data by geometric averaging of multiple internal control genes. Genome Biol. 3, research0034.1 (2002).
Teste, M.-A., Duquenne, M., François, J. M. & Parrou, J.-L. Validation of reference genes for quantitative expression analysis by real-time RT–PCR in Saccharomyces cerevisiae. BMC Mol. Biol. 10, 99 (2009).
Google Scholar
Mardones, W. et al. Rapid selection response to ethanol in Saccharomyces eubayanus emulates the domestication process under brewing conditions. Microb. Biotechnol. https://doi.org/10.1111/1751-7915.13803 (2021).
Ibstedt, S. et al. Concerted evolution of life stage performances signals recent selection on yeast nitrogen use. Mol. Biol. Evol. 32, 153–161 (2015).
Google Scholar
Rich, M. S. et al. Comprehensive analysis of the SUL1 promoter of Saccharomyces cerevisiae. Genetics 203, 191–202 (2016).
Google Scholar
Rest, J. S. et al. Nonlinear fitness consequences of variation in expression level of a eukaryotic gene. Mol. Biol. Evol. 30, 448–456 (2013).
Google Scholar
Bergen, A. C., Olsen, G. M. & Fay, J. C. Divergent MLS1 promoters lie on a fitness plateau for gene expression. Mol. Biol. Evol. 33, 1270–1279 (2016).
Google Scholar
Alstott, J., Bullmore, E. & Plenz, D. Powerlaw: a Python package for analysis of heavy-tailed distributions. PLoS One 9, e85777 (2014).
Google Scholar