Software - SFselect

SFselect is a method for classifying genomic regions evolving under positive selection, from those evolving neutrally

Selective sweeps leave a detectable signature on the site frequency spectrum. The specific signature, however, depends (among other things) on the time since the sweep begun (t), and on the strength of selection (s). In addition, the demographic history of a population also affects the site frequency spectrum.

We consider a "hard sweep" model of natural selection, where a single (novel) beneficial allele sweeps through the population. Our method is based on a form of supervised learning (Support Vector Machines), where the features for learning and classification are given by the scaled Site Frequency Spectrum in a region.

SFselect can be used in different ways:

  • Classify polymorphism data using a pre-trained general model that is robust to many different combinations of (s,t) values, while maintaining high power. Barring reliable knowledge of the selective sweep, we recommend using this option.
  • Classify polymorphism data using a pre-trained specific model that is most powerful for a given combination of (s,t) values. These specific models do not generalize as well for other (s,t) values. This option may be used given reliable knowledge of the selective sweep.

Citation: Ronen, Roy, Nitin Udpa, Eran Halperin, and Vineet Bafna. "Learning Natural Selection from the Site Frequency Spectrum." Genetics (2013).


If you have questions or need further support, please contact us.

Use Online Tool

Sign up or Login if you want to work with a file larger than 5.0 MB.