Getting Started - HyPhy - Hypothesis Testing using Phylogenies

Using HyPhy#

There are four ways to use HyPhy:

Run HyPhy on our accompanying Datamonkey webserver#

This option is the easiest, supports most popular analyses, and does not require use of the command line. Access Datamonkey here, or see this development version of Datamonkey for newer methods and a dramatically better user experience.

Run HyPhy from the command line#

This option is the most flexible approach through which you can access all available analyses and pipelines as well as customize your own HyPhy analyses. Follow these instructions for download and installation.

Run HyPhy custom analyses without the command line#

Run a legacy graphical user interface version of HyPhy (no longer developed, but still supporting many popular analyses) Mac OS X or Windows. Follow these instructions for download and installation.

Use HyPhy for software/pipeline development#

Compile HyPhy as a library that can be accessed via Python, R, or other language bindings. Follow these instructions for download and installation.

Typical uses of HyPhy#

HyPhy ships with a library of standard analyses that implement ~100 different methods from start to finish. HyPhy is most commonly used for characterizing the evolutionary process, in particular:

Detecting signatures of selection
Estimating evolutionary rates
Comparing different evolutionary models
Fitting custom models to sequence alignments

Characterizing selective pressures#

HyPhy provides a suite of diverse phylogenetic methodologies for testing specific hypotheses about selection in protein-coding and/or amino-acid multiple sequence alignments. Which method you select will depend on your specific question. Below we recommend several methods for different purposes, linked to more in depth descriptions. Tutorials for using these methods are also available here.

Note that you may find it useful to perform pre-processing on your dataset, specifically by screening for recombination breakpoints using our GARD (Genetic Algorithm for Recombination Detection) method before proceeding to selection analysis.

Are individual sites subject to pervasive (across the whole phylogeny) positive or purifying selection?#

FEL (Fixed Efects Likelihood) is suitable for small-to-medium sized data sets.
SLAC (Single-Likelihood Ancestor Counting) is an approximate method with accuracy similar to FEL, but suitable for larger datasets. However, SLAC is not suitable for highly-diverged sequences.
FUBAR (Fast, Unconstrained Bayesian AppRoximation) is suitable for medium-to-large data sets and is expected to have more power than FEL for detecting pervasive selection at sites. FUBAR is the preferred approach for inferring pervasive selection.

Are individual sites subject to episodic (at a subset of branches) positive or purifying selection?#

MEME (Mixed Effects Model of Evolution) tests for episodic selection at individual sites. Note that MEME does not accept a priori branch specifications (this feature is being introduced with v2.3-dev and later). MEME is the preferred approach for detecting positive selection at individual sites.

Are individual branches subject to episodic (at a subset of sites) positive or purifying selection?#

aBSREL (adaptive Branch-Site Random Effects Likelihood) is an improved version of the common "branch-site" class of models. aBSREL allows either for a priori specification of branch(es) to test for selection, or can test each lineage for selection in an exploratory fashion. Note that the exploratory approach will sacrifice power. aBSREL is the preferred approach for detecting positive selection at individual branches.

Has a gene experienced positive selection at any site on a particular branch or set of branches?#

BUSTED (Branch-Site Unrestricted Statistical Test for Episodic Diversification) will test for gene-wide selection at pre-specified lineages. This method is particularly useful for relatively small datasets (fewer than 10 taxa) where other methods may not have sufficient power to detect selection. This method is not suitable for identifying specific sites subject to positive seleciton.

Has gene-wide selection pressure been relaxed or intensified along a certain subset of branches?#

RELAX tests for a relaxation (e.g. where purifying selection has become less stringent) or an intensification (e.g. where purifying selection has become stronger) of selection pressures along a specified set of "test" branches. This method is not suitable for detecting positive selection.