RegScan is a command line tool for performing fast association analysis between allele frequencies and continuous traits. It uses linear regression to estimate marker effects on continuous traits.
The main features of RegScan are:
- Speed. Currently it is about an order of magnitude faster than the leading GWAS methods (that compute p-value, effect size and standard error such as SNPTEST or QuickTest) with one trait, and hundreds of times faster with a large number of traits and use of restrictive filters. RegScan achieves its speed by efficient implementation and performing only a critical number of statistical tests.
- Handling of combinatorial traits. RegScan can automatically create and analyze combinatorial traits such as trait ratios, products, sums, and differences.
- Automatically analyzes any number of traits. It can automatically analyze any number of traits without the user having to specify what traits to consider. This saves time during runtime but also makes input data preparation easier.
- Runtime filtering. In order to save computational time and reduce the output size the user can set restrictive filters during runtime (but also after runtime). Filtering of hits is done using a) the slope (effect size), b) standard error of slope, c) R2, t-value, or p-value, and c) minor allele count (MAC).
- Introduction of Reliability Score (RS). RegScan introduces the Reliability Score (RS) - a simple metric to help to isolate the biologically potentially most interesting associations using combinatorial traits.
- Additional functions. RegScan comes with several supporting functions required for data preparation and conversion as well as for filtering and analyzing the results.
- Optional summary file. All analysis results are placed in one file. Upon request RegScan will produce an additional summary output file which lists for each marker the best association with a trait based on the statistical parameter (p value) and also the effect size. This enables the user to quickly isolate the most interesting findings in the output data.
- Availability. RegScan is an open source project; the code can be compiled for all major computational platforms and both the 32- and 64-bit architectures.
- User support. RegScan comes with user instructions and test datasets for practicing and better understanding the functions. The authors also provide technical support and take requests for future updates.
- File formats. RegScan uses the following genotype file formats as input: gen, gen.gz, bgen. Bgen support was incorporated in version 0.2 thanks to help from Dr. Gavin Band (Gavin Band & Jonathan Marchini, the BGEN format, http://www.well.ox.ac.uk/~gav/bgen_format/).
RegScan's main goal is to achieve maximal computational speed in order to be applicable for the initial testing of very large data sets.
The article has been published in Briefings in Bioinformatics:
T. Haller, M. Kals, T. Esko, R. Mägi, K. Fischer. RegScan: a GWAS tool for quick estimation of allele effects on continuous traits and their combinations. Briefings in Bioinformatics.2015 Jan;16(1):39-44. doi: 10.1093/bib/bbt066. Epub 2013 Sep 5.
Download: Abstract Full text Pdf
Please choose your system:
- 64-bit Scientific Linux (Red Hat family): download static compilation
Upon request RegScan executable is available also for these systems:
- Debian Linux (Debian, Ubuntu, Mint)
- Mac OS X (Snow Leopard 10.6.8)
- Windows (XP, Vista, 7, 8)
- Source code: download code for compiling yourself
- User manual: download technical details, user guide, examples
Detailed instructions can be downloaded here. Illustrative instructions are found below.
(Attn: Reading the detailed instructions is essential for being able to fully use the RegScan program!)
RegScan includes functions for linear regression analysis and preparing files for it and well as functions for post-runtime analysis.
Regression analyis is carried out like this:
./REGSCAN -M gwas -gfile -pfile -missing -slope -statistic -statlimit -maclimit -selimit -out -summary -buffer
-gfile (required ) = genotype file format
-pfile (required) = phenotype file in RegScan format (easily derived from .sample format by RegScan)
-missing = missing phenotype data identifyer
-slope = effect size lower limit for screening
-statistic = main statistic used for screening; options: R2, T value, P value
-statlimit = screening limit for the statistical analysis (upper limit for P value, lower limit for R2 and T value)
-maclimit = minimal allowed minor allele count limit for screening (details in user guide)
-selimit = standard error of slope (SE) limit for screening
-out = output file name
-summary = additional summary file; options: yes, no
-buffer = memory allocation for maximal computational speed
./REGSCAN -M gwas -gfile TEST.gen -pfile TEST.regscan -missing na -slope 0.01 -statistic p -statlimit 5e-8 -maclimit 5 -selimit 1 -out results.txt -summary no -buffer 500
Please contact us if you have any questions or suggestions:
toomas.haller [ät] ut.ee
tom [ät] toomashaller.com