Genome-wide studies have been very successful in identifying moderate genetic effects of common alleles. However, large portion of genetic effect has remained unidentified and alternative approaches have to be used for gaining information about uncommon and rare variants as well. In genome-wide association analysis, the power of detecting true positive genotype-phenotype associations decreases in case of low minor-allele frequency. Therefore methods, which combine the information of several markers in genomic regions, should be preferred rather than using single-marker data.
GRANVIL (Gene- or Region-based ANalysis of Variants of Intermediate and Low frequency) is an implementation of a method described by Morris and Zeggini 1 to perform rare-variant analysis of binary or quantitative phenotypes. The method is based on accumulation of minor alleles of rare or uncommon markers discovered through dense genotyping or resequencing data. Association analyses are based on gene- or other pre-defined regions, determined by analyst.
 An evaluation of statistical approaches to rare variant analysis in genetic association studies (2010). Morris, A. P.; Zeggini, E. Genetic Epidemiology 34:188-193.
Magi, Kumar, Morris: Assessing the impact of missing genotype data in rare variant association analysis. BMC Proceedings 2011
Copy GRANVILv*.zip file into your computer, unzip the file:
To compile GRANVIL program, use command:
in the folder where files have been unpacked. The program can be run by typing:
For running GRANVIL, you need input files in SNPTESTv.2 format and a GENELIST file. SNPTEST file formats are described here. In case of case-control type of analysis, you should have single gen and sample file, where the phenotype is coded 0=control; 1=case.
1 rs1 11 A T 1 0 0 1 0 0 1 0 0
1 rs2 210 A T 0 1 0 1 0 0 1 0 0
1 rs3 300 A T 1 0 0 1 0 0 1 0 0
1 rs4 4637 A T 1 0 0 1 0 0 1 0 0
1 rs5 5555 A T 1 0 0 1 0 0 1 0 0
(Genotype file can be gzipped, if it has *.gz extension)
Sample_id Subject_id Missing Gender Phenotype Phenotype
Q 0 0 0 D B P
1 1 0 1 1 4.1
2 2 0 1 1 4.2
3 3 0 1 0 4.3
This file contains one co-variate (Gender) and two phenotypes: first one is case-control type (for logistic regression) and second one is a continuous phenotype (for linear regresson). In current version only continuous co-variates are enabled. Discreet co-variates can be used, if they have two classes (males=1, females=0 etc.). Adjusting for centre effect or other multi-categorious co-variate: create N-1 dummy variables coded 0 or 1 for N centers and code these variables as continuous (number 3 in second row of sample file).
A1 1 11 111
A5 1 2500 27300
A13 1 14 3000
A15 1 9 24780
Genelist file contains four columns: 1. GENE ID 2. chromosome 3. start position in bp 4. end position in bp If using file, make sure that the positions in genelist file are from the same dbSNP version as in the GEN file.
Command line options:
./GRANVIL [-f ] -g -s -x [-o ] -m [--print_gene ] ... [--sex ] -p [--debug] [--cov_all] [--cond_marker ] ... [--cond_gene ] ... [--cov_name ] ... [-r ] [--call_thresh ] [--imp_thresh ] [--missing_code ] [--chr ] [--extract_markers ] [--extract_samples ] [--exclude_markers ] [--exclude_samples ] [--] [--version] [-h]
-f , --flanking This specifies flanking region size in kb (default 0 kb)
-g , --gen (required) This specifies genotype file. Can be gzipped, but must have *.gz extension then.
-s , --sample (required) This specifies sample file
-x , --genmap (required) This specifies gene map file
-o , --out This specifies output files root
-m , --method (required) This option controls how the genotype uncertainity is taken into account: (a) threshold - genotypes with probability >= 0.95 will be analysed; (b) expected - genotype dosage will be constructed based on the sum of probabilities of genotypes containing one or two copies of minor allele.
-p , --pheno (required) This specifies phenotype to test
REMOVED FROM GRANVILv2.1 --lowmem In case of limited memory, lowmem options can be used. If both genelist and genotype file are both sorted in similar manner, sorted option can be used (faster). If the markers are in random order, unsorted option must be used
--debug Debug mode enabled
--cov_all All covariates are used in analysis
--cov_name (accepted multiple times) Name of covariate to use (in case of several covariates, use this command multiple times i.e. --cov_name SEX --cov_name AGE etc.)
-r , --rare_thresh Minor allele cutoff for defining rare variants (default 0.05)
--call_thresh Call-rate threshold for the best guess genotypes (default 0.9)
--imp_thresh Inputation score threshold for including markers (default 0.4)
--missing_code This specifies the coding for missing data (default NA)
--chr All markers in genotype file are forced to be from this chromosome. Ignoring the first column in genotype file --extract_samples This specifies file with extracted samples
--exclude_samples This specifies file with excluded samples
--extract_markers This specifies file with extracted markers. Marker extraction list must be in LINKAGE map format (4 columns: chromosome, markername, genetic_distance, position) and the markers are been excluded by chromosome name and position to prevent problems with different marker namings of 1000g imputed markers.
--exclude_markers This specifies file with excluded markers. Marker exclusion list must be in LINKAGE map format (4 columns: chromosome, markername, genetic_distance, position) and the markers are been excluded by chromosome name and position to prevent problems with different marker namings of 1000g imputed markers.
--print_gene (accepted multiple times) Name of gene model to be printed (in case of several genes, use thiscommand multiple times) Option available in GRANVILv2.1
--sex This specifies gender column name in sample file for sex stratified analysis (men=0, women=1) Option available in GRANVILv2.1
--cond_marker (accepted multiple times) Name of marker to use as a covariant in conditional analysis (in case of several markers, use this command multiple times) Option available in GRANVILv2.1
--cond_gene (accepted multiple times) Name of gene to use as a covariant in conditional analysis (in case of several genes, use this command multiple times) Option available in GRANVILv2.1
--, --ignore_rest Ignores the rest of the labeled arguments following this flag.
--version Displays version information and exits.
-h, --help Displays usage information and exits.
QUANTITATIVE TEST: ./GRANVIL -g data.impute.txt -s data.sample.txt --pheno PhenotypeQ --genmap ucsc_genes_b37.txt -m expected
CASE-CONTROL TEST: ./GRANVIL -g data.impute.txt -s data.sample.txt --pheno Phenotype --genmap ucsc_genes_b37.txt -m expected
Output file format
Results file contains following columns:
1. gene - gene ID
2. marker_count - number of rare markers in gene region
3. sample_count - number of samples in analysis
4. rare_variant_sum - count of rare alleles found in individuals
5. total_maf - sum of MAF of all used markers in given gene region
6. average_maf - average MAF of used markers in given gene region (total_maf / marker_count)
7. beta - effect size
8. se - std. error of effect
9. z - z-statistic
10. p - p-value