Parameters¶

This explains in more detail how and when to use each keyword in the config file. For readability, the minimum required parameters are listed for each step. If more than one step will be run (>1 out of 5 possible pipeline logic keywords is set to true), the minimum parameters required is the sum of the minimum required parameters for all true keywords.

Minimum Required Parameters for Each Step¶

Full List of User Input Data Parameters¶

Parameter	Type	Default	Description	Relevance
ChromosomeLengthFile	string		This is a tab-delimited text file that contains the same names as the chromosomes in your dataset followed by the chromosome length. This file can generally be pulled down from the NCBI website under the build you are using (hg19 or hg38). This is used for chunking the imputed data.	if `skipChunking:false`
Build	option: hg19 or hg38	hg38	This determines whether the software needs to be searching for a “chr” before the chomosome name or not
Chromosomes	string	1-22	This lets the software know which chromosomes you want to use for association analysis. It must be a range. If you want to only run analysis on a single chromsome, the start and end will be the same value. For example: running chromome 2 only will look like 2-2.	if `GenerateAssociations:true`
ImputeSuffix	string		The full suffix for the software to determine which files in the directory are impuation files. Important, it is assuming the prefix is either a chromosome number (hg19) or the string chr followed by the chromosome number (hg38)	if `GenerateAssociations:true` and `skipChunking:false`
ImputeDir	string		Full path to directory where imputed results are located	if `GenerateAssociations:true`
OutDir	string		Full path to directory where final results should be transferred	all
OutPrefix	string		string (no whitespace or special characters) to prefix to the output files generated	all
PhenoFile	string		Full path to tab-delimted phenotype file containing sample IDs, phenotypes, and covariates, with whatever string of headers you choose. NO WHITESPACES in header names.	If :`GenerateNull:true`
Plink	string		Full path to the directory and plink file prefix (dropping the suffix .bed,.bim,.fam) to an LD-pruned set of data to be used to generate GRM relatedness and to select random markers from for the variance ratio value	If `GenerateGRM:true` or `GenerateNull:true`
Trait	option: binary or quantitative		Based upon your association phenotype. If binary, all values will be 0/1/NA, if quantitivate all phenotype traits to be analyzed will be continuous or have numeric quantitative meaning	if `GenerateNull:true` or `GenerateResults:true`
Pheno	string		The exact name (case-sensitive) of the phenotype to be analyzed in your PhenoFile. Must be present in PhenoFile.	if `GenerateNull:true` or `GenerateResults:true`
InvNorm	boolean	FALSE	This applies to the phenotype of interest to be analzyed and whether to perform an inverse normalization. For binary traits, this should be set to FALSE and for quantitative traits, set this to TRUE.	if `GenerateNull:true`
Covars	comma-separated list		A comma-separated list (no whitespaces) of all the covariate names to regress out in the model. These variables need to be in your PhenoFile.	if `GenerateNull:true` or `GenerateResults:true`
SampleID	string		A string (no whitespaces) that is contained in the header of your PhenoFile. This is the sampleID names and they must be the same names as listed in the PhenoFile, Imputation Files, and Plink Files.	if `GenerateNull:true`
Nthreads	int		Strongly Recommned to leave this blank! If left blank, it will auto-decect available resources and scale steps automatically on the back-end. By specifiying the threads it tells the program to use max Nthreads for parallelization and concurrency.	all
SparseKin	boolean	TRUE	If set to true, takes advantage of the sparsity of the GRM, otherwise will not use the sparsity to make assessments	if `GenerateNull:true`
Markers	int	30	The number of random markers selected from the LD-pruned plink file to estimate the variance ratio component in the null model. Warning! This number increases time linearly	if `GenerateNull:true`
Rel	float	0.0625	A float between 0.0-1.0. This is the threshold in kinship estimate to consider someone related. Anything below this value will be treated as an unrelated individual in the pairwise comparison and calculation for the sparse GRM.	if `GenerateGRM:true`
Loco	boolean	TRUE	Leave-One-Chromosome-Out method. Warning – Setting this to true, increases the time complexity of the algorithm.	if GenerateNull:true`and :code:`GenerateAssociations:true
CovTransform	boolean	TRUE	Recommended to set to true. It is a QR decomposition that aids in the covergence of the null model.	if `GenerateNull:true`
VcfField	option: DS or GT	DS	This determines what metric to base association upon. DS = dosages and GT = genotypes. If you have genotypes only, i.e. chip data withouth dosage calculations, DS cannot be used!	if `GenerateAssociations:true`
MAF	float	0.05	Float between 0.0-0.50 that specifies the cutoff to be considered a common snp or a rare snp. For example, keeping this to the default of 0.05 will assume common snps are defined as those with a minor allele frequency >5% and that rare snps are defined as those with a minor allele frequency ≤ 5%. THIS IS NOT A FILTER!	if `GenerateResults:true`
MAC	int	10	A filter applied to the cleaned association results to remove snps that have low minor allele counts. Default recommendation is to set this to 10.	if `GenerateResults:true`
IsDropMissingDosages	boolean	FALSE		if `GenerateAssociations:true`
InfoFile	string		Path to the info file. This file contains snps information pertaining to chromosome, positions, genotype/imputation status, R2, ER2 values. For formatting of this file please refer to <—————–→	if `GenerateResults:true`
SaveChunks	boolean	TRUE	Specifies whether to save the chunked files and the queue list for future use.	if `GenerateAssociations:true` and `skipChunking:false`
GrmMAF	float	0.01	The minor allele frequency threshold for a snp to be included in the GRM calculation based on the LD-pruned plink file. For example, if set to 0.01 this means any snp with a MAF > 0.01 wil be used to calculate relatedness in the GRM.	if `GenerateGRM:true`
ChunkVariants	int	1000000	The window of base pairs to chunk imputation files. It is recommended to keep this at the default of 1000000.	if `GenerateNull:true` and `SkipChunking:false`
SaveAsTar	boolean	FALSE		all
ImputationFileList	string		Ends in _chunkedImputationQueue.txt	if `GenerateAssociations:true` and `skipChunking:true`
SparseGRM	string		Ends in .sparseGRM.mtx	if `GenerateGRM:false` and `GenerateNull:true`
SampleIDFile	string		Ends in sparseGRM.mtx.sampleIDs.txt	if `GenerateGRM:false` and `GenerateNull:true`
NullModelFile	string		Ends in .rda	if `GenerateNull:false` and `GenerateAssociations:true`
VarianceRatioFile	string		Ends in .varianceRatio.txt	if `GenerateNull:false` and `GenerateAssociations:true`
AssociationFile	string		Ends in _SNPassociationAnalysis.txt	If `GenerateAssociations:false` and `GenerateResults:true`

Parameters¶

Minimum Required Parameters for Each Step¶

Full List of User Input Data Parameters¶

Table of Contents

Previous topic

Next topic

This Page