Title: | Detect Runs of Homozygosity and Runs of Heterozygosity in Diploid Genomes |
---|---|
Description: | Detection of runs of homozygosity and of heterozygosity in diploid genomes using two methods: sliding windows (Purcell et al (2007) <doi:10.1086/519795>) and consecutive runs (Marras et al (2015) <doi:10.1111/age.12259>). |
Authors: | Filippo Biscarini [aut, cre], Paolo Cozzi [aut], Giustino Gaspa [aut], Gabriele Marras [aut] |
Maintainer: | Filippo Biscarini <[email protected]> |
License: | GPL-3 |
Version: | 0.9.6.9000 |
Built: | 2025-03-03 05:19:32 UTC |
Source: | https://github.com/bioinformatics-ptp/detectruns |
This is the main detectRUNS function to scan the genome for runs (of homozygosity or heterozygosity) using the consecutive method (Marras et al. 2015, Animal Genetics 46(2):110-121). All parameters to detect runs (e.g. minimum n. of SNP, max n. of missing genotypes, max n. of opposite genotypes etc.) are specified here. Input data are in the ped/map Plink format (https://www.cog-genomics.org/plink/1.9/input#ped)
consecutiveRUNS.run( genotypeFile, mapFile, ROHet = FALSE, maxOppRun = 0, maxMissRun = 0, minSNP = 15, minLengthBps = 1000, maxGap = 10^6 )
consecutiveRUNS.run( genotypeFile, mapFile, ROHet = FALSE, maxOppRun = 0, maxMissRun = 0, minSNP = 15, minLengthBps = 1000, maxGap = 10^6 )
genotypeFile |
genotype (.ped) file path |
mapFile |
map file (.map) file path |
ROHet |
should we look for ROHet or ROHom? (default = FALSE) |
maxOppRun |
max n. of opposite genotype SNPs in the run (default = 0) |
maxMissRun |
max n. of missing SNPs in the run (default = 0) |
minSNP |
minimum n. of SNP in a RUN (default = 15) |
minLengthBps |
minimum length of run in bps (defaults to 1000 bps = 1 kbps) |
maxGap |
max distance between consecutive SNP in a window to be still considered a potential run (defaults to 10^6) |
This function scans the genome (diploid) for runs using the consecutive method. This is a wrapper function for many component functions that handle the input data (ped/map files), performs internal conversions, accepts parameters specifications, selects the statistical method to detect runs (sliding windows, consecutive loci) and whether runs of homozygosity (RoHom) or of heterozygosity (RoHet) are looked for.
In the ped file, the groups samples belong to can be specified (first column). This is important if comparisons between human ethnic groups or between animal breeds or plant varieties or biological populations are to be performed. Also, if cases and controls are to be compared, this is the place where this information needs to be specified.
This function returns a data frame with all runs detected in the dataset. This data frame can then be written out to a csv file. The data frame is, in turn, the input for other functions of the detectRUNS package that create plots and produce statistics of the results (see plot and statistic functions in this manual, and/or refer to the vignette of detectRUNS).
A dataframe with RUNs of Homozygosity or Heterozygosity in the analysed dataset. The returned dataframe contains the following seven columns: "group", "id", "chrom", "nSNP", "from", "to", "lengthBps" (group: population, breed, case/control etc.; id: individual identifier; chrom: chromosome on which the run is located; nSNP: number of SNPs in the run; from: starting position of the run, in bps; to: end position of the run, in bps; lengthBps: size of the run)
# getting map and ped paths genotypeFile <- system.file("extdata", "Kijas2016_Sheep_subset.ped", package = "detectRUNS") mapFile <- system.file("extdata", "Kijas2016_Sheep_subset.map", package = "detectRUNS") # calculating runs with consecutive run approach ## Not run: # skipping runs calculation runs <- consecutiveRUNS.run(genotypeFile, mapFile, minSNP = 15, ROHet = FALSE, maxOppRun = 0, maxMissRun = 0, maxGap=10^6, minLengthBps = 100000) ## End(Not run) # loading pre-calculated data runsFile <- system.file("extdata", "Kijas2016_Sheep_subset.consecutive.csv", package="detectRUNS") colClasses <- c(rep("character", 3), rep("numeric", 4) ) runs <- read.csv2(runsFile, header = TRUE, stringsAsFactors = FALSE, colClasses = colClasses)
# getting map and ped paths genotypeFile <- system.file("extdata", "Kijas2016_Sheep_subset.ped", package = "detectRUNS") mapFile <- system.file("extdata", "Kijas2016_Sheep_subset.map", package = "detectRUNS") # calculating runs with consecutive run approach ## Not run: # skipping runs calculation runs <- consecutiveRUNS.run(genotypeFile, mapFile, minSNP = 15, ROHet = FALSE, maxOppRun = 0, maxMissRun = 0, maxGap=10^6, minLengthBps = 100000) ## End(Not run) # loading pre-calculated data runsFile <- system.file("extdata", "Kijas2016_Sheep_subset.consecutive.csv", package="detectRUNS") colClasses <- c(rep("character", 3), rep("numeric", 4) ) runs <- read.csv2(runsFile, header = TRUE, stringsAsFactors = FALSE, colClasses = colClasses)
This is a core function. It implements the consecutive method for detection of runs in diploid genomes (see Marras et al. 2015)
consecutiveRunsCpp( indGeno, individual, mapFile, ROHet = TRUE, minSNP = 3L, maxOppositeGenotype = 1L, maxMiss = 1L, minLengthBps = 1000L, maxGap = 1000000L )
consecutiveRunsCpp( indGeno, individual, mapFile, ROHet = TRUE, minSNP = 3L, maxOppositeGenotype = 1L, maxMiss = 1L, minLengthBps = 1000L, maxGap = 1000000L )
indGeno |
vector of 0/1/NAs of individual genotypes (0: homozygote; 1: heterozygote) |
individual |
list of group (breed, population, case/control etc.) and ID of individual sample |
mapFile |
Plink map file (for SNP position) |
ROHet |
shall we detect ROHet or ROHom? |
minSNP |
minimum number of SNP in a run |
maxOppositeGenotype |
max n. of homozygous/heterozygous SNP |
maxMiss |
max. n. of missing SNP |
minLengthBps |
min length of a run in bps |
maxGap |
max distance between consecutive SNP in a window to be still considered a potential run |
The consecutive method detect runs by consecutively scanning SNP loci along the genome. No sliding windows are used. Checks on minimum n. of SNP, max n. of opposite and missing genotypes, max gap between adjacent loci and minimum length of the run are implemented (as in the sliding window method). Both runs of homozygosity (RoHom) and of heterozygosity (RoHet) can be search for (option ROHet: TRUE/FALSE)
A data frame of runs per individual sample
Function to create a dataframe of RUNS per individual animal Requires a map file (other filename to read or R object) Parameters on maximum number of missing and opposite genotypes in the run (not the window) are implemented here
createRUNdf( snpRun, mapFile, minSNP = 3, minLengthBps = 1000, minDensity = 1/10, oppositeAndMissingSNP, maxOppRun = NULL, maxMissRun = NULL )
createRUNdf( snpRun, mapFile, minSNP = 3, minLengthBps = 1000, minDensity = 1/10, oppositeAndMissingSNP, maxOppRun = NULL, maxMissRun = NULL )
snpRun |
vector of TRUE/FALSE (is the SNP in a RUN?) |
mapFile |
Plink-like map file (data.frame) |
minSNP |
minimum n. of SNP to call a RUN |
minLengthBps |
minimum length of run in bps (defaults to 1000 bps = 1 kbps) |
minDensity |
minimum n. of SNP per kbps (defaults to 0.1 = 1 SNP every 10 kbps) |
oppositeAndMissingSNP |
indexed array of missing and opposite genotypes (SNP order in the genome is the index) |
maxOppRun |
max n. of opposite genotype SNPs in the run (not in the window!) |
maxMissRun |
max n. of missing SNPs in the run (not in the window!) |
a data.frame with RUNS per animal
This is an helper function, this will be called by another function
findOppositeAndMissing(data, ROHet = TRUE)
findOppositeAndMissing(data, ROHet = TRUE)
data |
vector of 0/1/2 genotypes |
ROHet |
TRUE in ROHet evaluation, FALSE for ROHom |
character array; names will be index in which opposite and missing snps are found in data array
This function calculates the individual inbreeding coefficients based on runs of homozygosity (ROH), either per-chromosome (chromosome-wide) or based on the entire genome (genome-wide). See details of calculations below
Froh_inbreeding(runs, mapFile, genome_wide = TRUE)
Froh_inbreeding(runs, mapFile, genome_wide = TRUE)
runs |
R object (dataframe) with results on runs |
mapFile |
Plink map file (to retrieve SNP position) |
genome_wide |
vector of TRUE/FALSE (genome-wide or chromosome-wide; defaults to TRUE/genome-wide) |
Froh is calculated as:
Depending on whether genome-wide or chromosome-wide calculations are required, the terms in the numerator and denominator will refer to the entire genome or will be restricted to specific chromosomes.
A data frame with the inbreeding coefficients of each individual sample
# getting map and ped paths genotypeFile <- system.file("extdata", "Kijas2016_Sheep_subset.ped", package = "detectRUNS") mapFile <- system.file("extdata", "Kijas2016_Sheep_subset.map", package = "detectRUNS") # calculating runs of Homozygosity ## Not run: # skipping runs calculation runs <- slidingRUNS.run(genotypeFile, mapFile, windowSize = 15, threshold = 0.1, minSNP = 15, ROHet = FALSE, maxOppositeGenotype = 1, maxMiss = 1, minLengthBps = 100000, minDensity = 1/10000) ## End(Not run) # loading pre-calculated data runsFile <- system.file("extdata", "Kijas2016_Sheep_subset.sliding.csv", package="detectRUNS") runsDF <- readExternalRuns(inputFile = runsFile, program = 'detectRUNS') Froh_inbreeding(runs = runsDF, mapFile = mapFile) Froh_inbreeding(runs = runsDF, mapFile = mapFile, genome_wide=FALSE)
# getting map and ped paths genotypeFile <- system.file("extdata", "Kijas2016_Sheep_subset.ped", package = "detectRUNS") mapFile <- system.file("extdata", "Kijas2016_Sheep_subset.map", package = "detectRUNS") # calculating runs of Homozygosity ## Not run: # skipping runs calculation runs <- slidingRUNS.run(genotypeFile, mapFile, windowSize = 15, threshold = 0.1, minSNP = 15, ROHet = FALSE, maxOppositeGenotype = 1, maxMiss = 1, minLengthBps = 100000, minDensity = 1/10000) ## End(Not run) # loading pre-calculated data runsFile <- system.file("extdata", "Kijas2016_Sheep_subset.sliding.csv", package="detectRUNS") runsDF <- readExternalRuns(inputFile = runsFile, program = 'detectRUNS') Froh_inbreeding(runs = runsDF, mapFile = mapFile) Froh_inbreeding(runs = runsDF, mapFile = mapFile, genome_wide=FALSE)
This function calculates the individual inbreeding coefficients based on runs of
homozygosity (ROH) using only ROH of specific size classes.
The parameter class
specify the size interval to split up calculations.
For example, if class = 2
Froh based on ROH 0-2, 2-4, 4-8, 80-16, >16 Mbps long
will be calculated.
Froh_inbreedingClass(runs, mapFile, Class = 2)
Froh_inbreedingClass(runs, mapFile, Class = 2)
runs |
R object (dataframe) with ROH results |
mapFile |
Plink map file (for SNP position) |
Class |
base ROH-length interval (in Mbps) (default: 0-2, 2-4, 4-8, 8-16, >16) |
A data frame with individual inbreeding coefficients based on ROH-length of specific size. The sum of ROH-length of specific size in each individual is reported alongside
# getting map and ped paths genotypeFile <- system.file("extdata", "Kijas2016_Sheep_subset.ped", package = "detectRUNS") mapFile <- system.file("extdata", "Kijas2016_Sheep_subset.map", package = "detectRUNS") # calculating runs of Homozygosity ## Not run: # skipping runs calculation runs <- slidingRUNS.run(genotypeFile, mapFile, windowSize = 15, threshold = 0.1, minSNP = 15, ROHet = FALSE, maxOppositeGenotype = 1, maxMiss = 1, minLengthBps = 100000, minDensity = 1/10000) ## End(Not run) # loading pre-calculated data runsFile <- system.file("extdata", "Kijas2016_Sheep_subset.sliding.csv", package="detectRUNS") runsDF <- readExternalRuns(inputFile = runsFile, program = 'detectRUNS') Froh_inbreedingClass(runs = runsDF, mapFile = mapFile, Class = 2)
# getting map and ped paths genotypeFile <- system.file("extdata", "Kijas2016_Sheep_subset.ped", package = "detectRUNS") mapFile <- system.file("extdata", "Kijas2016_Sheep_subset.map", package = "detectRUNS") # calculating runs of Homozygosity ## Not run: # skipping runs calculation runs <- slidingRUNS.run(genotypeFile, mapFile, windowSize = 15, threshold = 0.1, minSNP = 15, ROHet = FALSE, maxOppositeGenotype = 1, maxMiss = 1, minLengthBps = 100000, minDensity = 1/10000) ## End(Not run) # loading pre-calculated data runsFile <- system.file("extdata", "Kijas2016_Sheep_subset.sliding.csv", package="detectRUNS") runsDF <- readExternalRuns(inputFile = runsFile, program = 'detectRUNS') Froh_inbreedingClass(runs = runsDF, mapFile = mapFile, Class = 2)
This is a utility function, that convert 0/1/2 genotypes (AA/AB/BB) into 0/1 (either homozygous/heterozygous)
genoConvertCpp(genotype)
genoConvertCpp(genotype)
genotype |
vector of 0/1/2 genotypes |
converted vector of genotypes (0/1)
This is a core function within the sliding-window workflow. Parameters on how to consider a window heterozygous are here (maxHom, maxMiss)
heteroZygotTest(x, gaps, maxHom, maxMiss, maxGap, i, windowSize)
heteroZygotTest(x, gaps, maxHom, maxMiss, maxGap, i, windowSize)
x |
vector of 0/1 genotypes (from genoConvert()) |
gaps |
vector of differences between consecutive positions (gaps) in bps |
maxHom |
max n. of homozygous SNP in a heterozygous window |
maxMiss |
max n. of missing in a window |
maxGap |
max distance between consecutive SNP in a window to be still considered a potential run |
i |
index along the genome (genome-vector for each individual) |
windowSize |
size of window (n. of SNP) |
a list: i) TRUE/FALSE (whether a window is heterozygous or NOT); ii) indexes of "opposite and missing" genotype
This is a core function. Parameters on how to consider a window heterozygous are here (maxHom, maxMiss)
heteroZygotTestCpp(x, gaps, maxHom, maxMiss, maxGap)
heteroZygotTestCpp(x, gaps, maxHom, maxMiss, maxGap)
x |
vector of 0/1 genotypes (from genoConvert()) |
gaps |
vector of differences between consecutive positions (gaps) in bps |
maxHom |
max n. of homozygous SNP in a heterozygous window |
maxMiss |
max n. of missing in a window |
maxGap |
max distance between consecutive SNP in a window to be still considered a potential run |
TRUE/FALSE (whether a window is heterozygous or NOT)
This is a core function. Parameters on how to consider a window homozygous are here (maxHet, maxMiss)
homoZygotTest(x, gaps, maxHet, maxMiss, maxGap, i, windowSize)
homoZygotTest(x, gaps, maxHet, maxMiss, maxGap, i, windowSize)
x |
vector of 0/1 genotypes (from genoConvert()) |
gaps |
vector of differences between consecutive positions (gaps) in bps |
maxHet |
max n. of heterozygous SNP in a homozygous window |
maxMiss |
max n. of missing in a window |
maxGap |
max distance between consecutive SNP in a window to be still considered a potential run |
i |
index along the genome (genome-vector for each individual) |
windowSize |
size of window (n. of SNP) |
a list: i) TRUE/FALSE (whether a window is heterozygous or NOT); ii) indexes of "opposite and missing" genotype
This is a core function. Parameters on how to consider a window homozygous are here (maxHet, maxMiss)
homoZygotTestCpp(x, gaps, maxHet, maxMiss, maxGap)
homoZygotTestCpp(x, gaps, maxHet, maxMiss, maxGap)
x |
vector of 0/1 genotypes (from genoConvert()) |
gaps |
vector of differences between consecutive positions (gaps) in bps |
maxHet |
max n. of heterozygous SNP in a homozygous window |
maxMiss |
max n. of missing in a window |
maxGap |
max distance between consecutive SNP in a window to be still considered a potential run |
TRUE/FALSE (whether a window is homozygous or NOT)
This is a utility function, that convert ped genotypes (AA/AB/BB) into 0/1 (either homozygous/heterozygous)
pedConvertCpp(genotype)
pedConvertCpp(genotype)
genotype |
vector of pair of genotypes (01, AA, AG) |
converted vector of genotypes (0/1)
This function the distribution of runs per group. The average run length per size-class,
the average run length per chromosome (and group), the percent distribution of runs
per size-class and group, and the proportion of runs per chromosome are plotted.
With style="All"
all three plots are produced.
plot_DistributionRuns( runs, mapFile, groupSplit = TRUE, style = c("MeanClass", "MeanChr", "RunsPCT", "RunsPCT_Chr", "All"), savePlots = FALSE, outputName = NULL, plotTitle = NULL, Class = 2 )
plot_DistributionRuns( runs, mapFile, groupSplit = TRUE, style = c("MeanClass", "MeanChr", "RunsPCT", "RunsPCT_Chr", "All"), savePlots = FALSE, outputName = NULL, plotTitle = NULL, Class = 2 )
runs |
R object (dataframe) with results on detected runs |
mapFile |
Plink map file (for SNP position) |
groupSplit |
plots split by group (defaults to TRUE) |
style |
type of plot: MeanClass, MeanChr, RunsPCT, RunsPCT_Chr, All (all plots) |
savePlots |
should plots be saved out to files or plotted in the graphical terminal (default)? |
outputName |
title prefix (the base name of graph, if savePlots is TRUE)#' |
plotTitle |
title in plot (default NULL) |
Class |
group of length (in Mbps) by class (default: 0-2, 2-4, 4-8, 8-16, >16) |
plot Distribution Runs
# getting map and ped paths genotypeFile <- system.file("extdata", "Kijas2016_Sheep_subset.ped", package = "detectRUNS") mapFile <- system.file("extdata", "Kijas2016_Sheep_subset.map", package = "detectRUNS") # calculating runs of Homozygosity ## Not run: # skipping runs calculation runs <- slidingRUNS.run(genotypeFile, mapFile, windowSize = 15, threshold = 0.1, minSNP = 15, ROHet = FALSE, maxOppositeGenotype = 1, maxMiss = 1, minLengthBps = 100000, minDensity = 1/10000) ## End(Not run) # loading pre-calculated data runsFile <- system.file("extdata", "Kijas2016_Sheep_subset.sliding.csv", package="detectRUNS") runsDF <- readExternalRuns(inputFile = runsFile, program = 'detectRUNS') plot_InbreedingChr(runs = runsDF, mapFile = mapFile, style='All')
# getting map and ped paths genotypeFile <- system.file("extdata", "Kijas2016_Sheep_subset.ped", package = "detectRUNS") mapFile <- system.file("extdata", "Kijas2016_Sheep_subset.map", package = "detectRUNS") # calculating runs of Homozygosity ## Not run: # skipping runs calculation runs <- slidingRUNS.run(genotypeFile, mapFile, windowSize = 15, threshold = 0.1, minSNP = 15, ROHet = FALSE, maxOppositeGenotype = 1, maxMiss = 1, minLengthBps = 100000, minDensity = 1/10000) ## End(Not run) # loading pre-calculated data runsFile <- system.file("extdata", "Kijas2016_Sheep_subset.sliding.csv", package="detectRUNS") runsDF <- readExternalRuns(inputFile = runsFile, program = 'detectRUNS') plot_InbreedingChr(runs = runsDF, mapFile = mapFile, style='All')
The function plots the distribution of inbreeding/consanguinity coefficients
per chromosome and/or group. Three types of plots can be produces: barplots, boxplots,
violin plots. With style="All"
all three plots are produced.
plot_InbreedingChr( runs, mapFile, groupSplit = TRUE, style = c("ChrBarPlot", "ChrBoxPlot", "FrohBoxPlot", "All"), outputName = NULL, plotTitle = NULL, savePlots = FALSE )
plot_InbreedingChr( runs, mapFile, groupSplit = TRUE, style = c("ChrBarPlot", "ChrBoxPlot", "FrohBoxPlot", "All"), outputName = NULL, plotTitle = NULL, savePlots = FALSE )
runs |
R object (dataframe) with results on detected runs |
mapFile |
Plink map file (for SNP position) |
groupSplit |
plots split by group (defaults to TRUE) |
style |
type of plot: ChrBarPlot, ChrBoxPlot, FrohBoxPlot, All (all plots) |
outputName |
title prefix (the base name of graph, if savePlots is TRUE) |
plotTitle |
title in plot (default NULL) |
savePlots |
should plots be saved out to files or plotted in the graphical terminal (default)? |
plots of the distribution of inbreeding by chromosome and group
# getting map and ped paths genotypeFile <- system.file("extdata", "Kijas2016_Sheep_subset.ped", package = "detectRUNS") mapFile <- system.file("extdata", "Kijas2016_Sheep_subset.map", package = "detectRUNS") # calculating runs of Homozygosity ## Not run: # skipping runs calculation runs <- slidingRUNS.run(genotypeFile, mapFile, windowSize = 15, threshold = 0.1, minSNP = 15, ROHet = FALSE, maxOppositeGenotype = 1, maxMiss = 1, minLengthBps = 100000, minDensity = 1/10000) ## End(Not run) # loading pre-calculated data runsFile <- system.file("extdata", "Kijas2016_Sheep_subset.sliding.csv", package="detectRUNS") runsDF <- readExternalRuns(inputFile = runsFile, program = 'detectRUNS') plot_InbreedingChr(runs = runsDF, mapFile = mapFile, style='All')
# getting map and ped paths genotypeFile <- system.file("extdata", "Kijas2016_Sheep_subset.ped", package = "detectRUNS") mapFile <- system.file("extdata", "Kijas2016_Sheep_subset.map", package = "detectRUNS") # calculating runs of Homozygosity ## Not run: # skipping runs calculation runs <- slidingRUNS.run(genotypeFile, mapFile, windowSize = 15, threshold = 0.1, minSNP = 15, ROHet = FALSE, maxOppositeGenotype = 1, maxMiss = 1, minLengthBps = 100000, minDensity = 1/10000) ## End(Not run) # loading pre-calculated data runsFile <- system.file("extdata", "Kijas2016_Sheep_subset.sliding.csv", package="detectRUNS") runsDF <- readExternalRuns(inputFile = runsFile, program = 'detectRUNS') plot_InbreedingChr(runs = runsDF, mapFile = mapFile, style='All')
Function to plot the proportion of times/percentage each SNP in inside a run (population-specific signals) against SNP position in all chromosomes together Proportions on the y-axis, bps on the x-axis for all analysed chromosomes This is similar to the familiar GWAS Manhattan plot
plot_manhattanRuns( runs, genotypeFile, mapFile, pct_threshold = 0.33, x_font_size = 10, savePlots = FALSE, file_type = "pdf", outputName = NULL, plotTitle = NULL, plot_w = 8, plot_h = 6 )
plot_manhattanRuns( runs, genotypeFile, mapFile, pct_threshold = 0.33, x_font_size = 10, savePlots = FALSE, file_type = "pdf", outputName = NULL, plotTitle = NULL, plot_w = 8, plot_h = 6 )
runs |
a data.frame with runs per individual (group, id, chrom, nSNP, start, end, length) |
genotypeFile |
genotype (.ped) file path |
mapFile |
map file (.map) file path |
pct_threshold |
reference line for significant regions (e.g. 0.5 –> 50% SNPs in runs; default is 0.33) |
x_font_size |
font size for x axis values (chromosome numbers: default = 10) |
savePlots |
should plots be saved out in files (default) or plotted in the graphical terminal? |
file_type |
type of plot file to ba saved (if savePlots is TRUE; default is pdf) |
outputName |
title prefix (the base name of graph, if savePlots is TRUE) |
plotTitle |
title in plot (default) |
plot_w |
plot width (if savePlots = TRUE; default is 8) |
plot_h |
plot height (if savePlots = TRUE; default is 6) |
Manhattan plots of proportion of times SNPs are inside runs, per population (pdf files)
# getting map and ped paths genotypeFile <- system.file("extdata", "Kijas2016_Sheep_subset.ped", package = "detectRUNS") mapFile <- system.file("extdata", "Kijas2016_Sheep_subset.map", package = "detectRUNS") # calculating runs of Homozygosity ## Not run: # skipping runs calculation runs <- slidingRUNS.run(genotypeFile, mapFile, windowSize = 15, threshold = 0.1, minSNP = 15, ROHet = FALSE, maxOppositeGenotype = 1, maxMiss = 1, minLengthBps = 100000, minDensity = 1/10000) ## End(Not run) # loading pre-calculated data runsFile <- system.file("extdata", "Kijas2016_Sheep_subset.sliding.csv", package="detectRUNS") runsDF <- readExternalRuns(inputFile = runsFile, program = 'detectRUNS') # plot runs per animal (interactive) plot_manhattanRuns(runs = runsDF, genotypeFile = genotypeFile, mapFile = mapFile, savePlots = FALSE, plotTitle = "ROHom")
# getting map and ped paths genotypeFile <- system.file("extdata", "Kijas2016_Sheep_subset.ped", package = "detectRUNS") mapFile <- system.file("extdata", "Kijas2016_Sheep_subset.map", package = "detectRUNS") # calculating runs of Homozygosity ## Not run: # skipping runs calculation runs <- slidingRUNS.run(genotypeFile, mapFile, windowSize = 15, threshold = 0.1, minSNP = 15, ROHet = FALSE, maxOppositeGenotype = 1, maxMiss = 1, minLengthBps = 100000, minDensity = 1/10000) ## End(Not run) # loading pre-calculated data runsFile <- system.file("extdata", "Kijas2016_Sheep_subset.sliding.csv", package="detectRUNS") runsDF <- readExternalRuns(inputFile = runsFile, program = 'detectRUNS') # plot runs per animal (interactive) plot_manhattanRuns(runs = runsDF, genotypeFile = genotypeFile, mapFile = mapFile, savePlots = FALSE, plotTitle = "ROHom")
Function to plot the sum of run lengths (or the average run length) per individual against the average number of runs per individual. Points can be differentially coloured by group/population. This plot can be useful to identify patterns in the distribution of runs in different groups (e.g. few long runs vs many short runs)
plot_PatternRuns( runs, mapFile, method = c("sum", "mean"), outputName = NULL, savePlots = FALSE, plotTitle = NULL )
plot_PatternRuns( runs, mapFile, method = c("sum", "mean"), outputName = NULL, savePlots = FALSE, plotTitle = NULL )
runs |
a data.frame with runs per individual (group, id, chrom, nSNP, start, end, length) |
mapFile |
map file (.map) file path |
method |
"sum" or "mean" of run lengths per individual sample |
outputName |
title prefix (the base name of graph, if savePlots is TRUE)#' |
savePlots |
should plots be saved out to files or plotted in the graphical terminal (default)? |
plotTitle |
title in plot (default NULL) |
plot of number of runs vs run-length sum/mean per individual sample
# getting map and ped paths genotypeFile <- system.file("extdata", "Kijas2016_Sheep_subset.ped", package = "detectRUNS") mapFile <- system.file("extdata", "Kijas2016_Sheep_subset.map", package = "detectRUNS") # calculating runs of Homozygosity ## Not run: # skipping runs calculation runs <- slidingRUNS.run(genotypeFile, mapFile, windowSize = 15, threshold = 0.1, minSNP = 15, ROHet = FALSE, maxOppositeGenotype = 1, maxMiss = 1, minLengthBps = 100000, minDensity = 1/10000) ## End(Not run) # loading pre-calculated data runsFile <- system.file("extdata", "Kijas2016_Sheep_subset.sliding.csv", package="detectRUNS") runsDF <- readExternalRuns(inputFile = runsFile, program = 'detectRUNS') plot_PatternRuns(runs = runsDF, mapFile = mapFile, method = 'sum') plot_PatternRuns(runs = runsDF, mapFile = mapFile, method = 'mean')
# getting map and ped paths genotypeFile <- system.file("extdata", "Kijas2016_Sheep_subset.ped", package = "detectRUNS") mapFile <- system.file("extdata", "Kijas2016_Sheep_subset.map", package = "detectRUNS") # calculating runs of Homozygosity ## Not run: # skipping runs calculation runs <- slidingRUNS.run(genotypeFile, mapFile, windowSize = 15, threshold = 0.1, minSNP = 15, ROHet = FALSE, maxOppositeGenotype = 1, maxMiss = 1, minLengthBps = 100000, minDensity = 1/10000) ## End(Not run) # loading pre-calculated data runsFile <- system.file("extdata", "Kijas2016_Sheep_subset.sliding.csv", package="detectRUNS") runsDF <- readExternalRuns(inputFile = runsFile, program = 'detectRUNS') plot_PatternRuns(runs = runsDF, mapFile = mapFile, method = 'sum') plot_PatternRuns(runs = runsDF, mapFile = mapFile, method = 'mean')
Function to plot runs per individual (see Williams et al. 2016, Animal Genetics, for an example with animal data) Individual IDs on the y-axis, bps on the x-axis (position along the chromosome)
plot_Runs( runs, suppressInds = FALSE, savePlots = FALSE, separatePlots = FALSE, outputName = NULL )
plot_Runs( runs, suppressInds = FALSE, savePlots = FALSE, separatePlots = FALSE, outputName = NULL )
runs |
a data.frame with runs per individual (group, id, chrom, nSNP, start, end, length) |
suppressInds |
shall we suppress individual IDs on the y-axis? (defaults to FALSE) |
savePlots |
should plots be saved out to files (one pdf file for all chromosomes) or plotted in the graphical terminal (default)? |
separatePlots |
should plots for each chromosome be saved out to separate files? |
outputName |
title prefix (the base name of graph, if savePlots is TRUE) |
plot of runs by chromosome
# getting map and ped paths genotypeFile <- system.file("extdata", "Kijas2016_Sheep_subset.ped", package = "detectRUNS") mapFile <- system.file("extdata", "Kijas2016_Sheep_subset.map", package = "detectRUNS") # calculating runs of Homozygosity ## Not run: # skipping runs calculation runs <- slidingRUNS.run(genotypeFile, mapFile, windowSize = 15, threshold = 0.1, minSNP = 15, ROHet = FALSE, maxOppositeGenotype = 1, maxMiss = 1, minLengthBps = 100000, minDensity = 1/10000) ## End(Not run) # loading pre-calculated data runsFile <- system.file("extdata", "Kijas2016_Sheep_subset.sliding.csv", package="detectRUNS") runsDF <- readExternalRuns(inputFile = runsFile, program = 'detectRUNS') # plot runs per animal (interactive) plot_Runs(runs = runsDF, suppressInds = FALSE, savePlots = FALSE, outputName = "ROHom")
# getting map and ped paths genotypeFile <- system.file("extdata", "Kijas2016_Sheep_subset.ped", package = "detectRUNS") mapFile <- system.file("extdata", "Kijas2016_Sheep_subset.map", package = "detectRUNS") # calculating runs of Homozygosity ## Not run: # skipping runs calculation runs <- slidingRUNS.run(genotypeFile, mapFile, windowSize = 15, threshold = 0.1, minSNP = 15, ROHet = FALSE, maxOppositeGenotype = 1, maxMiss = 1, minLengthBps = 100000, minDensity = 1/10000) ## End(Not run) # loading pre-calculated data runsFile <- system.file("extdata", "Kijas2016_Sheep_subset.sliding.csv", package="detectRUNS") runsDF <- readExternalRuns(inputFile = runsFile, program = 'detectRUNS') # plot runs per animal (interactive) plot_Runs(runs = runsDF, suppressInds = FALSE, savePlots = FALSE, outputName = "ROHom")
Function to plot the number of times/percentage each SNP is inside a run (population-specific signals) against the SNP positions in the genome. Proportions on the y-axis, bps on the x-axis
plot_SnpsInRuns( runs, genotypeFile, mapFile, savePlots = FALSE, separatePlots = FALSE, outputName = NULL )
plot_SnpsInRuns( runs, genotypeFile, mapFile, savePlots = FALSE, separatePlots = FALSE, outputName = NULL )
runs |
a data.frame with runs per individual (group, id, chrom, nSNP, start, end, length) |
genotypeFile |
genotype (.ped) file path |
mapFile |
map file (.map) file path |
savePlots |
should plots be saved out in files (default) or plotted in the graphical terminal? |
separatePlots |
should plots for each chromosome be saved out to separate files? |
outputName |
title prefix (the base name of graph, if savePlots is TRUE) |
plot number of times a SNP is in a run by chromosome and population (pdf files)
# getting map and ped paths genotypeFile <- system.file("extdata", "Kijas2016_Sheep_subset.ped", package = "detectRUNS") mapFile <- system.file("extdata", "Kijas2016_Sheep_subset.map", package = "detectRUNS") # calculating runs of Homozygosity # skipping runs calculation ## Not run: runs <- slidingRUNS.run(genotypeFile, mapFile, windowSize = 15, threshold = 0.1, minSNP = 15, ROHet = FALSE, maxOppositeGenotype = 1, maxMiss = 1, minLengthBps = 100000, minDensity = 1/10000) ## End(Not run) # loading pre-calculated data runsFile <- system.file("extdata", "Kijas2016_Sheep_subset.sliding.csv", package="detectRUNS") runsDF <- readExternalRuns(inputFile = runsFile, program = 'detectRUNS') # plot runs per animal (interactive) plot_SnpsInRuns(runs = runsDF, genotypeFile = genotypeFile, mapFile = mapFile, savePlots = FALSE, outputName = "ROHom")
# getting map and ped paths genotypeFile <- system.file("extdata", "Kijas2016_Sheep_subset.ped", package = "detectRUNS") mapFile <- system.file("extdata", "Kijas2016_Sheep_subset.map", package = "detectRUNS") # calculating runs of Homozygosity # skipping runs calculation ## Not run: runs <- slidingRUNS.run(genotypeFile, mapFile, windowSize = 15, threshold = 0.1, minSNP = 15, ROHet = FALSE, maxOppositeGenotype = 1, maxMiss = 1, minLengthBps = 100000, minDensity = 1/10000) ## End(Not run) # loading pre-calculated data runsFile <- system.file("extdata", "Kijas2016_Sheep_subset.sliding.csv", package="detectRUNS") runsDF <- readExternalRuns(inputFile = runsFile, program = 'detectRUNS') # plot runs per animal (interactive) plot_SnpsInRuns(runs = runsDF, genotypeFile = genotypeFile, mapFile = mapFile, savePlots = FALSE, outputName = "ROHom")
Function to plot stacked runs along the chromosome (signaling presence of large numbers of runs in specific regions of a chromosome) Counts on the y-axis, bps on the x-axis (position along the chromosome)
plot_StackedRuns( runs, savePlots = FALSE, separatePlots = FALSE, outputName = NULL )
plot_StackedRuns( runs, savePlots = FALSE, separatePlots = FALSE, outputName = NULL )
runs |
a data.frame with runs per individual (group, id, chrom, nSNP, start, end, length) |
savePlots |
should plots be saved out in files (default) or plotted in the graphical terminal? |
separatePlots |
should plots for chromosomes be saved out to separate files? |
outputName |
title prefix (the base name of graph, if savePlots is TRUE) |
plot of stacked runs by population and by chromosome (pdf files)
# getting map and ped paths genotypeFile <- system.file("extdata", "Kijas2016_Sheep_subset.ped", package = "detectRUNS") mapFile <- system.file("extdata", "Kijas2016_Sheep_subset.map", package = "detectRUNS") # calculating runs of Homozygosity ## Not run: # skipping runs calculation runs <- slidingRUNS.run(genotypeFile, mapFile, windowSize = 15, threshold = 0.1, minSNP = 15, ROHet = FALSE, maxOppositeGenotype = 1, maxMiss = 1, minLengthBps = 100000, minDensity = 1/10000) ## End(Not run) # loading pre-calculated data runsFile <- system.file("extdata", "Kijas2016_Sheep_subset.sliding.csv", package="detectRUNS") runsDF <- readExternalRuns(inputFile = runsFile, program = 'detectRUNS') # plot runs per animal (interactive) plot_StackedRuns(runs = runsDF, savePlots = FALSE, outputName = "ROHom")
# getting map and ped paths genotypeFile <- system.file("extdata", "Kijas2016_Sheep_subset.ped", package = "detectRUNS") mapFile <- system.file("extdata", "Kijas2016_Sheep_subset.map", package = "detectRUNS") # calculating runs of Homozygosity ## Not run: # skipping runs calculation runs <- slidingRUNS.run(genotypeFile, mapFile, windowSize = 15, threshold = 0.1, minSNP = 15, ROHet = FALSE, maxOppositeGenotype = 1, maxMiss = 1, minLengthBps = 100000, minDensity = 1/10000) ## End(Not run) # loading pre-calculated data runsFile <- system.file("extdata", "Kijas2016_Sheep_subset.sliding.csv", package="detectRUNS") runsDF <- readExternalRuns(inputFile = runsFile, program = 'detectRUNS') # plot runs per animal (interactive) plot_StackedRuns(runs = runsDF, savePlots = FALSE, outputName = "ROHom")
Function to produce violin plots of the distribution of runs lengths per group The sum of run lengths, or its average, per individual sample is used to characterize the distribution of runs
plot_ViolinRuns( runs, method = c("sum", "mean"), outputName = NULL, plotTitle = NULL, savePlots = FALSE )
plot_ViolinRuns( runs, method = c("sum", "mean"), outputName = NULL, plotTitle = NULL, savePlots = FALSE )
runs |
a data.frame with runs per individual (group, id, chrom, nSNP, start, end, length) |
method |
"sum" or "mean" of run lengths per individual samples |
outputName |
title prefix (the base name of graph, if savePlots is TRUE) |
plotTitle |
title in plot (default NULL) |
savePlots |
should plots be saved out to files or plotted in the graphical terminal (default)? |
Violin plot of the distribution of runs-lengths (sum or mean)
# getting map and ped paths genotypeFile <- system.file("extdata", "Kijas2016_Sheep_subset.ped", package = "detectRUNS") mapFile <- system.file("extdata", "Kijas2016_Sheep_subset.map", package = "detectRUNS") # calculating runs of Homozygosity ## Not run: # skipping runs calculation runs <- slidingRUNS.run(genotypeFile, mapFile, windowSize = 15, threshold = 0.1, minSNP = 15, ROHet = FALSE, maxOppositeGenotype = 1, maxMiss = 1, minLengthBps = 100000, minDensity = 1/10000) ## End(Not run) # loading pre-calculated data runsFile <- system.file("extdata", "Kijas2016_Sheep_subset.sliding.csv", package="detectRUNS") runsDF <- readExternalRuns(inputFile = runsFile, program = 'detectRUNS') plot_ViolinRuns(runs = runsDF, method = "sum" , savePlots = FALSE) plot_ViolinRuns(runs = runsDF, method = "mean" , savePlots = FALSE)
# getting map and ped paths genotypeFile <- system.file("extdata", "Kijas2016_Sheep_subset.ped", package = "detectRUNS") mapFile <- system.file("extdata", "Kijas2016_Sheep_subset.map", package = "detectRUNS") # calculating runs of Homozygosity ## Not run: # skipping runs calculation runs <- slidingRUNS.run(genotypeFile, mapFile, windowSize = 15, threshold = 0.1, minSNP = 15, ROHet = FALSE, maxOppositeGenotype = 1, maxMiss = 1, minLengthBps = 100000, minDensity = 1/10000) ## End(Not run) # loading pre-calculated data runsFile <- system.file("extdata", "Kijas2016_Sheep_subset.sliding.csv", package="detectRUNS") runsDF <- readExternalRuns(inputFile = runsFile, program = 'detectRUNS') plot_ViolinRuns(runs = runsDF, method = "sum" , savePlots = FALSE) plot_ViolinRuns(runs = runsDF, method = "mean" , savePlots = FALSE)
Function to read in, from external files, the output of software for ROH:
detectRUNS
: output saved out to a file (e.g. write.table)
Plink
: output from the --homozyg
option (.hom
files)
BCFtools
: output from the roh
option
readExternalRuns( inputFile = NULL, program = c("plink", "BCFtools", "detectRUNS") )
readExternalRuns( inputFile = NULL, program = c("plink", "BCFtools", "detectRUNS") )
inputFile |
name of (path to) external file |
program |
source program that produced the ROH file (one of |
dataframe in the correct format to be used with plots and statistics functions from detectRUNS
# getting map and ped paths ## Not run: genotypeFile <- system.file("extdata", "Kijas2016_Sheep_subset.ped", package = "detectRUNS") mapFile <- system.file("extdata", "Kijas2016_Sheep_subset.map", package = "detectRUNS") # calculating runs of Homozygosity runs <- slidingRUNS.run(genotypeFile, mapFile, windowSize = 15, threshold = 0.1, minSNP = 15, ROHet = FALSE, maxMissRun = 1, maxMissWindow = 1, minLengthBps = 100000, minDensity = 1/10000) write.table(x= runs,file = 'Kijas2016_Sheep_subset.sliding.csv', quote=F, row.names = F) ## End(Not run) runsFile <- system.file("extdata", "Kijas2016_Sheep_subset.sliding.csv", package = "detectRUNS") newData=readExternalRuns(runsFile, program = 'detectRUNS')
# getting map and ped paths ## Not run: genotypeFile <- system.file("extdata", "Kijas2016_Sheep_subset.ped", package = "detectRUNS") mapFile <- system.file("extdata", "Kijas2016_Sheep_subset.map", package = "detectRUNS") # calculating runs of Homozygosity runs <- slidingRUNS.run(genotypeFile, mapFile, windowSize = 15, threshold = 0.1, minSNP = 15, ROHet = FALSE, maxMissRun = 1, maxMissWindow = 1, minLengthBps = 100000, minDensity = 1/10000) write.table(x= runs,file = 'Kijas2016_Sheep_subset.sliding.csv', quote=F, row.names = F) ## End(Not run) runsFile <- system.file("extdata", "Kijas2016_Sheep_subset.sliding.csv", package = "detectRUNS") newData=readExternalRuns(runsFile, program = 'detectRUNS')
This is a core function. Read PED file and returns a data.frame with the first two columns
readPOPCpp(genotypeFile)
readPOPCpp(genotypeFile)
genotypeFile |
genotype (.ped) file location |
a dataframe of POP, ID
The data frame will be reordered according to chromosome: from 1 to n, then X, Y, XY, MT The data frame needs to have a column with name "CHROMOSOME"
reorderDF(dfx)
reorderDF(dfx)
dfx |
data frame to be reordered (with column "CHROMOSOME") |
Reorder results based on chromosome
A reordered data frame by chromosome
This is one of the main function of detectRUNS and is used to detect runs (of homozygosity or heterozygosity) in the genome (diploid) with the sliding-window method. All parameters to detect runs (e.g. minimum n. of SNP, max n. of missing genotypes, max n. of opposite genotypes etc.) are specified here. Input data are in the ped/map Plink format (https://www.cog-genomics.org/plink/1.9/input#ped)
slidingRUNS.run( genotypeFile, mapFile, windowSize = 15, threshold = 0.05, minSNP = 3, ROHet = FALSE, maxOppWindow = 1, maxMissWindow = 1, maxGap = 10^6, minLengthBps = 1000, minDensity = 1/1000, maxOppRun = NULL, maxMissRun = NULL )
slidingRUNS.run( genotypeFile, mapFile, windowSize = 15, threshold = 0.05, minSNP = 3, ROHet = FALSE, maxOppWindow = 1, maxMissWindow = 1, maxGap = 10^6, minLengthBps = 1000, minDensity = 1/1000, maxOppRun = NULL, maxMissRun = NULL )
genotypeFile |
genotype (.ped) file path |
mapFile |
map file (.map) file path |
windowSize |
the size of sliding window (number of SNP loci) (default = 15) |
threshold |
the threshold of overlapping windows of the same state (homozygous/heterozygous) to call a SNP in a RUN (default = 0.05) |
minSNP |
minimum n. of SNP in a RUN (default = 3) |
ROHet |
should we look for ROHet or ROHom? (default = FALSE) |
maxOppWindow |
max n. of homozygous/heterozygous SNP in the sliding window (default = 1) |
maxMissWindow |
max. n. of missing SNP in the sliding window (default = 1) |
maxGap |
max distance between consecutive SNP to be still considered a potential run (default = 10^6 bps) |
minLengthBps |
minimum length of run in bps (defaults to 1000 bps = 1 kbps) |
minDensity |
minimum n. of SNP per kbps (defaults to 0.1 = 1 SNP every 10 kbps) |
maxOppRun |
max n. of opposite genotype SNPs in the run (optional) |
maxMissRun |
max n. of missing SNPs in the run (optional) |
This function scans the genome (diploid) for runs using the sliding-window method. This is a wrapper function for many component functions that handle the input data (ped/map files), perform internal conversions, accept parameters specifications, select whether runs of homozygosity (RoHom) or of heterozygosity (RoHet) are looked for.
In the ped file, the groups samples belong to can be specified (first column). This is important if comparisons between human ethnic groups or between animal breeds or plant varieties or biological populations are to be performed. Also, if cases and controls are to be compared, this is the place where this information needs to be specified.
This function returns a data frame with all runs detected in the dataset. This data frame can then be written out to a csv file. The data frame is, in turn, the input for other functions of the detectRUNS package that create plots and produce statistics from the results (see plots and statistics functions in this manual, and/or refer to the detectRUNS vignette).
A dataframe with RUNs of Homozygosity or Heterozygosity in the analysed dataset. The returned dataframe contains the following seven columns: "group", "id", "chrom", "nSNP", "from", "to", "lengthBps" (group: population, breed, case/control etc.; id: individual identifier; chrom: chromosome on which the run is located; nSNP: number of SNPs in the run; from: starting position of the run, in bps; to: end position of the run, in bps; lengthBps: size of the run)
# getting map and ped paths genotypeFile <- system.file("extdata", "Kijas2016_Sheep_subset.ped", package = "detectRUNS") mapFile <- system.file("extdata", "Kijas2016_Sheep_subset.map", package = "detectRUNS") # calculating runs with sliding window approach ## Not run: # skipping runs calculation runs <- slidingRUNS.run(genotypeFile, mapFile, windowSize = 15, threshold = 0.1, minSNP = 15, ROHet = FALSE, maxOppWindow = 1, maxMissWindow = 1, maxGap=10^6, minLengthBps = 100000, minDensity = 1/10000) ## End(Not run) # loading pre-calculated data runsFile <- system.file("extdata", "Kijas2016_Sheep_subset.sliding.csv", package="detectRUNS") colClasses <- c(rep("character", 3), rep("numeric", 4) ) runs <- read.csv2(runsFile, header = TRUE, stringsAsFactors = FALSE, colClasses = colClasses)
# getting map and ped paths genotypeFile <- system.file("extdata", "Kijas2016_Sheep_subset.ped", package = "detectRUNS") mapFile <- system.file("extdata", "Kijas2016_Sheep_subset.map", package = "detectRUNS") # calculating runs with sliding window approach ## Not run: # skipping runs calculation runs <- slidingRUNS.run(genotypeFile, mapFile, windowSize = 15, threshold = 0.1, minSNP = 15, ROHet = FALSE, maxOppWindow = 1, maxMissWindow = 1, maxGap=10^6, minLengthBps = 100000, minDensity = 1/10000) ## End(Not run) # loading pre-calculated data runsFile <- system.file("extdata", "Kijas2016_Sheep_subset.sliding.csv", package="detectRUNS") colClasses <- c(rep("character", 3), rep("numeric", 4) ) runs <- read.csv2(runsFile, header = TRUE, stringsAsFactors = FALSE, colClasses = colClasses)
This is a core function. The functions to detect RUNS are slid over the genome
slidingWindow( data, gaps, windowSize, step, maxGap, ROHet = TRUE, maxOppositeGenotype = 1, maxMiss = 1 )
slidingWindow( data, gaps, windowSize, step, maxGap, ROHet = TRUE, maxOppositeGenotype = 1, maxMiss = 1 )
data |
vector of 0/1/2 genotypes |
gaps |
vector of differences between consecutive positions (gaps) in bps |
windowSize |
size of window (n. of SNP) |
step |
by which (how many SNP) is the window slid |
maxGap |
max distance between consecutive SNP in a window to be still considered a potential run |
ROHet |
shall we detect ROHet or ROHom? |
maxOppositeGenotype |
max n. of homozygous/heterozygous SNP |
maxMiss |
max. n. of missing SNP |
vector of TRUE/FALSE (whether a window is homozygous or NOT)
This is a core function. The functions to detect RUNS are slid over the genome
slidingWindowCpp( data, gaps, windowSize, step, maxGap, ROHet = TRUE, maxOppositeGenotype = 1L, maxMiss = 1L )
slidingWindowCpp( data, gaps, windowSize, step, maxGap, ROHet = TRUE, maxOppositeGenotype = 1L, maxMiss = 1L )
data |
vector of 0/1/2 genotypes |
gaps |
vector of differences between consecutive positions (gaps) in bps |
windowSize |
size of window (n. of SNP) |
step |
by which (how many SNP) is the window slid |
maxGap |
max distance between consecutive SNP in a window to be still considered a potential run |
ROHet |
shall we detect ROHet or ROHom? |
maxOppositeGenotype |
max n. of homozygous/heterozygous SNP |
maxMiss |
max. n. of missing SNP |
vector of TRUE/FALSE (whether a window is homozygous or NOT)
This is a core function. The function to determine whether a SNP is or not in a RUN. The ratio between homozygous/heterozygous windows and total n. of windows is computed here
snpInRun(RunVector, windowSize, threshold)
snpInRun(RunVector, windowSize, threshold)
RunVector |
vector of TRUE/FALSE (is a window homozygous/heterozygous?) |
windowSize |
size of window (n. of SNP) |
threshold |
threshold to call a SNP in a RUN |
vector of TRUE/FALSE (whether a SNP is in a RUN or NOT)
This is a core function. The function to determine whether a SNP is or not in a RUN. The ratio between homozygous/heterozygous windows and total n. of windows is computed here
snpInRunCpp(RunVector, windowSize, threshold)
snpInRunCpp(RunVector, windowSize, threshold)
RunVector |
vector of TRUE/FALSE (is a window homozygous/heterozygous?) |
windowSize |
size of window (n. of SNP) |
threshold |
threshold to call a SNP in a RUN |
vector of TRUE/FALSE (whether a SNP is in a RUN or NOT)
Function to count number of times a SNP is in a RUN
snpInsideRunsCpp(runsChrom, mapChrom, genotypeFile)
snpInsideRunsCpp(runsChrom, mapChrom, genotypeFile)
runsChrom |
R object (dataframe) with results per chromosome |
mapChrom |
R map object with SNP per chromosome |
genotypeFile |
genotype (.ped) file location |
dataframe with counts per SNP in runs (per population)
This function processes the results from slidingRUNS.run
and
consecutiveRUNS.run
and produces a number of interesting descriptives
statistics on results.
summaryRuns(runs, mapFile, genotypeFile, Class = 2, snpInRuns = FALSE)
summaryRuns(runs, mapFile, genotypeFile, Class = 2, snpInRuns = FALSE)
runs |
R object (dataframe) with results on detected runs |
mapFile |
Plink map file (for SNP position) |
genotypeFile |
Plink ped file (for SNP position) |
Class |
group of length (in Mbps) by class (default: 0-2, 2-4, 4-8, 8-16, >16) |
snpInRuns |
TRUE/FALSE (default): should the function |
summaryRuns
calculates: i) the number of runs per chromosome and group/population;
ii) the percent distribution of runs per chromosome and group; iii) the number of
runs per size-class and group; iv) the percent distribution of runs per size-class
and group; v) the mean length of runs per chromosome and group; vi) the mean
length of runs per size-class and group; vii) individual inbreeding coefficient
estimated from ROH; viii) individual inbreeding coefficient estimated from ROH
per chromosome; ix) individual inbreeding coefficient estimated from ROH per
size-class
A list of dataframes containing the most relevant descriptives statistics on detected runs. The list conveniently contains 9 dataframes that can be used for further processing and visualization, or can be written out to text files
# getting map and ped paths genotypeFile <- system.file("extdata", "Kijas2016_Sheep_subset.ped", package = "detectRUNS") mapFile <- system.file("extdata", "Kijas2016_Sheep_subset.map", package = "detectRUNS") # calculating runs of Homozygosity ## Not run: # skipping runs calculation runs <- slidingRUNS.run(genotypeFile, mapFile, windowSize = 15, threshold = 0.1, minSNP = 15, ROHet = FALSE, maxOppositeGenotype = 1, maxMiss = 1, minLengthBps = 100000, minDensity = 1/10000) ## End(Not run) # loading pre-calculated data runsFile <- system.file("extdata", "Kijas2016_Sheep_subset.sliding.csv", package="detectRUNS") runsDF <- readExternalRuns(inputFile = runsFile, program = 'detectRUNS') summaryRuns(runs = runsDF, mapFile = mapFile, genotypeFile = genotypeFile, Class = 2, snpInRuns = FALSE)
# getting map and ped paths genotypeFile <- system.file("extdata", "Kijas2016_Sheep_subset.ped", package = "detectRUNS") mapFile <- system.file("extdata", "Kijas2016_Sheep_subset.map", package = "detectRUNS") # calculating runs of Homozygosity ## Not run: # skipping runs calculation runs <- slidingRUNS.run(genotypeFile, mapFile, windowSize = 15, threshold = 0.1, minSNP = 15, ROHet = FALSE, maxOppositeGenotype = 1, maxMiss = 1, minLengthBps = 100000, minDensity = 1/10000) ## End(Not run) # loading pre-calculated data runsFile <- system.file("extdata", "Kijas2016_Sheep_subset.sliding.csv", package="detectRUNS") runsDF <- readExternalRuns(inputFile = runsFile, program = 'detectRUNS') summaryRuns(runs = runsDF, mapFile = mapFile, genotypeFile = genotypeFile, Class = 2, snpInRuns = FALSE)
This function takes in input either the run results or the output from
the function snpInsideRuns
(proportion of times a SNP is inside a run)
in the population/group, and returns a subset of the runs most commonly
found in the group/population. The parameter threshold
controls the definition
of most common (e.g. in at least 50%, 70% etc. of the sampled individuals)
tableRuns( runs = NULL, SnpInRuns = NULL, genotypeFile, mapFile, threshold = 0.5 )
tableRuns( runs = NULL, SnpInRuns = NULL, genotypeFile, mapFile, threshold = 0.5 )
runs |
R object (dataframe) with results on detected runs |
SnpInRuns |
dataframe with the proportion of times each SNP falls inside a
run in the population (output from |
genotypeFile |
Plink ped file (for SNP position) |
mapFile |
Plink map file (for SNP position) |
threshold |
value from 0 to 1 (default 0.7) that controls the desired proportion of individuals carrying that run (e.g. 70%) |
A dataframe with the most common runs detected in the sampled individuals (the group/population, start and end position of the run, chromosome and number of SNP included in the run are reported in the output dataframe)
# getting map and ped paths genotypeFile <- system.file("extdata", "Kijas2016_Sheep_subset.ped", package = "detectRUNS") mapFile <- system.file("extdata", "Kijas2016_Sheep_subset.map", package = "detectRUNS") # calculating runs of Homozygosity ## Not run: # skipping runs calculation runs <- slidingRUNS.run(genotypeFile, mapFile, windowSize = 15, threshold = 0.1, minSNP = 15, ROHet = FALSE, maxOppositeGenotype = 1, maxMiss = 1, minLengthBps = 100000, minDensity = 1/10000) ## End(Not run) # loading pre-calculated data runsFile <- system.file("extdata", "Kijas2016_Sheep_subset.sliding.csv", package="detectRUNS") runsDF = readExternalRuns(inputFile = runsFile, program = 'detectRUNS') tableRuns(runs = runsDF, genotypeFile = genotypeFile, mapFile = mapFile, threshold = 0.5)
# getting map and ped paths genotypeFile <- system.file("extdata", "Kijas2016_Sheep_subset.ped", package = "detectRUNS") mapFile <- system.file("extdata", "Kijas2016_Sheep_subset.map", package = "detectRUNS") # calculating runs of Homozygosity ## Not run: # skipping runs calculation runs <- slidingRUNS.run(genotypeFile, mapFile, windowSize = 15, threshold = 0.1, minSNP = 15, ROHet = FALSE, maxOppositeGenotype = 1, maxMiss = 1, minLengthBps = 100000, minDensity = 1/10000) ## End(Not run) # loading pre-calculated data runsFile <- system.file("extdata", "Kijas2016_Sheep_subset.sliding.csv", package="detectRUNS") runsDF = readExternalRuns(inputFile = runsFile, program = 'detectRUNS') tableRuns(runs = runsDF, genotypeFile = genotypeFile, mapFile = mapFile, threshold = 0.5)
Function to write out RUNS per individual animal
writeRUN(ind, dRUN, ROHet = TRUE, group, outputName)
writeRUN(ind, dRUN, ROHet = TRUE, group, outputName)
ind |
ID of animals |
dRUN |
data.frame with RUNS per animal |
ROHet |
shall we detect ROHet or ROHom? |
group |
group (factor): population, breed, ethnicity, case/control etc. |
outputName |
output filename |
TRUE/FALSE if RUNS are written out or not