Package 'detectRUNS' reference manual

Title:	Detect Runs of Homozygosity and Runs of Heterozygosity in Diploid Genomes
Description:	Detection of runs of homozygosity and of heterozygosity in diploid genomes using two methods: sliding windows (Purcell et al (2007) <doi:10.1086/519795>) and consecutive runs (Marras et al (2015) <doi:10.1111/age.12259>).
Authors:	Filippo Biscarini [aut, cre], Paolo Cozzi [aut], Giustino Gaspa [aut], Gabriele Marras [aut]
Maintainer:	Filippo Biscarini <[email protected]>
License:	GPL-3
Version:	0.9.6.9000
Built:	2025-04-02 05:07:28 UTC
Source:	https://github.com/bioinformatics-ptp/detectruns

Main function to detect genomic RUNS (ROHom/ROHet) using the consecutive method

Description

This is the main detectRUNS function to scan the genome for runs (of homozygosity or heterozygosity) using the consecutive method (Marras et al. 2015, Animal Genetics 46(2):110-121). All parameters to detect runs (e.g. minimum n. of SNP, max n. of missing genotypes, max n. of opposite genotypes etc.) are specified here. Input data are in the ped/map Plink format (https://www.cog-genomics.org/plink/1.9/input#ped)

Usage

consecutiveRUNS.run(
  genotypeFile,
  mapFile,
  ROHet = FALSE,
  maxOppRun = 0,
  maxMissRun = 0,
  minSNP = 15,
  minLengthBps = 1000,
  maxGap = 10^6
)
consecutiveRUNS.run(
  genotypeFile,
  mapFile,
  ROHet = FALSE,
  maxOppRun = 0,
  maxMissRun = 0,
  minSNP = 15,
  minLengthBps = 1000,
  maxGap = 10^6
)

Arguments

`genotypeFile`	genotype (.ped) file path
`mapFile`	map file (.map) file path
`ROHet`	should we look for ROHet or ROHom? (default = FALSE)
`maxOppRun`	max n. of opposite genotype SNPs in the run (default = 0)
`maxMissRun`	max n. of missing SNPs in the run (default = 0)
`minSNP`	minimum n. of SNP in a RUN (default = 15)
`minLengthBps`	minimum length of run in bps (defaults to 1000 bps = 1 kbps)
`maxGap`	max distance between consecutive SNP in a window to be still considered a potential run (defaults to 10^6)

Details

This function scans the genome (diploid) for runs using the consecutive method. This is a wrapper function for many component functions that handle the input data (ped/map files), performs internal conversions, accepts parameters specifications, selects the statistical method to detect runs (sliding windows, consecutive loci) and whether runs of homozygosity (RoHom) or of heterozygosity (RoHet) are looked for.

In the ped file, the groups samples belong to can be specified (first column). This is important if comparisons between human ethnic groups or between animal breeds or plant varieties or biological populations are to be performed. Also, if cases and controls are to be compared, this is the place where this information needs to be specified.

This function returns a data frame with all runs detected in the dataset. This data frame can then be written out to a csv file. The data frame is, in turn, the input for other functions of the detectRUNS package that create plots and produce statistics of the results (see plot and statistic functions in this manual, and/or refer to the vignette of detectRUNS).

Value

A dataframe with RUNs of Homozygosity or Heterozygosity in the analysed dataset. The returned dataframe contains the following seven columns: "group", "id", "chrom", "nSNP", "from", "to", "lengthBps" (group: population, breed, case/control etc.; id: individual identifier; chrom: chromosome on which the run is located; nSNP: number of SNPs in the run; from: starting position of the run, in bps; to: end position of the run, in bps; lengthBps: size of the run)

Examples

# getting map and ped paths
genotypeFile <- system.file("extdata", "Kijas2016_Sheep_subset.ped", package = "detectRUNS")
mapFile <- system.file("extdata", "Kijas2016_Sheep_subset.map", package = "detectRUNS")
# calculating runs with consecutive run approach
## Not run: 
# skipping runs calculation
runs <- consecutiveRUNS.run(genotypeFile, mapFile, minSNP = 15, ROHet = FALSE,
maxOppRun = 0, maxMissRun = 0, maxGap=10^6,
minLengthBps = 100000)

## End(Not run)
# loading pre-calculated data
runsFile <- system.file("extdata", "Kijas2016_Sheep_subset.consecutive.csv", package="detectRUNS")
colClasses <- c(rep("character", 3), rep("numeric", 4)  )
runs <- read.csv2(runsFile, header = TRUE, stringsAsFactors = FALSE,
colClasses = colClasses)

# getting map and ped paths
genotypeFile <- system.file("extdata", "Kijas2016_Sheep_subset.ped", package = "detectRUNS")
mapFile <- system.file("extdata", "Kijas2016_Sheep_subset.map", package = "detectRUNS")
# calculating runs with consecutive run approach
## Not run: 
# skipping runs calculation
runs <- consecutiveRUNS.run(genotypeFile, mapFile, minSNP = 15, ROHet = FALSE,
maxOppRun = 0, maxMissRun = 0, maxGap=10^6,
minLengthBps = 100000)

## End(Not run)
# loading pre-calculated data
runsFile <- system.file("extdata", "Kijas2016_Sheep_subset.consecutive.csv", package="detectRUNS")
colClasses <- c(rep("character", 3), rep("numeric", 4)  )
runs <- read.csv2(runsFile, header = TRUE, stringsAsFactors = FALSE,
colClasses = colClasses)

Function to detect consecutive runs in a vector (individual's genotypes)

Description

This is a core function. It implements the consecutive method for detection of runs in diploid genomes (see Marras et al. 2015)

Usage

consecutiveRunsCpp(
  indGeno,
  individual,
  mapFile,
  ROHet = TRUE,
  minSNP = 3L,
  maxOppositeGenotype = 1L,
  maxMiss = 1L,
  minLengthBps = 1000L,
  maxGap = 1000000L
)
consecutiveRunsCpp(
  indGeno,
  individual,
  mapFile,
  ROHet = TRUE,
  minSNP = 3L,
  maxOppositeGenotype = 1L,
  maxMiss = 1L,
  minLengthBps = 1000L,
  maxGap = 1000000L
)

Arguments

`indGeno`	vector of 0/1/NAs of individual genotypes (0: homozygote; 1: heterozygote)
`individual`	list of group (breed, population, case/control etc.) and ID of individual sample
`mapFile`	Plink map file (for SNP position)
`ROHet`	shall we detect ROHet or ROHom?
`minSNP`	minimum number of SNP in a run
`maxOppositeGenotype`	max n. of homozygous/heterozygous SNP
`maxMiss`	max. n. of missing SNP
`minLengthBps`	min length of a run in bps
`maxGap`	max distance between consecutive SNP in a window to be still considered a potential run

Details

The consecutive method detect runs by consecutively scanning SNP loci along the genome. No sliding windows are used. Checks on minimum n. of SNP, max n. of opposite and missing genotypes, max gap between adjacent loci and minimum length of the run are implemented (as in the sliding window method). Both runs of homozygosity (RoHom) and of heterozygosity (RoHet) can be search for (option ROHet: TRUE/FALSE)

Value

A data frame of runs per individual sample

Function to create a dataframe of RUNS per individual animal Requires a map file (other filename to read or R object) Parameters on maximum number of missing and opposite genotypes in the run (not the window) are implemented here

Description

Function to create a dataframe of RUNS per individual animal Requires a map file (other filename to read or R object) Parameters on maximum number of missing and opposite genotypes in the run (not the window) are implemented here

Usage

createRUNdf(
  snpRun,
  mapFile,
  minSNP = 3,
  minLengthBps = 1000,
  minDensity = 1/10,
  oppositeAndMissingSNP,
  maxOppRun = NULL,
  maxMissRun = NULL
)
createRUNdf(
  snpRun,
  mapFile,
  minSNP = 3,
  minLengthBps = 1000,
  minDensity = 1/10,
  oppositeAndMissingSNP,
  maxOppRun = NULL,
  maxMissRun = NULL
)

Arguments

`snpRun`	vector of TRUE/FALSE (is the SNP in a RUN?)
`mapFile`	Plink-like map file (data.frame)
`minSNP`	minimum n. of SNP to call a RUN
`minLengthBps`	minimum length of run in bps (defaults to 1000 bps = 1 kbps)
`minDensity`	minimum n. of SNP per kbps (defaults to 0.1 = 1 SNP every 10 kbps)
`oppositeAndMissingSNP`	indexed array of missing and opposite genotypes (SNP order in the genome is the index)
`maxOppRun`	max n. of opposite genotype SNPs in the run (not in the window!)
`maxMissRun`	max n. of missing SNPs in the run (not in the window!)

Value

a data.frame with RUNS per animal

Function to calculate oppositeAndMissingGenotypes array

Description

This is an helper function, this will be called by another function

Usage

findOppositeAndMissing(data, ROHet = TRUE)
findOppositeAndMissing(data, ROHet = TRUE)

Arguments

`data`	vector of 0/1/2 genotypes
`ROHet`	TRUE in ROHet evaluation, FALSE for ROHom

Value

character array; names will be index in which opposite and missing snps are found in data array

Function to calculated Froh genome-wide or chromosome-wide

Description

This function calculates the individual inbreeding coefficients based on runs of homozygosity (ROH), either per-chromosome (chromosome-wide) or based on the entire genome (genome-wide). See details of calculations below

Usage

Froh_inbreeding(runs, mapFile, genome_wide = TRUE)
Froh_inbreeding(runs, mapFile, genome_wide = TRUE)

Arguments

`runs`	R object (dataframe) with results on runs
`mapFile`	Plink map file (to retrieve SNP position)
`genome_wide`	vector of TRUE/FALSE (genome-wide or chromosome-wide; defaults to TRUE/genome-wide)

Details

Froh is calculated as:

$F_{ROH} = \frac{\sum ROH_{length}}{Length_{genome}}$

Depending on whether genome-wide or chromosome-wide calculations are required, the terms in the numerator and denominator will refer to the entire genome or will be restricted to specific chromosomes.

Value

A data frame with the inbreeding coefficients of each individual sample

Examples

# getting map and ped paths
genotypeFile <- system.file("extdata", "Kijas2016_Sheep_subset.ped", package = "detectRUNS")
mapFile <- system.file("extdata", "Kijas2016_Sheep_subset.map", package = "detectRUNS")

# calculating runs of Homozygosity
## Not run: 
# skipping runs calculation
runs <- slidingRUNS.run(genotypeFile, mapFile, windowSize = 15, threshold = 0.1,  minSNP = 15,
ROHet = FALSE,  maxOppositeGenotype = 1, maxMiss = 1,  minLengthBps = 100000,  minDensity = 1/10000)

## End(Not run)
# loading pre-calculated data
runsFile <- system.file("extdata", "Kijas2016_Sheep_subset.sliding.csv", package="detectRUNS")
runsDF <- readExternalRuns(inputFile = runsFile, program = 'detectRUNS')

Froh_inbreeding(runs = runsDF, mapFile = mapFile)
Froh_inbreeding(runs = runsDF, mapFile = mapFile, genome_wide=FALSE)

# getting map and ped paths
genotypeFile <- system.file("extdata", "Kijas2016_Sheep_subset.ped", package = "detectRUNS")
mapFile <- system.file("extdata", "Kijas2016_Sheep_subset.map", package = "detectRUNS")

# calculating runs of Homozygosity
## Not run: 
# skipping runs calculation
runs <- slidingRUNS.run(genotypeFile, mapFile, windowSize = 15, threshold = 0.1,  minSNP = 15,
ROHet = FALSE,  maxOppositeGenotype = 1, maxMiss = 1,  minLengthBps = 100000,  minDensity = 1/10000)

## End(Not run)
# loading pre-calculated data
runsFile <- system.file("extdata", "Kijas2016_Sheep_subset.sliding.csv", package="detectRUNS")
runsDF <- readExternalRuns(inputFile = runsFile, program = 'detectRUNS')

Froh_inbreeding(runs = runsDF, mapFile = mapFile)
Froh_inbreeding(runs = runsDF, mapFile = mapFile, genome_wide=FALSE)

Function to calculated Froh using a ROH-class

Description

This function calculates the individual inbreeding coefficients based on runs of homozygosity (ROH) using only ROH of specific size classes. The parameter class specify the size interval to split up calculations. For example, if class = 2 Froh based on ROH 0-2, 2-4, 4-8, 80-16, >16 Mbps long will be calculated.

Usage

Froh_inbreedingClass(runs, mapFile, Class = 2)
Froh_inbreedingClass(runs, mapFile, Class = 2)

Arguments

`runs`	R object (dataframe) with ROH results
`mapFile`	Plink map file (for SNP position)
`Class`	base ROH-length interval (in Mbps) (default: 0-2, 2-4, 4-8, 8-16, >16)

Value

A data frame with individual inbreeding coefficients based on ROH-length of specific size. The sum of ROH-length of specific size in each individual is reported alongside

Examples

# getting map and ped paths
genotypeFile <- system.file("extdata", "Kijas2016_Sheep_subset.ped", package = "detectRUNS")
mapFile <- system.file("extdata", "Kijas2016_Sheep_subset.map", package = "detectRUNS")

# calculating runs of Homozygosity
## Not run: 
# skipping runs calculation
runs <- slidingRUNS.run(genotypeFile, mapFile, windowSize = 15, threshold = 0.1,  minSNP = 15,
ROHet = FALSE,  maxOppositeGenotype = 1, maxMiss = 1,  minLengthBps = 100000,  minDensity = 1/10000)

## End(Not run)
# loading pre-calculated data
runsFile <- system.file("extdata", "Kijas2016_Sheep_subset.sliding.csv", package="detectRUNS")
runsDF <- readExternalRuns(inputFile = runsFile, program = 'detectRUNS')

Froh_inbreedingClass(runs = runsDF, mapFile = mapFile, Class = 2)

# getting map and ped paths
genotypeFile <- system.file("extdata", "Kijas2016_Sheep_subset.ped", package = "detectRUNS")
mapFile <- system.file("extdata", "Kijas2016_Sheep_subset.map", package = "detectRUNS")

# calculating runs of Homozygosity
## Not run: 
# skipping runs calculation
runs <- slidingRUNS.run(genotypeFile, mapFile, windowSize = 15, threshold = 0.1,  minSNP = 15,
ROHet = FALSE,  maxOppositeGenotype = 1, maxMiss = 1,  minLengthBps = 100000,  minDensity = 1/10000)

## End(Not run)
# loading pre-calculated data
runsFile <- system.file("extdata", "Kijas2016_Sheep_subset.sliding.csv", package="detectRUNS")
runsDF <- readExternalRuns(inputFile = runsFile, program = 'detectRUNS')

Froh_inbreedingClass(runs = runsDF, mapFile = mapFile, Class = 2)

Convert 0/1/2 genotypes to 0/1

Description

This is a utility function, that convert 0/1/2 genotypes (AA/AB/BB) into 0/1 (either homozygous/heterozygous)

Usage

genoConvertCpp(genotype)
genoConvertCpp(genotype)

Arguments

genotype

vector of 0/1/2 genotypes

Value

converted vector of genotypes (0/1)

Function to check whether a window is (loosely) heterozygous or not

Description

This is a core function within the sliding-window workflow. Parameters on how to consider a window heterozygous are here (maxHom, maxMiss)

Usage

heteroZygotTest(x, gaps, maxHom, maxMiss, maxGap, i, windowSize)
heteroZygotTest(x, gaps, maxHom, maxMiss, maxGap, i, windowSize)

Arguments

`x`	vector of 0/1 genotypes (from genoConvert())
`gaps`	vector of differences between consecutive positions (gaps) in bps
`maxHom`	max n. of homozygous SNP in a heterozygous window
`maxMiss`	max n. of missing in a window
`maxGap`	max distance between consecutive SNP in a window to be still considered a potential run
`i`	index along the genome (genome-vector for each individual)
`windowSize`	size of window (n. of SNP)

Value

a list: i) TRUE/FALSE (whether a window is heterozygous or NOT); ii) indexes of "opposite and missing" genotype

Function to check whether a window is (loosely) heterozygous or not

Description

This is a core function. Parameters on how to consider a window heterozygous are here (maxHom, maxMiss)

Usage

heteroZygotTestCpp(x, gaps, maxHom, maxMiss, maxGap)
heteroZygotTestCpp(x, gaps, maxHom, maxMiss, maxGap)

Arguments

`x`	vector of 0/1 genotypes (from genoConvert())
`gaps`	vector of differences between consecutive positions (gaps) in bps
`maxHom`	max n. of homozygous SNP in a heterozygous window
`maxMiss`	max n. of missing in a window
`maxGap`	max distance between consecutive SNP in a window to be still considered a potential run

Value

TRUE/FALSE (whether a window is heterozygous or NOT)

Function to check whether a window is (loosely) homozygous or not

Description

This is a core function. Parameters on how to consider a window homozygous are here (maxHet, maxMiss)

Usage

homoZygotTest(x, gaps, maxHet, maxMiss, maxGap, i, windowSize)
homoZygotTest(x, gaps, maxHet, maxMiss, maxGap, i, windowSize)

Arguments

`x`	vector of 0/1 genotypes (from genoConvert())
`gaps`	vector of differences between consecutive positions (gaps) in bps
`maxHet`	max n. of heterozygous SNP in a homozygous window
`maxMiss`	max n. of missing in a window
`maxGap`	max distance between consecutive SNP in a window to be still considered a potential run
`i`	index along the genome (genome-vector for each individual)
`windowSize`	size of window (n. of SNP)

Value

a list: i) TRUE/FALSE (whether a window is heterozygous or NOT); ii) indexes of "opposite and missing" genotype

Function to check whether a window is (loosely) homozygous or not

Description

This is a core function. Parameters on how to consider a window homozygous are here (maxHet, maxMiss)

Usage

homoZygotTestCpp(x, gaps, maxHet, maxMiss, maxGap)
homoZygotTestCpp(x, gaps, maxHet, maxMiss, maxGap)

Arguments

`x`	vector of 0/1 genotypes (from genoConvert())
`gaps`	vector of differences between consecutive positions (gaps) in bps
`maxHet`	max n. of heterozygous SNP in a homozygous window
`maxMiss`	max n. of missing in a window
`maxGap`	max distance between consecutive SNP in a window to be still considered a potential run

Value

TRUE/FALSE (whether a window is homozygous or NOT)

Convert ped genotypes to 0/1

Description

This is a utility function, that convert ped genotypes (AA/AB/BB) into 0/1 (either homozygous/heterozygous)

Usage

pedConvertCpp(genotype)
pedConvertCpp(genotype)

Arguments

genotype

vector of pair of genotypes (01, AA, AG)

Value

converted vector of genotypes (0/1)

Plot Distribution of runs

Description

This function the distribution of runs per group. The average run length per size-class, the average run length per chromosome (and group), the percent distribution of runs per size-class and group, and the proportion of runs per chromosome are plotted. With style="All" all three plots are produced.

Usage

plot_DistributionRuns(
  runs,
  mapFile,
  groupSplit = TRUE,
  style = c("MeanClass", "MeanChr", "RunsPCT", "RunsPCT_Chr", "All"),
  savePlots = FALSE,
  outputName = NULL,
  plotTitle = NULL,
  Class = 2
)
plot_DistributionRuns(
  runs,
  mapFile,
  groupSplit = TRUE,
  style = c("MeanClass", "MeanChr", "RunsPCT", "RunsPCT_Chr", "All"),
  savePlots = FALSE,
  outputName = NULL,
  plotTitle = NULL,
  Class = 2
)

Arguments

`runs`	R object (dataframe) with results on detected runs
`mapFile`	Plink map file (for SNP position)
`groupSplit`	plots split by group (defaults to TRUE)
`style`	type of plot: MeanClass, MeanChr, RunsPCT, RunsPCT_Chr, All (all plots)
`savePlots`	should plots be saved out to files or plotted in the graphical terminal (default)?
`outputName`	title prefix (the base name of graph, if savePlots is TRUE)#'
`plotTitle`	title in plot (default NULL)
`Class`	group of length (in Mbps) by class (default: 0-2, 2-4, 4-8, 8-16, >16)

Value

plot Distribution Runs

Examples

# getting map and ped paths
genotypeFile <- system.file("extdata", "Kijas2016_Sheep_subset.ped", package = "detectRUNS")
mapFile <- system.file("extdata", "Kijas2016_Sheep_subset.map", package = "detectRUNS")

# calculating runs of Homozygosity
## Not run: 
# skipping runs calculation
runs <- slidingRUNS.run(genotypeFile, mapFile, windowSize = 15, threshold = 0.1,  minSNP = 15,
ROHet = FALSE,  maxOppositeGenotype = 1, maxMiss = 1,  minLengthBps = 100000,  minDensity = 1/10000)

## End(Not run)
# loading pre-calculated data
runsFile <- system.file("extdata", "Kijas2016_Sheep_subset.sliding.csv", package="detectRUNS")
runsDF <- readExternalRuns(inputFile = runsFile, program = 'detectRUNS')

plot_InbreedingChr(runs = runsDF, mapFile = mapFile, style='All')

# getting map and ped paths
genotypeFile <- system.file("extdata", "Kijas2016_Sheep_subset.ped", package = "detectRUNS")
mapFile <- system.file("extdata", "Kijas2016_Sheep_subset.map", package = "detectRUNS")

# calculating runs of Homozygosity
## Not run: 
# skipping runs calculation
runs <- slidingRUNS.run(genotypeFile, mapFile, windowSize = 15, threshold = 0.1,  minSNP = 15,
ROHet = FALSE,  maxOppositeGenotype = 1, maxMiss = 1,  minLengthBps = 100000,  minDensity = 1/10000)

## End(Not run)
# loading pre-calculated data
runsFile <- system.file("extdata", "Kijas2016_Sheep_subset.sliding.csv", package="detectRUNS")
runsDF <- readExternalRuns(inputFile = runsFile, program = 'detectRUNS')

plot_InbreedingChr(runs = runsDF, mapFile = mapFile, style='All')

Plot Froh-based inbreeding coefficients by group

Description

The function plots the distribution of inbreeding/consanguinity coefficients per chromosome and/or group. Three types of plots can be produces: barplots, boxplots, violin plots. With style="All" all three plots are produced.

Usage

plot_InbreedingChr(
  runs,
  mapFile,
  groupSplit = TRUE,
  style = c("ChrBarPlot", "ChrBoxPlot", "FrohBoxPlot", "All"),
  outputName = NULL,
  plotTitle = NULL,
  savePlots = FALSE
)
plot_InbreedingChr(
  runs,
  mapFile,
  groupSplit = TRUE,
  style = c("ChrBarPlot", "ChrBoxPlot", "FrohBoxPlot", "All"),
  outputName = NULL,
  plotTitle = NULL,
  savePlots = FALSE
)

Arguments

`runs`	R object (dataframe) with results on detected runs
`mapFile`	Plink map file (for SNP position)
`groupSplit`	plots split by group (defaults to TRUE)
`style`	type of plot: ChrBarPlot, ChrBoxPlot, FrohBoxPlot, All (all plots)
`outputName`	title prefix (the base name of graph, if savePlots is TRUE)
`plotTitle`	title in plot (default NULL)
`savePlots`	should plots be saved out to files or plotted in the graphical terminal (default)?

Value

plots of the distribution of inbreeding by chromosome and group

Examples

# getting map and ped paths
genotypeFile <- system.file("extdata", "Kijas2016_Sheep_subset.ped", package = "detectRUNS")
mapFile <- system.file("extdata", "Kijas2016_Sheep_subset.map", package = "detectRUNS")

# calculating runs of Homozygosity
## Not run: 
# skipping runs calculation
runs <- slidingRUNS.run(genotypeFile, mapFile, windowSize = 15, threshold = 0.1,  minSNP = 15,
ROHet = FALSE,  maxOppositeGenotype = 1, maxMiss = 1,  minLengthBps = 100000,  minDensity = 1/10000)

## End(Not run)
# loading pre-calculated data
runsFile <- system.file("extdata", "Kijas2016_Sheep_subset.sliding.csv", package="detectRUNS")
runsDF <- readExternalRuns(inputFile = runsFile, program = 'detectRUNS')

plot_InbreedingChr(runs = runsDF, mapFile = mapFile, style='All')

# getting map and ped paths
genotypeFile <- system.file("extdata", "Kijas2016_Sheep_subset.ped", package = "detectRUNS")
mapFile <- system.file("extdata", "Kijas2016_Sheep_subset.map", package = "detectRUNS")

# calculating runs of Homozygosity
## Not run: 
# skipping runs calculation
runs <- slidingRUNS.run(genotypeFile, mapFile, windowSize = 15, threshold = 0.1,  minSNP = 15,
ROHet = FALSE,  maxOppositeGenotype = 1, maxMiss = 1,  minLengthBps = 100000,  minDensity = 1/10000)

## End(Not run)
# loading pre-calculated data
runsFile <- system.file("extdata", "Kijas2016_Sheep_subset.sliding.csv", package="detectRUNS")
runsDF <- readExternalRuns(inputFile = runsFile, program = 'detectRUNS')

plot_InbreedingChr(runs = runsDF, mapFile = mapFile, style='All')

Plot the proportion of times SNPs are inside runs - MANHATTAN PLOT

Description

Function to plot the proportion of times/percentage each SNP in inside a run (population-specific signals) against SNP position in all chromosomes together Proportions on the y-axis, bps on the x-axis for all analysed chromosomes This is similar to the familiar GWAS Manhattan plot

Usage

plot_manhattanRuns(
  runs,
  genotypeFile,
  mapFile,
  pct_threshold = 0.33,
  x_font_size = 10,
  savePlots = FALSE,
  file_type = "pdf",
  outputName = NULL,
  plotTitle = NULL,
  plot_w = 8,
  plot_h = 6
)
plot_manhattanRuns(
  runs,
  genotypeFile,
  mapFile,
  pct_threshold = 0.33,
  x_font_size = 10,
  savePlots = FALSE,
  file_type = "pdf",
  outputName = NULL,
  plotTitle = NULL,
  plot_w = 8,
  plot_h = 6
)

Arguments

`runs`	a data.frame with runs per individual (group, id, chrom, nSNP, start, end, length)
`genotypeFile`	genotype (.ped) file path
`mapFile`	map file (.map) file path
`pct_threshold`	reference line for significant regions (e.g. 0.5 –> 50% SNPs in runs; default is 0.33)
`x_font_size`	font size for x axis values (chromosome numbers: default = 10)
`savePlots`	should plots be saved out in files (default) or plotted in the graphical terminal?
`file_type`	type of plot file to ba saved (if savePlots is TRUE; default is pdf)
`outputName`	title prefix (the base name of graph, if savePlots is TRUE)
`plotTitle`	title in plot (default)
`plot_w`	plot width (if savePlots = TRUE; default is 8)
`plot_h`	plot height (if savePlots = TRUE; default is 6)

Value

Manhattan plots of proportion of times SNPs are inside runs, per population (pdf files)

Examples

# getting map and ped paths
genotypeFile <- system.file("extdata", "Kijas2016_Sheep_subset.ped", package = "detectRUNS")
mapFile <- system.file("extdata", "Kijas2016_Sheep_subset.map", package = "detectRUNS")

# calculating runs of Homozygosity
## Not run: 
# skipping runs calculation
runs <- slidingRUNS.run(genotypeFile, mapFile, windowSize = 15, threshold = 0.1,  minSNP = 15,
ROHet = FALSE,  maxOppositeGenotype = 1, maxMiss = 1,  minLengthBps = 100000,  minDensity = 1/10000)

## End(Not run)
# loading pre-calculated data
runsFile <- system.file("extdata", "Kijas2016_Sheep_subset.sliding.csv", package="detectRUNS")
runsDF <- readExternalRuns(inputFile = runsFile, program = 'detectRUNS')

# plot runs per animal (interactive)
plot_manhattanRuns(runs = runsDF, genotypeFile = genotypeFile, mapFile = mapFile,
savePlots = FALSE, plotTitle = "ROHom")

# getting map and ped paths
genotypeFile <- system.file("extdata", "Kijas2016_Sheep_subset.ped", package = "detectRUNS")
mapFile <- system.file("extdata", "Kijas2016_Sheep_subset.map", package = "detectRUNS")

# calculating runs of Homozygosity
## Not run: 
# skipping runs calculation
runs <- slidingRUNS.run(genotypeFile, mapFile, windowSize = 15, threshold = 0.1,  minSNP = 15,
ROHet = FALSE,  maxOppositeGenotype = 1, maxMiss = 1,  minLengthBps = 100000,  minDensity = 1/10000)

## End(Not run)
# loading pre-calculated data
runsFile <- system.file("extdata", "Kijas2016_Sheep_subset.sliding.csv", package="detectRUNS")
runsDF <- readExternalRuns(inputFile = runsFile, program = 'detectRUNS')

# plot runs per animal (interactive)
plot_manhattanRuns(runs = runsDF, genotypeFile = genotypeFile, mapFile = mapFile,
savePlots = FALSE, plotTitle = "ROHom")

Plot sum of run-lengths (or average run-lengths) against the number of runs per individual

Description

Function to plot the sum of run lengths (or the average run length) per individual against the average number of runs per individual. Points can be differentially coloured by group/population. This plot can be useful to identify patterns in the distribution of runs in different groups (e.g. few long runs vs many short runs)

Usage

plot_PatternRuns(
  runs,
  mapFile,
  method = c("sum", "mean"),
  outputName = NULL,
  savePlots = FALSE,
  plotTitle = NULL
)
plot_PatternRuns(
  runs,
  mapFile,
  method = c("sum", "mean"),
  outputName = NULL,
  savePlots = FALSE,
  plotTitle = NULL
)

Arguments

`runs`	a data.frame with runs per individual (group, id, chrom, nSNP, start, end, length)
`mapFile`	map file (.map) file path
`method`	"sum" or "mean" of run lengths per individual sample
`outputName`	title prefix (the base name of graph, if savePlots is TRUE)#'
`savePlots`	should plots be saved out to files or plotted in the graphical terminal (default)?
`plotTitle`	title in plot (default NULL)

Value

plot of number of runs vs run-length sum/mean per individual sample

Examples

# getting map and ped paths
genotypeFile <- system.file("extdata", "Kijas2016_Sheep_subset.ped", package = "detectRUNS")
mapFile <- system.file("extdata", "Kijas2016_Sheep_subset.map", package = "detectRUNS")

# calculating runs of Homozygosity
## Not run: 
# skipping runs calculation
runs <- slidingRUNS.run(genotypeFile, mapFile, windowSize = 15, threshold = 0.1,  minSNP = 15,
ROHet = FALSE,  maxOppositeGenotype = 1, maxMiss = 1,  minLengthBps = 100000,  minDensity = 1/10000)

## End(Not run)
# loading pre-calculated data
runsFile <- system.file("extdata", "Kijas2016_Sheep_subset.sliding.csv", package="detectRUNS")
runsDF <- readExternalRuns(inputFile = runsFile, program = 'detectRUNS')

plot_PatternRuns(runs = runsDF, mapFile = mapFile, method = 'sum')
plot_PatternRuns(runs = runsDF, mapFile = mapFile, method = 'mean')

# getting map and ped paths
genotypeFile <- system.file("extdata", "Kijas2016_Sheep_subset.ped", package = "detectRUNS")
mapFile <- system.file("extdata", "Kijas2016_Sheep_subset.map", package = "detectRUNS")

# calculating runs of Homozygosity
## Not run: 
# skipping runs calculation
runs <- slidingRUNS.run(genotypeFile, mapFile, windowSize = 15, threshold = 0.1,  minSNP = 15,
ROHet = FALSE,  maxOppositeGenotype = 1, maxMiss = 1,  minLengthBps = 100000,  minDensity = 1/10000)

## End(Not run)
# loading pre-calculated data
runsFile <- system.file("extdata", "Kijas2016_Sheep_subset.sliding.csv", package="detectRUNS")
runsDF <- readExternalRuns(inputFile = runsFile, program = 'detectRUNS')

plot_PatternRuns(runs = runsDF, mapFile = mapFile, method = 'sum')
plot_PatternRuns(runs = runsDF, mapFile = mapFile, method = 'mean')

Function to plot runs per individual

Description

Function to plot runs per individual (see Williams et al. 2016, Animal Genetics, for an example with animal data) Individual IDs on the y-axis, bps on the x-axis (position along the chromosome)

Usage

plot_Runs(
  runs,
  suppressInds = FALSE,
  savePlots = FALSE,
  separatePlots = FALSE,
  outputName = NULL
)
plot_Runs(
  runs,
  suppressInds = FALSE,
  savePlots = FALSE,
  separatePlots = FALSE,
  outputName = NULL
)

Arguments

`runs`	a data.frame with runs per individual (group, id, chrom, nSNP, start, end, length)
`suppressInds`	shall we suppress individual IDs on the y-axis? (defaults to FALSE)
`savePlots`	should plots be saved out to files (one pdf file for all chromosomes) or plotted in the graphical terminal (default)?
`separatePlots`	should plots for each chromosome be saved out to separate files?
`outputName`	title prefix (the base name of graph, if savePlots is TRUE)

Value

plot of runs by chromosome

Examples

# getting map and ped paths
genotypeFile <- system.file("extdata", "Kijas2016_Sheep_subset.ped", package = "detectRUNS")
mapFile <- system.file("extdata", "Kijas2016_Sheep_subset.map", package = "detectRUNS")

# calculating runs of Homozygosity
## Not run: 
# skipping runs calculation
runs <- slidingRUNS.run(genotypeFile, mapFile, windowSize = 15, threshold = 0.1,  minSNP = 15,
ROHet = FALSE,  maxOppositeGenotype = 1, maxMiss = 1,  minLengthBps = 100000,  minDensity = 1/10000)

## End(Not run)
# loading pre-calculated data
runsFile <- system.file("extdata", "Kijas2016_Sheep_subset.sliding.csv", package="detectRUNS")
runsDF <- readExternalRuns(inputFile = runsFile, program = 'detectRUNS')

# plot runs per animal (interactive)
plot_Runs(runs = runsDF, suppressInds = FALSE, savePlots = FALSE, outputName = "ROHom")

# getting map and ped paths
genotypeFile <- system.file("extdata", "Kijas2016_Sheep_subset.ped", package = "detectRUNS")
mapFile <- system.file("extdata", "Kijas2016_Sheep_subset.map", package = "detectRUNS")

# calculating runs of Homozygosity
## Not run: 
# skipping runs calculation
runs <- slidingRUNS.run(genotypeFile, mapFile, windowSize = 15, threshold = 0.1,  minSNP = 15,
ROHet = FALSE,  maxOppositeGenotype = 1, maxMiss = 1,  minLengthBps = 100000,  minDensity = 1/10000)

## End(Not run)
# loading pre-calculated data
runsFile <- system.file("extdata", "Kijas2016_Sheep_subset.sliding.csv", package="detectRUNS")
runsDF <- readExternalRuns(inputFile = runsFile, program = 'detectRUNS')

# plot runs per animal (interactive)
plot_Runs(runs = runsDF, suppressInds = FALSE, savePlots = FALSE, outputName = "ROHom")

Plot the number of times each SNP falls inside runs

Description

Function to plot the number of times/percentage each SNP is inside a run (population-specific signals) against the SNP positions in the genome. Proportions on the y-axis, bps on the x-axis

Usage

plot_SnpsInRuns(
  runs,
  genotypeFile,
  mapFile,
  savePlots = FALSE,
  separatePlots = FALSE,
  outputName = NULL
)
plot_SnpsInRuns(
  runs,
  genotypeFile,
  mapFile,
  savePlots = FALSE,
  separatePlots = FALSE,
  outputName = NULL
)

Arguments

`runs`	a data.frame with runs per individual (group, id, chrom, nSNP, start, end, length)
`genotypeFile`	genotype (.ped) file path
`mapFile`	map file (.map) file path
`savePlots`	should plots be saved out in files (default) or plotted in the graphical terminal?
`separatePlots`	should plots for each chromosome be saved out to separate files?
`outputName`	title prefix (the base name of graph, if savePlots is TRUE)

Value

plot number of times a SNP is in a run by chromosome and population (pdf files)

Examples

# getting map and ped paths
genotypeFile <- system.file("extdata", "Kijas2016_Sheep_subset.ped", package = "detectRUNS")
mapFile <- system.file("extdata", "Kijas2016_Sheep_subset.map", package = "detectRUNS")

# calculating runs of Homozygosity
# skipping runs calculation
## Not run: 
runs <- slidingRUNS.run(genotypeFile, mapFile, windowSize = 15, threshold = 0.1,  minSNP = 15,
ROHet = FALSE,  maxOppositeGenotype = 1, maxMiss = 1,  minLengthBps = 100000,  minDensity = 1/10000)

## End(Not run)
# loading pre-calculated data
runsFile <- system.file("extdata", "Kijas2016_Sheep_subset.sliding.csv", package="detectRUNS")
runsDF <- readExternalRuns(inputFile = runsFile, program = 'detectRUNS')

# plot runs per animal (interactive)
plot_SnpsInRuns(runs = runsDF, genotypeFile = genotypeFile, mapFile = mapFile,
savePlots = FALSE, outputName = "ROHom")

# getting map and ped paths
genotypeFile <- system.file("extdata", "Kijas2016_Sheep_subset.ped", package = "detectRUNS")
mapFile <- system.file("extdata", "Kijas2016_Sheep_subset.map", package = "detectRUNS")

# calculating runs of Homozygosity
# skipping runs calculation
## Not run: 
runs <- slidingRUNS.run(genotypeFile, mapFile, windowSize = 15, threshold = 0.1,  minSNP = 15,
ROHet = FALSE,  maxOppositeGenotype = 1, maxMiss = 1,  minLengthBps = 100000,  minDensity = 1/10000)

## End(Not run)
# loading pre-calculated data
runsFile <- system.file("extdata", "Kijas2016_Sheep_subset.sliding.csv", package="detectRUNS")
runsDF <- readExternalRuns(inputFile = runsFile, program = 'detectRUNS')

# plot runs per animal (interactive)
plot_SnpsInRuns(runs = runsDF, genotypeFile = genotypeFile, mapFile = mapFile,
savePlots = FALSE, outputName = "ROHom")

Plot stacked runs

Description

Function to plot stacked runs along the chromosome (signaling presence of large numbers of runs in specific regions of a chromosome) Counts on the y-axis, bps on the x-axis (position along the chromosome)

Usage

plot_StackedRuns(
  runs,
  savePlots = FALSE,
  separatePlots = FALSE,
  outputName = NULL
)
plot_StackedRuns(
  runs,
  savePlots = FALSE,
  separatePlots = FALSE,
  outputName = NULL
)

Arguments

`runs`	a data.frame with runs per individual (group, id, chrom, nSNP, start, end, length)
`savePlots`	should plots be saved out in files (default) or plotted in the graphical terminal?
`separatePlots`	should plots for chromosomes be saved out to separate files?
`outputName`	title prefix (the base name of graph, if savePlots is TRUE)

Value

plot of stacked runs by population and by chromosome (pdf files)

Examples

# getting map and ped paths
genotypeFile <- system.file("extdata", "Kijas2016_Sheep_subset.ped", package = "detectRUNS")
mapFile <- system.file("extdata", "Kijas2016_Sheep_subset.map", package = "detectRUNS")

# calculating runs of Homozygosity
## Not run: 
# skipping runs calculation
runs <- slidingRUNS.run(genotypeFile, mapFile, windowSize = 15, threshold = 0.1,  minSNP = 15,
ROHet = FALSE,  maxOppositeGenotype = 1, maxMiss = 1,  minLengthBps = 100000,  minDensity = 1/10000)

## End(Not run)
# loading pre-calculated data
runsFile <- system.file("extdata", "Kijas2016_Sheep_subset.sliding.csv", package="detectRUNS")
runsDF <- readExternalRuns(inputFile = runsFile, program = 'detectRUNS')

# plot runs per animal (interactive)
plot_StackedRuns(runs = runsDF, savePlots = FALSE, outputName = "ROHom")

# getting map and ped paths
genotypeFile <- system.file("extdata", "Kijas2016_Sheep_subset.ped", package = "detectRUNS")
mapFile <- system.file("extdata", "Kijas2016_Sheep_subset.map", package = "detectRUNS")

# calculating runs of Homozygosity
## Not run: 
# skipping runs calculation
runs <- slidingRUNS.run(genotypeFile, mapFile, windowSize = 15, threshold = 0.1,  minSNP = 15,
ROHet = FALSE,  maxOppositeGenotype = 1, maxMiss = 1,  minLengthBps = 100000,  minDensity = 1/10000)

## End(Not run)
# loading pre-calculated data
runsFile <- system.file("extdata", "Kijas2016_Sheep_subset.sliding.csv", package="detectRUNS")
runsDF <- readExternalRuns(inputFile = runsFile, program = 'detectRUNS')

# plot runs per animal (interactive)
plot_StackedRuns(runs = runsDF, savePlots = FALSE, outputName = "ROHom")

Violin plot of run length per individual (either sum or mean)

Description

Function to produce violin plots of the distribution of runs lengths per group The sum of run lengths, or its average, per individual sample is used to characterize the distribution of runs

Usage

plot_ViolinRuns(
  runs,
  method = c("sum", "mean"),
  outputName = NULL,
  plotTitle = NULL,
  savePlots = FALSE
)
plot_ViolinRuns(
  runs,
  method = c("sum", "mean"),
  outputName = NULL,
  plotTitle = NULL,
  savePlots = FALSE
)

Arguments

`runs`	a data.frame with runs per individual (group, id, chrom, nSNP, start, end, length)
`method`	"sum" or "mean" of run lengths per individual samples
`outputName`	title prefix (the base name of graph, if savePlots is TRUE)
`plotTitle`	title in plot (default NULL)
`savePlots`	should plots be saved out to files or plotted in the graphical terminal (default)?

Value

Violin plot of the distribution of runs-lengths (sum or mean)

Examples

# getting map and ped paths
genotypeFile <- system.file("extdata", "Kijas2016_Sheep_subset.ped", package = "detectRUNS")
mapFile <- system.file("extdata", "Kijas2016_Sheep_subset.map", package = "detectRUNS")

# calculating runs of Homozygosity
## Not run: 
# skipping runs calculation
runs <- slidingRUNS.run(genotypeFile, mapFile, windowSize = 15, threshold = 0.1,  minSNP = 15,
ROHet = FALSE,  maxOppositeGenotype = 1, maxMiss = 1,  minLengthBps = 100000,  minDensity = 1/10000)

## End(Not run)
# loading pre-calculated data
runsFile <- system.file("extdata", "Kijas2016_Sheep_subset.sliding.csv", package="detectRUNS")
runsDF <- readExternalRuns(inputFile = runsFile, program = 'detectRUNS')

plot_ViolinRuns(runs = runsDF, method = "sum" , savePlots = FALSE)
plot_ViolinRuns(runs = runsDF, method = "mean" , savePlots = FALSE)

# getting map and ped paths
genotypeFile <- system.file("extdata", "Kijas2016_Sheep_subset.ped", package = "detectRUNS")
mapFile <- system.file("extdata", "Kijas2016_Sheep_subset.map", package = "detectRUNS")

# calculating runs of Homozygosity
## Not run: 
# skipping runs calculation
runs <- slidingRUNS.run(genotypeFile, mapFile, windowSize = 15, threshold = 0.1,  minSNP = 15,
ROHet = FALSE,  maxOppositeGenotype = 1, maxMiss = 1,  minLengthBps = 100000,  minDensity = 1/10000)

## End(Not run)
# loading pre-calculated data
runsFile <- system.file("extdata", "Kijas2016_Sheep_subset.sliding.csv", package="detectRUNS")
runsDF <- readExternalRuns(inputFile = runsFile, program = 'detectRUNS')

plot_ViolinRuns(runs = runsDF, method = "sum" , savePlots = FALSE)
plot_ViolinRuns(runs = runsDF, method = "mean" , savePlots = FALSE)

Read runs from external files

Description

Function to read in, from external files, the output of software for ROH:

detectRUNS: output saved out to a file (e.g. write.table)
Plink: output from the --homozyg option (.hom files)
BCFtools: output from the roh option

Usage

readExternalRuns(
  inputFile = NULL,
  program = c("plink", "BCFtools", "detectRUNS")
)
readExternalRuns(
  inputFile = NULL,
  program = c("plink", "BCFtools", "detectRUNS")
)

Arguments

`inputFile`	name of (path to) external file
`program`	source program that produced the ROH file (one of `detectRUNS`, `Plink`, `BCFtools`)

Value

dataframe in the correct format to be used with plots and statistics functions from detectRUNS

Examples

# getting map and ped paths
## Not run: 
genotypeFile <- system.file("extdata", "Kijas2016_Sheep_subset.ped", package = "detectRUNS")
mapFile <- system.file("extdata", "Kijas2016_Sheep_subset.map", package = "detectRUNS")

# calculating runs of Homozygosity
runs <- slidingRUNS.run(genotypeFile, mapFile, windowSize = 15, threshold = 0.1,  minSNP = 15,
ROHet = FALSE,  maxMissRun = 1, maxMissWindow = 1,  minLengthBps = 100000,  minDensity = 1/10000)

write.table(x= runs,file = 'Kijas2016_Sheep_subset.sliding.csv', quote=F, row.names = F)

## End(Not run)
runsFile <- system.file("extdata", "Kijas2016_Sheep_subset.sliding.csv", package = "detectRUNS")
newData=readExternalRuns(runsFile, program = 'detectRUNS')

# getting map and ped paths
## Not run: 
genotypeFile <- system.file("extdata", "Kijas2016_Sheep_subset.ped", package = "detectRUNS")
mapFile <- system.file("extdata", "Kijas2016_Sheep_subset.map", package = "detectRUNS")

# calculating runs of Homozygosity
runs <- slidingRUNS.run(genotypeFile, mapFile, windowSize = 15, threshold = 0.1,  minSNP = 15,
ROHet = FALSE,  maxMissRun = 1, maxMissWindow = 1,  minLengthBps = 100000,  minDensity = 1/10000)

write.table(x= runs,file = 'Kijas2016_Sheep_subset.sliding.csv', quote=F, row.names = F)

## End(Not run)
runsFile <- system.file("extdata", "Kijas2016_Sheep_subset.sliding.csv", package = "detectRUNS")
newData=readExternalRuns(runsFile, program = 'detectRUNS')

Function to return a dataframe of population (POP, ID)

Description

This is a core function. Read PED file and returns a data.frame with the first two columns

Usage

readPOPCpp(genotypeFile)
readPOPCpp(genotypeFile)

Arguments

genotypeFile

genotype (.ped) file location

Value

a dataframe of POP, ID

Function to reorder data frames by CHROMOSOME

Description

The data frame will be reordered according to chromosome: from 1 to n, then X, Y, XY, MT The data frame needs to have a column with name "CHROMOSOME"

Usage

reorderDF(dfx)
reorderDF(dfx)

Arguments

dfx

data frame to be reordered (with column "CHROMOSOME")

Details

Reorder results based on chromosome

Value

A reordered data frame by chromosome

Main function to detect RUNS (ROHom/ROHet) using sliding windows (a la Plink)

Description

This is one of the main function of detectRUNS and is used to detect runs (of homozygosity or heterozygosity) in the genome (diploid) with the sliding-window method. All parameters to detect runs (e.g. minimum n. of SNP, max n. of missing genotypes, max n. of opposite genotypes etc.) are specified here. Input data are in the ped/map Plink format (https://www.cog-genomics.org/plink/1.9/input#ped)

Usage

slidingRUNS.run(
  genotypeFile,
  mapFile,
  windowSize = 15,
  threshold = 0.05,
  minSNP = 3,
  ROHet = FALSE,
  maxOppWindow = 1,
  maxMissWindow = 1,
  maxGap = 10^6,
  minLengthBps = 1000,
  minDensity = 1/1000,
  maxOppRun = NULL,
  maxMissRun = NULL
)
slidingRUNS.run(
  genotypeFile,
  mapFile,
  windowSize = 15,
  threshold = 0.05,
  minSNP = 3,
  ROHet = FALSE,
  maxOppWindow = 1,
  maxMissWindow = 1,
  maxGap = 10^6,
  minLengthBps = 1000,
  minDensity = 1/1000,
  maxOppRun = NULL,
  maxMissRun = NULL
)

Arguments

`genotypeFile`	genotype (.ped) file path
`mapFile`	map file (.map) file path
`windowSize`	the size of sliding window (number of SNP loci) (default = 15)
`threshold`	the threshold of overlapping windows of the same state (homozygous/heterozygous) to call a SNP in a RUN (default = 0.05)
`minSNP`	minimum n. of SNP in a RUN (default = 3)
`ROHet`	should we look for ROHet or ROHom? (default = FALSE)
`maxOppWindow`	max n. of homozygous/heterozygous SNP in the sliding window (default = 1)
`maxMissWindow`	max. n. of missing SNP in the sliding window (default = 1)
`maxGap`	max distance between consecutive SNP to be still considered a potential run (default = 10^6 bps)
`minLengthBps`	minimum length of run in bps (defaults to 1000 bps = 1 kbps)
`minDensity`	minimum n. of SNP per kbps (defaults to 0.1 = 1 SNP every 10 kbps)
`maxOppRun`	max n. of opposite genotype SNPs in the run (optional)
`maxMissRun`	max n. of missing SNPs in the run (optional)

Details

This function scans the genome (diploid) for runs using the sliding-window method. This is a wrapper function for many component functions that handle the input data (ped/map files), perform internal conversions, accept parameters specifications, select whether runs of homozygosity (RoHom) or of heterozygosity (RoHet) are looked for.

This function returns a data frame with all runs detected in the dataset. This data frame can then be written out to a csv file. The data frame is, in turn, the input for other functions of the detectRUNS package that create plots and produce statistics from the results (see plots and statistics functions in this manual, and/or refer to the detectRUNS vignette).

Value

Examples

# getting map and ped paths
genotypeFile <- system.file("extdata", "Kijas2016_Sheep_subset.ped", package = "detectRUNS")
mapFile <- system.file("extdata", "Kijas2016_Sheep_subset.map", package = "detectRUNS")
# calculating runs with sliding window approach
## Not run: 
# skipping runs calculation
runs <- slidingRUNS.run(genotypeFile, mapFile, windowSize = 15, threshold = 0.1,
minSNP = 15, ROHet = FALSE,  maxOppWindow = 1, maxMissWindow = 1, maxGap=10^6,
minLengthBps = 100000,  minDensity = 1/10000)

## End(Not run)
# loading pre-calculated data
runsFile <- system.file("extdata", "Kijas2016_Sheep_subset.sliding.csv", package="detectRUNS")
colClasses <- c(rep("character", 3), rep("numeric", 4)  )
runs <- read.csv2(runsFile, header = TRUE, stringsAsFactors = FALSE,
colClasses = colClasses)

# getting map and ped paths
genotypeFile <- system.file("extdata", "Kijas2016_Sheep_subset.ped", package = "detectRUNS")
mapFile <- system.file("extdata", "Kijas2016_Sheep_subset.map", package = "detectRUNS")
# calculating runs with sliding window approach
## Not run: 
# skipping runs calculation
runs <- slidingRUNS.run(genotypeFile, mapFile, windowSize = 15, threshold = 0.1,
minSNP = 15, ROHet = FALSE,  maxOppWindow = 1, maxMissWindow = 1, maxGap=10^6,
minLengthBps = 100000,  minDensity = 1/10000)

## End(Not run)
# loading pre-calculated data
runsFile <- system.file("extdata", "Kijas2016_Sheep_subset.sliding.csv", package="detectRUNS")
colClasses <- c(rep("character", 3), rep("numeric", 4)  )
runs <- read.csv2(runsFile, header = TRUE, stringsAsFactors = FALSE,
colClasses = colClasses)

Function to slide a window over a vector (individual's genotypes)

Description

This is a core function. The functions to detect RUNS are slid over the genome

Usage

slidingWindow(
  data,
  gaps,
  windowSize,
  step,
  maxGap,
  ROHet = TRUE,
  maxOppositeGenotype = 1,
  maxMiss = 1
)
slidingWindow(
  data,
  gaps,
  windowSize,
  step,
  maxGap,
  ROHet = TRUE,
  maxOppositeGenotype = 1,
  maxMiss = 1
)

Arguments

`data`	vector of 0/1/2 genotypes
`gaps`	vector of differences between consecutive positions (gaps) in bps
`windowSize`	size of window (n. of SNP)
`step`	by which (how many SNP) is the window slid
`maxGap`	max distance between consecutive SNP in a window to be still considered a potential run
`ROHet`	shall we detect ROHet or ROHom?
`maxOppositeGenotype`	max n. of homozygous/heterozygous SNP
`maxMiss`	max. n. of missing SNP

Value

vector of TRUE/FALSE (whether a window is homozygous or NOT)

Function to slide a window over a vector (individual's genotypes)

Description

This is a core function. The functions to detect RUNS are slid over the genome

Usage

slidingWindowCpp(
  data,
  gaps,
  windowSize,
  step,
  maxGap,
  ROHet = TRUE,
  maxOppositeGenotype = 1L,
  maxMiss = 1L
)
slidingWindowCpp(
  data,
  gaps,
  windowSize,
  step,
  maxGap,
  ROHet = TRUE,
  maxOppositeGenotype = 1L,
  maxMiss = 1L
)

Arguments

`data`	vector of 0/1/2 genotypes
`gaps`	vector of differences between consecutive positions (gaps) in bps
`windowSize`	size of window (n. of SNP)
`step`	by which (how many SNP) is the window slid
`maxGap`	max distance between consecutive SNP in a window to be still considered a potential run
`ROHet`	shall we detect ROHet or ROHom?
`maxOppositeGenotype`	max n. of homozygous/heterozygous SNP
`maxMiss`	max. n. of missing SNP

Value

vector of TRUE/FALSE (whether a window is homozygous or NOT)

Function to return a vector of T/F for whether a SNP is or not in a RUN

Description

This is a core function. The function to determine whether a SNP is or not in a RUN. The ratio between homozygous/heterozygous windows and total n. of windows is computed here

Usage

snpInRun(RunVector, windowSize, threshold)
snpInRun(RunVector, windowSize, threshold)

Arguments

`RunVector`	vector of TRUE/FALSE (is a window homozygous/heterozygous?)
`windowSize`	size of window (n. of SNP)
`threshold`	threshold to call a SNP in a RUN

Value

vector of TRUE/FALSE (whether a SNP is in a RUN or NOT)

Function to return a vector of T/F for whether a SNP is or not in a RUN

Description

This is a core function. The function to determine whether a SNP is or not in a RUN. The ratio between homozygous/heterozygous windows and total n. of windows is computed here

Usage

snpInRunCpp(RunVector, windowSize, threshold)
snpInRunCpp(RunVector, windowSize, threshold)

Arguments

`RunVector`	vector of TRUE/FALSE (is a window homozygous/heterozygous?)
`windowSize`	size of window (n. of SNP)
`threshold`	threshold to call a SNP in a RUN

Value

vector of TRUE/FALSE (whether a SNP is in a RUN or NOT)

Function to count number of times a SNP is in a RUN

Description

Function to count number of times a SNP is in a RUN

Usage

snpInsideRunsCpp(runsChrom, mapChrom, genotypeFile)
snpInsideRunsCpp(runsChrom, mapChrom, genotypeFile)

Arguments

`runsChrom`	R object (dataframe) with results per chromosome
`mapChrom`	R map object with SNP per chromosome
`genotypeFile`	genotype (.ped) file location

Value

dataframe with counts per SNP in runs (per population)

Summary statistics on detected runs

Description

This function processes the results from slidingRUNS.run and consecutiveRUNS.run and produces a number of interesting descriptives statistics on results.

Usage

summaryRuns(runs, mapFile, genotypeFile, Class = 2, snpInRuns = FALSE)
summaryRuns(runs, mapFile, genotypeFile, Class = 2, snpInRuns = FALSE)

Arguments

`runs`	R object (dataframe) with results on detected runs
`mapFile`	Plink map file (for SNP position)
`genotypeFile`	Plink ped file (for SNP position)
`Class`	group of length (in Mbps) by class (default: 0-2, 2-4, 4-8, 8-16, >16)
`snpInRuns`	TRUE/FALSE (default): should the function `snpInsideRuns` be called to compute the proportion of times each SNP falls inside a run in the group/population?

Details

summaryRuns calculates: i) the number of runs per chromosome and group/population; ii) the percent distribution of runs per chromosome and group; iii) the number of runs per size-class and group; iv) the percent distribution of runs per size-class and group; v) the mean length of runs per chromosome and group; vi) the mean length of runs per size-class and group; vii) individual inbreeding coefficient estimated from ROH; viii) individual inbreeding coefficient estimated from ROH per chromosome; ix) individual inbreeding coefficient estimated from ROH per size-class

Value

A list of dataframes containing the most relevant descriptives statistics on detected runs. The list conveniently contains 9 dataframes that can be used for further processing and visualization, or can be written out to text files

Examples

# getting map and ped paths
genotypeFile <- system.file("extdata", "Kijas2016_Sheep_subset.ped", package = "detectRUNS")
mapFile <- system.file("extdata", "Kijas2016_Sheep_subset.map", package = "detectRUNS")

# calculating runs of Homozygosity
## Not run: 
# skipping runs calculation
runs <- slidingRUNS.run(genotypeFile, mapFile, windowSize = 15, threshold = 0.1,  minSNP = 15,
ROHet = FALSE,  maxOppositeGenotype = 1, maxMiss = 1,  minLengthBps = 100000,  minDensity = 1/10000)

## End(Not run)
# loading pre-calculated data
runsFile <- system.file("extdata", "Kijas2016_Sheep_subset.sliding.csv", package="detectRUNS")
runsDF <- readExternalRuns(inputFile = runsFile, program = 'detectRUNS')

summaryRuns(runs = runsDF, mapFile = mapFile, genotypeFile = genotypeFile, Class = 2,
snpInRuns = FALSE)

# getting map and ped paths
genotypeFile <- system.file("extdata", "Kijas2016_Sheep_subset.ped", package = "detectRUNS")
mapFile <- system.file("extdata", "Kijas2016_Sheep_subset.map", package = "detectRUNS")

# calculating runs of Homozygosity
## Not run: 
# skipping runs calculation
runs <- slidingRUNS.run(genotypeFile, mapFile, windowSize = 15, threshold = 0.1,  minSNP = 15,
ROHet = FALSE,  maxOppositeGenotype = 1, maxMiss = 1,  minLengthBps = 100000,  minDensity = 1/10000)

## End(Not run)
# loading pre-calculated data
runsFile <- system.file("extdata", "Kijas2016_Sheep_subset.sliding.csv", package="detectRUNS")
runsDF <- readExternalRuns(inputFile = runsFile, program = 'detectRUNS')

summaryRuns(runs = runsDF, mapFile = mapFile, genotypeFile = genotypeFile, Class = 2,
snpInRuns = FALSE)

Function to retrieve most common runs in the population

Description

This function takes in input either the run results or the output from the function snpInsideRuns (proportion of times a SNP is inside a run) in the population/group, and returns a subset of the runs most commonly found in the group/population. The parameter threshold controls the definition of most common (e.g. in at least 50%, 70% etc. of the sampled individuals)

Usage

tableRuns(
  runs = NULL,
  SnpInRuns = NULL,
  genotypeFile,
  mapFile,
  threshold = 0.5
)
tableRuns(
  runs = NULL,
  SnpInRuns = NULL,
  genotypeFile,
  mapFile,
  threshold = 0.5
)

Arguments

`runs`	R object (dataframe) with results on detected runs
`SnpInRuns`	dataframe with the proportion of times each SNP falls inside a run in the population (output from `snpInsideRuns`)
`genotypeFile`	Plink ped file (for SNP position)
`mapFile`	Plink map file (for SNP position)
`threshold`	value from 0 to 1 (default 0.7) that controls the desired proportion of individuals carrying that run (e.g. 70%)

Value

A dataframe with the most common runs detected in the sampled individuals (the group/population, start and end position of the run, chromosome and number of SNP included in the run are reported in the output dataframe)

Examples

# getting map and ped paths
genotypeFile <- system.file("extdata", "Kijas2016_Sheep_subset.ped", package = "detectRUNS")
mapFile <- system.file("extdata", "Kijas2016_Sheep_subset.map", package = "detectRUNS")

# calculating runs of Homozygosity
## Not run: 
# skipping runs calculation
runs <- slidingRUNS.run(genotypeFile, mapFile, windowSize = 15, threshold = 0.1,  minSNP = 15,
ROHet = FALSE,  maxOppositeGenotype = 1, maxMiss = 1,  minLengthBps = 100000,  minDensity = 1/10000)

## End(Not run)
# loading pre-calculated data
runsFile <- system.file("extdata", "Kijas2016_Sheep_subset.sliding.csv", package="detectRUNS")
runsDF = readExternalRuns(inputFile = runsFile, program = 'detectRUNS')

tableRuns(runs = runsDF, genotypeFile = genotypeFile, mapFile = mapFile, threshold = 0.5)

# getting map and ped paths
genotypeFile <- system.file("extdata", "Kijas2016_Sheep_subset.ped", package = "detectRUNS")
mapFile <- system.file("extdata", "Kijas2016_Sheep_subset.map", package = "detectRUNS")

# calculating runs of Homozygosity
## Not run: 
# skipping runs calculation
runs <- slidingRUNS.run(genotypeFile, mapFile, windowSize = 15, threshold = 0.1,  minSNP = 15,
ROHet = FALSE,  maxOppositeGenotype = 1, maxMiss = 1,  minLengthBps = 100000,  minDensity = 1/10000)

## End(Not run)
# loading pre-calculated data
runsFile <- system.file("extdata", "Kijas2016_Sheep_subset.sliding.csv", package="detectRUNS")
runsDF = readExternalRuns(inputFile = runsFile, program = 'detectRUNS')

tableRuns(runs = runsDF, genotypeFile = genotypeFile, mapFile = mapFile, threshold = 0.5)

Function to write out RUNS per individual animal

Description

Function to write out RUNS per individual animal

Usage

writeRUN(ind, dRUN, ROHet = TRUE, group, outputName)
writeRUN(ind, dRUN, ROHet = TRUE, group, outputName)

Arguments

`ind`	ID of animals
`dRUN`	data.frame with RUNS per animal
`ROHet`	shall we detect ROHet or ROHom?
`group`	group (factor): population, breed, ethnicity, case/control etc.
`outputName`	output filename

Value

TRUE/FALSE if RUNS are written out or not

Package 'detectRUNS'

Help Index

Main function to detect genomic RUNS (ROHom/ROHet) using the consecutive method

Description

Usage

Arguments

Details

Value

Examples

Function to detect consecutive runs in a vector (individual's genotypes)

Description

Usage

Arguments

Details

Value

Function to create a dataframe of RUNS per individual animal Requires a map file (other filename to read or R object) Parameters on maximum number of missing and opposite genotypes in the run (not the window) are implemented here

Description

Usage

Arguments

Value

Function to calculate oppositeAndMissingGenotypes array

Description

Usage

Arguments

Value

Function to calculated Froh genome-wide or chromosome-wide

Description

Usage

Arguments

Details

Value

Examples

Function to calculated Froh using a ROH-class

Description

Usage

Arguments

Value

Examples

Convert 0/1/2 genotypes to 0/1

Description

Usage

Arguments

Value

Function to check whether a window is (loosely) heterozygous or not

Description

Usage

Arguments

Value

Function to check whether a window is (loosely) heterozygous or not

Description

Usage

Arguments

Value

Function to check whether a window is (loosely) homozygous or not

Description

Usage

Arguments

Value

Function to check whether a window is (loosely) homozygous or not

Description

Usage

Arguments

Value

Convert ped genotypes to 0/1

Description

Usage

Arguments

Value

Plot Distribution of runs

Description

Usage

Arguments

Value

Examples

Plot Froh-based inbreeding coefficients by group

Description

Usage

Arguments

Value

Examples