NGSRelate Steps

Step 2: Extract the frequence column for ngsRelate

Next we extract the frequency column from the allele frequency file (.mafs.gz) and remove the header.

Get the ngsRelate output

Now that the .res file has finished, let’s grab it and take a look.

Get the .res

Connect via scp or sftp and get your .res file.

Read in .res

Load the data in R for plotting. NGSrelate uses allele frequencies from samples and estimates IBD probabilities to infer the relationships between individuals. The output is a set of Cotterman coefficients of relatedness (k0, k1, k2).

Column definitions

  • k0:k2 are the maximum likelihood (ML) estimates of the relatedness coefficients. \(k\)’s are a function of the number of loci with 0, 1 or 2 alleles from IBD or IBS (IBS: two alleles are functionally the same), identity by descent (IBD: one is physical copy of other, or both physical copies of the same ancestral allele).
  • loglh column is the log of the likelihood of the ML estimate
  • nIter is the number of iterations of the maximization algorithm that was used to find the MLE
  • coverage is fraction of non-missing sites, i.e. the fraction of sites where data was available for both individuals, and where the minor allele frequency (MAF) above the threshold (default is 0.05 but the user may specify a different threshold). Note that in some cases nIter is -1. This indicates that values on the boundary of the parameter space had a higher likelihood than the values achieved using the EM-algorithm (ML methods sometimes have trouble finding the ML estimate when it is on the boundary of the parameter space, and we therefore test the boundary values explicitly and output these if these have the highest likelihood).

Visualization

Heat Map

Let’s take a look at an interactive heatmap, only RABO vs. RABO: