MEGASAT: automated inference of microsatellite genotypes from sequence data

Zhan, Luyao, Paterson, Ian G, Fraser, Bonnie A, Watson, Beth, Bradbury, Ian R, Ravindran, Praveen Nadukkalam, Reznick, David, Beiko, Robert G and Bentzen, Paul (2016) MEGASAT: automated inference of microsatellite genotypes from sequence data. Molecular Ecology Resources. ISSN 1755-098X

[img] PDF - Accepted Version
Download (471kB)

Abstract

MEGASAT is software that enables genotyping of microsatellite loci using next-generation sequencing data. Microsatellites are amplified in large multiplexes, and then sequenced in pooled amplicons. MEGASAT reads sequence files and automatically scores microsatellite genotypes. It uses fuzzy matches to allow for sequencing errors and applies decision rules to account for amplification artefacts, including nontarget amplification products, replication slippage during PCR (amplification stutter) and differential amplification of alleles. An important fea- ture of MEGASAT is the generation of histograms of the length–frequency distributions of amplification products for each locus and each individual. These histograms, analogous to electropherograms traditionally used to score microsatellite genotypes, enable rapid evaluation and editing of automatically scored genotypes. MEGASAT is written in Perl, runs on Windows, Mac OS X and Linux systems, and includes a simple graphical user interface. We demon- strate MEGASAT using data from guppy, Poecilia reticulata. We genotype 1024 guppies at 43 microsatellites per run on an Illumina MiSeq sequencer. We evaluated the accuracy of automatically called genotypes using two methods, based on pedigree and repeat genotyping data, and obtained estimates of mean genotyping error rates of 0.021 and 0.012. In both estimates, three loci accounted for a disproportionate fraction of genotyping errors; conversely, 26 loci were scored with 0–1 detected error (error rate ≤0.007). Our results show that with appropriate selection of loci, automated genotyping of microsatellite loci can be achieved with very high throughput, low genotyping error and very low genotyping costs.

Item Type: Article
Keywords: animal mating/breeding systems, bioinformatics/phyloinformatics, captive populations, conservation genet- ics, landscape genetics, population genetics – empirical
Schools and Departments: School of Life Sciences > Evolution, Behaviour and Environment
Subjects: Q Science > QH Natural history > QH0301 Biology > QH0359 Evolution
Depositing User: Bonnie Fraser
Date Deposited: 14 Sep 2016 14:53
Last Modified: 25 Jul 2017 17:24
URI: http://sro.sussex.ac.uk/id/eprint/63342

View download statistics for this item

📧 Request an update