Motivation: Next-generation sequencing studies in clinical genetics are often limited by the scarcity of human genotype data, which stems from ethical, regulatory, and economic barriers. The shortfall is sharpest in consanguineous populations, which are common in South Asia and the Middle East, where family-based designs need large pedigrees that are rarely sequenced in full. Existing simulators do not combine pedigree-aware propagation, realistic population stratification, and clinical export formats in one tool. Results: We present GenoSim, an R package for forward-time simulation of diploid SNP genotypes. It runs in two modes: a population mode implementing inbreeding-adjusted Hardy-Weinberg sampling, Wright-Fisher drift, directional selection, recurrent mutation, and Haldane recombination across multiple generations; and a pedigree-constrained mode that ingests real family VCFs and a pedigree, reconstructs phase where the pedigree makes it identifiable, propagates genotypes through the observed family structure, and appends synthetic generations. Version 1.1.1 adds population stratification through the Balding-Nichols model parameterised by gnomAD v3.1 fixation indices (F_ST) for eight ancestry groups (AFR, AMR, EAS, EUR, FIN, MID, SAS, ASJ), empirical allele-frequency loading from external reference panels, and admixed-cohort simulation. Analysis functions cover Hardy-Weinberg testing, linkage disequilibrium, runs of homozygosity, principal component analysis, founder-referenced and between-generation F-statistics, and Nei gene diversity. Availability and implementation: GenoSim is available as an R package at https://github.com/malikbak/GenoSim under the MIT licence. It requires R [≥] 4.0.0 and depends only on base R packages (stats, utils, graphics, grDevices, tools).
Bakar, A., Gul, R., Haq, W. u., Afghani, T.
Advertisement
Stats
- Recommendations n/a n/a positive of 0 vote(s)
- Views 6
- Comments 0
