Demographic Inference with B-map
Demographic inference tools are confounded by the linked effects of selection; see Johri et al. (2021) for biases in MSMC-like and SFS-based approaches, and Marsh et al. (2024) for biases in ARG-based approaches.
To best avoid biases, it is essential to only use the most neutrally-evolving sites for demographic inference , i.e. the sites least affected by sweeps and BGS (highest B).
Using --Bmap
(see B-map Utilities for VCF), we can use a B-map from --genome
to filter a VCF or CSV to keep only the most neutrally evolving sites.
Plotting the B distribution
The following command will find the B of each position in the VCF using the B-map, print a brief summary to standard out and save a simple plot (B_distribution.png) of the results.
Bvalcalc --Bmap your_B_map.csv \
--positions your.vcf \
--plot_distribution
--out variants_B.csv
Now we can open up B_distribution.png which will help decide on a minimum B cut-off for our demographic inference analysis (e.g. B >= 0.9).
In addition we have variants_B.csv which lists the B for each position in case you’d like to run your own stats or do your own plotting of the results.
Saving positions with high B
Next, we can pick a cut-off and filter our list of positions to keep only sites with B >= 0.9 and save the output to filtered_positions.csv.
Bvalcalc --Bmap your_B_map.csv \
--positions your.vcf \
--out_minimum 0.9
--out filtered_positions.txt
--bcftools_format
Adding the --bcftools_format
option removes the B value column and reformats the output so it’s easier to filter with bcftools.
Filtering a VCF
Using the newly saved filtered_positions.csv and our original VCF we can filter using bcftools:
bcftools view \
-R variants_B_above_0.9.txt \
your.vcf -Ov -o filtered.vcf
Now the filtered.vcf only contains sites with B >= 0.9 and is ready for more accurate demographic inference with your tool of choice!