Quickstart Tutorial
This tutorial (5-10 minutes) walks through the core functionality of calculating B with Bvalcalc!
Installation
Bvalcalc is a python program called from the command line (terminal, shell, zsh, bash etc.)
You can install Bvalcalc on the command line via pip:
pip install Bvalcalc
Specifying popgen parameters
To calculate B, we need evolutionary information about the species of the population in a parameters file.
Copy population genetic parameters from one of the templates, e.g. Drosophila.
Bvalcalc --generate_params drosophila
Open ./DrosophilaParams.py
in your text editor of choice. This file contains example popgen parameters that allow us to accurately calculate B.
When calculating B for a new species you’ll need to check the literature and use informed values for your population, see Tailoring Parameters
Calculating a B value
Let’s start by calculating B at a site 1kb from the edge of a single conserved element 5kb in length.
Bvalcalc \
--site \
--pop_params ./DrosophilaParams.py \
--gene_size 5000 \
--distance 1000
The B-value for this site should be printed in the console as just above 0.99, which indicates diversity is expecteed to be reduced 1% due to BGS.
B recovery from one element
Now, let’s calculate the recovery of B as a function of distance from the 5kb element so we can plot it (gene_B.png
).
Bvalcalc \
--gene \
--gene_size 5000 \
--pop_params ./DrosophilaParams.py \
--plot_output gene_B.png
Have a look at the plot and the results printed to the console, you’ll notice B decays with distance from the selected element. It’s still a modest reduction (<3.1%), but remember, across a genome ALL selected elements will contribute to B at any given site.
B in a genomic region
Alright, now let’s look at part of a chromosome.
We can use a BED file (or GFF/CSV) that specifies which genomic ranges are conserved to calculate B for a region in the genome.
Let’s calculate B for a 1 Mb region in the middle of chromosome 2R [9500000-10500000] and save it to 1Mb_B.png
.
Bvalcalc \
--region 2R:9500000-10500000 \
--pop_params ./DrosophilaParams.py \
--bedgff_path examples/dmel6_2R_genes.csv \
--plot_output 1Mb_B.png
Have a look at the plots: the blue sections of the graph indicate neutral regions and black indicates conserved elements. That’s all that’s necessary for many analyses, especially if you’re only interested in B values for a specific region of the genome, or are testing against simulated results.
Calculating a B-map
If you wanted to generate a complete B-map for all sites across all chromosomes you would use the following command, though note it’s a lot more data to crunch and maps are already available for Drosophila so no need to run it!
Bvalcalc \
--genome \
--pop_params ./DrosophilaParams.py \
--bedgff_path examples/dmel6_2R_genes.csv \
--out Dmel_Bmap.csv \
--out_binsize 1000
If you had run that command, you’d get a B-map!
B-maps are useful to identify highly conserved regions of the genome, as a null-model for inference, e.g. SweepFinder2 with B-map, or to select the most neutrally-evolving sites for e.g. demographic inference, see Demographic Inference with B-map.