Constructing a BIGSI

1. Extract k-mers from your data

You can just any tool you want to extract unique k-mers from your raw data. We recommend mccortex as you can use it's error cleaning methods to extract error cleaned k-mers. However, you can also use a k-mer counter software like Jellyfish or a custom script.

mccortex/bin/mccortex31 build -k 31 -s test1 -1 example-data/kmers.txt example-data/test1.ctx
mccortex/bin/mccortex31 build -k 31 -s test2 -1 example-data/kmers.txt example-data/test2.ctx

2. Create the BIGSI config files


Choosing BIGSI parameters

See to decide on parameters k, m and h.

Below are three example configs to get you started with your preferred key value store berkeleyDB, rocksDB, or redis.

## Example config using berkeleyDB
h: 1
k: 31
m: 28000000
storage-engine: berkeleydb
  filename: test-berkeleydb
  flag: "c" ## Change to 'r' for read-only access
## Example config using redis
h: 1
k: 31
m: 28000000
storage-engine: redis
  host: localhost
  port: 6379
## Example config using rocksdb
h: 1
k: 31
m: 28000000
nproc: 4
storage-engine: rocksdb
  filename: test-rocksdb
    create_if_missing: true
    max_open_files: 5000
  read_only: false ## Change to true for read only access

3. Construct the bloom filters

export BIGSI_CONFIG=example-data/configs/rocks.yaml ## set the config path, or use --config

bigsi bloom example-data/test1.ctx example-data/test1.bloom
bigsi bloom example-data/test2.ctx example-data/test2.bloom

4. Insert the bloom filters into the index

bigsi build  example-data/test1.bloom example-data/test2.bloom -s s1 -s s2

5. Query the index


If you've installed with docker


docker run -v $PWD/example-data:/data phelimb/bigsi:63768c2 bigsi bloom --config /data/configs/berkeleydb.yaml /data/test1.ctx /data/test1.bloom    
docker run -v $PWD/example-data:/data phelimb/bigsi:63768c2 bigsi bloom --config /data/configs/berkeleydb.yaml /data/test1.ctx /data/test2.bloom

docker run -v $PWD/example-data:/data phelimb/bigsi:63768c2 bigsi build --config /data/configs/berkeleydb.yaml /data/test1.bloom /data/test2.bloom -s s1 -s s2
docker run -v $PWD/example-data:/data phelimb/bigsi:63768c2 bigsi search --config /data/configs/berkeleydb.yaml CGGCGAGGAAGCGTTAAATCTCTTTCTGACG