Sign in to PrePAN

Bio::DB::Big XS bindings to libBigWig for access the UCSC/kent big formats



use Bio::DB::Big;
use Bio::DB::Big::AutoSQL;

# Setup CURL buffers

my $bw = Bio::DB::Big->open('path/to/');
# Generic: get the type
if($bw->is_big_wig()) {
  print "We have a bigwig file\n";

# Generic: Get headers
my $header = $bw->header();
printf("Working with %d zoom levels", $header->{nLevels});
# Generic: Get chromosomes (comes back as a hash {chrom => length})
my $chroms = $bw->chroms();

#Get stats, values and intervals
if($bw->has_chrom('chr1')) {
  my $bins = 10;

  # uses the zoom levels and returns an array of 10 bins over chromsome positions 1-100
  my $stats = $bw->get_stats('chr1', 0, 100, $bins, 'mean');
  foreach my $s (@{$stats}) {
    printf("%f\n", $s);

  # Go directly to the raw level and calc on that but ask for maximum value per bin this time
  my $full_stats = $bw->get_stats('chr1', 0, 100, $bins, 'max', 1);

  # Get a value for each base over chromsome positions 1 - 100. Values can be undef if not set
  my $values = $bw->get_values('chr1', 0, 100);

  # Get the real intervals where a value was assigned
  my $intervals = $bw->get_intervals('chr1', 0, 100);
  foreach my $i (@{$intervals}) {
    printf("%d - %d: %f\n", $i->{start}, $i->{end}, $i->{value})

  # Or iterate which allows you to move through a file without loading everything into memory
  my $blocks_per_iter = 10;
  my $iter = $bw->get_intervals_iterator('chr1', 0, 100, $blocks_per_iter);
  while(my $intervals = $iter->next()) {
    foreach my $i (@{$intervals}) {
      printf("%d - %d: %f\n", $i->{start}, $i->{end}, $i->{value})

my $bb = Bio::DB::Big->open('');
if($bb->is_big_bed) {
  my $with_string = 1;
  # Optionally you do not retrieve the "string" if you don't want to potenitally saving memory
  my $entries = $bb->get_entries('chr21', 9000000, 10000000, $with_string);
  foreach my $e (@{$entries}) {
    printf("%d - %d: %s\n", $e->{start}, $e->{end}, $e->{string});

  # Or you can use an iterator
  my $blocks_per_iter = 10;
  my $iter = $bb->get_entries_iterator('chr21', 0, $bb->chrom_length('chr21'), $with_string, $blocks_per_iter);
  while(my $entries = $iter->next()) {
    foreach my $e (@{$entries}) {
      printf("%d - %d: %s\n", $e->{start}, $e->{end}, $e->{string});

  # Finally you can request AutoSQL and parse if available
  if($bb->get_autosql()) {
    my $autosql = $bb->get_autosql();
    my $as = Bio::DB::Big::AutoSQL->new($autosql);
    if($as->has_field('name')) {
      printf("%s: The field 'name' is in position %d\n", $as->name(), $as->get_field('name')->position());
    # Or just get all fields as an arrayref
    my $fields = $as->fields();


This library provides access to the BigWig and BigBed file formats designed by UCSC. However rather than use kent libraries this uses libBigWig from as it provides an implementation that avoids exiting when errors happen. libBigWig provides access to BigWig summaries, values and intervals alongside providing access to BigBed entries.


Just a few comments on your XS code:

h = (HV *)sv_2mortal((SV *)newHV());

To fix the refcount problem, I would use the T_HVREF_REFCOUNT_FIXED typemap entry, rather than the sv_2mortal black magic. See perlxstypemap.

Same for the arrays.

hv_store(h, "version", 7, newSVuv(big->hdr->version), 0);

I would use the more maintainable hv_stores(h, "version", newSVuv(...)).

Please sign up to post a review.