PrePAN

Sign in to PrePAN

Bio::DB::Big XS bindings to libBigWig for access the UCSC/kent big formats

Good

Synopsis

use Bio::DB::Big;
use Bio::DB::Big::AutoSQL;

# Setup CURL buffers
Bio::DB::Big->init();

my $bw = Bio::DB::Big->open('path/to/file.bw');
# Generic: get the type
if($bw->is_big_wig()) {
  print "We have a bigwig file\n";
}

# Generic: Get headers
my $header = $bw->header();
printf("Working with %d zoom levels", $header->{nLevels});
# Generic: Get chromosomes (comes back as a hash {chrom => length})
my $chroms = $bw->chroms();

#Get stats, values and intervals
if($bw->has_chrom('chr1')) {
  my $bins = 10;

  # uses the zoom levels and returns an array of 10 bins over chromsome positions 1-100
  my $stats = $bw->get_stats('chr1', 0, 100, $bins, 'mean');
  foreach my $s (@{$stats}) {
    printf("%f\n", $s);
  }

  # Go directly to the raw level and calc on that but ask for maximum value per bin this time
  my $full_stats = $bw->get_stats('chr1', 0, 100, $bins, 'max', 1);

  # Get a value for each base over chromsome positions 1 - 100. Values can be undef if not set
  my $values = $bw->get_values('chr1', 0, 100);

  # Get the real intervals where a value was assigned
  my $intervals = $bw->get_intervals('chr1', 0, 100);
  foreach my $i (@{$intervals}) {
    printf("%d - %d: %f\n", $i->{start}, $i->{end}, $i->{value})
  }

  # Or iterate which allows you to move through a file without loading everything into memory
  my $blocks_per_iter = 10;
  my $iter = $bw->get_intervals_iterator('chr1', 0, 100, $blocks_per_iter);
  while(my $intervals = $iter->next()) {
    foreach my $i (@{$intervals}) {
      printf("%d - %d: %f\n", $i->{start}, $i->{end}, $i->{value})
    }
  }
}

my $bb = Bio::DB::Big->open('http://genome.ucsc.edu/goldenPath/help/examples/bigBedExample.bb');
if($bb->is_big_bed) {
  my $with_string = 1;
  # Optionally you do not retrieve the "string" if you don't want to potenitally saving memory
  my $entries = $bb->get_entries('chr21', 9000000, 10000000, $with_string);
  foreach my $e (@{$entries}) {
    printf("%d - %d: %s\n", $e->{start}, $e->{end}, $e->{string});
  }

  # Or you can use an iterator
  my $blocks_per_iter = 10;
  my $iter = $bb->get_entries_iterator('chr21', 0, $bb->chrom_length('chr21'), $with_string, $blocks_per_iter);
  while(my $entries = $iter->next()) {
    foreach my $e (@{$entries}) {
      printf("%d - %d: %s\n", $e->{start}, $e->{end}, $e->{string});
    }
  }

  # Finally you can request AutoSQL and parse if available
  if($bb->get_autosql()) {
    my $autosql = $bb->get_autosql();
    my $as = Bio::DB::Big::AutoSQL->new($autosql);
    if($as->has_field('name')) {
      printf("%s: The field 'name' is in position %d\n", $as->name(), $as->get_field('name')->position());
    }
    # Or just get all fields as an arrayref
    my $fields = $as->fields();
  }
}

Description

This library provides access to the BigWig and BigBed file formats designed by UCSC. However rather than use kent libraries this uses libBigWig from https://github.com/dpryan79/libBigWig as it provides an implementation that avoids exiting when errors happen. libBigWig provides access to BigWig summaries, values and intervals alongside providing access to BigBed entries.

Comments

Just a few comments on your XS code:

h = (HV *)sv_2mortal((SV *)newHV());

To fix the refcount problem, I would use the T_HVREF_REFCOUNT_FIXED typemap entry, rather than the sv_2mortal black magic. See perlxstypemap.

Same for the arrays.

hv_store(h, "version", 7, newSVuv(big->hdr->version), 0);

I would use the more maintainable hv_stores(h, "version", newSVuv(...)).

Please sign up to post a review.