Sign in to PrePAN

PrePAN provides a place
to discuss your modules.


Requests for Reviews Feed

Code::Search Search code for strings

Search source code for strings.

Trigram index of medium number (40,000 or so) source code files.

Already have developed code but it is not in a state fit to release, and it is all in C except for a few scripts.

benkasminbullock@github 4 comments

Hash::Wrap (formerly Return::Object) create lightweight on-the-fly objects from hashes


This module is now known as Hash::Wrap

The text below represents the original module, Return::Object. Please see the github repo for the current documentation.


This module provides routines which encapsulate a hash as an object. The object provides methods for keys in the hash; attempting to access a non-existent key via a method will cause an exception.

The impetus for this was to encapsulate data returned from a subroutine or method (hence the name). Returning a bare hash can lead to bugs if there are typos in hash key names when accessing the hash.

It is not necessary for the hash to be fully populated when the object is created. The underlying hash may be manipulated directly, and changes will be reflected in the object's methods. To prevent this, consider using the lock routines in Hash::Util on the object after creation.

Only hash keys which are legal method names will be accessible via object methods.

Object construction and constructor customization

By default Object::Return exports a return_object constructor which, given a hashref, blesses it directly into the Return::Object::Class class.

The constructor may be customized to change which class the object is instantiated from, and how it is constructed from the data. Return::Object uses Exporter::Tiny to perform the customization. For example,

use Return::Object
  return_object => { -as => 'return_cloned_object',
                     -clone => 1 };

will create a version of return_object which clones the passed hash and is imported as return_cloned_object. To import it under the original name, return_object, leave out the -as option.

The following options are available to customize the constructor.

  • -as => subroutine name

    This is optional, and imports the customized version of return_object with the given name.

  • -class => class name

    The object will be blessed into the specified class. If the class should be created on the fly, specify the -create option. See "Object Classes" for what is expected of the object classes. This defaults to Object::Return::Class.

  • -create => boolean

    If true, and -class is specified, a class with the given name will be created.

  • -copy => boolean

    If true, the object will store the data in a shallow copy of the hash. By default, the object uses the hash directly.

  • -clone => boolean

    If true, the object will store the data in a deep copy of the hash, made with "dclone" in Storeable. By default, the object uses the hash directly.

Object Classes

An object class has the following properties:

  • The class must be a subclass of Return::Object::Base.
  • The class typically does not provide any methods, as they would mask a hash key of the same name.
  • The class need not have a constructor. If it does, it is passed a hashref which it should bless as the actual object. For example:

    package My::Result;
    use parent 'Return::Object::Base';
    sub new {
      my  ( $class, $hash ) = @_;
      return bless $hash, $class;

    This excludes having a hash key named new.

Return::Object::Base provides an empty DESTROY method, a can method, and an AUTOLOAD method. They will mask hash keys with the same names.


You can make new bug reports, and view existing ones, through the web interface at


Please see those modules/websites for more information related to this module.


Diab Jerius


This software is Copyright (c) 2017 by Smithsonian Astrophysical Observatory.

This is free software, licensed under:

The GNU General Public License, Version 3, June 2007

djerius@github 3 comments

Types::PDL PDL types using Type::Tiny


This module provides Type::Tiny compatible types for PDL.



Allows an object blessed into the class PDL, e.g.

validate( [pdl], Piddle );

Piddle accepts the following parameters ("Parameters"):



Some types take optional parameters which add additional constraints on the object. For example, to indicate that only empty piddles are accepted:

validate( [pdl], Piddle[ empty => 1 ] );

The available parameters are:

  • empty

    This accepts a boolean value; if true the piddle must be empty (i.e. the isempty method returns true), if false, it must not be empty.

  • null

    This accepts a boolean value; if true the piddle must be a null piddle, if false, it must not be null.

  • ndims

    This specifies a fixed number of dimensions which the piddle must have. Don't mix use this with ndims_min or ndims_max.

  • ndims_min

    The minimum number of dimensions the piddle may have. Don't specify this with ndims.

  • ndims_max

    The maximum number of dimensions the piddle may have. Don't specify this with ndims.

djerius@github 0 comments

Type::TinyX::Facets Easily create a facet parameterized Type::Tiny type


Type::TinyX::Facets make it easy to create parameterized types with facets.

Type::Tiny allows definition of types which can accept parameters:

use Types::Standard -types;

my $t1 = Array[Int];
my $t2 = Tuple[Int, HashRef];

This defines $t1 as an array of integers. and $t2 as a tuple of two elements, an integer and a hash.

Parameters are passed as a list to the parameterized constraint generation machinery, and there is great freedom in how they may be interpreted.

This module makes it easy to create a parameterized type which takes name - value pairs or,facets. (The terminology is taken from Types::XSD::Lite, upon which this module owes its existence.)


djerius@github 0 comments

Sql::Textify Run a SQL query and get the result in text format (markdown, html, csv)

This module executes SQL queries and produces a text output (markdown, html, csv, ...). Connection details, username and password can be specified in a C-style comment inside the SQL query or using the constructor:

my $t = new Sql::Textify(
    conn => 'dbi:SQLite:dbname=test.sqlite3',
    username => 'myusername',
    password => 'mypassword',
    format => 'markdown',
    layout => 'table'
my $text = $t->textify('select * from gardens');

I couldn't find any module that converts data to markdown so I have written one. I am using a script that makes use of this code everyday because I want do run SQL queries on various dbms at the command line (or using sublime text) and I like to get the result in markdown format or in other text formats.

fthiella@github 1 comment

Vlc::Engine vlc engine

Vlc::Engine is a perl module that allow users to use libvlc via perl.

jamesroseaxl@twitter 0 comments

Data::Enumerable::Lazy A lazy enumerator/generator optimised for non-flat collections

This library is yet another implementation of lazy generator + enumerable pattern for Perl5. First of all, it is not a parallel calculation framework. For the parallelism problems, Generator::Object would be the way to go.

This library provides building blocks for lazy data manipulation. One of the key features of this library is that it abstracts the end users away from a multi-level nested sub-collection enumeration whereas a classical enumerator normally operates within a flat collection context. Think of it this way: this library provides a built-in functionality to resolve enumerable steps in micro-batches, and these micro-batches might be enumerables as well and so on and so forth. But it definitely does not force the micro-batched way.

A quick example: let's say there is a task: read multiple plain text files word-by-word. This task contains several nesting enumerable loops:

-> file
  -> line
    -> word (we assume there is no word wrap, but it is not a problem for our model)

Implemented in imperative style, the program would look like:

foreach my $file (@files) {
  foreach my $line ($file->read_lines) {
    foreach my $word ($line->split_words) {
      # do something

Let's say there is one more level of complexity: multi-partition setup:

foreach my $partition (@partitions) {
  foreach my $file ($partition->ls_files) {

We have to implement the loops over and over again, exposing the internal knowledge about the partitions, files, lines etc. But we can do it better. Let's examine a lazy enumerable approach:

my $word_enum = Data::Enumerable::Lazy::from_list(@partitions)
  -> continue({ on_has_next => sub { my ($self, $partition) = @_; $self->yield($partition->ls_files) } })
  -> continue({ on_has_next => sub { my ($self, $file)      = @_; $self->yield($file->read_lines   ) } })
  -> continue({ on_has_next => sub { my ($self, $line)      = @_; $self->yield($line->split_words  ) } });

while ($word_enum->has_next) {
  my $word = $word_enum->next;
  # do something

The benefit is that the end user might have zero knowledge about partitions, files, lines, etc. Adding a new nesting loop for multi-computer fetch? Easy.

The end user is focused on consuming the words, not intermediate stages. A wrapper library might hide the implementation details and return the enumerable back to the end user.

Enumerables are single-pass calculation units. What it means: an enumerable is stateful, once it reached the end of the sequence, it will not rewind to the beginning.

Enumerables have internal buffer: another enumerable which preserves the pre-fetched collection. The fig. above illustrates the buffering algorithm.

[enumerable.has_next] -> [_buffer.has_next] -> yes -> return true
                                              -> no -> result = [enumerable.on_has_next] -> return result

[] -> [_buffer.has_next] -> yes -> return []
                                          -> no -> result = [] -> [enumerable.set_buffer(result)] -> return


A basic range

This example implements a range generator from $from until $to. In order to generate this range we define 2 callbacks: on_has_next() and on_next(). The first one is used as point of truth whether the sequence has any more non-visited elements, and the 2nd one is to return the next element in the sequence and the one that changes the state of the internal sequence iterator.

sub basic_range {
  my ($from, $to) = @_;
  my $current = $from;
    on_has_next => sub {
      return $current  sub {
      my ($self) = @_;
      return $self->yield($current++);

on_has_next() makes sure the current value does not exceed $to value, and on_next() yields the next value of the sequence. Note the yield method. An enumerable developer is expected to use this method in order to return the next step value. This method does some internal bookkeeping and smart caching.


# We initialize a new range generator from 0 to 10 including.
my $range = basic_range(0, 10);
# We check if the sequence has elements in it's tail.
while ($range->has_next) {
  # In this very line the state of $range is being changed
  say $range->next;
is $range->has_next, 0, '$range has been iterated completely'
is $range->next, undef, 'A fully iterated sequence returns undef on next()'

Prime numbers

Prime numbers is an infinite sequence of natural numbers. This example implements a very basic prime number generator.

my $prime_num_stream = Data::Enumerable::Lazy->new({
  # This is an infinite sequence
  on_has_next => sub { 1 },
  on_next => sub {
    my $self = shift;
    # We save the result of the previous step
    my $next = $self->{_prev_} // 1;
    LOOKUP: while (1) {
      # Check all numbers from 2 to sqrt(N)
      foreach (2..floor(sqrt($next))) {
        ($next % $_ == 0) and next LOOKUP;
      last LOOKUP;
    # Save the result in order to use it in the next step
    $self->{_prev_} = $next;
    # Return the result

What's remarkable regarding this specific example is that one can not simply call to_list() in order to get all elements of the sequence. The enumerable will throw an exception claiming it's an infinitive sequence. Therefore, we should use next() in order to get elements one by one or use another handy method take() which returns first N results.

Nested enumerables

In this example we will output a numbers of a multiplication table 10x10. What's interesting in this example is that there are 2 sequences: primary and secondary. Primary on_next() returns secondary sequence, which generates the result of multiplication of 2 numbers.

# A new stream based on a range from 1 to 10
my $mult_table = Data::Enumerable::Lazy->from_list(1..10)->continue({
  on_has_next => sub {
    my ($self, $i) = @_;
    # The primary stream returns another sequence, based on range
      on_next => sub {
        # $_[0] is a substream self
        # $_[1] is a next substream sequence element
        $_[0]->yield( $_[1] * $i )

Another feature which is demonstrated here is the batched result generation. Let's iterate the sequence step by step and see what happens inside.

$mult_table->has_next;         # returns true based on the primary range, _buffer is
                               # empty
$mult_table->next;             # returns 1, the secondary sequence is now stored as
                               # the primary enumerable buffer and 1 is being served
                               # from this buffer
$mult_table->has_next;         # returns true, resolved by the state of the buffer
$mult_table->next;             # returns 2, moves buffer iterator forward, the
                               # primary sequence on_next() is _not_ being called
                               # this time
$mult_table->next for (3..10); # The last iteration completes the buffer
                               # iteration cycle
$mult_table->has_next;         # returns true, but now it calls the primary
                               # on_has_next()
$mult_table->next;             # returns 2 as the first element in the next
                               # secondary sequence (which is 1 again) multiplied by
                               # the 2nd element of the primary sequence (which is 2)
$mult_table->to_list;          # Generates the tail of the sesquence:
                               # [4, 6, ..., 80, 90, 100]
$mult_table->has_next;         # returns false as the buffer is empty now and the
                               # primary sequence on_has_next() says there is nothing
                               # more to iterate over.


on_next($self, $element) :: CodeRef -> Data::Enumerable::Lazy | Any

on_next is a code ref, a callback which is being called every time the generator is in demand for a new bit of data. Enumerable buffers up the result of the previous calculation and if there are no more elements left in the buffer, on_next() would be called.

$element is defined when the current collection is a contuniation of another enumerable. I.e.:

my $enum = Data::Enumerable::Lazy->from_list(1, 2, 3);
my $enum2 = $enum->continue({
    on_next => sub {
        my ($self, $i) = @_;
        $self->yield($i * $i)
$enum2->to_list; # generates 1, 4, 9

In this case $i would be defined and it comes from the original enumerable.

The function is supposed to return an enumerable, in this case it would be kept as the buffer object. If this function method returns any other value, it would be wrapped in a Data::Enumerable::Lazy->singular(). There is a way to prevent an enumerable from wrapping your return value in an enum and keeping it in a raw state by providing _no_wrap=1.

on_has_next($self) :: CodeRef -> Bool

on_has_next is a code ref, a callback to be called whenever the enumerable is about to resolve has_next() method call. Similar to on_next() call, this one is also triggered whenever an enumerable runs out of buffered elements. The function should return boolean.

A method that returns 1 all the time is the way to initialise an infinite enumerable (see infinity()). If it returns 0 no matter what, it would be an empty enumerable (see empty()). Normally you stay somewhere in the middle and implement some state check logic in there.

on_reset($self) :: CodeRef -> void

This is a callback to be called in order to reset the state of the enumerable. This callback should be defined in the same scope as the enumerable itself. The library provides nothing magical but a callback and a handle to call it, so the state cleanup is completely on the developer's side.

is_finite :: Bool

A boolean flag indicating whether an enumerable is finite or not. By default, enumerables are treated as infinite, which means some functions will throw an exception, like: to_list() or resolve().

Make sure to not mark an enumerable as finite and to call finite-size defined methods, in this case it will create an infinite loop on the resolution.



Function next() is the primary interface for accessing elements of an enumerable. It will do some internal checks and if there is no elements to be served from an intermediate buffer, it will resolve the next step by calling on_next() callback. Enumerables are composable: one enumerable might be based on another enumeration. E.g.: a sequence of natural number squares is based on the sequence of natural numbers themselves. In other words, a sequence is defined as a tuple of another sequence and a function which would be lazily applied to every element of this sequence.

next() accepts 0 or more arguments, which would be passed to on_next() callback.

next() is expected to do the heavy-lifting job in opposite to has_next(), which is supposed to be cheap and fast. This statement flips upside down whenever grep() is applied to a stream. See grep() for more details.


has_next() is the primary entry point to get an information about the state of an enumerable. If the method returned false, there are no more elements to be consumed. I.e. the sequence has been iterated completely. Normally it means the end of an iteration cycle.

Enumerables use internal buffers in order to support batched on_next() resolutions. If there are some elements left in the buffer, on_next() won't call on_has_next() callback immediately. If the buffer has been iterated completely, on_has_next() would be called.

on_next() should be fast on resolving the state of an enumerable as it's going to be used for a condition state check.


This method is a generic entry point for a enum reset. In fact, it is basically a wrapper around user-defined on_reset().


This function transforms a lazy enumerable to a list. Only finite enumerables can be transformed to a list, so the method checks if an enumerable is created with is_finite=1 flag. An exception would be thrown otherwise.


Creates a new enumerable by applying a user-defined function to the original enumerable. Works the same way as perl map {} function but it's lazy.

reduce($acc, $callback)

Resolves the enumerable and returns the resulting state of the accumulator $acc provided as the 1st argument. $callback should always return the new state of $acc.

reduce() is defined for finite enumerables only.

grep($callback, $max_lookahead)

grep() is a function which returns a new enumerable by applying a user-defined filter function.

grep() might be applied to both finite and infinite enumerables. In case of an infinitive enumerable there is an additional argument specifying max number of lookahead steps. If an element satisfying the condition could not be found in max_lookahead steps, an enumerable is considered to be completely iterated and has_next() will return false.

grep() returns a new enumerable with quite special properties: has_next() will perform a look ahead and call the original enumerable next() method in order to find an element for which the user-defined function will return true. next(), on the other side, returns the value that was pre-fetched by has_next().


Resolves an enumerable completely. Applicable for finite enumerables only. The method returns nothing.


Resolves first $N_elements and returns the resulting list. If there are fewer than N elements in the enumerable, the entire enumerable would be returned as a list.


This function takes elements until it meets the first one that does not satisfy the conditional callback. The callback receives only 1 argument: an element. It should return true if the element should be taken. Once it returned false, the stream is over.

continue($ext = %{ on_next => sub {}, ... })

Creates a new enumerable by extending the existing one. on_next is the only mandatory argument. on_has_next might be overridden if some custom logic comes into play.

is_finite is inherited from the parent enumerable by default. All additional attributes would be transparently passed to the constructor.


This method is supposed to be called from on_next callback only. This is the only valid way for an Enumerable to return the next result. Effectively, it ensures the returned result conforms to the required interface and is wrapped in a lazy wrapper if needed.



Returns an empty enumerable. Effectively it means an equivalent of an empty array. has_next() will return false and next() will return undef. Useful whenever a on_next() step wants to return an empty result set.


Returns an enumerable with a single element $val. Actively used as an internal data container.


Returns a new enumerable instantiated from a list. The easiest way to initialise an enumerable. In fact, all elements are already resolved so this method sets is_finite=1 by default.


Creates an infinitive enumerable by cycling the original list. E.g. if the original list is [1, 2, 3], cycle() will generate an infinitive sequence like: 1, 2, 3, 1, 2, 3, 1, ...


Returns a new infinite enumerable. has_next() always returns true whereas next() returns undef all the time. Useful as an extension basis for infinite sequences.

merge($tream1 [, $tream2 [, $tream3 [, ...]]])

This function merges one or more streams together by fan-outing next() method call among the non-empty streams. Returns a new enumerable instance, which: * Has next elements as far as at least one of the streams does. * Returns next element py picking it one-by-one from the streams. * Is finite if and only if all the streams are finite. If one of the streams is over, it would be taken into account and next() will continue choosing from non-empty ones.


Oleg S


icanhazbroccoli@github 0 comments

USB::TMC Perl interface to USBTMC Test&Measurement backend

Based on USB::LibUSB.

Does not yet support the additional usb488_subclass.

amba@github 0 comments

Parallel::Regex::PCRE Apply regexes to buffer via pthread pool

The proposed module would provide a convenient interface to a pool of pthreads for parallelized regular expression matching.

Using the class:

After initialization, every time the pool was given an input string via "match", each of the workers in the pool would try to apply disjoint subsets of regular expressions to the string. The caller would block until all of the regular expressions had been applied, and receive a count of total matches as a return value.

Any matches could then be accessed through an iterator method "next_match", which would return a numeric regex id (corresponding to the position of the regex in "regex_list") and any text captured by it.

Use cases

The suitability of this module to a given application depends entirely on whether the overhead of setting up the pool and traversing matches exceeds the performance benefit of applying regular expressions to the input in parallel.

The first expected use-case for this module is an email spam filter, which might apply thousands or tens of thousands of regular expressions to each email document, few of which are expected to match at all.

This use-case, where the number of match() calls is huge, each input is large, the number of regular expressions is large, and the number of matches is small, is on the optimal end of the cost/benefit spectrum. Small numbers of regular expressions being applied to small strings would be on the other end, and I expect would be better served just using ordinary perl regular expressions serially. I look forward to testing these expectations against real data.

Thoughts on implementation:

The guts of the module would exist mostly as XS. The initial implementation would use PCRE simply to keep C development time short. Other implementations with the same interface might be provided as Parallel::Regex::* if PCRE proves unsatisfactory.

new() would initialize:

  • A pool of pthread worker threads,

  • A set of compiled PCRE regular expressions,

  • A mapping of those workers to those compiled regexes,

  • A pipe,

  • Work mutexes on which worker threads are blocked/unblocked,

  • An input pointer,

  • An array of output pointers, one per worker,

  • An active worker counter,

  • A shared state mutex for protecting the worker counter and output counters


The match method would simply:

  • Reset the iterator state,

  • Set the active worker counter to the pool size,

  • Update the input pointer to point at the buffer,

  • Unlock the worker mutexes,

  • Read a match count from the pipe (blocking until available),

  • Return match count to caller


The worker threads would remain blocked on their work mutex until woken by the match method, then:

  • Apply each of their regexes to the input buffer (accessed via input pointer),

  • Write match data to their own local memory buffer (resizing the buffer and copying forward as needed) as matches are found,

  • When done with all regexes, acquire the shared state mutex,

  • Update its output pointer to point at its local memory buffer,

  • Update the total hit count,

  • Decrement the active worker counter,

  • Release the shared state mutex,

  • If active worker counter is zero (it is last thread to finish), write total hit count to pipe,

  • Lock its own worker mutex to put itself to sleep until the next time match method is called.


The iterator is straightforward, needs only maintain two integers for state:

  • An index into the output pointer array,

  • An index into the next record in the corresponding worker's local output buffer.


The workers' local output buffer would contain a series of variable-length records, starting with the regex id (or 0xFFFF for the sentry record) and a count of matches, and for each match a count of groups (which can be zero) followed by length-prefixed group text strings.

Anticipated follow-up work:

I'm not going to spend any time on premature optimization until I must, but I expect some regexes to take more time to run than others, and unless this is reflected in the distribution of regexes to worker threads, there will be one worker thread (the slowest) limiting the performance of the entire system. Since the caller does not unblock until after all of the threads are done, the slowest thread determines time blocked.

A couple of solutions occur to me:

  • I could push the burden of figuring it out onto the user, and have them provide weights with their regular expressions. That would be simplest for me, but more complex for the user.

  • Alternatively I could provide a way to switch the pool between "training" and "working" modes, having them measure the time each regex takes to complete during "training" mode and rewriting the regex -> worker mapping appropriately on the transition to "working" mode. Thus the user could train the pool with a few rounds of input data (at the cost of some performance), let it reshuffle the mapping, and then process the rest of the data at full speed. That would be simplest for the user, but more complex for me.

The distinction might not matter that much, as I expect to be the only user, at least for a while.


Also, I expect to write a Parallel::Regex::Reference module which is actually a serial implementation written in pure-perl. Its purpose would be twofold:

  • To compare against the performance of Parallel::Regex::PCRE, so that use of the module can be justified (or not!),

  • To provide a fallback solution, so that if anyone encounters problems with Parallel::Regex::PCRE they can simply switch to the other module without otherwise changing their code.

ttkciar@github 1 comment

Web::HackerNews Scrape the HTML of Hackernews

Given a Hacker News page, scrape the HTML to extract the contents. For example, get the title and the "hide" URL, etc., so that one can automatically match the titles against a regular expression then "hide" stories about Elon Musk, James Damore, react.js, Google memos, or other tedious things and people.

This is an HTML scraper and not related to WebService::HackerNews by Neil Bower. Note that Hacker News uses tables and "center" tags for layout, with no particular logical subdivision.

benkasminbullock@github 2 comments