PrePAN

Sign in to PrePAN

PrePAN provides a place
to discuss your modules.

CLOSE

Requests for Reviews Feed

Sort::Naturally::ICU Perl extension for human-friendly ("natural") sort order, which using ICU library for locale aware sorting.

DESCRIPTION

See: http://search.cpan.org/~serval/Sort-Naturally-XS-0.7.8/lib/Sort/Naturally/XS.pm#DESCRIPTION

LOCALE AWARE SORTING

The following example demonstrates default way to locale aware sorting:

use POSIX;

my @list = ('a.'.'c', 'A'..'B');

setlocale(POSIX::LC_ALL, 'en_US.utf8');
my @result = sort @list;
# @result contains  a, A, b, B, c, C

setlocale(POSIX::LC_ALL, 'en_CA.utf8');
@result = sort @list;
# @result contains  A, a, B, b, C, c

The problem is that not all Unix-like OSs completely support POSIX, in fact only Linux fully POSIX compatible. Therefore you can't use above approach in Mac OS or FreeBSD. This module is designed to solve this issue.

To be able to sort a list with an arbitrary locale at any platform it's necessary to use the sorted function with a locale keyword argument. locale should be LDML locale identifier:

use Sort::Naturally::ICU qw/sorted/;

my $list = ['a.'.'c', 'A'..'B'];

my $result_us = sorted($list, locale => 'en-US-u-va-posix');
# $result_us contains A, B, C, a, b, c

my $result_ca = sorted($list, locale => 'en-CA-u-va-posix');
# $result_ca contains a, A, b, B, c, C

the_serval@twitter 0 comments

Plack::App::URLMux PSGI/Plack component for dispatching multiple applications based on URL path and patterns

The module is designed for multiplexing applications by URL path and patterns. This module is based on Plack::App::URLMap, but it optimized to dispatching a lot of URLs without performance regression. Additionally this module provide the ability to assign routing rules using URL patterns with named parameters. The format of parameter consist of ':' followed by parameter name and quantifier which is the same as for regexp expressions.

The idea was in that URL in RFC1738 for HTTP defined as hierarchical resource locator with '/' as delimeter and the package splits url-path on this parts, searches the application by them in mapped search tree and fills the matched named patters. This module defines value of a named parameter as one o more parts of the url-path (hsegment in RFC terms) in array context and returns list of matched parameters in PSGI/Plack environment 'plack.urlmux,params.url' as array of name and value pairs.

The namespace for Plack additional modules can be used for components as Plack::App and for middleware as Plack::Middleware, so I have chosen the Plack::App::URLMux name. I send a email to author of Plack modules about naming the module and did not receive feedback, so I decide to write about the package here.

I don`t like the idea to implement a URL multiplexer as regexp expression URL parser as in (Python dispatcher)[https://docs.djangoproject.com/en/1.11/topics/http/urls/#example] for example, due its performance issues for a lot of URL rules. This module solves the performance problem by simplify the regexp expressions to named patterns and operates more larger constructions - url-path segments.

Any suggestions, comments and criticism are welcome.

AlexBurnes@github 0 comments

Convert::Maker Make a conversion generator for fixed keys

A conversion generator which outputs C code. The C code output converts hash keys to values in the style of Data::Munge's list2re or lex. Construct automata to translate a fixed list of inputs into a certain set of outcomes.

The projected use of this is in creating very fast converters for translating symbol tables into other symbols. For example, to convert "ASCII IPA" into Unicode symbols very rapidly, to convert Chinese characters into Pinyin, etc.

benkasminbullock@github 0 comments

Code::Search Search code for strings

Search source code for strings.

Trigram index of medium number (40,000 or so) source code files.

Already have developed code but it is not in a state fit to release, and it is all in C except for a few scripts.

benkasminbullock@github 4 comments

Hash::Wrap (formerly Return::Object) create lightweight on-the-fly objects from hashes

UPDATE

This module is now known as Hash::Wrap

The text below represents the original module, Return::Object. Please see the github repo for the current documentation.

DESCRIPTION

This module provides routines which encapsulate a hash as an object. The object provides methods for keys in the hash; attempting to access a non-existent key via a method will cause an exception.

The impetus for this was to encapsulate data returned from a subroutine or method (hence the name). Returning a bare hash can lead to bugs if there are typos in hash key names when accessing the hash.

It is not necessary for the hash to be fully populated when the object is created. The underlying hash may be manipulated directly, and changes will be reflected in the object's methods. To prevent this, consider using the lock routines in Hash::Util on the object after creation.

Only hash keys which are legal method names will be accessible via object methods.

Object construction and constructor customization

By default Object::Return exports a return_object constructor which, given a hashref, blesses it directly into the Return::Object::Class class.

The constructor may be customized to change which class the object is instantiated from, and how it is constructed from the data. Return::Object uses Exporter::Tiny to perform the customization. For example,

use Return::Object
  return_object => { -as => 'return_cloned_object',
                     -clone => 1 };

will create a version of return_object which clones the passed hash and is imported as return_cloned_object. To import it under the original name, return_object, leave out the -as option.

The following options are available to customize the constructor.

  • -as => subroutine name

    This is optional, and imports the customized version of return_object with the given name.

  • -class => class name

    The object will be blessed into the specified class. If the class should be created on the fly, specify the -create option. See "Object Classes" for what is expected of the object classes. This defaults to Object::Return::Class.

  • -create => boolean

    If true, and -class is specified, a class with the given name will be created.

  • -copy => boolean

    If true, the object will store the data in a shallow copy of the hash. By default, the object uses the hash directly.

  • -clone => boolean

    If true, the object will store the data in a deep copy of the hash, made with "dclone" in Storeable. By default, the object uses the hash directly.

Object Classes

An object class has the following properties:

  • The class must be a subclass of Return::Object::Base.
  • The class typically does not provide any methods, as they would mask a hash key of the same name.
  • The class need not have a constructor. If it does, it is passed a hashref which it should bless as the actual object. For example:

    package My::Result;
    use parent 'Return::Object::Base';
    
    
    sub new {
      my  ( $class, $hash ) = @_;
      return bless $hash, $class;
    }
    

    This excludes having a hash key named new.

Return::Object::Base provides an empty DESTROY method, a can method, and an AUTOLOAD method. They will mask hash keys with the same names.

BUGS AND LIMITATIONS

You can make new bug reports, and view existing ones, through the web interface at https://rt.cpan.org/Public/Dist/Display.html?Name=Return-Object.

SEE ALSO

Please see those modules/websites for more information related to this module.

AUTHOR

Diab Jerius

COPYRIGHT AND LICENSE

This software is Copyright (c) 2017 by Smithsonian Astrophysical Observatory.

This is free software, licensed under:

The GNU General Public License, Version 3, June 2007

djerius@github 3 comments

Types::PDL PDL types using Type::Tiny

DESCRIPTION

This module provides Type::Tiny compatible types for PDL.

Types

Piddle

Allows an object blessed into the class PDL, e.g.

validate( [pdl], Piddle );

Piddle accepts the following parameters ("Parameters"):

empty
ndims
ndims_min
ndims_max

Parameters

Some types take optional parameters which add additional constraints on the object. For example, to indicate that only empty piddles are accepted:

validate( [pdl], Piddle[ empty => 1 ] );

The available parameters are:

  • empty

    This accepts a boolean value; if true the piddle must be empty (i.e. the isempty method returns true), if false, it must not be empty.

  • null

    This accepts a boolean value; if true the piddle must be a null piddle, if false, it must not be null.

  • ndims

    This specifies a fixed number of dimensions which the piddle must have. Don't mix use this with ndims_min or ndims_max.

  • ndims_min

    The minimum number of dimensions the piddle may have. Don't specify this with ndims.

  • ndims_max

    The maximum number of dimensions the piddle may have. Don't specify this with ndims.

djerius@github 0 comments

Type::TinyX::Facets Easily create a facet parameterized Type::Tiny type

DESCRIPTION

Type::TinyX::Facets make it easy to create parameterized types with facets.

Type::Tiny allows definition of types which can accept parameters:

use Types::Standard -types;

my $t1 = Array[Int];
my $t2 = Tuple[Int, HashRef];

This defines $t1 as an array of integers. and $t2 as a tuple of two elements, an integer and a hash.

Parameters are passed as a list to the parameterized constraint generation machinery, and there is great freedom in how they may be interpreted.

This module makes it easy to create a parameterized type which takes name - value pairs or,facets. (The terminology is taken from Types::XSD::Lite, upon which this module owes its existence.)

THANKS

djerius@github 0 comments

Sql::Textify Run a SQL query and get the result in text format (markdown, html, csv)

This module executes SQL queries and produces a text output (markdown, html, csv, ...). Connection details, username and password can be specified in a C-style comment inside the SQL query or using the constructor:

my $t = new Sql::Textify(
    conn => 'dbi:SQLite:dbname=test.sqlite3',
    username => 'myusername',
    password => 'mypassword',
    format => 'markdown',
    layout => 'table'
);
my $text = $t->textify('select * from gardens');

I couldn't find any module that converts data to markdown so I have written one. I am using a script that makes use of this code everyday because I want do run SQL queries on various dbms at the command line (or using sublime text) and I like to get the result in markdown format or in other text formats.

fthiella@github 1 comment

Vlc::Engine vlc engine

Vlc::Engine is a perl module that allow users to use libvlc via perl.

jamesroseaxl@twitter 0 comments

Data::Enumerable::Lazy A lazy enumerator/generator optimised for non-flat collections

This library is yet another implementation of lazy generator + enumerable pattern for Perl5. First of all, it is not a parallel calculation framework. For the parallelism problems, Generator::Object would be the way to go.

This library provides building blocks for lazy data manipulation. One of the key features of this library is that it abstracts the end users away from a multi-level nested sub-collection enumeration whereas a classical enumerator normally operates within a flat collection context. Think of it this way: this library provides a built-in functionality to resolve enumerable steps in micro-batches, and these micro-batches might be enumerables as well and so on and so forth. But it definitely does not force the micro-batched way.

A quick example: let's say there is a task: read multiple plain text files word-by-word. This task contains several nesting enumerable loops:

-> file
  -> line
    -> word (we assume there is no word wrap, but it is not a problem for our model)

Implemented in imperative style, the program would look like:

foreach my $file (@files) {
  foreach my $line ($file->read_lines) {
    foreach my $word ($line->split_words) {
      # do something
    }
  }
}

Let's say there is one more level of complexity: multi-partition setup:

foreach my $partition (@partitions) {
  foreach my $file ($partition->ls_files) {
    ...
  }
}

We have to implement the loops over and over again, exposing the internal knowledge about the partitions, files, lines etc. But we can do it better. Let's examine a lazy enumerable approach:

my $word_enum = Data::Enumerable::Lazy::from_list(@partitions)
  -> continue({ on_has_next => sub { my ($self, $partition) = @_; $self->yield($partition->ls_files) } })
  -> continue({ on_has_next => sub { my ($self, $file)      = @_; $self->yield($file->read_lines   ) } })
  -> continue({ on_has_next => sub { my ($self, $line)      = @_; $self->yield($line->split_words  ) } });

while ($word_enum->has_next) {
  my $word = $word_enum->next;
  # do something
}

The benefit is that the end user might have zero knowledge about partitions, files, lines, etc. Adding a new nesting loop for multi-computer fetch? Easy.

The end user is focused on consuming the words, not intermediate stages. A wrapper library might hide the implementation details and return the enumerable back to the end user.

Enumerables are single-pass calculation units. What it means: an enumerable is stateful, once it reached the end of the sequence, it will not rewind to the beginning.

Enumerables have internal buffer: another enumerable which preserves the pre-fetched collection. The fig. above illustrates the buffering algorithm.

[enumerable.has_next] -> [_buffer.has_next] -> yes -> return true
                                              -> no -> result = [enumerable.on_has_next] -> return result

[enumerable.next] -> [_buffer.has_next] -> yes -> return [_buffer.next]
                                          -> no -> result = [enumerable.next] -> [enumerable.set_buffer(result)] -> return _buffer.next

EXAMPLES

A basic range

This example implements a range generator from $from until $to. In order to generate this range we define 2 callbacks: on_has_next() and on_next(). The first one is used as point of truth whether the sequence has any more non-visited elements, and the 2nd one is to return the next element in the sequence and the one that changes the state of the internal sequence iterator.

sub basic_range {
  my ($from, $to) = @_;
  my $current = $from;
  Data::Enumerable::Lazy->new({
    on_has_next => sub {
      return $current  sub {
      my ($self) = @_;
      return $self->yield($current++);
    },
  });
}

on_has_next() makes sure the current value does not exceed $to value, and on_next() yields the next value of the sequence. Note the yield method. An enumerable developer is expected to use this method in order to return the next step value. This method does some internal bookkeeping and smart caching.

Usage:

# We initialize a new range generator from 0 to 10 including.
my $range = basic_range(0, 10);
# We check if the sequence has elements in it's tail.
while ($range->has_next) {
  # In this very line the state of $range is being changed
  say $range->next;
}
is $range->has_next, 0, '$range has been iterated completely'
is $range->next, undef, 'A fully iterated sequence returns undef on next()'

Prime numbers

Prime numbers is an infinite sequence of natural numbers. This example implements a very basic prime number generator.

my $prime_num_stream = Data::Enumerable::Lazy->new({
  # This is an infinite sequence
  on_has_next => sub { 1 },
  on_next => sub {
    my $self = shift;
    # We save the result of the previous step
    my $next = $self->{_prev_} // 1;
    LOOKUP: while (1) {
      $next++;
      # Check all numbers from 2 to sqrt(N)
      foreach (2..floor(sqrt($next))) {
        ($next % $_ == 0) and next LOOKUP;
      }
      last LOOKUP;
    }
    # Save the result in order to use it in the next step
    $self->{_prev_} = $next;
    # Return the result
    $self->yield($next);
  },
});

What's remarkable regarding this specific example is that one can not simply call to_list() in order to get all elements of the sequence. The enumerable will throw an exception claiming it's an infinitive sequence. Therefore, we should use next() in order to get elements one by one or use another handy method take() which returns first N results.

Nested enumerables

In this example we will output a numbers of a multiplication table 10x10. What's interesting in this example is that there are 2 sequences: primary and secondary. Primary on_next() returns secondary sequence, which generates the result of multiplication of 2 numbers.

# A new stream based on a range from 1 to 10
my $mult_table = Data::Enumerable::Lazy->from_list(1..10)->continue({
  on_has_next => sub {
    my ($self, $i) = @_;
    # The primary stream returns another sequence, based on range
    $self->yield(Data::Enumerable::Lazy->from_list(1..10)->continue({
      on_next => sub {
        # $_[0] is a substream self
        # $_[1] is a next substream sequence element
        $_[0]->yield( $_[1] * $i )
      },
    }));
  },
});

Another feature which is demonstrated here is the batched result generation. Let's iterate the sequence step by step and see what happens inside.

$mult_table->has_next;         # returns true based on the primary range, _buffer is
                               # empty
$mult_table->next;             # returns 1, the secondary sequence is now stored as
                               # the primary enumerable buffer and 1 is being served
                               # from this buffer
$mult_table->has_next;         # returns true, resolved by the state of the buffer
$mult_table->next;             # returns 2, moves buffer iterator forward, the
                               # primary sequence on_next() is _not_ being called
                               # this time
$mult_table->next for (3..10); # The last iteration completes the buffer
                               # iteration cycle
$mult_table->has_next;         # returns true, but now it calls the primary
                               # on_has_next()
$mult_table->next;             # returns 2 as the first element in the next
                               # secondary sequence (which is 1 again) multiplied by
                               # the 2nd element of the primary sequence (which is 2)
$mult_table->to_list;          # Generates the tail of the sesquence:
                               # [4, 6, ..., 80, 90, 100]
$mult_table->has_next;         # returns false as the buffer is empty now and the
                               # primary sequence on_has_next() says there is nothing
                               # more to iterate over.

OPTIONS

on_next($self, $element) :: CodeRef -> Data::Enumerable::Lazy | Any

on_next is a code ref, a callback which is being called every time the generator is in demand for a new bit of data. Enumerable buffers up the result of the previous calculation and if there are no more elements left in the buffer, on_next() would be called.

$element is defined when the current collection is a contuniation of another enumerable. I.e.:

my $enum = Data::Enumerable::Lazy->from_list(1, 2, 3);
my $enum2 = $enum->continue({
    on_next => sub {
        my ($self, $i) = @_;
        $self->yield($i * $i)
    }
});
$enum2->to_list; # generates 1, 4, 9

In this case $i would be defined and it comes from the original enumerable.

The function is supposed to return an enumerable, in this case it would be kept as the buffer object. If this function method returns any other value, it would be wrapped in a Data::Enumerable::Lazy->singular(). There is a way to prevent an enumerable from wrapping your return value in an enum and keeping it in a raw state by providing _no_wrap=1.

on_has_next($self) :: CodeRef -> Bool

on_has_next is a code ref, a callback to be called whenever the enumerable is about to resolve has_next() method call. Similar to on_next() call, this one is also triggered whenever an enumerable runs out of buffered elements. The function should return boolean.

A method that returns 1 all the time is the way to initialise an infinite enumerable (see infinity()). If it returns 0 no matter what, it would be an empty enumerable (see empty()). Normally you stay somewhere in the middle and implement some state check logic in there.

on_reset($self) :: CodeRef -> void

This is a callback to be called in order to reset the state of the enumerable. This callback should be defined in the same scope as the enumerable itself. The library provides nothing magical but a callback and a handle to call it, so the state cleanup is completely on the developer's side.

is_finite :: Bool

A boolean flag indicating whether an enumerable is finite or not. By default, enumerables are treated as infinite, which means some functions will throw an exception, like: to_list() or resolve().

Make sure to not mark an enumerable as finite and to call finite-size defined methods, in this case it will create an infinite loop on the resolution.

INSTANCE METHODS

next()

Function next() is the primary interface for accessing elements of an enumerable. It will do some internal checks and if there is no elements to be served from an intermediate buffer, it will resolve the next step by calling on_next() callback. Enumerables are composable: one enumerable might be based on another enumeration. E.g.: a sequence of natural number squares is based on the sequence of natural numbers themselves. In other words, a sequence is defined as a tuple of another sequence and a function which would be lazily applied to every element of this sequence.

next() accepts 0 or more arguments, which would be passed to on_next() callback.

next() is expected to do the heavy-lifting job in opposite to has_next(), which is supposed to be cheap and fast. This statement flips upside down whenever grep() is applied to a stream. See grep() for more details.

has_next()

has_next() is the primary entry point to get an information about the state of an enumerable. If the method returned false, there are no more elements to be consumed. I.e. the sequence has been iterated completely. Normally it means the end of an iteration cycle.

Enumerables use internal buffers in order to support batched on_next() resolutions. If there are some elements left in the buffer, on_next() won't call on_has_next() callback immediately. If the buffer has been iterated completely, on_has_next() would be called.

on_next() should be fast on resolving the state of an enumerable as it's going to be used for a condition state check.

reset()

This method is a generic entry point for a enum reset. In fact, it is basically a wrapper around user-defined on_reset().

to_list()

This function transforms a lazy enumerable to a list. Only finite enumerables can be transformed to a list, so the method checks if an enumerable is created with is_finite=1 flag. An exception would be thrown otherwise.

map($callback)

Creates a new enumerable by applying a user-defined function to the original enumerable. Works the same way as perl map {} function but it's lazy.

reduce($acc, $callback)

Resolves the enumerable and returns the resulting state of the accumulator $acc provided as the 1st argument. $callback should always return the new state of $acc.

reduce() is defined for finite enumerables only.

grep($callback, $max_lookahead)

grep() is a function which returns a new enumerable by applying a user-defined filter function.

grep() might be applied to both finite and infinite enumerables. In case of an infinitive enumerable there is an additional argument specifying max number of lookahead steps. If an element satisfying the condition could not be found in max_lookahead steps, an enumerable is considered to be completely iterated and has_next() will return false.

grep() returns a new enumerable with quite special properties: has_next() will perform a look ahead and call the original enumerable next() method in order to find an element for which the user-defined function will return true. next(), on the other side, returns the value that was pre-fetched by has_next().

resolve()

Resolves an enumerable completely. Applicable for finite enumerables only. The method returns nothing.

take($N_elements)

Resolves first $N_elements and returns the resulting list. If there are fewer than N elements in the enumerable, the entire enumerable would be returned as a list.

take_while($callback)

This function takes elements until it meets the first one that does not satisfy the conditional callback. The callback receives only 1 argument: an element. It should return true if the element should be taken. Once it returned false, the stream is over.

continue($ext = %{ on_next => sub {}, ... })

Creates a new enumerable by extending the existing one. on_next is the only mandatory argument. on_has_next might be overridden if some custom logic comes into play.

is_finite is inherited from the parent enumerable by default. All additional attributes would be transparently passed to the constructor.

yield($result)

This method is supposed to be called from on_next callback only. This is the only valid way for an Enumerable to return the next result. Effectively, it ensures the returned result conforms to the required interface and is wrapped in a lazy wrapper if needed.

CLASS METHODS

empty()

Returns an empty enumerable. Effectively it means an equivalent of an empty array. has_next() will return false and next() will return undef. Useful whenever a on_next() step wants to return an empty result set.

singular($val)

Returns an enumerable with a single element $val. Actively used as an internal data container.

from_list(@list)

Returns a new enumerable instantiated from a list. The easiest way to initialise an enumerable. In fact, all elements are already resolved so this method sets is_finite=1 by default.

cycle()

Creates an infinitive enumerable by cycling the original list. E.g. if the original list is [1, 2, 3], cycle() will generate an infinitive sequence like: 1, 2, 3, 1, 2, 3, 1, ...

infinity()

Returns a new infinite enumerable. has_next() always returns true whereas next() returns undef all the time. Useful as an extension basis for infinite sequences.

merge($tream1 [, $tream2 [, $tream3 [, ...]]])

This function merges one or more streams together by fan-outing next() method call among the non-empty streams. Returns a new enumerable instance, which: * Has next elements as far as at least one of the streams does. * Returns next element py picking it one-by-one from the streams. * Is finite if and only if all the streams are finite. If one of the streams is over, it would be taken into account and next() will continue choosing from non-empty ones.

AUTHOR

Oleg S

SEE ALSO

icanhazbroccoli@github 0 comments