PrePAN

Sign in to PrePAN

PrePAN provides a place
to discuss your modules.

CLOSE

Requests for Reviews Feed

CrawlerCommons::RobotRulesParser Perl implementation of Google crawler-commons RobotRulesParser

This module is a fairly close reproduction of the Crawler-Commons SimpleRobotRulesParser http://crawler-commons.github.io/crawlercommons/0.7/crawlercommons/robots/SimpleRobotRulesParser.html

From BaseRobotsParser javadoc:

Parse the robots.txt file in content, and return rules appropriate
for processing paths by userAgent. Note that multiple agent names
may be provided as comma-separated values; the order of these shouldn't
matter, as the file is parsed in order, and each agent name found in the
file will be compared to every agent name found in robotNames.
Also note that names are lower-cased before comparison, and that any
robot name you pass shouldn't contain commas or spaces; if the name has
spaces, it will be split into multiple names, each of which will be
compared against agent names in the robots.txt file. An agent name is
considered a match if it's a prefix match on the provided robot name. For
example, if you pass in "Mozilla Crawlerbot-super 1.0", this would match
"crawlerbot" as the agent name, because of splitting on spaces,
lower-casing, and the prefix match rule.

The method failedFetch is not implemented.

akrobinson74@github 1 comment

Unoconv Use LibreOffice to convert file formats

Note: using my name in the module name was a stop-gap while I find a better name. Suggestions???

   Curley::Unoconv is a Perl extension that allows you to use LibreOffice (or OpenOffice.org) to convert from any spreadsheet format LibreOffice will accept to any format LibreOffice will. You can then do further processing with, e.g. Text::CSV_XS The function only does the conversion if necessary, which is determined by comparing the two files' mtimes.

   It uses Dag Wieers' unoconv, available on most Linux distributions. http://dag.wieers.com/home-made/unoconv/

   Note that the conversion can fail (and your program will die). A few notes:

   ·   The version number unoconv expects must agree with the version of LibreOffice installed. Install unoconv from the same repository you got LibreOffice from. E.g. if you install LibreOffice from debian backports, then install unoconv from debian backports.

   ·   Unoconv may not like it if the running instance of LibreOffice (if any) has a copy of the source file loaded. Save and close it.

   ·   If LibreOffice does not support the given format the whole kazoo will die messily.

EXPORT unoconv ()

SEE ALSO http://dag.wieers.com/home-made/unoconv/

   man 1 unoconv

DEFICIENCIES · There is not a lot of error checking. For example, we do not check to see if we can write to the target file.

   ·   There is no provision for passing other parameters to the unoconv program. Maybe make that an optional parameter?

   ·   How can we gracefully tell if LibreOffice supports the file format the caller wants?

   ·   We have no control over how well our back end converts. Other programs and Perl modules may do a better job for given pairs of source and destination formats.

   ·   OpenOffice::UNO might provide a more direct and flexible interface, and eliminate the need for python.

charlescurley@github 1 comment

Set::SegmentTree Immutable segment trees in perl

wat? Segment Tree

A Segment tree is an immutable tree structure used to efficiently resolve a value to the set of segments which encompass it.

Why?

You have a large set of value intervals (like time segments!) and need to match them against a single value (like a time) efficiently.

This solution is suitable for problems where the set of intervals is known in advance of the queries, and the tree needs to be loaded and queried efficiently many orders of magnitude more often than the set of intervals is updated.

Data structure:

A segment is like this: [ Segment Label, Start Value , End Value ]

Start Value and End Values Must be numeric.

Start Value Must be less than End Value

Segment Label Must occur exactly once

The speed of Set::SegmentTree depends on not being concerned with additional segment relevant data, so it is expected one would use the label as an index into whatever persistence retains additional information about the segment.

Use walkthrough

my @segments = (['A',1,5],['B',2,3],['C',3,8],['D',10,15]);

This defines four intervals which both do and don't overlap. - A - 1 to 5 - B - 2 to 3 - C - 3 to 8 - D - 10 to 15

Doing a find within the resulting tree.

my $tree = Set::SegmentTree::Builder->new(@segments)->build

Would make these tests pass

is_deeply [$tree->find(0)], [];
is_deeply [$tree->find(1)], [qw/A/];
is_deeply [$tree->find(2)], [qw/A B/];
is_deeply [$tree->find(3)], [qw/A B C/];
is_deeply [$tree->find(4)], [qw/A C/];
is_deeply [$tree->find(6)], [qw/C/];
is_deeply [$tree->find(9)], [];
is_deeply [$tree->find(12)], [qw/D/];

And although this structure is relatively expensive to build, it can be saved efficiently,

my $builder = Set::SegmentTree::Builder->new(@segments);
$builder->to_file('filename');

and then loaded and queried extremely quickly, making this. pass in only milliseconds.

my $tree = Set::SegmentTree->from_file('filename');
is_deeply [$tree->find(3)], [qw/A B C/];

This structure is useful in the use case where...

1) value segment intersection is important 1) performance of loading and lookup is critical, but building is not

The Segment Tree data structure allows you to resolve any single value to the list of segments which encompass it in O(log(n)+nk).

DavidIAm@github 5 comments

Net::ZooIt High level recipes for Apache ZooKeeper

DESCRIPTION

Net::ZooIt provides high level recipes for working with ZooKeeper in Perl, like locks, leader election or queues.

Net::ZooKeeper Handles

Net::ZooIt methods always take a Net::ZooKeeper handle object as a parameter and delegate their creation to the user. Rationale: enterprises often have customised ways to create those handles, Net::ZooIt aims to be instantly usable without such customisation.

Automatic Cleanup

Net::ZooIt constructors return a Net::ZooIt object, which automatically clean up their znodes when they go out of scope at the end of the enclosing block. If you want to clean up earlier, call

  $zooit_obj->DESTROY;

Implication: if you call Net::ZooIt constructors in void context, the created object goes out of scope immediately, and your znodes are deleted. Net::ZooIt logs a ZOOIT_ERR message in this case.

Error Handling

Net::ZooIt constructors return nothing in case of errors during creation.

Once you hold a lock or other resource, you're not notified of connection loss errors. If you need to take special action, check your Net::ZooKeeper handle.

If you give up Net::ZooIt resources during connection loss, your znodes cannot be cleaned up immediately, they will enter a garbage collection queue and Net::ZooIt will clean them up once connection is resumed.

Logging

Net::ZooIt logs to STDERR. Log messages are prefixed with Zulu military time, PID and the level of the current message: ZOOIT_DIE ZOOIT_ERR ZOOIT_WARN ZOOIT_INFO ZOOIT_DEBUG.

If Net::ZooIt throws an exception, it prints a ZOOIT_DIE level message before dying. This allows seeing the original error message even if an eval {} block swallows it.

To capture Net::ZooIt log messages to a file instead of STDERR, redirect STDERR to a new file handle in the normal Perl manner:

open(OLDERR, '>&', fileno(STDERR)) or die("unable to dup STDERR: $!");
open(STDERR, '>', $log_file) or die("unable to redirect STDERR: $!");

subogero@github 3 comments

List::Flat Another module to flatten an arrayref or list that may contain arrayrefs

I was unhappy with the several modules on CPAN I could find that do the relatively simple task of flattening a deep structure of array references into a single flat list, so I wrote another one.

Or rather, another two: one that handles circular references (flat) and one that doesn't (flatx). I suspect flatx is a terrible name but I couldn't think of one that wasn't something like "flat_unsafe" and that didn't encourage its use indiscriminately.

Thoughts welcome. Thanks.

aaronpriven@github 5 comments

English::Control Like English.pm, but with ${^OFS} instead of $OFS

So I had this idea that English.pm would be better if, instead of storing its variables in each package (potentially clobbering other variables if somebody has forgotten that, for example, $LIST_SEPARATOR or $WARNING or $NR are special), it would be better if it stored its variables as control-character variables, like ${^LIST_SEPARATOR} or ${^NR}. These are reserved to perl, normally, and are forced to be in package "main" so they only need to be set up once (not imported for each module).

Anyway, so I wrote this module to make that happen. (I say "wrote", but mostly what I did, other than typing {^ and } a lot, was delete a bunch of stuff.)

Thoughts?

aaronpriven@github 4 comments

Carton::Include A module to automatically include the nearest local/ dir

When I want to execute script that uses a cpanfile & Carton, I have to type "carton exec ./script.pl" in order to execute it (in order to include the nearest local/ directory in @INC). By creating Carton::Include, and using this module inside my scripts, I can have them automatically search for the nearest local/ directory up the tree and "use lib" that, so I can then execute my script by typing only ./script.pl

akarelas@github 0 comments

Catalyst::Authentication::Credential::JWT authentication to a Catalyst app via JSON Web Token

This authentication credential checker tries to read a JSON Web Token (JWT) from the current request, verifies its signature and looks up the user in the configured authentication store.

It provides support for authentication/authorization via JWT to Catalyst.

gerhardj@github 0 comments

Verilog::VCD::Writer creates VCD waveform files

VCD(Value Change Dump) is the default way of recording waveform information for a HDL(Verilog/VHDL/SystemC/SystemVerilog) simulation.

This module provides an implementation of a VCD Writer.

The module originally started as a quick and dirty perl script to convert the CSV file generated by a logic analyzer into a VCD file.

For the release on CPAN the original code has been heavily modified and cleanedup so as to meet other waveform generation needs.

jahagirdar@github 3 comments

Confluence::REST Thin wrapper around Confluence's REST API

A very thin wrapper around the Confluence REST API. Lets you pass in CQL (Confluence Query Language).

This code is basically JIRA::REST with some tweaks to get it to work with the Confluence REST API.

I'm already using this for some stuff at work, so I can vouch for the basic functionality.

rmloveland@github 0 comments