PrePAN

Sign in to PrePAN

PrePAN provides a place
to discuss your modules.

CLOSE

Requests for Reviews Feed

Web::HackerNews Scrape the HTML of Hackernews

Given a Hacker News page, scrape the HTML to extract the contents. For example, get the title and the "hide" URL, etc., so that one can automatically match the titles against a regular expression then "hide" stories about Elon Musk, James Damore, react.js, Google memos, or other tedious things and people.

This is an HTML scraper and not related to WebService::HackerNews by Neil Bower. Note that Hacker News uses tables and "center" tags for layout, with no particular logical subdivision.

benkasminbullock@github 1 comment

XS::Check Check XS for errors, something like Perl Critic for XS

Something like Perl critic for XS.

benkasminbullock@github 2 comments

LibUSB Perl interface to the libusb-1.0 API

This module provides a Perl interface to the libusb-1.0 API. It provides access to most basic libusb functionality including read-out of device descriptors and synchronous device I/O. So far functionality is tested on Linux, but the objective is to provide the full portability of libusb-1.0 (Windows, BSDs, OSX,...).

The module has a two-tier design:

  • LibUSB::XS

Raw XS interface, stay as close at possible to the libusb API. Not intended to be used directly.

  • LibUSB

Based on LibUSB::XS, adds convenient error handling and additional high-level functionality (e.g. device discovery with vid, pid and serial number). Easy to build more functionality without knowing about XS.

amba@github 4 comments

Crypt::File::Valet Convenient encrypted I/O

I am the author of File::Valet (https://metacpan.org/pod/File::Valet) and have need for a module with a similarly convenient way to perform I/O on encrypted files, so I'm writing one. The synopsis shows what I'd like it to look like, but it's not set in stone. The method and function names are chosen to be similar to those of File::Valet, but with "x" used instead of "f" to denote "encrypted" vs "file".

The module would use a caller-provided digest instance, hashed password string, and salt string to encrypt/decrypt the contents of files via a CTR cipher, with random padding before and after the file content, and a convenient way to harden predictable plaintext (via mix method).

When I came to PrePAN, the question I had in mind was "should this be named File::Crypt::Valet or Crypt::File::Valet?" but general comments and suggestions about the proposed module would be welcome as well.

ttkciar@github 2 comments

Bifcode Bifcode serialization format

STATUS

This module and related encoding format are still under development. Do not use it anywhere near production. Input is welcome.

DESCRIPTION

Bifcode implements the bifcode serialisation format, a mixed binary/text encoding with support for the following data types:

  • Primitive:
    • Undefined(null)
    • Booleans(true/false)
    • Integer numbers
    • Floating point numbers
    • UTF8 strings
    • Binary strings
  • Structured:
    • Arrays(lists)
    • Hashes(dictionaries)

The encoding is simple to construct and relatively easy to parse. There is no need to escape special characters in strings. It is not considered human readable, but as it is mostly text it can usually be visually debugged.

Bifcode can only be constructed canonically; i.e. there is only one possible encoding per data structure. This property makes it suitable for comparing structures (using cryptographic hashes) across networks.

In terms of size the encoding is similar to minified JSON. In terms of speed this module compares well with other pure Perl encoding modules with the same features.

MOTIVATION & GOALS

Bifcode was created for a project because none of currently available serialization formats (Bencode, JSON, MsgPack, Sereal, YAML, etc) met the requirements of:

  • Support for undef
  • Support for UTF8 strings
  • Support for binary data
  • Trivial to construct on the fly from within SQLite triggers
  • Universally-recognized canonical form for hashing

There no lofty goals or intentions to promote this outside of my specific case. Use it or not, as you please, based on your own requirements. Constructive discussion is welcome.

SPECIFICATION

The encoding is defined as follows:

BIFCODE_UNDEF

A null or undefined value correspond to '~'.

BIFCODE_TRUE and BIFCODE_FALSE

Boolean values are represented by '1' and '0'.

BIFCODE_UTF8

A UTF8 string is 'U' followed by the octet length of the decoded string as a base ten number followed by a colon and the decoded string. For example "\x{df}" corresponds to "U2:\x{c3}\x{9f}".

BIFCODE_BYTES

Opaque data is 'B' followed by the octet length of the data as a base ten number followed by a colon and then the data itself. For example a three-byte blob 'xyz' corresponds to 'B3:xyz'.

BIFCODE_INTEGER

Integers are represented by an 'I' followed by the number in base 10 followed by a ','. For example 'I3,' corresponds to 3 and 'I-3,' corresponds to -3. Integers have no size limitation. 'I-0,' is invalid. All encodings with a leading zero, such as 'I03,', are invalid, other than 'I0,', which of course corresponds to 0.

BIFCODE_FLOAT

Floats are represented by an 'F' followed by a decimal number in base 10 followed by a 'e' followed by an exponent followed by a ','. For example 'F3.0e-1,' corresponds to 0.3 and 'F-0.1e0,' corresponds to -0.1. Floats have no size limitation. 'F-0.0,' is invalid. All encodings with an extraneous leading zero, such as 'F03.0e0,', are invalid.

BIFCODE_LIST

Lists are encoded as a '[' followed by their elements (also bifcode encoded) followed by a ']'. For example '[U4:spamU4:eggs]' corresponds to ['spam', 'eggs'].

BIFCODE_DICT

Dictionaries are encoded as a '{' followed by a list of alternating keys and their corresponding values followed by a '}'. For example, '{U3:cowU3:mooU4:spamU4:eggs}' corresponds to {'cow': 'moo', 'spam': 'eggs'} and '{U4:spam[U1:aU1:b]}' corresponds to {'spam': ['a', 'b']}. Keys must be BIFCODE_UTF8 or BIFCODE_BYTES and appear in sorted order (sorted as raw strings, not alphanumerics).

INTERFACE

encode_bifcode( $datastructure )

Takes a single argument which may be a scalar, or may be a reference to either a scalar, an array or a hash. Arrays and hashes may in turn contain values of these same types. Returns a byte string.

The mapping from Perl to bifcode is as follows:

  • 'undef' maps directly to BIFCODE_UNDEF.
  • The global package variables $Bifcode::TRUE and $Bifcode::FALSE encode to BIFCODE_TRUE and BIFCODE_FALSE.
  • Plain scalars that look like canonically represented integers will be serialised as BIFCODE_INTEGER. Otherwise they are treated as BIFCODE_UTF8.
  • SCALAR references become BIFCODE_BYTES.
  • ARRAY references become BIFCODE_LIST.
  • HASH references become BIFCODE_DICT.

You can force scalars to be encoded a particular way by passing a reference to them blessed as Bifcode::BYTES, Bifcode::INTEGER or Bifcode::UTF8. The force_bifcode function below can help with creating such references.

This subroutine croaks on unhandled data types.

decode_bifcode( $string [, $max_depth ] )

Takes a byte string and returns the corresponding deserialised data structure.

If you pass an integer for the second option, it will croak when attempting to parse dictionaries nested deeper than this level, to prevent DoS attacks using maliciously crafted input.

bifcode types are mapped back to Perl in the reverse way to the encode_bifcode function, with the exception that any scalars which were "forced" to a particular type (using blessed references) will decode as unblessed scalars.

Croaks on malformed data.

force_bifcode( $scalar, $type )

Returns a reference to $scalar blessed as Bifcode::$TYPE. The value of $type is not checked, but the encode_bifcode function will only accept the resulting reference where $type is one of 'bytes', 'integer', or 'utf8'.

DIAGNOSTICS

  • trailing garbage at %s

    Your data does not end after the first encode_bifcode-serialised item.

    You may also get this error if a malformed item follows.

  • garbage at %s

    Your data is malformed.

  • unexpected end of data at %s

    Your data is truncated.

  • unexpected end of string data starting at %s

    Your data includes a string declared to be longer than the available data.

  • malformed string length at %s

    Your data contained a string with negative length or a length with leading zeroes.

  • malformed integer data at %s

    Your data contained something that was supposed to be an integer but didn't make sense.

  • dict key not in sort order at %s

    Your data violates the encode_bifcode format constaint that dict keys must appear in lexical sort order.

  • duplicate dict key at %s

    Your data violates the encode_bifcode format constaint that all dict keys must be unique.

  • dict key is not a string at %s

    Your data violates the encode_bifcode format constaint that all dict keys be strings.

  • dict key is missing value at %s

    Your data contains a dictionary with an odd number of elements.

  • nesting depth exceeded at %s

    Your data contains dicts or lists that are nested deeper than the $max_depth passed to decode_bifcode().

  • unhandled data type

    You are trying to serialise a data structure that consists of data types other than

    • scalars
    • references to arrays
    • references to hashes
    • references to scalars

    The format does not support this.

BUGS AND LIMITATIONS

Strings and numbers are practically indistinguishable in Perl, so encode_bifcode() has to resort to a heuristic to decide how to serialise a scalar. This cannot be fixed.

AUTHOR

Mark Lawrence , heavily based on Bencode by Aristotle Pagaltzis

COPYRIGHT AND LICENSE

This software is copyright (c):

  • 2015 by Aristotle Pagaltzis
  • 2017 by Mark Lawrence.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.

mlawren@github 3 comments

WebService::Dataworld PROPOSAL: A wrapper around the data.world APIs

data.world has JSON APIs for both query and maintenance of large public and non-public datasets. There are some fiddly bits to the API; first thing I noticed is that querying and maintenance happen at different URLs. Additionally, some of the returns are not yet consistent; sometimes you'll get a message and other data, sometimes just a message; there's no "truthy on success" code that would be useful to "try" blocks, etc. I propose a simple wrapper around the complexities, an API object that just takes care of those details for the user, validates inputs, and deals with returns in a way to make them more consistent.

Thoughts?

GeekRuthie@twitter 5 comments

CrawlerCommons::RobotRulesParser Perl implementation of Google crawler-commons RobotRulesParser

This module is a fairly close reproduction of the Crawler-Commons SimpleRobotRulesParser http://crawler-commons.github.io/crawlercommons/0.7/crawlercommons/robots/SimpleRobotRulesParser.html

From BaseRobotsParser javadoc:

Parse the robots.txt file in content, and return rules appropriate
for processing paths by userAgent. Note that multiple agent names
may be provided as comma-separated values; the order of these shouldn't
matter, as the file is parsed in order, and each agent name found in the
file will be compared to every agent name found in robotNames.
Also note that names are lower-cased before comparison, and that any
robot name you pass shouldn't contain commas or spaces; if the name has
spaces, it will be split into multiple names, each of which will be
compared against agent names in the robots.txt file. An agent name is
considered a match if it's a prefix match on the provided robot name. For
example, if you pass in "Mozilla Crawlerbot-super 1.0", this would match
"crawlerbot" as the agent name, because of splitting on spaces,
lower-casing, and the prefix match rule.

The method failedFetch is not implemented.

akrobinson74@github 1 comment

Unoconv Use LibreOffice to convert file formats

Note: using my name in the module name was a stop-gap while I find a better name. Suggestions???

   Curley::Unoconv is a Perl extension that allows you to use LibreOffice (or OpenOffice.org) to convert from any spreadsheet format LibreOffice will accept to any format LibreOffice will. You can then do further processing with, e.g. Text::CSV_XS The function only does the conversion if necessary, which is determined by comparing the two files' mtimes.

   It uses Dag Wieers' unoconv, available on most Linux distributions. http://dag.wieers.com/home-made/unoconv/

   Note that the conversion can fail (and your program will die). A few notes:

   ·   The version number unoconv expects must agree with the version of LibreOffice installed. Install unoconv from the same repository you got LibreOffice from. E.g. if you install LibreOffice from debian backports, then install unoconv from debian backports.

   ·   Unoconv may not like it if the running instance of LibreOffice (if any) has a copy of the source file loaded. Save and close it.

   ·   If LibreOffice does not support the given format the whole kazoo will die messily.

EXPORT unoconv ()

SEE ALSO http://dag.wieers.com/home-made/unoconv/

   man 1 unoconv

DEFICIENCIES · There is not a lot of error checking. For example, we do not check to see if we can write to the target file.

   ·   There is no provision for passing other parameters to the unoconv program. Maybe make that an optional parameter?

   ·   How can we gracefully tell if LibreOffice supports the file format the caller wants?

   ·   We have no control over how well our back end converts. Other programs and Perl modules may do a better job for given pairs of source and destination formats.

   ·   OpenOffice::UNO might provide a more direct and flexible interface, and eliminate the need for python.

charlescurley@github 1 comment

Set::SegmentTree Immutable segment trees in perl

wat? Segment Tree

A Segment tree is an immutable tree structure used to efficiently resolve a value to the set of segments which encompass it.

Why?

You have a large set of value intervals (like time segments!) and need to match them against a single value (like a time) efficiently.

This solution is suitable for problems where the set of intervals is known in advance of the queries, and the tree needs to be loaded and queried efficiently many orders of magnitude more often than the set of intervals is updated.

Data structure:

A segment is like this: [ Segment Label, Start Value , End Value ]

Start Value and End Values Must be numeric.

Start Value Must be less than End Value

Segment Label Must occur exactly once

The speed of Set::SegmentTree depends on not being concerned with additional segment relevant data, so it is expected one would use the label as an index into whatever persistence retains additional information about the segment.

Use walkthrough

my @segments = (['A',1,5],['B',2,3],['C',3,8],['D',10,15]);

This defines four intervals which both do and don't overlap. - A - 1 to 5 - B - 2 to 3 - C - 3 to 8 - D - 10 to 15

Doing a find within the resulting tree.

my $tree = Set::SegmentTree::Builder->new(@segments)->build

Would make these tests pass

is_deeply [$tree->find(0)], [];
is_deeply [$tree->find(1)], [qw/A/];
is_deeply [$tree->find(2)], [qw/A B/];
is_deeply [$tree->find(3)], [qw/A B C/];
is_deeply [$tree->find(4)], [qw/A C/];
is_deeply [$tree->find(6)], [qw/C/];
is_deeply [$tree->find(9)], [];
is_deeply [$tree->find(12)], [qw/D/];

And although this structure is relatively expensive to build, it can be saved efficiently,

my $builder = Set::SegmentTree::Builder->new(@segments);
$builder->to_file('filename');

and then loaded and queried extremely quickly, making this. pass in only milliseconds.

my $tree = Set::SegmentTree->from_file('filename');
is_deeply [$tree->find(3)], [qw/A B C/];

This structure is useful in the use case where...

1) value segment intersection is important 1) performance of loading and lookup is critical, but building is not

The Segment Tree data structure allows you to resolve any single value to the list of segments which encompass it in O(log(n)+nk).

DavidIAm@github 5 comments

Net::ZooIt High level recipes for Apache ZooKeeper

DESCRIPTION

Net::ZooIt provides high level recipes for working with ZooKeeper in Perl, like locks, leader election or queues.

Net::ZooKeeper Handles

Net::ZooIt methods always take a Net::ZooKeeper handle object as a parameter and delegate their creation to the user. Rationale: enterprises often have customised ways to create those handles, Net::ZooIt aims to be instantly usable without such customisation.

Automatic Cleanup

Net::ZooIt constructors return a Net::ZooIt object, which automatically clean up their znodes when they go out of scope at the end of the enclosing block. If you want to clean up earlier, call

  $zooit_obj->DESTROY;

Implication: if you call Net::ZooIt constructors in void context, the created object goes out of scope immediately, and your znodes are deleted. Net::ZooIt logs a ZOOIT_ERR message in this case.

Error Handling

Net::ZooIt constructors return nothing in case of errors during creation.

Once you hold a lock or other resource, you're not notified of connection loss errors. If you need to take special action, check your Net::ZooKeeper handle.

If you give up Net::ZooIt resources during connection loss, your znodes cannot be cleaned up immediately, they will enter a garbage collection queue and Net::ZooIt will clean them up once connection is resumed.

Logging

Net::ZooIt logs to STDERR. Log messages are prefixed with Zulu military time, PID and the level of the current message: ZOOIT_DIE ZOOIT_ERR ZOOIT_WARN ZOOIT_INFO ZOOIT_DEBUG.

If Net::ZooIt throws an exception, it prints a ZOOIT_DIE level message before dying. This allows seeing the original error message even if an eval {} block swallows it.

To capture Net::ZooIt log messages to a file instead of STDERR, redirect STDERR to a new file handle in the normal Perl manner:

open(OLDERR, '>&', fileno(STDERR)) or die("unable to dup STDERR: $!");
open(STDERR, '>', $log_file) or die("unable to redirect STDERR: $!");

subogero@github 3 comments