PrePAN

Sign in to PrePAN

perltab command line utility for using perl code to manipulate data tables

Good

Synopsis

% perltab -e 'script to run on tabular data'  tabularInputdata.tsv

Description

perltab could be thought of as an extension to perl autosplit mode for handling tabular data. I developed while working with a bioinformatics dataset that quite a few features (columns) and many of them had missing values, which made the data inconvenient to directly handle with something like perl autosplit mode. With perl autosplit mode it is easy to output the nth column.

% perl -F'\t' -anE 'say $F[1]' heightWeight.tsv

perltab makes this slightly easier:

% perltab -e 'say $F[1]' heightWeight.tsv

But it also allows for using named columns (and allows for abbreviation)

% perltab -e 'say F(hei)' heightWeight.tsv

This is convenient, but when numerical computation and missing values come to play perltab is particularly helpful.

For example the minimum value of a column can be output in this way:

% perltab -d 'bemin $m, F(hei)' -z 'say $m'

or, as long as the column labels do not look like numbers, this will also work

% perltab -e 'bemin $m, F(hei)' -z 'say $m'

To do that on the command line without perltab is quite difficult without a LOT of typing, mostly because non-numerical values must be silently skipped but also because $m needed to be initialized properly (for example zero won't work if negative values are present in the data). perltab defines several reasonably mnenmonic functions to handle issues like that transparently.

I believe perltab can be a valuable contribution to CPAN. However perltab would be an unusual contribution to CPAN in that it is designed to be used as a stand-alone command line tool rather than as a library. In fact, is implemented as a plain program rather than a module. Other differences from most CPAN modules is that its bilingual help documentation is written in a simple ad-hoc markup language rather than POD. I started with POD but it did not fit my needs.

By uploading perltab to PREPAN I hope I can get constructive advice on how to move this somewhat atypical contribution into CPAN.

The perltab documention % perltab -h has close to 50 examples of using perltab and all of these are represented in the regression test suite.

Comments

This isn't that unusual at all. In fact, CPAN has a whole namespace devoted to command-line applications: App. So, to upload your perltab command to CPAN, it would be best to call it App::perltab. You'd need a small stub "module" to provide some introduction, a bit of docs, and a link to the main documentation in the `perltab` command. There are lots of App:: distributions on CPAN you can use for inspiration.
Thank you for the quick and apt advice! I will take a look at some of the App:: distributions.
Follow up. I added a lib/App/perltab.pm file with a synopsis and summary in pod format.
Another question I have is about regression testing. perltab comes with a few dozen regression tests but I have not tested them on various platforms and I expect there may be problems on some platforms. Is this likely to cause a problem in terms of registering perltab as part of CPAN?
Just upload it, and fix the problems later.
Thank you for the encouragement! Is it considered okay to upload to CPAN while still under status 'in review' in PrePAN?
There is no rule about that at all, you can upload whatever you want to CPAN. It's a bit like zombo.com in that regard. Prepan.org is not part of any official review process for CPAN. There is no review process.

Please sign up to post a review.