PrePAN

Sign in to PrePAN

Profile

User's Modules

Shell::Command::Argv Convert argument strings to arrays and vice-versa

NAME

Shell::Command::Argv - convert argument strings to argument arrays and vice-versa.

VERSION

1.0.0

SYNOPSIS

use Shell::Command::Argv qw(
   string_to_argv
   argv_to_string
   quote_args
);

my @args = string_to_argv($arg_string);
system 'echo', @args;

my $arg_string = argv_to_string(@args);
system "echo $arg_string > /tmp/foo.txt";

my @quoted = quote_args(@args);
system "echo @quoted > /tmp/foo.txt";

DESCRIPTION

A package for converting argument strings to argument arrays and vice-versa.

The rules for parsing, escaping, and quoting are designed to be compatible the way Perl system() and exec() calls work when called with a single string argument. This is effectively the same as /bin/sh -c but with some exceptions added.

You can take args directly from @ARGV, and create argument strings from them that when passed to another command using system() or exec() will reproduce the same exact elements in the called command's @ARGV.

Or you can take a string of arguments from a file, parse into an array, combine with @ARGV and then call system() or exec() in list form.

Note that by "arguments", we mean arguments passed on the command line that end up in @ARGV. This does NOT include other valid shell command syntax such as wildcards, pipes and I/O redirection. It basically forbids any syntax that would change or get expanded if you passed it via a shell command. For example, ~ and *. These characters can of course be quoted or escaped just like in the shell. Only things that can make it to your application's @ARGV are valid arguments as far as this module is concerned.

The idea is to provide a way for tools to pass argument strings around in shell syntax while being able to convert to an array of arguments as needed without allowing the shell to do any expansions or substitutions. This makes for more stable, less dynamic code. This stability leads to the assertion that the following test should always be true:

is_deeply [string_to_argv( argv_to_string( @ARGV ) )], \@ARGV;

In other words, calling string_to_argv() on the result of calling argv_to_string() on a list of strings will always produce the original list.

The following test should mostly be true:

is_deeply [argv_to_string( string_to_argv( $arg_string ) )], $arg_string;

The reason it is not always true is that a single argument array element can be produced by more than one quoting/escaping method. Also, certain inputs to string_to_argv() may throw an exception.

SUBROUTINES/METHODS

Nothing is exported by default. The following subroutines are available for export.

string_to_argv

my @args = string_to_argv($arg_string);

Splits a whitespace delimited argument string into an array of arguments just like /bin/sh would do or when calling system() to pass arguments to an executable in string form.

The effect is similar to the following inefficient code:

my @args = split "\n", `perl -e 'print "$_\n" for @ARGV' -- $arg_string`, -1;
pop @args;

Like the above code (and the underlying use of /bin/sh), string_to_argv() respects both single and double quoted strings as well as unquoted words. Note that quoted strings will be returned without the surrounding quotes. For example:

my @args = string_to_argv(q("a" 'b' c)); # returns: qw(a b c):

Whitespace is ignored in $arg_string except within quoted strings or if escaped.

my @args = string_to_argv(q(  "a a"  'b b' c     c)); # returns: ('a a', 'b b', 'c', 'c'):

Literal quotes and other shell sensitive characters must be embedded in strings and/or escaped with a backslash as required by /bin/sh.

Unlike the above system() command and /bin/sh, some characters that might normally be interpreted by the shell are not allowed. $arg_string should only contain arguments and cannot contain shell wildcards, backticks, pipes, or output redirection. Because of this, double quoted strings cannot contain $ and ` characters because the shell would normally expand those before @ARGV is formed.

When presented with invalid input, string_to_argv() with throw an exception with the text: "Invalid argument syntax" in the message and try to point to the offending syntax.

You may be familiar with Text::ParseWords and wondering why we should not just use Text::ParseWords::shellwords($arg_string). string_to_argv($arg_string) behaves similarly to Text::ParseWords::shellwords($arg_string) but differs in the following ways:

  • string_to_argv only takes a single string as input as opposed to a list.

    This is nothing major but worth noting. Simply call string_to_argv() in a loop or block:

    my @ARGV = map { string_to_argv($_) } @Arg_strings;
    
  • string_to_argv does not allow the ` (backtick) character in double quoted strings.

    We do not want to allow dynamic shell code to exist in argument strings that may get expanded later by the shell. While it may have been possible to allow a single backtick character (system() calls using /bin/sh will allow it) it was considered not worth the effort at the time this subroutine was last rewritten.

  • string_to_argv does not allow the $ (dollar sign) character in double quoted strings.

    This is to prevent shell or environment variables from expanding when the arguments are passed in a shell command. This is another case where it may have been possible to allow $ in some cases but it was considered not worth the effort at the time this subroutine was last rewritten.

  • string_to_argv does not allow the following characters in unquoted strings unless escaped: | # & and angle brackets: > < These characters will either end an argument list or otherwise affect the shell command but will never make it to the ARGV of another executable.

  • string_to_argv does not allow the following characters in unquoted strings unless escaped: [ ] { } ( ) ? * ` $ These characters can result in dynamic arguments that can get expanded by the shell and therefore need to be quoted or escaped.

  • string_to_argv throws exceptions on bad inputs as opposed to returning undef or empty lists.

  • string_to_argv will preserve \ (backslash) in double quoted strings unless they escape a " (double quote) or \ (backslash) character.
  • string_to_argv does not allow escaping other ' (single quote) with a \ (backslash) in single quoted strings.

    This mimics the behavior of system() calls using /bin/sh.

  • string_to_argv will preserve \ (backslash) in single quoted strings.

  • string_to_argv will allow a trailing \ (backslash) right before the closing quote in single quoted strings.

    This mimics what Perl's system() does when using /bin/sh to run shell commands.

  • string_to_argv will reject words with a leading ~ (tilde) unless quoted or escaped.

    The shell will try to expand it to the home directory of the current user.

string_to_argv behaves very similarly to the Glib Shell Related Utilites function g_shell_parse_argv:

However, string_to_argv is still more restrictive in what is considered valid input, and is pure Perl.

See the Glib Shell Related Utilites section for more information.

argv_to_string

my $arg_string = argv_to_string(@args);
system "echo $arg_string > /tmp/foo.txt";

When forming argument strings from a list of arguments (e.g. @args), it is sometimes necessary to quote the argument and/or escape certain characters or you may not end up reproducing the argument list properly in string form. The simplest example is when an argument has whitespace. If you do not quote the argument, it will "flatten" into two arguments in the string.

argv_to_string converts a list of command line arguments to an argument string, quoting the arguments only as needed using the quote_args function, and inserting a single space between the quoted arguments. See quote_args for details on quoting behavior.

quote_args

my @quoted = quote_args(@args);
system "echo @quoted > /tmp/foo.txt";

Protect arguments from the shell by quoting and escaping as needed, but only if needed. This is the workhorse behind the argv_to_string() subroutine.

See also the g_shell_quote function in the Glib Shell Related Utilites section.

Glib Shell Related Utilities

A very good C implementation of the same idea can be found here:

https://developer.gnome.org/glib/stable/glib-Shell-related-Utilities.html

The relevant functions were easily wrapped using Inline::C for comparison during development.

The g_shell_parse_argv and g_shell_quote functions are almost identical to string_to_argv and quote_args but were not restrictive enough. g_shell_quote is overly paranoid and quotes when it is not needed, and g_shell_parse_argv allows things like | $ and & to exist without being quoted or escaped.

BUGS AND LIMITATIONS

This code is for Linux shell commands only. No attempt has been made to handle windows argument quoting or parsing.

SEE ALSO

  • Text::ParseWords
  • https://developer.gnome.org/glib/stable/glib-Shell-related-Utilities.html

AUTHOR

Hank Sola

fsola@github 12 comments