PrePAN

Sign in to PrePAN

Sort::Naturally::XS Perl extension for human-friendly ("natural") sort order

Good

Synopsis

# Usage

use Sort::Naturally::XS;

my @mixed_list = qw/test21 test20 test10 test11 test2 test1/;

my @result = nsort(@mixed_list); # @result is: test1 test2 test10 test11 test20 test21

@result = sort ncmp @mixed_list; # same, but use standard sort function

@result = sort {ncmp($a, $b)} @mixed_list; # same as ncmp, but argument pass explicitly

my $result = Sort::Naturally::XS::sorted(\@mixed_list, locale => 'ru_RU.utf8'); # pass custom locale


# Benchmark
require Sort::Naturally::XS;
require Sort::Naturally;

my @list = (
    'H4', 'T25', 'H5', 'T27', 'H8', 'T30', 'HEX', 'T35', 'M10', 'T4', 'M12', 'T40', 'M13', 'T45', 'M14',
    'T47', 'M16', 'T5', 'M4', 'T50', 'M5', 'T55', 'M6', 'T6', 'M7', 'T60', 'M8', 'T7', 'M9', 'T70', 'Ph0',
    'T8', 'Ph1', 'T9', 'Ph2', 'TT10', 'Ph3', 'TT15', 'Ph4', 'TT20', 'Pz0', 'TT25', 'Pz1', 'TT27', 'Pz2',
    'TT30', 'Pz3', 'TT40', 'Pz4', 'TT45', 'R10', 'TT50', 'R12', 'TT55', 'R13', 'TT6', 'R14', 'TT60', 'R5',
    'TT7', 'R6', 'TT70', 'R7', 'TT8', 'R8', 'TT9', 'S', 'TX', 'Sl', 'XZN', 'T10', 'T15', 'T20'
);

Benchmark::cmpthese(-3, {
    my => sub { Sort::Naturally::XS::nsort(@list) },
    other => sub { Sort::Naturally::nsort(@list) },
});

#          Rate other    my
# other   561/s    --  -97%
# my    20693/s 3588%    --

Benchmark::cmpthese(-10, {
    std   => sub { sort @list },
    other => sub { sort {Sort::Naturally::ncmp($a, $b)} @list },
    my    => sub { sort {Sort::Naturally::XS::ncmp($a, $b)} @list },
});

#            Rate other   std    my
# other 7977106/s    --   -3%   -5%
# std   8232321/s    3%    --   -2%
# my    8426303/s    6%    2%    --

Description

Description

Natural sort order is an ordering of mixed (consists of characters and digits) strings in alphabetical order, except that digits parts are ordered as a numbers.

For example, standard machine-oriented alphabetical sort for list:

test21 test20 test10 test11 test2 test1

result to:

test1 test10 test11 test2 test20 test21

It isn't human-friendly, because test10 and test11 comes before test2. Natural sort order suggests the following:

test1 test2 test10 test11 test20 test21

Advantages

  • Written in C and XS, so it's really fast
  • Support already exists Sort::Naturally module API
  • Fix some Sort::Naturally deviation from normal sort behavior, like "foobar" comes before "foo13'

Benchmark

See synopsis section

Comments

It would be great if this addressed https://rt.cpan.org/Public/Bug/Display.html?id=107953
In the near future I will try to consider the possibility of adding arbitrary locale. Thanks for idea.
Added "sorted" subroutine that supports arbitrary locale.

Please sign up to post a review.