Sign in to PrePAN

List::Unique::DeterministicOrder Store and access a list of keys using a deterministic order based on the sequence of insertions and deletions



This module provides a structure to store a list
of keys, without duplicates, and be able to access
them by either key name or index.

    use List::Unique::DeterministicOrder;
    my $foo = List::Unique::DeterministicOrder->new(
        data => [qw /foo bar quux fetangle/]
    print $foo->keys;
    #  foo bar quux fetangle
    $foo->delete ('bar')
    print $foo->keys;
    #  foo fetangle quux 
    print $foo->get_key_at_pos(2);
    #  quux
    print $foo->get_key_at_pos(20);
    #  undef
    $foo->push ('bardungle')
    print $foo->keys;
    #  foo fetangle quux bardungle

    #  keys are stored only once,
    #  just like with a normal hash
    $foo->push ('fetangle')
    print $foo->keys;
    #  foo fetangle quux bardungle
    print $foo->exists ('gelert');
    #  false
    print $foo->pop;
    #  bardungle


Discussion section from the POD is below.

Any suggestions for a better name are appreciated.


The algorithm used is from

The algorithm used inserts keys at the end, but swaps keys around on deletion. Hence it is deterministic and repeatable, but only if the sequence of insertions and deletions is replicated exactly.

So why would one use this in the first place? The motivating use-case was a randomisation process where keys would be selected from a pool of keys, and sometimes inserted. e.g. the process might select and remove the 10th key, then the 257th, then insert a new key, followed by more selections and removals. The randomisations needed to produce the same results same for the same given PRNG sequence for reproducibility purposes.

Using a hash to store the data provides rapid access, but getting the nth key requires the key list be generated each time, and Perl's hashes do not provide their keys in a deterministic order across all versions and platforms.
Binary searches over sorted lists proved very effective for a while, but bottlenecks started to manifest when the data sets became much larger and the number of lists became both abundant and lengthy.

Since the order itself does not matter, only the ability to replicate it, this module was written.

One could also use Hash::Ordered, but it has the overhead of storing values, which are not needed here. I also wrote this module before I benchmarked against Hash::Ordered. Regardless, this module is faster for the example use-case described above - see the benchmarking results in (which is part of this distribution). That said, some of the implementation details have been adapted/borrowed from Hash::Ordered.


Absent any comments, the module has been uploaded to CPAN.

Please sign up to post a review.