This module provides a structure to store a list of keys, without duplicates, and be able to access them by either key name or index. ``` use List::Unique::DeterministicOrder; my $foo = List::Unique::DeterministicOrder->new( data => [qw /foo bar quux fetangle/] ); print $foo->keys; # foo bar quux fetangle $foo->delete ('bar') print $foo->keys; # foo fetangle quux print $foo->get_key_at_pos(2); # quux print $foo->get_key_at_pos(20); # undef $foo->push ('bardungle') print $foo->keys; # foo fetangle quux bardungle # keys are stored only once, # just like with a normal hash $foo->push ('fetangle') print $foo->keys; # foo fetangle quux bardungle print $foo->exists ('gelert'); # false print $foo->pop; # bardungle ```
Discussion section from the POD is below.
Any suggestions for a better name are appreciated.
The algorithm used is from https://stackoverflow.com/questions/5682218/data-structure-insert-remove-contains-get-random-element-all-at-o1/5684892#5684892
The algorithm used inserts keys at the end, but swaps keys around on deletion. Hence it is deterministic and repeatable, but only if the sequence of insertions and deletions is replicated exactly.
So why would one use this in the first place? The motivating use-case was a randomisation process where keys would be selected from a pool of keys, and sometimes inserted. e.g. the process might select and remove the 10th key, then the 257th, then insert a new key, followed by more selections and removals. The randomisations needed to produce the same results same for the same given PRNG sequence for reproducibility purposes.
Using a hash to store the data provides rapid access,
but getting the nth key requires the key list be generated
each time, and Perl's hashes do not provide their
keys in a deterministic
order across all versions and platforms.
Binary searches over sorted lists proved very effective for a while, but bottlenecks started to manifest when the data sets became much larger and the number of lists became both abundant and lengthy.
Since the order itself does not matter, only the ability to replicate it, this module was written.
One could also use Hash::Ordered, but it has the overhead of storing values, which are not needed here. I also wrote this module before I benchmarked against Hash::Ordered. Regardless, this module is faster for the example use-case described above - see the benchmarking results in bench.pl (which is part of this distribution). That said, some of the implementation details have been adapted/borrowed from Hash::Ordered.