## Synopsis

```
use strict;
use warnings;
use Statistics::Descriptive::LogScale;
my $stat = Statistics::Descriptive::LogScale->new ();
# read numbers from stdin. Print a brief summary.
while(<>) {
chomp;
$stat->add_data($_);
};
# This can also be done in O(1) memory, precisely
printf "Mean: %f +- %f\n", $stat->mean, $stat->standard_deviation;
# This requires either storing actual data, or approximating
printf "25%%/50%%/75%%: %f, %f, %f\n", $stat->percentile(25), $stat->median, $stat->percentile(75);
```

## Description

This module provides Statistics::Descriptive::Full-compatible interface, however it doesn't keep all the data. Instead, data is divided into logarithmic bins and only bin counts are stored. Thus rough statistical distribution's properties can be calculated without using lots of memory.

It was initially targeted at performance analysis. Knowing that 99% of requests finished in 0.1s +- 0.01s looks more useful than just having an average request time 0.1s +- 1s (standard deviation) which is what I observed not once trying to "analyze" our web applications' performance.

However, a broader usage may exist, e.g. some long-running application may want to keep track of various useful numbers without leaking.

Ideally, this module could also become a short way of saying "I'm not sure why I need statistics, but it's nice to have, and simple." For those who *know* why they need statistics, there's R.

## Comments

Statistics::Descriptive::LogScale->new (25, 50, 75);

While searching for prior art, I have discovered a few papers on memory-efficient algorithms for percentiles/median that required a small fraction of sample size to run. I admit I didn't manage to read through any of them, let alone try to implement. At second glance they seem to use approximation, too.

However, when precise values are needed, it would probably be more feasible to move the sample to a separate machine and process it there with R.

Please sign up to post a review.