PrePAN

Sign in to PrePAN

PDLx::Algorithm::Center Various ways of centering a dataset

Good

Synopsis

$results = sigma_clip( coords => $coords,
                       weight => $weight,
                       mask   => $mask,
                       %opts);

Description

This module collects various algorithms for determining the center of a dataset into one place. It accepts data stored as PDL variables (piddles)

Currently it contains a single function, sigma_clip, which provides an iterative algorithm which successively removes outliers by clipping those whose distances from the current center are greater than a given number of standard deviations.

sigma_clip finds the center of a data set by:

  1. ignoring the data whose distance to the current center is a specified number of standard deviations
  2. calculating a new center by performing a (weighted) centroid of the remaining data
  3. calculating the standard deviation of the distance from the data to the center
  4. repeat at step 1 until either a convergence tolerance has been met or the iteration limit has been exceeded

The initial center may be explicitly specified, or may be calculated by performing a (weighted) centroid of the data.

The initial standard deviation is calculated using the initial center and either the entire dataset, or from a clipped region about the initial center.

sigma_clip can center sparse (e.g., input is a list of coordinates) or dense datasets (input is a hyper-rectangle) with or without weights. It accepts a mask which directs it to use only certain elements in the dataset.

The coordinates may be transformed using (PDL::Transform)[https://metacpan.org/pod/PDL::Transform]. This is mostly useful for dense datasets, where coordinates are generated from the indices of the passed hyper-rectangle. This functionality is not currently documented, as tests for it have not yet been written.

More information is available at the github repo page, https://github.com/djerius/PDLx-Algorithm-Center

Comments

Please sign up to post a review.