Tuesday, 6 January 2026

Analytics Libraries Expect Regularised Data

This is a recurrent theme in quantitative computing. 

Analytics libraries expect clean, regularised data, e.g. time series with no missing values.  Real-life data often has gaps and idiosyncrasies - it needs to cleaned often (to create a golden source) but even then subsequently rejigged based on the consumer need. 

This is akin to the Adapter Design Pattern in programming. In the Adapter you adapt an "interface" to another "interface" - for example, an XML dataset is "adapted" into a JSON dataset for JSON consumers.

Statistical libraries in particular are particularly picky about datasets and consistency, particularly when comparing datasets and trying to find relationships or errors between actual and expected values.

No comments: