Multivariate Analytics & ML Explorer: Correlation, PCA & Anomaly Detection

Upload a tabular engineering dataset to move from correlation structure through dimensionality reduction to supervised modelling and anomaly detection, with statistical rigour built in.

This tool computes Pearson, Spearman, and Kendall correlation matrices with Bonferroni or Benjamini-Hochberg correction for multiple testing, then runs Principal Component Analysis (PCA) with a choice of StandardScaler or RobustScaler. From there it extends to random-forest supervised modelling, k-means clustering, and isolation-forest anomaly detection, so a single tabular dataset can be examined for structure, drivers, and outliers in one workflow.

Why the corrections and scaling matter

Testing many variable pairs for correlation inflates the chance of false positives; multiple-testing correction keeps the reported relationships defensible rather than spurious. Scaling matters because variables measured on different units (a temperature in degrees and a pressure in megapascals) would otherwise let the larger-magnitude variable dominate PCA purely through its scale. RobustScaler is the safer choice when the data contain outliers that would distort a standard mean-and-variance normalisation.

Worked example

A condition-monitoring export of vibration, bearing temperature, discharge pressure, and flow rate is uploaded. PCA shows the first two components capture about 80 percent of total variance, meaning the four sensors carry largely redundant information that reduces cleanly to two underlying modes. The isolation forest flags roughly 3 percent of records as anomalies, and on inspection those timestamps cluster in the days preceding two recorded trips, giving an early-warning signature to monitor.

For: condition-monitoring, process, and data-focused engineers analysing multi-sensor or multi-parameter datasets.

Initialising…