Robust Linear Fit Using the RANSAC Algorithm

Robust Linear Fit Using the RANSAC Algorithm

Postby yiboj » Fri Sep 08, 2017 1:11 pm

RANSAC (RANdom SAmple Consensus) is a non-deterministic algorithm producing only a reasonable result with a certain probability, which is dependent on the number of iterations (see max_trials parameter). It is typically used for linear and non-linear regression problems and is especially popular in the fields of photogrammetric computer vision.

The algorithm splits the complete input sample data into a set of inliers, which may be subject to noise, and outliers, which are e.g. caused by erroneous measurements or invalid hypotheses about the data. The resulting model is then estimated only from the determined inliers.

Once data is in the presence of corrupt data: either outliers, or error in the model, RANSAC will fit a model from random subsets of inliers from the complete data set.

Here is the sample code using scikit python package::
Code: Select all
import numpy as np
from matplotlib import pyplot as plt
from netCDF4 import Dataset

from sklearn import linear_model, datasets

# Load data from your netcdf data file
filename = 'Your Sample'
ncin = Dataset(filename, 'r')
ssrd = ncin.variables['ssrd'][:]
tp = ncin.variables['tp'][:]

n_samples = len(ssrd)

X, y, coef = datasets.make_regression(n_samples=n_samples, n_features=1,
                                      n_informative=1, noise=10,
                                      coef=True, random_state=0)

X[:,0] = tp
y = ssrd

# Fit line using all data
lr = linear_model.LinearRegression(), y)

# Robustly fit linear model with RANSAC algorithm
ransac = linear_model.RANSACRegressor(), y)
inlier_mask = ransac.inlier_mask_
outlier_mask = np.logical_not(inlier_mask)

# Predict data of estimated models
line_X = np.arange(X.min(), X.max())[:, np.newaxis]
line_y = lr.predict(line_X)
line_y_ransac = ransac.predict(line_X)

# Plot the results
lw = 5

plt.scatter(X[inlier_mask], y[inlier_mask], color='yellowgreen',
plt.scatter(X[outlier_mask], y[outlier_mask], color='gold',
plt.plot(line_X, line_y, color='navy', linewidth=lw, label='Linear regressor')
plt.plot(line_X, line_y_ransac, color='red', linewidth=lw, label='RANSAC regressor')
plt.legend(loc='upper right')
plt.xlabel("Total Precipitation (mm)")
plt.ylabel("Surface Solar Radiation Downwards (J m**-2)")

Here is the final plot:
Figure 1, Robust linear fit.
robust_fit.png (76.76 KiB) Viewed 15900 times

User could also download the Docker image for this recipe from dockerhub, and run the recipe using jupyter nootbook:
Posts: 130
Joined: Mon Mar 30, 2015 11:22 am

Return to Data Recipes