Page 1 of 1

### Robust Linear Fit Using the RANSAC Algorithm Posted: Fri Sep 08, 2017 1:11 pm
RANSAC (RANdom SAmple Consensus) is a non-deterministic algorithm producing only a reasonable result with a certain probability, which is dependent on the number of iterations (see max_trials parameter). It is typically used for linear and non-linear regression problems and is especially popular in the fields of photogrammetric computer vision.

The algorithm splits the complete input sample data into a set of inliers, which may be subject to noise, and outliers, which are e.g. caused by erroneous measurements or invalid hypotheses about the data. The resulting model is then estimated only from the determined inliers.

Once data is in the presence of corrupt data: either outliers, or error in the model, RANSAC will fit a model from random subsets of inliers from the complete data set.

Here is the sample code using scikit python package::
Code: Select all
`import numpy as npfrom matplotlib import pyplot as pltfrom netCDF4 import Datasetfrom sklearn import linear_model, datasets# Load data from your netcdf data filefilename = 'Your Sample File.nc'ncin = Dataset(filename, 'r')ssrd = ncin.variables['ssrd'][:]tp = ncin.variables['tp'][:]ncin.close()n_samples = len(ssrd)X, y, coef = datasets.make_regression(n_samples=n_samples, n_features=1,                                      n_informative=1, noise=10,                                      coef=True, random_state=0)X[:,0] = tpy = ssrd# Fit line using all datalr = linear_model.LinearRegression()lr.fit(X, y)# Robustly fit linear model with RANSAC algorithmransac = linear_model.RANSACRegressor()ransac.fit(X, y)inlier_mask = ransac.inlier_mask_outlier_mask = np.logical_not(inlier_mask)# Predict data of estimated modelsline_X = np.arange(X.min(), X.max())[:, np.newaxis]line_y = lr.predict(line_X)line_y_ransac = ransac.predict(line_X)# Plot the resultslw = 5plt.scatter(X[inlier_mask], y[inlier_mask], color='yellowgreen',            label='Inliers')plt.scatter(X[outlier_mask], y[outlier_mask], color='gold',            label='Outliers')plt.plot(line_X, line_y, color='navy', linewidth=lw, label='Linear regressor')plt.plot(line_X, line_y_ransac, color='red', linewidth=lw, label='RANSAC regressor')plt.legend(loc='upper right')plt.xlabel("Total Precipitation (mm)")plt.ylabel("Surface Solar Radiation Downwards (J m**-2)")plt.show()`

Here is the final plot: Figure 1, Robust linear fit.
robust_fit.png (76.76 KiB) Viewed 14963 times

User could also download the Docker image for this recipe from dockerhub, and run the recipe using jupyter nootbook:
https://hub.docker.com/r/podaacdatarecipes/linearfit_ransac/