Spark regression models

Spark regression models

Table of content

System and experiment settings

Summary of results

\[RMSE = \sqrt{\frac{1}{N}\sum_{i=1}^{N}(y_i-w^Tx_i)^2}\]

Linear regression models

Three linear regression models will be covered in this blog post, including least square, ridge regression, and lasso. The application context is single label regression problem. Regression problem is sometimes closely related to classification problems, I would recommend my blog post about running classification model on Spark.

Load and save data files

Least square (code)

Run least square with parameter selections

Model test

Experimental results

Lasso and ridge regression (code)

Run Lasso/Ridge with parameter selections

Model test

Experimental results for Lasso

Experimental results for ridge regression

Decision tree regressor (code)

Experimental results

YearPredictionMSD dataset download

cadata dataset download

Coding details

Random forest regressor (code)

Experimental results

YearPredictionMSD dataset download

cadata dataset download

Coding details

External reading materials

Hongyu Su 19 October 2015