top of page
Search
  • Writer's pictureIfrim Ciprian

Machine Learning Regression - Rainfall Forecasting

Updated: Apr 7, 2022

With the use of Machine Learning Estimators, I can use the data from my sensors, and compare it to historical past data, in order to estimate how much rainfall there will be.



I have tested multiple datasets as follows:

  1. train dataset 20 years = 2000-2019 - daily averages - 5 features/test dataset 2 years = 2020-2021

  2. train dataset 2 years - 2021-2022 - daily averages - 3 features/test dataset 1 year = 2019

  3. train dataset 2 years - 2021-2022 - daily averages - 6 features/test dataset 1 year = 2019

  4. train dataset 2 years - 2021-2022 - hourly averages - 6 features/test dataset 1 year = 2019

  5. train dataset 4 years - 2018-2021 - daily averages - 6 features/test dataset 3 months = jan-march 2022

  6. train dataset 4 years - 2018-2021 - hourly averages - 6 features/test dataset 3 months = jan-march 2022


The features are as follows:

  1. 3 features: temp, humidity, pressure

  2. 5 features: temp, feels like temp, dew point, humidity, pressure

  3. 6 features: temp, feels like temp, dew point, humidity, pressure, uv index


And for the Regression estimators, I have used:

  • KNN Regression

  • Decision Tree Regression

  • Random Forest Regression

The code is outputting as CSV file the actual prediction, as txt file the performance metrics and as png, the mean squared and mean absolute errors.

The plots look as following:



Here are my findings:

For the 3 features datset:


Metrics:

Here are some of the plotted histograms:


The same process was applied to all datasets, and the findings can be found in the 4 excel files that have been added at the bottom of the blog.


The estimators have been tested both as default, as well as optimised with grid search or random search.

The best estimator is a Decision Tree Classifier, default, with the highest depth/pure leaves.

Which looks as following, when simplified to a depth of 4:

It reaches an R2 Error of only 10.34%, and a Mean Absolute Deviation of 9.29%. In statistics, the coefficient of determination, denoted R² or r² and pronounced "R squared", is the proportion of the variation in the dependent variable that is predictable from the

independent variable.


Here is a histogram on the 2 years Test Dataset of true values vs predicted values:

Therefore, we can conclude that the model is highly accurate.

So now, thanks to the Micromlgen library we can port this library to the Arduino: https://github.com/eloquentarduino/micromlgen



Predicted Statistical Analysis - Corrected Dataset
.xlsx
Download XLSX • 144KB
Prediction Statistical Analysis - 20 Years - 3&5 Features
.xlsx
Download XLSX • 198KB
Prediction Statistical Analysis - 2 Years - 3 Features
.xlsx
Download XLSX • 203KB
Prediction Statistical Analysis - 2 Years - 6 Features - Training
.xlsx
Download XLSX • 69KB

3 views0 comments

Recent Posts

See All
bottom of page