top of page
Search

Machine Learning Regression - Rainfall Forecasting

  • Writer: Ifrim Ciprian
    Ifrim Ciprian
  • Mar 11, 2022
  • 2 min read

Updated: Apr 7, 2022

With the use of Machine Learning Estimators, I can use the data from my sensors, and compare it to historical past data, in order to estimate how much rainfall there will be.



I have tested multiple datasets as follows:

  1. train dataset 20 years = 2000-2019 - daily averages - 5 features/test dataset 2 years = 2020-2021

  2. train dataset 2 years - 2021-2022 - daily averages - 3 features/test dataset 1 year = 2019

  3. train dataset 2 years - 2021-2022 - daily averages - 6 features/test dataset 1 year = 2019

  4. train dataset 2 years - 2021-2022 - hourly averages - 6 features/test dataset 1 year = 2019

  5. train dataset 4 years - 2018-2021 - daily averages - 6 features/test dataset 3 months = jan-march 2022

  6. train dataset 4 years - 2018-2021 - hourly averages - 6 features/test dataset 3 months = jan-march 2022


The features are as follows:

  1. 3 features: temp, humidity, pressure

  2. 5 features: temp, feels like temp, dew point, humidity, pressure

  3. 6 features: temp, feels like temp, dew point, humidity, pressure, uv index


And for the Regression estimators, I have used:

  • KNN Regression

  • Decision Tree Regression

  • Random Forest Regression

The code is outputting as CSV file the actual prediction, as txt file the performance metrics and as png, the mean squared and mean absolute errors.

ree

The plots look as following:

ree

ree

Here are my findings:

For the 3 features datset:

ree

Metrics:

ree

Here are some of the plotted histograms:

ree

The same process was applied to all datasets, and the findings can be found in the 4 excel files that have been added at the bottom of the blog.


The estimators have been tested both as default, as well as optimised with grid search or random search.

The best estimator is a Decision Tree Classifier, default, with the highest depth/pure leaves.

Which looks as following, when simplified to a depth of 4:

ree

It reaches an R2 Error of only 10.34%, and a Mean Absolute Deviation of 9.29%. In statistics, the coefficient of determination, denoted R² or r² and pronounced "R squared", is the proportion of the variation in the dependent variable that is predictable from the

independent variable.

ree

Here is a histogram on the 2 years Test Dataset of true values vs predicted values:

ree

Therefore, we can conclude that the model is highly accurate.

So now, thanks to the Micromlgen library we can port this library to the Arduino: https://github.com/eloquentarduino/micromlgen




 
 
 

Kommentare


  • LinkedIn
  • Facebook
  • YouTube
  • Instagram

©2022 AURI Robotics - Environmental Data Collection with Smartwatches

bottom of page