top of page
Search
  • Writer's pictureIfrim Ciprian

Weather Datasets - Machine Learning Prospects

As with the general plan, the idea was to add an Unsupervised Machine Learning model to be used for data anomaly/outlier detection. That feature is in the works and will be added at a later stage.

However, one interesting feature to add would be to use the Temperature, Pressure and Humidity data from the Nicla Board with Supervised Machine Learning to predict weather conditions. This would provide a fully offline solution, that with an accurate and big dataset, can provide accurate outputs within a certain percentage% from an actual Weather Station. These outputs will be tested and compared against the accurate values to create graphs in order to visualise the accuracy.


The idea started from this article describing how to do a similar system in Python for rainfall inch prediction: https://www.educative.io/edpresso/ml-rainfall-prediction-using-linear-regression.

Then the following research paper called "RAINFALL PREDICTION USING MODIFIED LINEAR REGRESSION", from the ARPN Journal of Engineering and Applied Sciences, S. Prabakaran, P. Naveen Kumar and P. Sai Mani Tarun, 2017. Which goes into depth of the mathematics and model for the same rainfall prediction system: http://www.arpnjournals.org/jeas/research_papers/rp_2017/jeas_0617_6115.pdf


The systems would work as follows:

  • Supervised Decision Tree Model: Classification of the temperature/pressure/humidity data to output a Fair/Cloudy/Rainy/Snowy/Foggy/Haze output.

  • Supervised 2D Linear Regressor Model: Regression system that uses the temperature/pressure/humidity data to assign a rainfall amount prediction in mm.

This weekend I have worked on the 2 datasets that will be used, which have been attached at the end of this blog.

The first dataset will be used for weather condition, and it stores 248 entires for the year of 2021 from the 1st of January to the 31st of December. The values are taken every 5 days, at 2:20AM, 12:20PM, 6:20PM, and looks as follows:

As can be seen from the coloured columns, I have assigned a different colour to each data, and then to the weather condition, I simplified it from something complex of Cloudy/Mostly Cloudy/Partly Cloudy to just Cloudy, and this aspect has been applied to all conditions. The temperature has also been rounded up with the general rule of:

  1. 0.0-0.49 = rounded down;

  2. 0.5-0.99 = rounded up.

Furthermore, to simply the processing. I have added a condition Index to all condition descriptions. This condition is directly related to a string array in Arduino, which will be taken and presented based on the condition. The string array will be changed to a an array representing the file names to be played by the DFPlayer when switching to a voice line output system at a later stage.


For the rainfall prediction in mm with the 2D Linear Regressor, I am using direct information from a Weather Station. The current table looks as follows, must will most probably be modified/filtered when applying it to the system:


In Arduino both datasets will be implemented in the form of arrays, with a separate array (only for Decision Tree) representing the output.

The coding for the Decision Tree will look as follows:

#include"DecisionTree.h"  
Eloquent::ML::Port::DecisionTree clf;  

voidsetup(){     
Serial.begin(115200);     
Serial.println("Begin"); 
}  

voidloop(){     
float irisSample[4] = {6.2, 2.8, 4.8, 1.8};      
Serial.print("Predicted label (you should see '2': ");    
 Serial.println(clf.predict(irisSample));     
delay(1000); 
}

The coding for the 2D Linear Regressor:

#include <LinearRegression2d.h>

LinearRegression2d lr = LinearRegression2d();
double values[2];

void setup() {
    Serial.begin(9600);     
}

void loop() {
    Serial.println("Start learn");
    //y = 0.5*x1 + 3*x2 + 0.75
    lr.learn(0.0, 93.0, 0.0);
    lr.learn(1.11, 93.0, 0.0);
    lr.learn(2.78, 93.0, 1.0);
    lr.learn(0.0, 93.0, 0.0);
    lr.learn(1.11, 93.0, 0.0);
    lr.learn(2.78, 93.0, 1.0);

    Serial.println("End learn");

    float fake_temp = 2.78;
    float fake_hum = 93.0;
    Serial.println(int(lr.calculate(fake_temp,fake_hum)));




    //Serial.println("Reset");
    //lr.reset();
    
    delay(300000);
}

These ML algorithms will be fully developed and tested for accuracy in the following week. I have attached the 2 datasets to this blog, as well as the research document on Linear Rainfall prediction models.


More articles and research papers on Embedded Machine Learning available at:


Prediction Models
.docx
Download DOCX • 13KB
temp_dataset
.xlsx
Download XLSX • 36KB
daily-areal-rainfall
.csv
Download CSV • 231KB
linear2d_arduino
.zip
Download ZIP • 420B




11 views0 comments

Recent Posts

See All
bottom of page