top of page
Search
  • Writer's pictureIfrim Ciprian

Machine Learning Classification - Weather Conditions

Updated: Apr 7, 2022

Continuing the Machine Learning work from last time, by moving to a Classification problem from a Regression problem, with the same datasets, we can clasify the current environment conditions as:

  • Fair

  • Rain

  • Cloudy

  • Overcast

  • Snow


I have tested different estimators that can be ported to Arduino by the MicroMLGen Library:

  1. Decision Tree

  2. Gaussian Naive Bayes

  3. Random Forest

  4. Support Vector Machines

  5. XGBOOST Classifier

The code creates an output CSV file, txt metrics file and confusion matrix plots that look as follows:

The metrics TXT files (that can be imported into Excel for easy table generation), look as follows:


The confusion matrix look as follow, depending on the estimator:

1) dtc_default_weather_data_2000_2019_weather_data_2020_2021_5f:

2) xgb_default_weather_data_2018_2021_hourly_weather_data_2022_hourly_5f

3) rfc_default_weather_data_2000_2019_3c_weather_data_2020_2021_3c_5f

4) gnb_default_2000_2019_2020_2021_5f

All the confusion matrix files (a total of 77) can be found on GitHub for all the different runs.


The classifiers have been optimised using Grid Search or Random Search.

  1. Grid search is a process that searches exhaustively through a manually specified subset of the hyperparameter space of the targeted algorithm. Random search, on the other hand, selects a value for each hyperparameter independently using a probability distribution.

  2. Random search is a technique where random combinations of the hyperparameters are used to find the best solution for the built model. It is similar to grid search, and yet it has proven to yield better results comparatively.

Here are some pictures of the process:

1) Suppiort Vector Machines Optimisation


Here is a video demo:


I have compared the different estimators by checking the F1 Score, Precision and Recall per class:


It seems that the XGBOOOST classifier is the best, as I am more interested in high rain class accuracy, combined with fair and snow, not cloud and overcast.


Since the dataset is imbalanced, it would be nice to test accuracy with an oversampled dataset, which is what I will be doing next.

Sun diagram of sample distribution per class:

Histogram of the samples distribution per class:



Classification Statistical Analysis - Original Dataset
.xlsx
Download XLSX • 372KB




5 views0 comments

Recent Posts

See All
bottom of page