Weather Classification - SEFR Estimator

Ifrim Ciprian
Mar 18, 2022
2 min read

Considering the work perfomed with the different estimators in my previous blogs, all the estimators used are the ones available to be ported in the MicroMLGen library.

I have noticed that there is a new Estimator, called SEFR, which stands for Scalable, Efficient, and Fast classifieR, was first published last year in a paper from three Iranian researchers.

SEFR is a binary classifier, meaning it can tell apart two classes: the positive class and the negative class. Usually the positive class is the thing you’re looking for, and the negative class is the absence of that thing.

Binary classification is certainly important, but many interesting classification tasks involve multiple classes. You can extend SEFR to do multiclass classification as well, basically by running it multiple times. I’ll show how to do this later on.

Since SEFR is a supervised learning algorithm, you’ll need to train it on a set of examples and their labels. The labels are 0 or 1, for the negative class and positive class, respectively. The examples themselves consist of numerical data.

The key idea in SEFR is that we want to determine for each feature whether it helps to identify positive examples, or whether it helps to identify negative examples.

During training, SEFR computes a weight value for each feature. If the training data has M features per example, then SEFR learns M weights in total.

The weight is just a number that tells us how much an individual feature “pulls” the example towards the positive class (the weight is close to +1), or how much it pulls the example towards the negative class (the weight is close to -1).

If the weight is close to 0, the feature isn’t very useful to the classification process and can be considered irrelevant.

I have tested it quickly on the 20 year dataset just to compare the accuracy:

Training time: 0 ns
Training CV score: 0.419
Test accuracy: 0.435

The accuracy is actually worse than any of the other estimators, which is expected as it is a simpler algorithm, with no optimisation possible, made for binary classification, with one vs all for multiple classes.

Weather Classification - SEFR Estimator

Recent Posts

Comments