Teaching Machines to See: SURF, HOG & Bagging Magic in Image Classification
What if we could teach machines to see—not just glance, but perceive, understand, and categorize? In this post, we step into the world of image classification using a blend of classical computer vision and ensemble machine learning. Picture it like giving a robot eyes (SURF + HOG) and a brain (Bagging + Decision Trees) to recognize visual patterns.
🔹 Step 1: Image Data Mounting and Exploration
Using Google Colab, we mount a Drive folder containing labeled subfolders of images. Each subfolder is a class: cats, dogs, raccoons—you name it.
drive.mount('/content/drive')
We then peek into image dimensions to get a feel for our dataset’s scale and variety.
🔹 Step 2: Feature Extraction – Meet SURF and HOG
Imagine each image as a complex terrain map. SURF (or fallback SIFT) identifies critical landforms—edges, corners, and blobs. HOG (Histogram of Oriented Gradients) captures the overall texture and flow of the terrain.
Each image goes through:
- Grayscale conversion
- SURF/SIFT descriptor flattening (max 1000 features)
- HOG feature extraction
- Feature vector concatenation
These vectors are then padded for consistent dimensions.
🔹 Step 3: Label Encoding
To make our labels machine-readable, we use LabelEncoder
:
labelencoder = LabelEncoder()
enc_labels = labelencoder.fit_transform(labels)
This allows us to train classifiers without worrying about string categories.
🔹 Step 4: Ensemble Learning with Bagging
We train a BaggingClassifier
using DecisionTreeClassifier
as the base estimator. It’s like assembling a team of scouts, each examining different subsets of the data and reporting back.
bagging_classifier = BaggingClassifier(
estimator=DecisionTreeClassifier(),
n_estimators=10,
max_samples=0.8,
random_state=42
)
Split the data into training and testing:
X_train, X_test, y_train, y_test = train_test_split(images, enc_labels, test_size=0.2, random_state=42)
bagging_classifier.fit(X_train, y_train)
🔹 Step 5: Classical Decision Trees and Random Forests
We benchmark our ensemble with standalone classifiers:
DecisionTreeClassifier(criterion='entropy')
RandomForestClassifier(max_depth=7, random_state=0)
Each tree is like a rulebook; the forest is a democracy of rulebooks. Surprisingly, forests often outperform individual trees due to their aggregated wisdom.
🔹 Step 6: Evaluation and Confusion Matrix
Using classification_report
and confusion_matrix
, we assess precision, recall, and accuracy. Visual heatmaps reveal which classes are confused with others.
sns.heatmap(cm, annot=True, fmt="d", cmap="Blues")
🔹 Step 7: Dataset Summary
We count how many images each class has:
image_counts = {folder: len(glob.glob(os.path.join(folder_path, "*.jpg"))) for folder in subfolders}
This reveals any class imbalance issues.
🔹 Step 8: Fast Linear Classifier – SGD
For a lightweight baseline, we deploy an SGDClassifier
using a linear hinge loss. Surprisingly robust and fast, SGD often serves as a sanity check for larger models.
sgd_classifier = SGDClassifier(loss='hinge', penalty='l2')
📌 Summary
In this journey, we stitched classical computer vision (SURF + HOG) with robust ensemble learning (Bagging, Forests, SGD). It’s a practical example of fusing domain-specific handcrafted features with reliable ML architectures.
By the end, we have:
- Extracted meaningful visual descriptors
- Trained multiple classifiers
- Evaluated models using structured metrics
- Export-ready performance metrics for deployment
Coming Next: Pushing this into a real-world application with live image streams, data augmentation, and model compression for edge devices.