Simple Object Detection using CNN with TensorFlow and Keras

Table contents

Introduction
Prerequisites
Project Structure Overview
Implementation
FAQs
Conclusion

Introduction

In this blog, we’ll walk through a simple yet effective approach to object detection using Convolutional Neural Networks (CNNs), implemented with TensorFlow and Keras. You’ll learn how to prepare your dataset, build and train a model, and run predictions—all within a clean and scalable project structure.

Prerequisites

We need to ensure use the right libraries and frameworks that are required to run this project.

Core Libraries

| Library       | Version   | Description                                          |
|---------------|-----------|------------------------------------------------------|
| PyYAML        | 6.0       | For reading configuration files                      |
| TensorFlow    | 2.10.0    | Deep learning framework                              |
| Keras         | 2.10.0    | High-level API for building and training models      |

Data Science & Visualization

| Library         | Version     | Description                                        |
|-----------------|-------------|----------------------------------------------------|
| NumPy           | 1.23.3      | Numerical operations                               |
| Pandas          | 1.5.0       | Data manipulation and analysis                     |
| OpenCV-Python   | 4.6.0.66    | Image processing                                   |
| Matplotlib      | 3.5.3       | Plotting and visualization                         |
| Scikit-learn    | 1.1.2       | Machine learning utilities and metrics             |

Project Structure Overview

This organization is designed to keep the project clean, scalable, and easy to maintain. By separating configuration, data, models, and source code, it becomes much easier to manage experiments, training workflows, and inference pipelines. Each directory has a clear purpose, helping developers and collaborators navigate and extend the project efficiently.

project/
│
├── config/                     # YAML configuration files for model, data paths, and training parameters
│   └── config.yaml             # Main configuration file
│
├── data/                       # Raw and processed data, annotations, and image splits
│   ├── annotations/            # CSV files with bounding box and class labels
│   ├── processed/              # Preprocessed datasets for training, validation, and testing
│   └── images/                 # Image data organized by usage and class
│       ├── train/              # Training images
│       │   ├── animal/
│       │   ├── human/
│       │   └── vehicle/
│       ├── validation/         # Validation images
│       │   ├── animal/
│       │   ├── human/
│       │   └── vehicle/
│       ├── test/               # Test images
│       │   ├── animal/
│       │   ├── human/
│       │   └── vehicle/
│       └── predict/            # Images used for inference
│           └── sample.png
│
├── models/                     # Trained models and checkpoints
│
├── src/                        # Source code for data handling, model logic, and pipeline execution
│   ├── data_preprocessing.py   # Data cleaning, formatting, and splitting
│   ├── model.py                # CNN model definition, training, and evaluation
│   ├── predict.py              # Inference logic for new images
│   ├── utils.py                # Helper functions for data and model operations
│   ├── config.py               # Loads and parses YAML configuration
│   ├── pipeline.py             # Orchestrates preprocessing, training, evaluation, and prediction
│   └── main.py                 # CLI entrypoint to run pipeline steps
│
├── requirements.txt            # Python dependencies
├── .flake8                     # Linting configuration for flake8
└── README.md                   # Project documentation and usage instructions

To maintain clarity and modularity, the project is divided into the following key folders:
config/: Stores all configuration files in YAML format.
data/: Organizes both raw and processed datasets.
models/: Contains trained models, weights, and checkpoints. Useful for saving progress during training and for deploying models later.
src/: Houses all core scripts for preprocessing, training, prediction, and utility functions.

Implementation

This section walks through the key steps to build a simple object detection system using a Convolutional Neural Network (CNN) with TensorFlow and Keras.

Step 1: Prepare Images and Annotations
This step involves organizing your image dataset and creating annotation files that describe the location and class of each object in the images. (/data/annotations/validation_annotations.csv and /data/annotations/train_annotations.csv)

Each annotation CSV should follow this structure:

### /data/annotations/train_annotations.csv

filename,label,xmin,ymin,xmax,ymax
image1.jpg,animal,10,20,100,200
image2.jpg,human,15,25,110,210
image3.jpg,vehicle,10,20,100,200

Step 2: Configuration
Keeping configurations separate makes the pipeline flexible and easy to tune without changing code.

### config/config.yaml

# Configuration settings for the image classification project

# Paths
data_dir: "data/images/"
train_dir: "data/images/train/"
validation_dir: "data/images/validation/"
test_dir: "data/images/test/"
annotations_dir: "data/annotations/"
processed_data_dir: "data/processed/"
model_dir: "models/"
checkpoint_dir: "models/checkpoints/"
saved_model: "models/saved_model.h5"
predict_image_path: "data/images/predict/car1.jpeg"

# Model Parameters
input_shape: [128, 128, 3]
num_classes: 3  # Adjust this if you add more classes
class_indices_order: ['animal', 'human', 'vehicle']  # Ensure this matches your model's output order

# Training Parameters
batch_size: 32
epochs: 10
learning_rate: 0.001

# Augmentation Parameters
rotation_range: 20
width_shift_range: 0.2
height_shift_range: 0.2
horizontal_flip: true

Step 3: Data Preprocessing
Clean and prepare the dataset by resizing images, normalizing pixel values, and formatting annotations. Split the data into training, validation, and test sets.

### src/data_preprocessing.py

import os
import pandas as pd
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.preprocessing import image
import numpy as np
from typing import Tuple
from config import DATA_DIR, BATCH_SIZE


class DataPreprocessor:
    def __init__(
        self,
        data_dir: str = DATA_DIR,
        img_size: Tuple[int, int] = (128, 128),
        batch_size: int = BATCH_SIZE,
    ) -> None:
        """
        Initialize the DataPreprocessor with config values for data directory and batch size.
        Sets up an ImageDataGenerator for preprocessing images.
        """
        self.data_dir = data_dir
        self.img_size = img_size
        self.batch_size = batch_size
        # ImageDataGenerator rescales pixel values from [0, 255] to [0, 1]
        self.datagen = ImageDataGenerator(rescale=1.0 / 255)

    def load_image(self, image_path: str) -> np.ndarray:
        """
        Load and preprocess a single image from the specified path.
        Args:
            image_path: Path to the image file.
        Returns:
            Preprocessed image as a numpy array.
        """
        img = image.load_img(image_path, target_size=self.img_size)
        x = image.img_to_array(img)
        x = x / 255.0
        x = np.expand_dims(x, axis=0)
        return x

    def load_images_from_directory(self, directory: str) -> Tuple:
        """
        Load and preprocess all images from a specified directory.
        Args:
            directory: Path to the directory containing images.
        Returns:
            A generator yielding batches of preprocessed images.
        """
        return self.datagen.flow_from_directory(
            directory,
            target_size=self.img_size,
            batch_size=self.batch_size,
            class_mode="categorical",
        )

    def load_data(self) -> Tuple:
        """
        Loads training and validation data from directories using ImageDataGenerator.
        Returns generators for training and validation datasets.
        """
        # Loads and rescales training images, returns batches with one-hot encoded labels
        train_data = self.load_images_from_directory(
            os.path.join(self.data_dir, "train")
        )

        # Loads and rescales validation images, returns batches with one-hot encoded labels
        val_data = self.load_images_from_directory(
            os.path.join(self.data_dir, "validation")
        )

        return train_data, val_data

    def preprocess_dataframe(self, df: pd.DataFrame) -> pd.DataFrame:
        """
        Preprocesses a pandas DataFrame by converting categorical labels to numerical codes.
        Returns the modified DataFrame.
        """
        # Convert categorical labels to integer codes for model compatibility
        df["label"] = pd.Categorical(df["label"]).codes
        # You can add more preprocessing steps here (e.g., normalization, augmentation)
        return df

    def save_processed_dataframe(self, df: pd.DataFrame, output_path: str) -> None:
        """
        Save the processed DataFrame to a CSV file.
        Args:
            df: The processed pandas DataFrame.
            output_path: Path to save the CSV file.
        """
        df.to_csv(output_path, index=False)
        print(f"Processed DataFrame saved to {output_path}")

Step 4: Model Definition
Build a custom CNN architecture using TensorFlow and Keras. The model should extract features and predict bounding boxes and class labels.

Step 5: Training
Compile the model with suitable loss functions and optimizers. Train it using the prepared dataset and save checkpoints for later use.

Step 6: Evaluation
Assess model performance using metrics like accuracy and IoU (Intersection over Union). Validate predictions against ground truth.

### src/model.py

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Conv2D, MaxPooling2D, Flatten, Dense
from typing import Tuple, Any
from config import INPUT_SHAPE, NUM_CLASSES, EPOCHS, BATCH_SIZE


class CNNModel:
    """
    Encapsulates a Convolutional Neural Network for object detection.
    Provides methods for building, training, evaluating, saving, and predicting.
    """

    def __init__(
        self,
        input_shape: Tuple[int, int, int] = INPUT_SHAPE,
        num_classes: int = NUM_CLASSES,
    ) -> None:
        self.input_shape = input_shape
        self.num_classes = num_classes
        self.model = self._build_model()

    def _build_model(self) -> Sequential:
        """
        Build and compile the CNN model architecture.
        Returns:
            Compiled Keras Sequential model.
        """
        model = Sequential(
            [
                Conv2D(
                    32,
                    kernel_size=(3, 3),
                    activation="relu",
                    input_shape=self.input_shape,
                ),
                MaxPooling2D(pool_size=(2, 2)),
                Conv2D(64, kernel_size=(3, 3), activation="relu"),
                MaxPooling2D(pool_size=(2, 2)),
                Flatten(),
                Dense(128, activation="relu"),
                Dense(self.num_classes, activation="softmax"),
            ]
        )
        model.compile(
            optimizer="adam", loss="categorical_crossentropy", metrics=["accuracy"]
        )
        return model

    def train(
        self,
        x_train: Any,
        y_train: Any = None,
        epochs: int = EPOCHS,
        batch_size: int = BATCH_SIZE,
    ) -> None:
        """
        Train the CNN model on the provided training data.
        Args:
            x_train: Training images or DirectoryIterator.
            y_train: Training labels (optional, not needed for DirectoryIterator).
            epochs: Number of training epochs.
            batch_size: Size of training batches (ignored for DirectoryIterator).
        """
        if y_train is None:
            self.model.fit(x_train, epochs=epochs)
        else:
            self.model.fit(x_train, y_train, epochs=epochs, batch_size=batch_size)

    def evaluate(self, x_test: Any, y_test: Any = None) -> Any:
        """
        Evaluate the model on test data.
        Args:
            x_test: Test images or DirectoryIterator.
            y_test: Test labels (optional, not needed for DirectoryIterator).
        Returns:
            Evaluation metrics (loss, accuracy).
        """
        if y_test is None:
            return self.model.evaluate(x_test)
        return self.model.evaluate(x_test, y_test)

    def save_model(self, file_path: str) -> None:
        """
        Save the trained model to the specified file path.
        Args:
            file_path: Path to save the model file (e.g., 'models/cnn_model.h5').
        """
        self.model.save(file_path)

    def predict(self, data: Any, batch_size: int = BATCH_SIZE) -> Any:
        """
        Make predictions using the trained model on the provided data.
        Args:
            data: Input data for prediction (e.g., DirectoryIterator).
        Returns:
            Prediction results from the model.
        """
        return self.model.predict(data, batch_size=batch_size)

Step 7: Prediction
Load the trained model and run inference on new images. Visualize results by drawing bounding boxes and class labels.

### src/predict.py

from typing import Any
from config import BATCH_SIZE


class Predictor:
    """
    Encapsulates prediction logic for a trained model.
    """

    def __init__(self, model: Any) -> None:
        self.model = model

    def predict(self, data: Any, batch_size: int = BATCH_SIZE) -> Any:
        """
        Make predictions using the provided model and input data.
        Args:
            data (Any): Input data for prediction.
            batch_size (int): Batch size for prediction (default from config).
        Returns:
            Any: Prediction result from the model.
        """
        return self.model.predict(data, batch_size=batch_size)

Step 8: pipeline
This modular design allows you to run each step independently or as part of a full pipeline, making experimentation and debugging easier.

### src/pipeline.py

import os
import pandas as pd
import numpy as np
from config import (
    ANNOTATIONS_DIR,
    PROCESSED_DATA_DIR,
    SAVED_MODEL,
    TEST_DIR,
    PREDICT_IMAGE_PATH,
    CLASS_INDICES_ORDER,
)


class ObjectDetectionPipeline:
    def __init__(self):
        from data_preprocessing import DataPreprocessor
        from model import CNNModel

        self.preprocessor = DataPreprocessor()
        self.model = CNNModel()

    def preprocess_data(self):
        """
        Run the data preprocessing pipeline.
        Loads training and validation data using DataPreprocessor.
        """
        print("Preprocessing data...")
        # Training annotations
        print("Loading and preprocessing training annotations...")
        train_annotations_path = os.path.join(ANNOTATIONS_DIR, "train_annotations.csv")
        processed_train_annotations_path = os.path.join(
            PROCESSED_DATA_DIR, "processed_train.csv"
        )
        train_df = pd.read_csv(train_annotations_path)
        processed_train_df = self.preprocessor.preprocess_dataframe(train_df)
        self.preprocessor.save_processed_dataframe(
            processed_train_df, processed_train_annotations_path
        )

        # Validation annotations
        print("Loading and preprocessing validation annotations...")
        validation_annotations_path = os.path.join(
            ANNOTATIONS_DIR, "validation_annotations.csv"
        )
        processed_validation_annotations_path = os.path.join(
            PROCESSED_DATA_DIR, "processed_val.csv"
        )
        val_df = pd.read_csv(validation_annotations_path)
        processed_val_df = self.preprocessor.preprocess_dataframe(val_df)
        self.preprocessor.save_processed_dataframe(
            processed_val_df, processed_validation_annotations_path
        )

        print("Data preprocessing complete.")

    def train_model(self):
        """
        Run the model training pipeline.
        Loads data, trains the CNN model, and saves the trained model.
        """
        print("Training model...")
        train_data, val_data = self.preprocessor.load_data()
        self.model.train(train_data)
        self.model.save_model(os.path.join(SAVED_MODEL))
        print("Model trained and saved.")

    def evaluate_model(self):
        """
        Run the model evaluation pipeline.
        Loads test data, evaluates the CNN model, and prints results.
        """
        print("Evaluating model...")
        self.model.model.load_weights(SAVED_MODEL)
        print(f"Loading test data from: {TEST_DIR}")
        test_data = self.preprocessor.load_images_from_directory(TEST_DIR)
        if hasattr(test_data, "samples") and test_data.samples == 0:
            print(
                f"No test data found in '{TEST_DIR}'. Ensure the directory exists and contains images organized in class subfolders."
            )
            print(
                "Expected structure: TEST_DIR/class1/, TEST_DIR/class2/, ... with images inside each class folder."
            )
            return
        loss, accuracy = self.model.evaluate(test_data)
        print(f"Test Loss: {loss:.4f}")
        print(f"Test Accuracy: {accuracy:.4f}")

    def predict_data(self):
        """
        Run the prediction pipeline.
        Loads a test image, makes predictions using the trained model, and prints results.
        """
        print("Running prediction...")
        self.model.model.load_weights(SAVED_MODEL)
        print("Loading test image...")
        from predict import Predictor

        predictor = Predictor(self.model)
        prediction = predictor.predict(self.preprocessor.load_image(PREDICT_IMAGE_PATH))
        print(f"Raw prediction probabilities: {prediction}")
        class_index = int(np.argmax(prediction))
        predicted_class = CLASS_INDICES_ORDER[class_index]
        print(f"Predicted class index: {class_index}")
        print(f"The predicted class is: {predicted_class}")

Step 9: Running the Pipeline via CLI
acts as the command-line interface (CLI) entry point for executing different stages of the object detection pipeline. It reads configuration from config.yaml and triggers the appropriate functions in pipeline.py.

### src/main.py

import argparse
from pipeline import ObjectDetectionPipeline


def main():
    """
    Main entry point for the command-line interface.
    Parses arguments and runs the selected pipeline process.
    """
    parser = argparse.ArgumentParser(
        description="Run different processes: preprocess, train, evaluate, predict."
    )
    parser.add_argument(
        "process",
        choices=["preprocess", "train", "evaluate", "predict"],
        help="Process to run",
    )
    args = parser.parse_args()

    pipeline = ObjectDetectionPipeline()
    if args.process == "preprocess":
        pipeline.preprocess_data()
    elif args.process == "train":
        pipeline.train_model()
    elif args.process == "evaluate":
        pipeline.evaluate_model()
    elif args.process == "predict":
        pipeline.predict_data()


if __name__ == "__main__":
    main()

Step 10: Run pipeline
Use the main entrypoint for all steps:
*** For platforms other than Windows, replace “python” with “python3”.

python src/main.py preprocess   # Preprocess data
python src/main.py train        # Train model
python src/main.py evaluate     # Evaluate model
python src/main.py predict      # Predict on new data

Code Quality (Optional)
Format and lint your code for consistency and style:

black src/      # Format code
flake8 src/     # Lint code

FAQs

What is object detection?
Object detection is a computer vision technique that allows machines to identify and locate objects within an image or video. It goes beyond simple image classification by not only recognizing what objects are present but also determining where they are using bounding boxes.

What Is Object Detection Used For?
Object detection is widely used in various fields:

Autonomous vehicles: Detecting pedestrians, traffic signs, and other vehicles.
Security systems: Identifying intruders or suspicious objects.
Healthcare: Detecting tumors or anomalies in medical scans.
Retail: Monitoring shelves and customer behavior.
Agriculture: Identifying crops, pests, or diseases.

What are the basic steps to implement object detection using CNN in TensorFlow/Keras?

Prepare the dataset: Organize images and annotations.
Preprocess data: Resize, normalize, and format labels.
Build the CNN model: Define architecture using Keras.
Train the model: Use annotated data to learn object features.
Evaluate performance: Use metrics like accuracy and IoU.
Run predictions: Apply the model to new images and visualize results.

Conclusion

This tutorial covered the essential steps: organizing your data, preprocessing, building a model, training, evaluating, and predicting. With a modular project structure and configurable pipeline, you’re now equipped to experiment, extend, and deploy your own object detection solutions.