top of page

Computer Vision Algorithm

The first line of defence in protecting the value of the collected recyclables

Techincal Details

In the following sections, we will dive deeper into the technical aspects of our computer vision technology, including the sampling process, data preprocessing, model building, and model training.


This video demonstrates the accuracy of the algorithm when identifying clean, dirty, dropping and empty chamber. The bottom right of the video shows the identification result, as well as the confidence rate.  

The path to reliable classification

Creating a CV algorithm involves several steps, including sampling, data preprocessing, model building, model training, and model evaluation.

This section would focus on sampling, model building and also model evaluation.

Data Sampling

Dynamic background sampling vs

Static background sampling 

Dynamic Background


  • High versatility. Adapts to different environments (Key advantage during rapid prototyping phase)


  • Requires more samples of a larger variety

  • Longer training time

  • Longer inference time

  • Difficult to verify model in all environments

  • Prone to data bias

Static Background


  • Stable and reliable in a controlled environment.

  • Requires less data and shorter training time.

  • Less prone to data bias


  • Overly sensitive to small changes in the environment.

  • Less versatile and may not perform as well in different environments.

  • Requires resampling if the environment changes

Chosen: Static background sampling

We chose static background sampling due to its stability. Nearing the end of the prototyping phase, there are much less mechanical changes in the mechanism, thus the algorithm does not have to adapt to changing enironments that often.

Other disadvantages are mitigated through techniques like data augmentation.

Static Background


Model building

Custom model or transfer learning

In our efforts to develop a reliable and robust computer vision model, we conducted extensive testing on some of the most widely used pretrained models, including InceptionV3, EfficientNetB1, NASNetMobile, ResNet50V2, ResNet101V2, EfficientNetB3, Xception, ResNet50, InceptionResNetV2, MobileNetV2, DenseNet121, ResNet101, ResNet152V2, DenseNet169, MobileNet, DenseNet201, ResNet152, VGG16, and VGG19.

From our observation, transfer learning models often get stuck at 80% accuracy during training, while custom models can achieve up to 98% accuracy. Transfer learning also tends to train slower and can result in unnecessarily large models that suffer from the vanishing or exploding gradient problem. In contrast, our custom model achieves 98% accuracy with a compact size of 14MB, while Google's InceptionV3 model achieves only 87% accuracy with a size of 129MB.

Model Evaluation

The model evaluation process is essential for identifying problems in all aspects of model building, including sampling, preprocessing, model architecture, and training. By evaluating the model's performance using a separate set of data, we can identify any issues that may be affecting the model's accuracy and reliability.

The machine learning process is often viewed as unpredictable and opaque, with no means of debugging beyond trial and error. However, this is not entirely accurate. In the following sections, we will provide examples of how to extract meaningful information from graphs such as accuracy/epoch graphs, precision, recall, F1 and confusion matrix. By analyzing these metrics, we can gain insights into the model's performance and identify areas for improvement.

Training process in chronological order

2022 November


The accuracy of the model became 100% after just 3 training epochs, which is abnormal. After investigation, it was found that during sampling, we accidentally placed an object in view of the camera only when we are taking pictures of dirty bottles. The algorithm ended up idneiftying that object instead of the cleanliness of the bottle. We used image masking and cropping to prevent the algorithm from seeing that part of the image.

2022 January


The red line represents error rate. And it is increasing. This is a symptom of vanishing/exploding gradient, where the model is not learning at all. This problem is solved by removing layers and reducing the size of the algorithm.

2023 February


Blue and yellow lines represent training accuracy and validation accuracy respectively, while red line represents error. Although the metrics are trending in the right direction, they are very unstable, with error suddenly spiking back, up to 0.3 and 0.15 within a few epochs. This is solved by reducing the learning rate.

2023 April

Screenshot 2023-06-03 163907.png

This is the confusion matrix (A very confusing name). The top picture lets you understand how to read a confusion matrix.

In our case, 

the second line and column represents the class "Clean"

the third line and column represents the class "Dirty".


This means that 3 "Clean" pictures are wrongly identifed as "dirty", while 10 "Dirty" pictures are identified as "Clean".

We solve this by checking which pictures confused the algorithm and remove them if they are bad data, e.g. pictures put into the wrong directory.

2023 April


Blue and yellow lines represent training accuracy and validation accuracy repectively, while green and red lines represent error and validation error.

The trend is that the model performs better during testing than in training. It is like a student performs poorly on homeworks, but gets 99 marks in public exam.  In our case, this means that the training data does not resemble the real world data enough. 

This is solved by reducing the strength of data augmentation.

2023 May

Screenshot 2023-06-03 165438.png

This is the final version of the algorithm. With accuracy approaching 99% and the error reducing to less that 0.1. The algorithm is ready.

The journey

My name is Garmisch Wong

It has been a fruitful journey to train a computer vision algorithm without relying on existing solutions like Google teachable machine. A custom solution comes with much more flexibility and allows me to learn much more.

Special thanks to Professor Desmond Tsoi for providing me with valuable feedback and support.

The video on the right demonstrates the machine running in full auto mode. The computer vision algorithm identifies the cleanliness of the bottles in real time.

bottom of page