Improving Object Detection Accuracy with ESP32-CAM and OpenCV

Hey everyone,

I’ve been working on an object counting project using the ESP32-CAM and OpenCV. I got the basic setup working, but I’m running into issues with accuracy in changing lighting conditions. I’m wondering if anyone here has tried improving detection using background subtraction or more advanced tracking techniques? Any tips for reducing false positives or handling multiple objects moving at once would be great!

References:
https://www.theengineeringprojects.com/2025/03/object-counting-project-using-esp32-cam-and-opencv.html

https://www.youtube.com/watch?v=J4bB1FjY94s

https://randomnerdtutorials.com/esp32-cam-video-streaming-face-recognition-arduino-ide/

https://opencv.org/object-detection-using-opencv-and-python/

There are many ways to skin this cat…

Some of the ESPcams I’ve used were a bit shoddy, but even with a good one running naive object detection on an esp32 will be essentially maxing it out…you can train the face detection algos with the different lighting conditions to help account for them

You can also implement some thresholds that take the average brightness, normalize it vs other data, then maximize contrast to help with most everything you’ve described

If you decide you’re into using CV on MCUs, a Pi with an arducam works great, a Jetson Nano with most camera works great…you can also use most usb/webcams with a Pi/Jetson

We also sell https://www.sparkfun.com/openmv-cam-h7-plus.html which is an awesome lil fella with outstanding documentation OpenMV Cam Tutorial — MicroPython 1.24 documentation (it even uses its own IDE Download – OpenMV)

The human eye and brain do lots of image processing “in the background” that is extremely difficult to imitate with cameras and computer algorithms. For example, mere color recognition under changing light conditions is a far more difficult problem than most people realize.

Also keep in mind that the eye has logarithmic response to illumination levels spanning many orders of magnitude, whereas cameras have linear response. As mentioned above, a better camera with greater bit depth will help overcome that range problem.

Thanks a lot for the suggestions! I’ll try tweaking the brightness threshold and contrast like you mentioned. Also, the OpenMV cam looks really interesting, might give that a shot for better performance.

1 Like

That’s a great point. I hadn’t really thought about how much our eyes handle behind the scenes. Makes sense why lighting throws things off so much. I’ll definitely look into better camera options with higher dynamic range.

1 Like