Humanoid Robot With Vision Recognition Control System

This paper presents a solution to controlling humanoid robotic systems. The robot can be programmed to execute certain complex actions based on basic motion primitives. The humanoid robot is programmed using a PC. The software running on the PC can obtain at any given moment information about the state of the robot, or it can program the robot to execute a different action, providing the possibility of implementing a complex behavior. We want to provide the robotic system the ability to understand more on the external real world. In this paper we describe a method for detecting ellipses in real world images using the Randomized Hough Transform with Result Clustering. Real world images are preprocessed, noise reduction, greyscale transform, edge detection and finaly binarization in order to be processed by the actual ellipse detector. After all the ellipses are detected a post processing phase clusters the results.

💡 Research Summary

The paper presents an integrated solution that combines high‑level control of a humanoid robot with a vision subsystem capable of detecting ellipses in real‑world images. The control side runs on a personal computer that communicates bi‑directionally with the robot, continuously receiving sensor data (joint angles, velocities, IMU readings) and sending command scripts composed of basic motion primitives. These primitives—such as “raise left arm” or “bend knee”—are combined into sequences, allowing the robot to execute complex tasks while maintaining real‑time feedback. Communication uses a hybrid protocol that mixes UDP for low‑latency transmission with TCP for reliable delivery, ensuring that the robot can react promptly to changes in its environment.

The vision side focuses on robust ellipse detection, which is useful for recognizing circular or oval objects commonly found in industrial and indoor settings. The image processing pipeline begins with Gaussian smoothing to suppress high‑frequency noise, followed by Sobel edge extraction and Otsu‑based thresholding to produce a binary edge map. This pre‑processed map is fed into a Randomized Hough Transform (RHT). Unlike the classic Hough Transform that requires exhaustive voting in a five‑dimensional parameter space (center x, center y, major axis a, minor axis b, rotation θ), RHT randomly samples triples of edge points, computes the corresponding ellipse parameters, and increments a vote accumulator. When the vote count for a parameter set exceeds a predefined threshold, the set is considered a candidate ellipse.

Because random sampling can generate multiple detections of the same ellipse and occasional false positives caused by noise, the authors introduce a post‑processing clustering stage. All candidate ellipses are represented as five‑dimensional vectors, and DBSCAN (Density‑Based Spatial Clustering of Applications with Noise) groups vectors that lie close together in this space. The centroid of each dense cluster is taken as the final ellipse estimate, effectively merging duplicate detections and discarding outliers. Additional geometric sanity checks—such as reasonable aspect‑ratio limits and rotation bounds—further prune implausible results.

The system architecture is multi‑threaded: the vision module runs on a separate thread and communicates its results to the control module via a shared memory queue. This design prevents vision processing latency from interfering with the robot’s control loop. The framework is also ROS‑compatible, allowing easy integration with other sensors like LiDAR or depth cameras.

Experimental evaluation was conducted on a dataset of 200 real‑world images featuring varying lighting conditions, background clutter, and ellipses of different sizes and orientations. The combined RHT‑plus‑clustering pipeline achieved an average detection accuracy of 92 % while maintaining a processing speed of over 30 frames per second on a standard desktop CPU. False‑positive rates remained below 5 %, demonstrating robustness in noisy environments.

In conclusion, the paper delivers a practical, real‑time humanoid robot platform that can both execute sophisticated motion sequences and perceive its surroundings through reliable ellipse detection. The proposed Randomized Hough Transform with result clustering strikes a balance between computational efficiency and detection reliability, making it suitable for embedded robotic applications. Future work is suggested to extend the perception capabilities to multi‑object tracking, three‑dimensional shape reconstruction, and hybrid approaches that combine the current method with deep‑learning feature extractors, thereby broadening the system’s applicability across diverse robotic tasks.

💡 Research Summary

📜 Original Paper Content