-
Notifications
You must be signed in to change notification settings - Fork 0
Image Processing
The chosen pipeline for processing the captured frames is simple yet effective, consisting of the following layers:
- Color Filtering:
Removes green and blue channels to isolate the red component.
- Thresholding:
Segments the object of interest.
- Morphological Filtering:
Applies erosion followed by dilation to remove residual noise.
Despite its simplicity, this approach achieves strong segmentation of the target object.
Color filtering is a technique used to selectively enhance or suppress specific color channels in an image. In this pipeline, the RGB color space is employed to emphasize the red channel and reduce background noise, thereby increasing contrast between the object of interest (the pendulum) and the background.
Since the pendulum is represented by a Pokéball (a sphere with red and white hemispheres), isolating the red channel effectively enhances the object and diminishes background interference.
filtered_im[:, :, 0] = np.clip(filtered_im[:, :, 0] * blue_filter, 0, 255)
filtered_im[:, :, 1] = np.clip(filtered_im[:, :, 1] * green_filter, 0, 255)
filtered_im[:, :, 2] = np.clip(filtered_im[:, :, 2] * red_filter, 0, 255)
The next step on our pipeline is Thresholding!
This is one of the simplest segmentation techniques. It separates the object based on intensity differences, basically turning pixels above a certain threshold white and all others black.
# Convert to grayscale and apply threshold
gray_im = cv2.cvtColor(filtered_im, cv2.COLOR_BGR2GRAY)
_, thresholded_im = cv2.threshold(gray_im, threshold_value_setting, max_binary_value, threshold_type_value)
OpenCV provides various thresholding methods (e.g., Binary, Inverted, Truncate). We chose Binary Thresholding, where pixels in gray_im
above threshold_value_setting
are set to 255
, and others to 0
. While effective, residual grainy noise is often present after thresholding, which we handle with erosion and dilation.
Erosion and dilation are complementary morphological operations that operate on binary images. Erosion removes pixels on object boundaries, while dilation adds pixels to these boundaries. When combined (erosion followed by dilation), they form an opening operation, which is useful for removing noise smaller than the structuring element.
The opening operation is defined as:
where A is the input image, and B is the structuring element. This operation removes noise artifacts while preserving the main object's shape.
The code below performs opening:
# Apply morphological transformations
element = cv2.getStructuringElement(morph_shape_val, (2 * kernel_val + 1, 2 * kernel_val + 1), (kernel_val, kernel_val))
eroded_im = cv2.erode(thresholded_im, element)
dilated_im = cv2.dilate(eroded_im, element)
The morph_shape_val
for cv2.getStructuringElement
can be cv2.MORPH_RECT
, cv2.MORPH_CROSS
, or cv2.MORPH_ELLIPSE
. Here, cv2.MORPH_ELLIPSE
was selected for its effectiveness after testing.
Example with cv2.MORPH_ELLIPSE
and kernel_val
= 3
:
[[0 0 0 1 0 0 0]
[0 1 1 1 1 1 0]
[1 1 1 1 1 1 1]
[1 1 1 1 1 1 1]
[1 1 1 1 1 1 1]
[0 1 1 1 1 1 0]
[0 0 0 1 0 0 0]]