Motion detection in underwater video helps identify marine species

Digital video technology is rapidly improving with emerging possibilities to record high resolution digital imagery over long periods of time and for a much lower cost. Video and image based monitoring in the ocean offers many advantages over traditional diver-based observational techniques. The holy grail of video technology is the development of software which can automatically identify species. This means creating software which performs accurate count, size, and species ID of fish and other marine species in real-time. Such applications will revolutionise the way marine science is conducted and drastically expand our data acquisition capabilities.

A common fish surveying technique researchers utilise is Baited Remote Underwater Video Stations (BRUVS). These stationary seafloor camera frames utilise bait to attract fish, capturing the activity on video. Footage is then analysed back in the laboratory where researchers manually perform species ID and individual counts. Although beneficial in places difficult for humans to access, manual species identification and counting and measuring of individuals is extremely labour intensive and a major disincentive against the application of this technology. More commonly, researchers are looking for more efficient ways to capture fish community data.

A top priority of ours is continuous underwater video from both a scientific and public engagement point of view. Unlike BRUVS our low-cost underwater cameras are designed to remain at sea for months at a time, live-streaming footage to the cloud. A downfall of acquiring such large amounts of video data is that much of what we collect doesn't contain anything of interest i.e. when analysing footage we are only interested in frames that contain fish or other species of interest. This is where clever motion detection algorithms can play an important role in our ability to efficiently manage and analyse data. By utilising motion detection algorithms we can significantly reduce the amount of video data we send to the cloud (reducing data upload), whilst also reducing the data that needs to be analysed. This is done by detecting changes in video by classifying the background and foreground, where foreground implies a moving object of interest. In other words, we can train software to do what humans do, analyse the footage, pick out anything of interest and discard the rest.

At the most basic level, motion detection can be achieved by comparing the pixels of two sequential frames of video. Colour differences for each pixel are calculated and a measure for the overall difference in the sequence is compared with a threshold. If the motion is above the threshold, we consider there to be motion in the video sequence.If we have our resolution set to 1280×720 (like the numbers you see for the resolution of your computer or TV screen) that means each frame has 1280 pixels multiplied by 720 resulting in 921,600 pixels per frame! Because we send through 25 frames per second, that means the algorithm has to process about 23 million pixels for every second of video. Pixels are represented by four channels, Red, Green, Blue and Alpha (describing opacity). By comparing the colour channels for each pixel of two frames we can uncover any changes of colour in that pixel. The corresponding code for this process might look something like this:

differenceRed := absDiff(red1, red2)

differenceGreen := absDiff(green1, green2)

differenceBlue := absDiff(blue1, blue2)

differenceTotal := differenceRed + differenceGreen + differenceBlue

The calculated difference (differenceTotal) is then added onto a sum of difference for comparison to a threshold; if it is above the threshold, we say there is motion.

However, this method performs poorly when the background of a scene contains regular motion or noise, such as moving vegetation, turbidity or light variations. To mitigate these issues, we explored methods that are robust to background movement, particularly the Mixture of Gaussian (MOG) algorithm. MOG assumes that every pixel in the video can be modelled by a mixture of Gaussian distributions. Each pixel possesses a continually updated distribution describing its “normal” range of values, i.e. values that are considered to be consistent with background. If a value does not fit the distribution well it’s considered to correspond to foreground. On top of this, we can perform erosion and dilation on the images to eliminate false positives caused by noise, and accentuate/fill areas of motion. Finally, the resulting pixel area of motion is compared to a threshold to classify. Occasionally there are still false positives, but we think we can develop methods to eliminate this. For example, many of these are very short lived, but when actual motion is occurring it can last for several seconds; perhaps we can also measure time and adopt that as a determinant parameter? The discussed methods are quite demanding on the Raspberry Pi hardware that we are using. We can reduce load on the systems by firstly reducing resolution of images before processing, and also perform motion checks on every nth frame. We are continually exploring avenues that will optimise this process.

Biomass estimation of marine species is highly valuable to marine scientists, fisheries and both environmental and species conservation agencies. Changes in abundance and relative distribution of species in different areas can provide important insights into the effects of natural and/or human induced changes to marine populations. This knowledge can then be transferred into appropriate management strategies such as species conservation programs or fishing quotas designed to regenerate populations. 

Follow us on Instagram, Facebook or Twitter for more photos, videos and stories.