Service Stream is checking field work with computer vision

Service Stream is increasingly using computer vision to verify that field work is completed to specification and safely.

Service Stream’s head of data and AI Simon Fisher.

The engineering contractor’s head of data and AI Simon Fisher told the AWS Summit Sydney that field work verification is critical both to ensuring it gets paid for work, and that subcontractors get paid.

Traditionally, skilled analysts verify the work by analysing before-and-after images captured at the site, and comparing it to maps, diagrams, written descriptions of the work tasks and other formal documents.

Verification varies in complexity; while it can be as simple as a single ‘before’ and ‘after’ image, for more complex jobs there can be hundreds of images captured per completed work order.

The volume of images, and the opportunity cost of having analysts manually parse them, led Service Stream to investigate whether computer vision and AI could be of assistance.

“We’ve got over 1 million images that we need to look at every month, so the volume is huge and it’s not really feasible to manually do all of this,” Fisher said.

“We also have teams of skilled analysts looking at these images and as our business grows, there are always more higher value opportunities that they could be employed at – for example, managing subcontractor relationships, improving processes, and working on projects like we’re talking about today.

“And there’s critical risk for our business here in the time it takes to do these approvals – for a number of reasons, but largely it can push out the time we pay our subcontractors and we get paid ourselves.”

Getting to 95 percent accuracy

Fisher said that computer vision is well-suited to field work verification because “we can essentially look for very common objects like pipes, conduits, ladders and concrete, describe them in words and [have] models look across all these images and reason about where they are and how they’ve got there.”

While the company has gotten to “94-95 percent accuracy on average” and a “50 percent time saving for what it would take a human to review” and verify that work was completed, it took some time to get right.

“Our first attempt at this was basically supplying all the images at once in a large dump and getting the model to reason across them with the expected work requirements – and it didn’t really go that well,” Fisher said.

The early iteration produced too many false positives, with the model validating work despite a lack of evidence, or choosing submitted images “that didn’t make sense”.

It also only marginally increased the speed at which verifications could be completed, while the costs were “starting to make the business second-guess the value proposition.”

“A lot of the issues … stemmed from the fact that we’re providing the model with simply too much information,” Fisher said.

“It was getting hundreds of images potentially, along with a huge set of instructions and criteria it has to follow.

“It was potentially too much work for the model. It was burning token costs and producing all these challenges we had.”

The company was able to solve this in part by pre-processing the images, including tiling them into one big image for the model to parse, which improved performance, and improving de-duplication, excluding, for example, images of the same asset repair, just taken from different angles.

It also matched the verification process undertaken by the model more closely with how human analysts approached the same task.

“When we first started out, [the models] used our written description of the work tasks and the formal documents we kept. They’re kind of like legal documents, describing exactly what has to take place for the work to be completed, with huge numbers of criteria and caveats, and it’s all really well spelled out,” Fisher said.

“But we realised that the teams and analysts actually doing this work and looking at images had deviated quite a bit from those documents, and they had all this built up intuition about how work should be validated.

“They could just scan a number of images and [conclude that a] subbie had done certain steps, so we don’t need to apply this criteria in this particular context.

“So what we had to do is go back and work deeply with these teams and really understand how they did their job, and how the process flow worked, and then we could tailor a really custom set of instructions for the models that actually matched the reality of this work validation task.”

This both produced the results that Service Stream wanted, and has given the company confidence to apply the technology to additional field work verifications across the various vertical industries it serves.

Broader applications

“At the moment we’re rolling it out further across different business units: utilities and transport, Defence eventually, and different projects as well,” Fisher said.

Additional enhancements will make the verification process more real-time and agentic.

The company did not describe the architecture in detail but said it utilises Amazon Bedrock services and a “host of other” services and systems.

“We want to get to the point with more and more contracts that we can do this in real-time, with as much processing on the edge, on the mobile device as possible, and using cloud processing for the more advanced analyses,” Fisher said.

“We’re [also] actively making this more agentic by having the model automatically know what type of work [a subcontractor is] doing, having it prompt [them] if there’s something you’ve missed before going a step far, and automatically essentially completing the [verification] work on the spot and acting as sort of a personal assistant to the worker in the field.”

Fisher said that computer vision could be useful not only to verify the work is complete, but also that the work is carried out safely.

“One way of improving safety quality is to reduce the friction in the site inspections and safety checks that our workers are mandated to do,” he said.

“We’re trying to do that is using video analysis, so the workers can just take quick videos throughout the job showing what they’re doing, and have automated checks with computer vision models to check for all the safety controls that are required for that particular stage of the job, and pre-filling in the site report so they can reduce the burden on them.”

The safety and personal assistant applications required additional work on usability, however.

“One of the big challenges here is making it usable,” Fisher said.

“It’s a really difficult usability challenge because the workers are wearing gloves, so we had to carefully design it so that it doesn’t actually impede their work or slow down their workflows.

“We have to make it good and smooth enough that the workers actually want to use it.”

Ry Crozier travelled to AWS Summit Sydney as a guest of AWS.