Gov Copilot trial unsettled by usage metrics and unmet expectations – Cloud

Two-thirds of participants in a six-month trial of Microsoft 365 Copilot across the federal government used the tool “a few times a week” or less, with high expectations largely going “unmet”.

An evaluation report was published by the Digital Transformation Agency (DTA) late Wednesday, providing a detailed view of how some 5765 Copilot licences were used in the first six months of 2024.

The report has utility inside and outside of government. Despite the hype around generative AI tools, there is very little public domain knowledge about actual value the tools generate in real-world settings.

However, there are also some key limitations in the evaluation, notably that it’s based entirely on self-assessments by users, and that executives are over-represented in the trial.

Still, there will be interest in how federal government entities fared in their adoption of the tool.

Overall use of Copilot over the six-month period is rated “moderate”, with “only a third of trial participants [using it] daily”, and “with its use concentrated in summarising meetings and information and re-writing content.”

By contrast, some 46 percent of users found value a “few times a week” and 21 percent “a few times a month”.

The evaluation attributes this result to “user capability, perceived benefit of the tool and convenience, and user interface.”

There was a correlation between training and usage – the more training undertaken, the more the tool was used. Results were better where training was customised to an Australian Public Service (APS) context.

But some staff “couldn’t find time for training among other work commitments and time pressures.”

Others “had a poor first experience with the tool or [found] that it took more time to verify and edit outputs than it would take to create” the summary or transcript themselves.

The user interface was also a key issue, notably at the CSIRO, where Copilot’s presence wasn’t immediately apparent, and users simply forgot it existed.

“Focus groups with trial participants remarked that they often forgot Copilot was embedded into Microsoft 365 applications as it was not obviously apparent in the user interface,” the evaluation report states.

“Consequently, they neglected to use features, including forgetting to record meetings for transcription and summarisation.

“CSIRO identified through internal research with [its] trial participants that the user interface at times made it difficult to find features.

“Given one of the arguable advantages of Copilot is its current integration with existing Microsoft workflows, its reported lack of visibility amongst users largely diminishes its greatest value-add.”

The experience was also variable across the Microsoft suite.

Those hoping Copilot would make Excel analysis easier were underwhelmed.

Likewise, those wanting to make use of Outlook integration were let down by the entity not being on a version of Outlook that was new enough to make use of Copilot.

That isn’t Microsoft’s fault, but the need for system upgrades and changes does weigh on the business case for ongoing use.

Great expectations

The evaluation report also examines the extent to which users’ expectations were met, and that the experience lived up to the marketing and hype.

Users went in with high expectations, it is noted, but “there has been a reduction in positive sentiment across all activities that trial participants had expected Copilot to assist with.”

“Although the sentiment remains positive … their initial expectations of Copilot were unmet,” the report states.

“Features of Copilot (and generative AI more broadly) were marketed as being able to significantly save participants’ time, thereby heightening participants’ expectations.

“These expectations appear to have been tempered following Copilot’s use.

“There was a 32 percent decrease in the positive belief that Copilot allowed participants to ‘spend less time in emails’ and a 54 percent reduction in the belief that it would allow them to ‘attend fewer meetings.’”

The report makes no specific recommendations on embedding Copilot in daily government operations.

It does caution federal government entities to consider generative AI products carefully, and to give users better clarity on thorny issues that may arise from ongoing use of the technology.

For example, there were concerns that transcribing every meeting with AI could make detailed minutes of meetings fair game for freedom of information requests, which, in turn, may mean participants are less likely to engage in those forums.