GitHub Copilot update stops AI model from revealing secrets

GitHub Copilot update stops AI model from revealing secrets

GitHub has updated the AI model of Copilot, a programming assistant that generates real-time source code and function recommendations in Visual Studio, and says it’s now safer and more powerful.

The company says the new AI model, which will be rolled out to users this week, offers better quality suggestions in a shorter time, further improving the efficiency of software developers using it by increasing the acceptance rate.

CoPilot will introduce a new paradigm called “Fill-In-the-Middle,” which uses a library of known code suffixes and leaves a gap for the AI tool to fill, achieving better relevance and coherence with the rest of the project’s code.

Additionally, GitHub has updated the client of CoPilot to reduce unwanted suggestions by 4.5% for improved overall code acceptance rates.

“When we first launched GitHub Copilot for Individuals in June 2022, more than 27% of developers’ code files on average were generated by GitHub Copilot,” Senior Director of Product Management Shuyin Zhao said.

“Today, GitHub Copilot is behind an average of 46% of a developers’ code across all programming languages—and in Java, that number jumps to 61%.”

CoPilot's accepted suggestions rate over time
CoPilot’s accepted suggestions rate over time (GitHub)

More secure suggestions

One of the highlight improvements in this CoPilot update is the introduction of a new security vulnerability filtering system that will help identify and block insecure suggestions such as hardcoded credentials, path injections, and SQL injections.

“The new system leverages LLMs (large language models) to approximate the behavior of static analysis tools—and since GitHub Copilot runs advanced AI models on powerful compute resources, it’s incredibly fast and can even detect vulnerable patterns in incomplete fragments of code,” Zhao said.

“This means insecure coding patterns are quickly blocked and replaced by alternative suggestions.”

The software company says CoPilot may generate secrets like keys, credentials, and passwords seen in the training data on novel strings. However, these aren’t usable as they’re entirely fictitious and will be blocked by the new filtering system.

Example of the real-time blocking system
Example of the real-time blocking system (GitHub)

The appearance of these secrets in CoPilot’s code suggestions has caused fierce criticism from the software developing community, with many accusing Microsoft of using large sets of publicly available data to train its AI models with little regard to security, even including sets that mistakenly contain secrets.

By blocking unsafe suggestions in the editor in real-time, GitHub might also provide some resistance against poisoned dataset attacks aiming to covertly train AI assistants to make suggestions containing malicious payloads.

At this time, CoPilot’s LLMs are still being trained to distinguish between vulnerable and non-vulnerable code patterns, so the AI model’s performance on that front is expected to improve gradually in the near future.



Source link