
Microsoft’s Copilot AI assistant has inadvertently exposed content from over 20,000 private GitHub repositories associated with major companies such as Google, Intel, Huawei, PayPal, IBM, Tencent, and even Microsoft itself.
These repositories initially started as public but were switched to private status after developers discovered they contained sensitive information, such as authentication credentials that could lead to unauthorized access. Despite these changes, the complete contents of these private repositories remain accessible through Copilot even months later.
The alarming discovery was made by the AI security firm Lasso in the latter half of 2024. In January, Lasso identified that Copilot continued storing and providing access to private repositories, prompting them to investigate the scale of this issue.
Zombie Repositories Exposed
“We were astonished to find out that any data on GitHub, even if briefly public, could be indexed and potentially revealed by tools like Copilot,” noted Lasso researchers Ophir Dror and Bar Lanyado in a post from Thursday. “Eager to explore the scope of the problem, we automated the identification process for what we termed ‘zombie repositories’—those that were once public but are now private—and validated our findings.”
In a surprising turn of events, the Lasso team found that one of their own private repositories was accessible through Microsoft’s Copilot. The issue stemmed from Bing’s caching mechanism, which indexed repository pages when they were public and failed to remove these entries after the repositories were set to private. Since Copilot relies on Bing for its search capabilities, this meant that private data became retrievable through the AI assistant.
Following Lasso’s disclosure of the problem in November, Microsoft implemented changes aimed at resolving the issue. While Lasso confirmed that private data was removed from Bing’s cache, they made a noteworthy finding: a GitHub repository that had been taken private after Microsoft initiated a lawsuit. This lawsuit claimed that the repository contained tools meant to circumvent protective measures within Microsoft’s generative AI services. Although the repository was deleted from GitHub, its content remained accessible through Copilot, highlighting ongoing concerns surrounding data privacy and AI tools.
