/* ---- Google Analytics Code Below */

Tuesday, May 30, 2023

Want to Keep AI From Sharing Secrets? Train It Yourself

Have come up with companies thinking this.

Want to Keep AI From Sharing Secrets? Train It Yourself MosaicML delivers a secure platform for hosted AI  MATTHEW S. SMITH

On 11 March 2023, Samsung’s Device Solutions division permitted employee use of ChatGPT. Problems ensued. A report in The Economist Korea, published less than three weeks later, identified three cases of “data leakage.” Two engineers used ChatGPT to troubleshoot confidential code, and an executive used it for a transcript of a meeting. Samsung changed course, banning employee use, not of just ChatGPT but of all external generative AI.

Samsung’s situation illustrates a problem facing anyone who uses third-party generative AI tools based on a large language model (LLM). The most powerful AI tools can ingest large chunks of text and quickly produce useful results, but this feature can easily lead to data leaks.

“That might be fine for personal use, but what about corporate use? […] You can’t just send all of your data to OpenAI, to their servers,” says Taleb Alashkar, chief technology officer of the computer vision company AlgoFace and MIT Research Affiliate.

Naïve AI users hand over private data

Generative AI’s data privacy issues boil down to two key concerns.

AI is bound by the same privacy regulations as other technology. Italy’s temporary ban of ChatGPT occurred after a security incident in March 2023 that let users see the chat histories of other users. This problem could affect any technology that stores user data. Italy lifted its ban after OpenAI added features to give users more control over how their data is stored and used.

But AI faces other unique challenges. Generative AI models aren’t designed to reproduce training data and are generally incapable of doing so in any specific instance, but it’s not impossible. A paper titled “Extracting Training Data from Diffusion Models,” published in January 2023, describes how Stable Diffusion can generate images similar to images in the training data. The Doe vs. GitHub lawsuit includes examples of code generated by Github Copilot, a tool powered by an LLM from OpenAI, that match code found in training data.

A photograph of a woman named Ann Graham Lotz next to an AI-generated image of Ann Graham Lotz created with Stable Diffusion. The comparison shows that the AI generator image is significantly similar to the original image, which was included in the AI model's training data.Researchers discovered that Stable Diffusion can sometimes produce images similar to its training data. EXTRACTING TRAINING DATA FROM DIFFUSION MODELS

This leads to fears that generative AI controlled by a third party could unintentionally leak sensitive data, either in part or in whole. Some generative AI tools, including ChatGPT, worsen this fear by including user data in their training set. Organizations concerned about data privacy are left with little choice but to bar its use.

“Think about an insurance company, or big banks, or [Department of Defense], or Mayo Clinic,” says Alashkar, adding that “every CIO, CTO, security principal, or manager in a company is busy looking over those policies and best practices. I think most responsible companies are very busy now trying to find the right thing.”

Efficiency holds the answer to private AI

AI’s data privacy woes have an obvious solution. An organization could train using its own data (or data it has sourced through means that meet data-privacy regulations) and deploy the model on hardware it owns and controls. But the obvious solution comes with an obvious problem: It’s inefficient. The process of training and deploying a generative AI model is expensive and difficult to manage for all but the most experienced and well-funded organizations.  ... ' 

No comments: