Businesses are always looking for ways to gain efficiency and productivity, and to many, diving into using AI-based large language models (LLMs) like ChatGPT to generate content, chat with customers, and even build software appears promising. But, to the contrary, many large enterprises have found that they are forced to pull employees back from using these technologies. The question is whether other companies in their industries will follow suit.
Why ban AI? The reason: generative AI services use data inputs for further training, in many cases revealing the data to external parties later on. For organizations that own or process sensitive data, maintain proprietary intellectual property, work in highly regulated industries, or produce closed-source software — the results of this type of data leakage could be disastrous.
But these tools offer huge benefits as well. So how can CISOs determine if they should allow or ban ChatGPT and the like? Especially considering that such a ban may stifle employee productivity, is difficult to enforce, and is ripe for subversion?
Because employees may perceive ChatGPT and other LLMs as making their jobs easier and processes more efficient, they may use it in ways, unbeknownst to them, that result in data leakage.
Like all AI models, ChatGPT is designed to produce better results as it is fed more data. The unintended data leakage that may result is not necessarily a flaw, as these tools were not designed to be secure data vaults. Similarly to how posting confidential information on social media platforms like LinkedIn or Instagram would be insecure — these apps were not built to protect private data.
One study found employees pasting regulated confidential information or intellectual property into these tools. In another case, Samsung engineers accidentally leaked confidential data after uploading it to ChatGPT, resulting in Samsung restricting the use of ChatGPT by employees.
As with any software, LLMs often contain bugs, some of which can result in data leaks. In March 2023, OpenAI revealed that a bug had caused parts of users' conversations with ChatGPT to be shown to other users.
Finally, there are compliance and regulatory concerns associated with these tools. There are no guarantees around how the data is handled, and sharing data could put a firm out of compliance with data security regulations. Just as with any external application, leaks or a lack of visibility of how the data is processed can result in a violation of the GDPR or other regulatory frameworks. Passing data to an LLM also breaks the data audit trail required for compliance.
Given the risks, several large companies have moved to stop their employees from using LLMs altogether.
Amazon implemented a ban of ChatGPT after it discovered ChatGPT responses that seemed to resemble internal Amazon data. Apple implemented its own ban of internal use of ChatGPT and Copilot, an automated coding tool from GitHub, due to concerns that the tools could leak sensitive information.
The financial industry has been particularly proactive about stopping LLM usage. JPMorgan Chase put severe restrictions on internal use of ChatGPT, concerned about the potential leaking of protected financial information, which could lead to violations of national and industry data regulations. Large financial providers like Bank of America, Citigroup, and Deutsche Bank followed their example.
Finally, Samsung has also banned ChatGPT for extended periods of time, as mentioned above. They removed and reintroduced their ban several times.
Based on these and other examples, companies that ban or restrict employee generative AI usage seem to do so for these main reasons:
Direct leaking of internal data
Concerns on how LLMs store, process, and leverage data inputs to improve their algorithm and responses, which can mimic private internal data and lead to accidental distribution of competitive intel
Concerns around the lack of logging of how LLMs process regulated data
Even if an organization does decide to ban or restrict the use of LLMs, they may find that enforcement is near-impossible.
Setting a security standard does not mean internal users will follow that standard, or even be aware of the rule. Organizations already find it difficult to block the use of unsecured personal devices for remote work, or to stop the use of unauthorized SaaS apps thanks to cloud computing. While the use of non-approved apps is called "shadow IT," one might call the potential situation under an LLM ban "shadow AI."
Security can ban certain apps by blocking the IP addresses or URLs of these tools, but of course these restrictions are not perfectly effective. Personal devices may not have the right security clients installed; company equipment can be used on non-company networks. Determined users might even use a VPN to circumvent firewall rules and access banned tools.
One thing that can be said with certainty about ChatGPT and similar services is that these tools are immensely popular. A ban may help curtail usage and associated data leakage. But CISOs may want to assume their employees are using it, whether on a corporate or a personal device. To that end, they should strongly consider applying a data loss prevention (DLP) solution.
DLP solutions use a variety of tactics to detect sensitive data and keep it from leaving a protected environment. These methods include pattern matching, keyword matching, file hash matching, and data fingerprinting, but most relevant for preventing AI tool data leakage are the ability to restrict copying and pasting, uploads, and keyboard inputs.
DLP solutions (when paired with browser isolation) should be able to prevent copying and pasting by employees, stopping them from entering sensitive data into any web application, including LLMs. DLP may also be able to block data uploads, stop certain keyboard inputs, and detect confidential data in outgoing HTTP requests.
Organizations may or may not want to ban generative AI usage. Those who do, may not be able to completely stop its use. But for organizations in either situation, DLP offers an alternative to both unfettered AI use and AI bans.
DLP is of course no guarantee that data will not be uploaded. Overall, CISOs will have to weigh the pros and cons of allowing the use of ChatGPT and other LLMs, and their conclusions will differ by industry. In heavily regulated industries like banking, uploading content to LLMs may be a nonstarter. In other industries, CISOs may evaluate AI use on a case-by-case — or simply allow it freely.
But every business has sensitive data to protect, and DLP can help keep that data out of LLM databases. Because of the importance of data protection in today's environment, Cloudflare offers DLP to help reduce the risk of data and code exposure even with the increasing use of generative AI tools in the workplace.
This article is part of a series on the latest trends and topics impacting today’s technology decision-makers.
Get the Simplifying the way we protect SaaS applications whitepaper, to see how Cloudflare helps organizations protect their applications and data with a Zero Trust approach.
After reading this article you will be able to understand:
Why large language models (LLMs) put data at risk
Several global companies have banned internal use of ChatGPT and other generative AI tools
How data loss prevention (DLP) can safely enable AI use