Could ChatGPT and Other AI Applications Expose Your Proprietary Information and Compromise Your Security Program?


What you don't know can have devastating consequences for your business.
Cybersecurity at Calian

Machine learning and artificial intelligence (AI) offer immense potential for streamlining processes and automating repetitive tasks. However, there are some important considerations to keep in mind when implementing these technologies. The risks and challenges associated with machine learning and AI can be as substantial as the efficiencies gained through their use.

Data Retention and Ownership
One of the fundamental aspects of machine learning is training algorithms to distinguish between good and bad information. This requires a large sample size of data. As machine learning applications strive to improve the accuracy of their algorithms, they assimilate vast amounts of information to use in their pool of training data, which could include sensitive corporate information. While these organizations claim data anonymization, we should be concerned about data ownership and the potential loss of control over valuable intellectual property.

There's no way for you to tell whether your information is being assimilated and used by an application, unless they are above board and tell you in your master sales agreement or in their terms and conditions, or if you ask the specific question and get the answer. So, when it comes to doing a threat risk assessment for new applications that an organization wants to implement to make employees' lives easier and more productive, there is a lot of pressure on the security team to do their due diligence around how these tools are used.

Lack of Transparency
Unless explicitly stated in contracts or terms and conditions, it can be challenging to determine whether proprietary data is being assimilated and used by a machine learning tool. Recent incidents, such as the Samsung breach, demonstrate the potential consequences of unintentionally sharing sensitive data with public platforms.

Engineers and software developers for Samsung put their company's proprietary code into the ChatGPT engine to streamline it and to solve a problem. What they didn't realize at the time was that the public open-source version of ChatGPT was capturing and assimilating that code into the large pool of training data and their intellectual property became part of the public domain. This resulted in Samsung issuing a proprietary data breach notice to the public and blocking the use of ChatGPT for all Samsung employees.

Do Your Due Diligence
At Calian we have recently considered three different SaaS tools that leverage machine learning, two of which we have rejected because of their terms and conditions. Sometimes you need to consider questions that you wouldn't normally think to ask. Even though a tool might be just a database of documents that you put your contracts in to help you track and maintain compliance with the terms and conditions of your contracts, you may not be aware of what that opens you up to. In the terms and conditions for the sale, we noticed that a company, in effect, declared all rights and ownership for all information and documentation that we would put into the tool. We would be handing over to a third party all the data in our corporate contracts.

Even something as benign as Grammarly has risks. You put text into the tool and it does a grammar check for you. But it appears that all the information you put into that tool may be absorbed. Maybe it's not sensitive information—a letter to a colleague or a simple blog post. But what if you are writing about a contract or a potential acquisition? In that case it's the same tool, but the sensitivity level of the information is off the charts and you're potentially putting that into the public domain. So, data retention and data ownership are vital things that you need to pay attention to because, like the internet, once it's public you'll never be able to retrieve it.

Unintended Consequences and Automation
Integrating machine learning tools into organizational environments introduces challenges related to access control and data management as well. Do these tools adhere to access control policies and restrict searches to authorized information? If you ask it to find information on a certain topic, is it going to restrict the search to just the information that you would normally have access to, or will it find examples of data in your files or your history, or even in your CEO's history? These concerns about the potential for an application to access data from sources without proper authorization further complicate the security issues.

In addition, while these tools can enhance productivity, there is a risk of them granting permissions or creating administrative accounts without human intervention or approval. The ability of these tools to autonomously perform tasks demands careful consideration of potential risks and the need for robust control mechanisms.

Machine learning and AI have transformative potential, but they also bring new risks and challenges. Organizations must carefully navigate issues related to data retention and ownership, transparency and privacy, access control and unintended consequences. By addressing these concerns and implementing safeguards, businesses can harness the power of these technologies while ensuring the security and protection of their sensitive information. But it is crucial for security teams to be proactive and informed as the landscape of machine learning and AI continues to evolve.

Want to talk AI and Cyber? Join us at SecTor 2023 Oct 23rd-26th. Request a meeting with Calian at SecTor at

Sustaining Partners