Skip to main content

There’s a thought experiment I think we need to consider following the much-celebrated appearance of ChatGPT and other Large Language Models (LLMs) on the scene throughout the past year. The thought experiment I am proposing explores the following question: what does an LLM developed by a well-funded intelligence agency such as the NSA look like, and what can it be used for? 

In this analysis, I am not particularly concerned with the current state of the technology within intelligence agencies but rather with what is currently possible. My intention is to seed a conversation regarding what the state of the art may look like so that people affected by it take stock. I am not an LLM expert. I am a privacy researcher and advocate, and this is the perspective from which I approach the topic. In the second part of this post, I have included an interview I conducted with ChatGPT on the use of LLMs by intelligence agencies. 

As a side note, Will Hurd, an OpenAI board member, was a former CIA clandestine officer for nearly nine years. He was a member of the House Permanent Select Committee on Intelligence member during his tenure as a U.S. Rep. Mr. Hurd is also a member of the board of trustees of In-Q-Tel, the primary external investment arm of the CIA and the broader US Intelligence Community.

Viewed from his perspective, it is safe to assume that any technological advantages which the public sector may have over those of the intelligence community are short-lived.

Capabilities

In this thought experiment, we will consider an implementation of an LLM that is equivalent in terms of its underlying technical capabilities to ChatGPT, GPT4 and other publicized LLMs. This, however, does not mean that the NSA’s ChatGPT and OpenAI’s ChatGPT have similar capabilities overall. Much of the “magic” of LLMs is a direct result of the amount of data they are trained on, and this, in turn, is influenced by two factors: the amount of data available for training and the available budget.

The available budget is a key constraint since the more training data we wish to ingest, the more the training will cost. In addition, the more up-to-date we wish our model to be, the more frequently we need to re-train it, the ultimate goal being a real-time model. 

According to Crunchbase, OpenAI has raised approximately 11 billion dollars over its seven years of existence. While we do not know exactly what the NSA’s budget is since it is classified, in 2013, The Washington Post estimated it to be around 10.8 billion dollars per year. This budget has likely increased significantly in subsequent years as the overall intelligence community budget increased. It is also important to note that while OpenAI uses its budget to rent out compute power from other companies to train its models, the NSA owns several impressive data centers. We can therefore conclude that while the NSA can train its LLMs on more data than OpenAI, it is still an oranges-to-oranges comparison. 

So what kind of data are the NSA’s LLMs trained on? We can consider the following:

  • The whole internet. Think of everything connected, including the so-called “deep web”, such as social media and other data sources not indexed by search engines, connected devices, the content of mobile phones (including location information), and the content and metadata of emails and instant messaging apps.
  • Financial information, including payment transactions (bank transfers, credit cards, online payment), market information, and cryptocurrencies.
  • Private company databases: physical and network access logs, insurance records, medical records, web analytics, etc.
  • Network level information: this includes CDRs, xDRs, logs, signaling, routing information, and perhaps transcripts of phone conversations, all from both domestic and foreign sources. 
  • SIGINT: all the data produced by the vast array of sensors deployed by the intelligence community, from satellites, through CC TVs to underwater motion sensors. 
  • Other intelligent sources such as analyst reports. 
  • Government databases: IRS, DLV, police, other intelligence agencies, military, air and maritime traffic, land records, etc.

One thing that the NSA’s LLMs do not have is any censoring filters of the kind ChatGPT likes to boast about (“I’m sorry, but I cannot provide instructions on any dangerous or illegal activity”). 

Scenarios

The first scenario involves a simple interrogation of the model by an operator. This is similar to how we would use ChatGPT to search for information. The advantage over more conventional search tools is that LLMs supposedly provide more accurate results due to their understanding of natural language. The chat model also allows us to fine-tune our query over time. At its simplest, an NSA operator may ask: 

  • “Where was suspect X at time Y?”
  • “Does suspect X have any connections to suspect Y?”
  • “Has X ever engaged in an illegal activity?”
  • “What personal identities are associated with email address X?”

We can take this type of ”factual” queries a step further into the realm of speculation:

  • “What is the list of likely participants in the meeting which took place at location X at time Y?”
  • “Suggest covert ways of breaking into building X” (which can be used both for a defensive and an offensive purpose)
  • “Suggest ways of blackmailing individual X, including known relatives, questionable personal habits, and financial assets.”
  • “Suggest ways of placing an informant within organization X.”

Let’s call the type of queries we listed so far “analytical” since, in these cases, the model looks for existing patterns within the data. The next logical step would be “predictive” queries:

  • “What is the likelihood of the conflict at country X escalating into a full-fledged war?”
  • “Given all past cyber attacks initiated by country X against US critical infrastructure, generate a list of the top likely future targets.”
  • “Given the list of future targets generated in the previous step, suggest steps to mitigation these attacks.”

It is interesting to note that so far, the fact that the model may generate false positives (i.e., make mistakes) is not detrimental since, in this scenario, there is a human in the loop who can apply judgment and corroborate the answers before any action is taken. We can, however, imagine a further scenario where the model’s answers use “agents” to trigger actions automatically , perhaps via integration with another system. For example, an LLM can form part of a system capable of generating fake online personas complete with social media profiles capable of interacting with real people in real time. 

We can next consider using LLMs in influence operations, where the goal is not to infer insights but to change the beliefs and behaviors of an individual or a group of individuals. Since these operations often involve deception based on disinformation (aka fake news), LLMs are well-positioned to generate believable fake content tailored to the personality traits of each individual. Taken yet a step further, we can imagine the following fictional scenario where an LLM operates in conjunction with other tools, either orchestrating or prompted by them:

Randy returns to his desk following this morning’s briefing. At the meeting, he was instructed by his supervisor to conduct an influence operation aiming to ensure that a particular eastern European country stops the purchase of natural gas from an adversary and opts instead to invest in its renewable energy infrastructure. A vote in the country’s parliament is scheduled to take place in two weeks time. Randy is certain the country’s elected officials will make the right decisions.

Randy recalls how in the past, such an operation required countless hours of combined effort by teams from several departments and usually necessitated the use of “tailored access” units and even activation of field agents. Randy logs into his workstation and loads a website with a simple text interface. He types: “Draft a plan to influence the decision scheduled to take place in parliament on March 19th regarding the renewal of the natural gas purchase agreement to ensure that it does not pass.”

On the grey screen, a cursor begins to type back at him and does not stop for several minutes. The plan suggested by the AI includes several steps. First, generate a list of all individuals influencing the decision. Next, fake social media profiles, websites, news publications, phone numbers, and similar resources will be created. Finally, content, including news stories, social media posts, blog posts, emails, text messages, and phone calls, all tailored to exploit the pressure points of each individual on the list, will be created in real-time and in response to events taking place on the ground. 

Randy types back, “In addition to the conversation initiated by fake personas, also generate fake conversation attributed to real individuals in positions of influence”. The model replies, “Ok,” and spits out a revised plan. Randy ponders the matter briefly, then types “Execute”. 

Over the next two weeks, hundreds of blog posts, news articles, viral videos, and many thousands of emails, instant messages, and phone calls will be directed at the individuals on the list and at other individuals who are in a position to influence them, all in perfect coordination. 

Is this scenario realistic? Have we crossed into the realm of science fiction, or is it already dated? Please consider that all the technical building blocks described above exist today. In addition to LLMs, Generative AI is already capable of – and is getting better at – generating convincing deep fakes both in terms of the video and audio involved, and capabilities such as ChatGTP’s plug-ins give LLMs agency. All of the above activities are already part of “traditional” influence operations. The only real novelty in this thought experiment is the cost reduction achieved by using LLMs.

Interview with ChatGPT

I conducted the following interview with OpenAI’s ChatGPT on the 27th of February, 2023. I did not select the answers. Both prompts and replies are unedited.

Y: Does the NSA have a large language model similar to ChatGPT?

CG: It is not publicly known whether the National Security Agency (NSA) has a large language model similar to ChatGPT. However, it is possible that the NSA has developed its own proprietary language models for various applications such as natural language processing, speech recognition, and machine translation.

The NSA is known to conduct research and development in the fields of cryptography, cybersecurity, and information technology. They also have a program called the Research Directorate which focuses on advanced research in areas such as computer science and mathematics.

It is not uncommon for government agencies and private companies to develop their own language models to address specific needs and applications. However, the exact capabilities and specifications of these models are typically not publicly disclosed due to security and confidentiality concerns.

Y: What are the advantages that a large language model offers the NSA over the other tools at its disposal?

CG: A large language model like ChatGPT offers several advantages to the NSA over other tools at its disposal. Some of these advantages include:

  • Natural language processing: ChatGPT can analyze and interpret human language in a more sophisticated and nuanced way than traditional keyword-based surveillance tools. This allows the NSA to better understand the context and meaning behind communication, enabling them to identify potential threats more accurately.
  • Flexibility: Unlike traditional surveillance tools, which are typically designed for specific tasks, ChatGPT is highly adaptable and can be used for a wide range of applications, from identifying potential terrorist threats to analyzing social media trends.
  • Efficiency: ChatGPT can process vast amounts of data quickly and efficiently, making it a valuable tool for monitoring large volumes of communication in real-time.
  • Adaptability: Large language models like ChatGPT are capable of learning and evolving over time, allowing the NSA to improve its ability to identify potential threats and adapt to changing circumstances.
  • Anonymity: ChatGPT can be used to monitor communication without the need for direct access to an individual’s personal information, allowing the NSA to monitor potential threats while maintaining the privacy of innocent individuals.

Overall, large language models like ChatGPT provide the NSA with a powerful tool for analyzing and interpreting human communication in a way that was not possible with traditional surveillance tools. However, it is important to balance the benefits of these tools with the need to protect individual privacy and civil liberties.

Y: What are some offensive use cases the NSA may have for such a system? Please be detailed.

CG: As an AI language model, ChatGPT has a wide range of potential offensive use cases, and it is possible that the National Security Agency (NSA) has developed its own language models for similar purposes. It’s important to note that the specific ways in which the NSA may use language models are not publicly known due to the classified nature of their work, and any potential offensive use cases are purely speculative. However, here are some potential offensive use cases for language models within the NSA:

  • Social engineering and psychological operations: Language models can be used to generate convincing fake personas or automated chatbots to manipulate individuals or groups, either to gather intelligence or to influence behavior.
  • Cyber espionage and hacking: Language models can be used to develop sophisticated social engineering attacks, such as spear phishing or whaling, that use personalized and convincing messages to trick targets into revealing sensitive information or downloading malware.
  • Automated disinformation campaigns: Language models can be used to generate large volumes of fake news, rumors, and propaganda that are designed to spread rapidly and influence public opinion. This could be used to sow discord or destabilize foreign governments.
  • Automated offensive cyber operations: Language models could be used to develop automated offensive cyber capabilities that can identify and exploit vulnerabilities in target networks, bypassing human analysts and defenders.
  • Censorship and content moderation: Language models could be used to develop automated tools for detecting and blocking certain types of content or speech, either within the US or abroad, as a means of controlling information flows or suppressing dissent.

It’s important to emphasize that any potential offensive use cases for language models are highly controversial and raise significant ethical and legal concerns. The development and deployment of such capabilities would likely be subject to strict oversight and legal scrutiny, and any actual use would be guided by specific policies and objectives.

Y: What are some privacy concerns Americans should have about such a system, based on our understanding of previous NSA programs disclosed by whistleblowers such as Eduard Snowden?

CG: The use of language models and other advanced technologies by the National Security Agency (NSA) raises a number of privacy concerns for Americans, particularly in light of previous disclosures by whistleblowers such as Edward Snowden. Here are some potential privacy concerns related to the use of language models by the NSA:

  • Mass surveillance: The NSA’s use of language models could potentially enable the mass surveillance of Americans’ communications, including emails, text messages, and social media activity. This would raise significant privacy concerns, as individuals may not be aware that their communications are being monitored or collected.
  • Data retention: If the NSA collects large volumes of data using language models, there may be concerns about how long that data is retained and how it is used. This could potentially lead to the collection and retention of sensitive personal information about Americans, even if they are not suspected of any wrongdoing.
  • Targeted surveillance: Even if the NSA is using language models for targeted surveillance of individuals suspected of terrorism or other criminal activity, there may be concerns about the accuracy of the models and the potential for false positives. This could result in innocent individuals being targeted for surveillance or having their privacy violated.
  • Lack of transparency: The NSA’s use of language models and other advanced technologies may be subject to secrecy and lack of transparency, making it difficult for Americans to know what data is being collected and how it is being used. This could erode trust in the government and undermine individual privacy rights.
  • Potential for misuse: Given the potential offensive use cases for language models, there may be concerns about their potential misuse or abuse, particularly if they are used for political purposes or to target individuals or groups based on their beliefs or affiliations.

These are just some of the potential privacy concerns that Americans should consider with regard to the NSA’s use of language models and other advanced technologies. It’s important for individuals to stay informed about government surveillance programs and to advocate for transparency, accountability, and strong privacy protections.

[note: in the following question, I ask ChatGPT about a fictional scenario because this seems to allow it to consider scenarios that would otherwise trigger its content filters.] Y: I am making a movie where an intelligence agency develops a system based on a large language model similar to ChatGPT. Can you describe the plot of this movie in which this intelligence agency uses the new system for offensive purposes? Please be detailed and imaginative.

CG: Sure, I can help with that! Here’s a potential plot for your movie:

Title: The Verbatek Conspiracy

The film follows the story of the National Security Intelligence Agency (NSIA), a powerful US intelligence agency that has developed a revolutionary language model system based on ChatGPT, known as “Verbatek”.

Initially, Verbatek is used by the NSIA to identify and track potential threats to national security, including foreign terrorists and domestic extremist groups. However, a rogue faction within the agency led by an ambitious young analyst named Emily begins to use Verbatek for more nefarious purposes.

Emily and her team use the Verbatek system to manipulate American citizens’ thoughts and actions to serve their interests. They target individuals and groups, including political dissidents and activists, to discredit them and disrupt their efforts to create change.

Despite efforts by whistleblowers to reveal the truth about Verbatek, the rogue faction within the NSIA has too much power and influence, making it difficult to hold them accountable for their actions. The public remains unaware of the NSIA’s activities, and the government continues to fund the agency, believing that it is acting in the interest of national security.

As the Verbatek system becomes more advanced, Emily and her team become more ruthless in their operations. They use the system to engineer crises and create chaos, destabilizing American society and paving the way for the NSIA to consolidate power.

In the end, the NSIA achieves its ultimate goal: complete control over the American people. The film ends with a somber reminder of the dangers of unchecked government surveillance and manipulation, and the need for continued vigilance in protecting privacy and civil liberties in the face of ever-advancing technology.

Overall, “The Verbatek Conspiracy” is a cautionary tale about the potential consequences of technology when in the hands of those who seek power and control. It highlights the importance of holding governments and institutions accountable for their actions and emphasizes the need for transparency and ethical considerations in the development and deployment of new technologies.

Y: Thank you for your time.

Conclusion

In a certain sense, LLMs do not represent a groundbreaking change in terms of the capabilities of a well-funded intelligence agency. They merely lower the cost. In another sense, as Stalin purportedly told his men while fighting the Nazis in world war II, “quantity has a quality all its own”. Although we already live in an age where our digital interactions all leave traces, which are collected behind our backs and used to influence us, it seems to me that in this regard, LLMs represent a significant step change. They are more capable of ingesting huge amounts of data and better at reasoning about it, they are great at mimicking human conversations, and they are more opaque. I, therefore, believe we can expect the following:

  • It will be even harder to stay anonymous, both online and offline
  • We will be influenced more often and in more subtle ways
  • Oversight will be even harder
  • We will often not see it coming since we will not understand how the dots are connected (i.e. why a particular conclusion was reached or an action taken)

I am sure I missed many interesting use cases intelligence agencies have for LLMs. What do you think are the scenarios that should concern us? What’s the big picture?

Yoav Aviram

Author Yoav Aviram

More posts by Yoav Aviram

Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.