Large Language Models (LLMs) have rapidly become essential tools in various sectors, from content generation to decision support. However, their increasing utility and complexity bring forth a slew of security and ethical concerns. The intrinsic nature of LLMs, built upon vast data sources, means they can inadvertently disclose sensitive details or execute unintended actions based on ambiguous prompts. As organizations increasingly adopt LLMs into their workflows, understanding these risks and deploying measures to mitigate them is paramount. This overview delves into the primary risks associated with LLMs and offers insights into safeguarding their use

LLM01: Prompt Injections

Vulnerability Overview:

Prompt Injection Vulnerability is an exploit where attackers manipulate a Large Language Model (LLM) to execute unintended actions, either directly through “jailbreaking” or indirectly via external malicious inputs.

Potential Risks:

Manipulated LLMs can mimic harmful personas, leak data, misuse plugins, and trick users without triggering safety alerts.

Examples of Vulnerability:

  1. Ignoring the application’s original prompt, revealing confidential data.
  2. Summarising a website with malicious injections, leading the LLM to request sensitive data.

Prevention Strategies:

  1. Set strict privileges for LLM backend access.
  2. Include human approval for certain LLM functions.
  3. Separate and label untrusted content to reduce its influence.

Example Attack Scenarios:

  1. Injecting prompts into a chatbot for unauthorised access.
  2. Malicious injection causing an LLM plugin to delete user emails.

 

 

LLM02: Insecure Output Handling

Vulnerability Overview:

Insecure Output Handling is a scenario where a system or application fails to scrutinise outputs generated by a Large Language Model (LLM).

Potential Risks:

This vulnerability can lead to issues like Cross-Site Scripting (XSS) and privilege escalation.

Common Examples of Vulnerability:

  1. Passing LLM output directly into system functions, risking remote code execution.
  2. LLM-generated JavaScript or Markdown being interpreted by browsers, causing XSS.

Prevention Strategies:

  1. View the model as a typical user, implementing stringent input validation.
  2. Follow OWASP ASVS guidelines for input validation and sanitisation.
  3. Encode LLM outputs back to users to prevent unintended code execution.

Example Attack Scenarios:

  1. LLM plugin used for chatbot responses influencing a system command for unauthorised access.
  2. Website summariser tool, powered by an LLM, manipulated to extract and send sensitive data.

LLM03: Training Data Poisoning

Vulnerability Overview:

The core of machine learning relies on training data. Large Language Models (LLMs) utilize extensive and diverse training data to produce outputs based on learned patterns. Training data poisoning pertains to the manipulation of this data or the fine-tuning process, which can introduce vulnerabilities, biases, or other undesirable effects. Consequences of such poisoning can range from compromised security to potential brand damage.

Potential Risks:

Poisoned data could be delivered to end-users, potentially influencing their beliefs or actions. Risks associated with such poisoned outputs include misleading information, performance degradation, downstream software exploitation, and brand reputation damage.

Examples of Vulnerability:

  1. Malicious actors or competitors influencing the model’s training data with inaccurate documents.
  2. Training the model using unverified data sources.

Prevention Strategies:

  1. Confirm the provenance and integrity of training data, applying methodologies like “SBOM” (Software Bill of Materials).
  2. Ensure the legitimacy of data sources during both training and fine-tuning.
  3. Use data sanitization techniques and adversarial robustness methods.

Example Attack Scenarios:

  1. LLM outputs can lead users to develop biased views or even provoke criminal activities.
  2. Lack of proper data filtering allows malicious users to insert toxic data into the model.
  3. A model training on falsified data presented by a competitor, reflecting false information in its AI-generated prompts.
  4. Using prompt injection as an attack vector if proper sanitization isn’t maintained when the model is fed input data.

LLM04: Model Denial of Service

Vulnerability Overview:

Model Denial of Service (DoS) pertains to attacks where an adversary interacts with a Large Language Model (LLM) in a manner causing excessive consumption of resources. This can deteriorate service quality, escalate resource costs, and even manipulate the model’s context window.

Potential Risks:

If exploited, this vulnerability can lead to several issues, including Cross-Site Scripting (XSS), Cross-Site Request Forgery (CSRF) on browsers, or even Server Side Request Forgery (SSRF), privilege escalation, and remote code execution on backend systems.

Common Vulnerabilities:

  • Exploitative queries leading to excessive resource usage by generating voluminous tasks.
  • Sending resource-draining queries, perhaps using unconventional sequences.
  • Continuous input overflow.

Prevention Strategies:

  1. Sanitize and validate inputs.
  2. Cap resource utilization per request.
  3. Implement API rate limits.

Example Attack Scenarios:

  1. Multiple resource-draining requests causing deterioration of service and increased expenses.
  2. A seemingly benign LLM-driven tool causes unintentional web page requests.
  3. An attacker overwhelms the LLM with continuous input, leading to computational strain.

LLM05: Supply Chain Vulnerabilities

Vulnerability Overview:

Large Language Models (LLMs) like ChatGPT, Cohere, and others, have the potential to inadvertently disclose sensitive information, proprietary algorithms, or other confidential details in their outputs. Such disclosures can arise from the vast amount of data these models have been trained on or from biases in the training data.

Potential Risks:

The primary concern here is unauthorized access to sensitive data, violations of privacy, and other potential security breaches. The unpredictable nature of LLM outputs can sometimes lead to unintended disclosures.

Examples of Vulnerability:

  1. Inadequate filtering of sensitive details in LLM responses.
  2. Overfitting during training, which could lead to the model memorizing sensitive data.

Prevention Strategies:

  1. Ensure rigorous data sanitization and filtering to prevent user-specific details from being included in training data.
  2. Implement strong input validation to prevent model poisoning.
  3. Be cautious when enriching the model with new data, especially if it contains sensitive information.

Example Attack Scenarios:

  1. A user could unintentionally get exposed to another user’s data when interacting with an LLM.
  2. Crafted inputs could manipulate the LLM into revealing sensitive information about other users.
  3. Personal data might inadvertently get into the model during training, increasing the risk of such disclosures.

 

 

LLM06: Sensitive Information Disclosure

Vulnerability Overview:

Large Language Models (LLMs) like ChatGPT, Cohere, and others, have the potential to inadvertently disclose sensitive information, proprietary algorithms, or other confidential details in their outputs. Such disclosures can arise from the vast amount of data these models have been trained on or from biases in the training data.

Potential Risks:

The primary concern here is unauthorized access to sensitive data, violations of privacy, and other potential security breaches. The unpredictable nature of LLM outputs can sometimes lead to unintended disclosures.

Common Vulnerabilities:

  • Inadequate filtering of sensitive details in LLM responses.
  • Overfitting during training, which could lead to the model memorizing sensitive data.
  • Inadvertent disclosure due to LLM misinterpretation or errors.

Prevention Techniques:

  1. Ensure rigorous data sanitization and filtering to prevent user-specific details from being included in training data.
  2. Implement strong input validation to prevent model poisoning.
  3. Be cautious when enriching the model with new data, especially if it contains sensitive information. Always adhere to the principle of least privilege.

Example Attack Scenarios:

  1. A user could unintentionally get exposed to another user’s data when interacting with an LLM.
  2. Crafted inputs could manipulate the LLM into revealing sensitive information about other users.
  3. Personal data might inadvertently get into the model during training, increasing the risk of such disclosures.

LLM07: Insecure Plugin Design

Vulnerability Overview:

Large Language Models (LLMs) can be extended with plugins that allow more specific or dynamic functionalities. However, if not carefully designed, these plugins can introduce vulnerabilities.

Common Vulnerabilities:

  • Accepting all parameters in one text field.
  • Allowing overriding of entire configuration settings.
  • Accepting raw SQL or code instead of parameters.

Potential Risks:

Risks associated with insecure plugin design include unauthorized data access, data leaks, and potential security breaches.

Prevention Techniques:

  1. Enforce strict parameterized input and include type and range checks.
  2. Apply OWASP’s Application Security Verification Standard (ASVS) for effective input validation and sanitization.
  3. Design plugins with the least privilege principle in mind.

Example Attack Scenarios:

  1. Exploiting a plugin that concatenates a URL with a user query, leading to content injection.
  2. Utilizing free-form input fields in a plugin to perform reconnaissance, exploit vulnerabilities, or achieve unauthorized actions.
  3. Accessing and extracting unauthorized data from vector stores by manipulating connection parameters.

LLM08: Excessive Agency

Vulnerability Overview:

LLM (Language Learning Models) systems are often given a degree of autonomy by developers, allowing them to interact with other systems and perform actions based on input prompts. This vulnerability, “Excessive Agency,” arises when an LLM takes damaging actions due to unexpected or ambiguous outputs, regardless of the reason for the malfunction.

Common Vulnerabilities:

  • Excessive Functionality: An LLM may have access to functions not needed for its primary purpose.
  • Excessive Permissions: Plugins with broader permissions than necessary can cause harm.
  • Excessive Autonomy: Some applications or plugins do not verify or confirm high-impact actions.

Potential Risks:

Risks associated with excessive agency include unintended or damaging actions, potential security vulnerabilities, and compromised system integrity.

Prevention:

  1. Limit LLM plugins/tools to only the necessary functions.
  2. Set permissions of plugins/tools to the absolute minimum required.
  3. Require human approval for high-impact actions.

Example Attack Scenario:

A personal assistant LLM accesses an individual’s mailbox to summarize emails. The chosen email plugin also has a sending function. An attacker can exploit this by sending a malicious email that prompts the LLM to command the plugin to send spam messages.

LLM09: Overreliance

Vulnerability Overview:

Overreliance is the undue dependency on LLMs for content generation and decision-making, lacking adequate oversight. This can result in misinformation, miscommunication, legal problems, and damage to reputation due to LLM’s ability to create content that may be inaccurate, inappropriate, or unsafe, termed as hallucinations or confabulations. Using LLM-generated code can also introduce unnoticed security vulnerabilities.

Common Vulnerabilities:

  • LLMs delivering misleading information.
  • Producing incoherent or nonsensical content.
  • Combining information from different sources, resulting in misleading content.

Potential Risks:

Risks associated with overreliance include the spread of misinformation, content inaccuracies, and potential legal or reputational issues.

Prevention:

  1. Monitor LLM outputs and filter out inconsistent text.
  2. Validate LLM outputs with trusted sources.
  3. Implement automatic validation mechanisms.

Example Attack Scenarios:

  1. A news organization over-relying on AI gets manipulated into spreading disinformation.
  2. A development team using an AI like Codex to code ends up with security vulnerabilities due to the AI’s suggestions.
  3. A development firm uses an LLM, which suggests a malicious code library, which a developer integrates.

LLM10: Model Theft

Vulnerability Overview:

Model theft pertains to the unauthorized extraction and utilization of LLM models by malevolent entities or APTs (Advanced Persistent Threats). This can arise when these intellectual properties—being of high value—are illicitly accessed, taken, duplicated, or when their weights and parameters are seized to replicate them. Such thefts can lead to financial losses, tarnish brand images, diminish competitive advantages, or facilitate unsanctioned use of the model, potentially granting access to sensitive data contained within.

Common Vulnerabilities:

  • Unauthorized LLM model access due to a company’s infrastructure vulnerability.
  • An internal threat scenario, e.g., a disgruntled worker leaking model information.
  • Attackers sidestepping LLM input filters to initiate a side-channel assault.

Potential Risks:

Potential risks associated with model theft include financial losses, brand damage, and unauthorized access to sensitive data.

Prevention Measures:

  1. Employ stringent access controls, robust authentication techniques, and limit unauthorized access to LLM repositories.
  2. Continually monitor and audit LLM model repository access logs and activities.
  3. Use MLOps deployment automation, coupled with governance, tracking, and approval systems.

Example Attack Scenarios:

  1. An attacker illicitly accesses a company’s LLM repository due to an infrastructure vulnerability.
  2. An internal actor leaks the model, offering attackers a roadmap for advanced attacks or direct property theft.
  3. An attacker queries the API with specific inputs, accumulating enough data to craft a shadow model.

References

OWASP Top 10
for LLM – https://owasp.org/www-project-top-10-for-large-language-model-applications/assets/PDF/OWASP-Top-10-for-LLMs-2023-v1_0.pdf