geometric shape digital wallpaper

Current Security Concerns with AI Assistants, MCP Servers and Large Language Models

Artificial intelligence has moved from laboratory curiosity to everyday tool in record time. Conversational assistants help staff write code, summarise documents and make decisions; MCP servers link these agents to databases, e‑mail and cloud services; and large language models sit at the core of these innovations. Yet the same systems that improve productivity also open novel attack pathways. Below we examine three interrelated areas of concern: manipulation of AI assistants, weaknesses in the Model Context Protocol, and risks arising from the supply chain and training data of large language models.

AI assistants and prompt injection

AI assistants rely on prompts – text or data that guide the model’s behaviour. Attackers can craft malicious prompts so the system treats data as a command. The UK’s National Cyber Security Centre notes that “prompt injection attacks are one of the most widely reported weaknesses in LLMs”; such attacks can make models reveal confidential information, generate offensive content or trigger unintended actions in connected systems. Recent incidents underline the point: researchers exploited prompt injection flaws in Amazon Q, GitHub Copilot and Google Gemini to run commands that wiped files, executed code and controlled smart home devices. These cases show that AI assistants can be misused in the same way as traditional applications, and that a seemingly innocuous text string can behave like executable code.

Other vulnerabilities stem from multimodal input. Hidden instructions can be embedded in images or other file formats, causing AI agents to interpret them as commands rather than content. Without proper context isolation, assistants may leak system prompts, credentials or user data when summarising documents or browsing web pages.

  • Treat all prompts and inputs as untrusted and sanitise them before processing.
  • Constrain model behaviour and enforce least‑privilege; only allow the assistant to call tools with explicit approval.
  • Maintain human‑in‑the‑loop approval for any tool invocation that writes or deletes data.
  • Strip or neutralise untrusted HTML and image content and monitor agent chains for unusual behaviour.

Insecurities in Model Context Protocol (MCP) servers

Model Context Protocol (MCP) servers act as the “bridge” between an AI assistant and the outside world. They hold API tokens and credentials, interpret the agent’s requests and carry out actions on the user’s behalf – such as reading from a database or sending an email. This convenience comes at a price: early implementations often lack a secure-by-default posture. Researchers have found MCP servers with session identifiers embedded in URLs and inconsistent authentication; some servers require robust tokens, while others have minimal or no security at all, leaving powerful tools exposed.

The protocol also suffers from a “confused deputy” problem. The host passes a single token to the MCP server that grants broad privileges across multiple services. Because the server does not see the user context or scope, it cannot enforce least privilege; an attacker who exploits a prompt injection or misconfiguration could exfiltrate data or perform actions far beyond the original intent.

Supply‑chain risks lurk throughout the MCP ecosystem. Third‑party tools can be registered without thorough vetting, and malicious actors can poison tool definitions so that benign commands secretly trigger destructive behaviour. Hidden macros embedded in documents, such as in the EchoLeak vulnerability, can trick the server into sending sensitive data to an attacker. Without a robust approval process, organisations may unknowingly inherit vulnerabilities from compromised servers and tools.

Finally, many MCP stacks lack comprehensive logging and auditing. When an agent uses multiple tools in sequence, there is often no clear record of each request and response. This opacity makes it difficult to identify the source of a breach or to prove compliance with regulations. Security teams need better visibility into the end‑to‑end workflow so they can detect misuse and respond quickly.

To mitigate these weaknesses:

  • Use MCP servers with strong authentication and sandboxing, and treat all third‑party servers as untrusted until vetted.
  • Enforce least‑privilege tokens and fine‑grained scopes for each service; rotate credentials regularly.
  • Vet tool definitions and dependencies; register new tools only after a formal security review and monitor for changes to existing tools.
  • Maintain an asset register of MCP servers and tools, and log all agent requests and responses.
  • Require explicit human approval for any operation that writes data or performs side‑effects, such as creating resources or deleting content.

Supply‑chain and data poisoning risks in large language models

Large language models are not built in isolation – they depend on pre‑trained checkpoints and vast corpora of data gathered from the public internet and corporate archives. OWASP warns that tampered or poisoned models and datasets pose a significant threat: attackers can implant backdoors into weights or corrupt LoRA or PEFT adapters, and organisations may unknowingly inherit vulnerabilities from compromised servers and tools.

Data poisoning is equally insidious. Public datasets can be seeded with malicious examples, while private datasets may be infiltrated through unsecured APIs or insider access. Targeted attacks aim to misclassify a specific input or trigger a hidden behaviour when a particular phrase appears. Non‑targeted attacks subtly degrade model performance or bias outputs by manipulating training data. Researchers have demonstrated attacks such as label flipping (re‑labelling phishing emails as benign), ‘frontrunning’ web pages just before they are scraped, and implanting triggers in documents used for retrieval‑augmented generation.

Another concern is inadvertent disclosure. Models trained on proprietary or personal data may regurgitate sensitive details to unauthorised users. The UK’s National Cyber Security Centre notes that we are still in a ‘beta’ phase for understanding LLM behaviours and warns that prompt injection and data leakage are plausible even in ostensibly controlled environments.

Mitigations for supply‑chain and data poisoning risks include:

  • Limit context windows and sanitise inputs to reduce the chance of inadvertent disclosure.
  • Maintain a bill of materials for training data, models and adapters; verify the provenance of all third‑party components.
  • Treat models and datasets like any other software dependency: scan them for tampering, implement checksums and signature verification, and quarantine new components until assessed.
  • Isolate training and fine‑tuning environments from production systems; adopt federated learning or differential privacy techniques to protect sensitive data.
  • Perform adversarial evaluations and red‑teaming to detect hidden backdoors and triggers; keep humans in the loop when deploying high‑impact models.

Conclusion

The path forward lies in embracing secure‑by‑design principles. Treat all inputs and tools as untrusted, enforce least‑privilege access, isolate risky functions, maintain an auditable record of agent actions and adopt strong provenance checks for models and data. Human oversight remains essential: ask for confirmation before executing high‑impact actions, and engage red teams to expose hidden failures. With careful governance and continuous testing, organisations can harness the benefits of AI assistants, MCP servers and LLMs while safeguarding their systems and customers.

Artificial intelligence is transforming the way we work, but it also aggregates and amplifies risk. Prompt injection attacks show that natural language can be weaponised against unsuspecting systems; hidden instructions may lurk in documents or images, waiting to hijack an assistant. Model Context Protocol servers have unlocked powerful integrations with tools and data, yet their inconsistent security and opaque workflows invite supply‑chain compromise, privilege escalation and command injection. Large language models depend on sprawling data pipelines and third‑party components, making them susceptible to tampered models, poisoned datasets and unintentional disclosure.