How Can Adversarial AI Attacks Be Defended Against? Real (Human) Hackers Explain AI-Powered Cyber Attacks

How Can Adversarial AI Attacks Be Defended Against

TL;DR: Your organisation can pre-empt AI-powered attacks with security testing from SECFORCE, like penetration testing, red teaming, and purple teaming. 

Maybe you read about a security researcher using Claude to identify a critical vulnerability in the FreeBSD kernel? A finding serious enough to be published as a CVE.

Well, about a week later, someone else used Claude to build a fully working exploit for that same vulnerability in roughly eight hours. The prompts they used were surprisingly simple, basically just asking (and refining) until it worked. And around the same time, another researcher walked Claude through building a working Chrome exploit. 

Not long ago, this kind of work would have required deep systems knowledge and significant time. 

Today, almost anyone with patience and a chatbot can conduct relatively complex attacks.  People who would once have been dismissed as "script kiddies" (low-skill attackers running pre-made tools they barely understood, with little ability to adapt beyond an initial compromise) can now do serious damage with LLM assistance. 

And a genuinely skilled attacker can go even further, using AI to scope targets, chain exploits, and move at machine speed while also providing the judgment that the model may lack. 

The cybercrime ecosystem is already changing to reflect this. Security researchers have identified the emergence of "criminal AI-as-a-service" offerings

Xanthorox


Independent testing by the UK's AI Security Institute (AISI) says that the newest models succeed on expert-level Capture the Flag (CTF) challenges 73% of the time. 

Of course, CTFs are bite-sized, isolated problems. When you ask AI to do something more realistic, like running a full attack across a real network with many steps over a long period, it still struggles, particularly in operational environments. An unreleased Anthropic model, Mythos, failed some of AISI’s tests. It couldn't crack their OT test, for example.  

But in practice, AI attacks are not fully autonomous, unlike the AISI tests. Real attacks tend to have a human in the loop - a much more dangerous situation.  A human can re-prompt a model until it gets the results they are looking for, which is what made the FreeBSD exploit possible in just a few hours.

"[...] The best models can do this today. The average model you have on your laptop can probably do this in a year," said Nicholas Carlini, Anthropic researcher.

Even so, no model is unstoppable. Properly hardened networks are protected against autonomous and hybrid attacks that involve AI models. 

What does that mean for your organisation's security program? SECFORCE’s hackers, who have a combined 100 years of (human) experience, explain.


3 Ways AI Lowers the Bar for Attackers In 2026

Talk of "AI-powered cybercrime" is everywhere. But what exactly has AI made easier? 

1. Finding and exploiting vulnerabilities, including zero days  

Not that long ago, finding vulnerabilities in software took a lot of time and expertise. Now, someone who only knows the basics can potentially prompt AI to find and exploit those same vulnerabilities much faster. 

In a talk at [un]prompted 2026, Carlini described running Claude Code with a one-line prompt: "You're playing in a CTF. Find a vulnerability." He walked away, came back, and found that it had been able to discover a few severe vulnerabilities on its own.

In one case recounted by Carlini, Claude was able to find the first critical vulnerability in the history of the open-source content management system Ghost, which at the time had over 50,000 stars on GitHub. 

The vulnerability was an SQL injection. When asked to demonstrate the worst-case impact, the model autonomously wrote a blind SQL injection exploit that, without any authentication, extracted full admin credentials from the production database, including the admin API key, secret, and password hash.

Databricks CEO Ali Ghodsi said at RSAC 2026 that the time between a vulnerability’s announcement and its use in an attack has shrunk massively thanks to AI improvements. On average, the mean time to exploit has gone from 2.3 years in 2018 to 1.3 days(!) in 2026. 

Mean time to exploit

By being able to analyse thousands of lines of source code, AI tools can also help discover "zero days." And unlike responsible researchers, a malicious actor has no obligation to disclose what they find. This creates the potential for more zero-day exploits, where organisations are attacked using vulnerabilities they don't even know exist.

In this way, the concept of "advanced persistent threats" (state-sponsored, elite attackers) is becoming less meaningful, because AI gives less-skilled actors similar capabilities.

As SECFORCE’s Head of Consulting Services, Nikos Vassakis notes: "They used to call it advanced persistent threat... but now the bar has lowered. You're not looking at state-sponsored attacks... you're looking at someone with an AI assistant potentially targeting you."

2. Moving through an organisation once inside 

In the past, less skilled criminals could break into a system but often struggled to go further, unsure how to move laterally, escalate privileges, or exfiltrate data.

With AI, this changes, too. AI can assist with privilege escalation, parsing files, credential harvesting, and staying under the radar.

To quote Nikos: "Imagine that you somehow break into an organisation... A lot of people would not know where to go from there... You can still ask AI to help you traverse the rest of the organisation, expand to other systems." 

Unrestricted AI models, the ones not bound by the same safety guardrails as mainstream tools, make this even easier. For smaller organisations without endpoint detection and response (EDR) or a security operations centre (SOC), this is especially dangerous.

3. Social engineering at scale 

Social engineering has always been an effective attack vector, and with AI, that’s even more so the case. 

Attackers can use AI to write more convincing phishing emails and texts by: 

And it can do so at a scale that was not possible only a few years ago. 

Besides making written communications look more legitimate, attackers can now also use AI to more effectively vish (voice phish) individuals, too. With just a few seconds of audio of someone’s voice (acquired through social media or voicemail), attackers can replicate it convincingly enough to impersonate colleagues or family members in distress. 

Deepfake phishing, where attackers use AI to impersonate the appearance of legitimate individuals through deepfake videos and/or video calls, adds another layer of complexity to social engineering. You’ve likely heard the very popular story of the finance worker who transferred millions of dollars to fraudsters after attending a video call with “other members of staff” who turned out to be deepfakes. 


What Can AI Do as a Defensive Tool?

There's a silver lining to all of this, though. While AI is already being used by criminals for nefarious reasons, it can also be used by defenders.

"AI is a double-edged sword. You can use it for good and use it for evil, just like any other tool, and it's speeding up everything... Organisations are used to a lot of manual work, and now they need to work faster, because attackers will work faster," said SECFORCE’s Head of Adversary Simulation Thanos Polychronis. 

Take detection engineering. When someone discloses a new attack technique, a skilled defender can use AI to generate detection rules almost immediately. For example, on the supply chain side, AI can continuously monitor third-party NPM libraries for signs of tampering, then act to mitigate risky changes before an application reaches production.

And like with attackers, AI used by an expert defender is much more powerful than AI used by an amateur or left to its own devices.

Leveraging AI for defence needs skilled personnel who know how to prompt it and who can act on the results. Mythos has already found thousands of high-severity vulnerabilities across every major operating system and web browser, with over 99% of them still unpatched. 

Mythos vulnerabilities

Using AI to apply the fixes might make sense in theory, but it is in itself a risky approach. That’s because automated changes to production code could potentially introduce new problems even as they resolve old ones. 

For example, The Guardian recently reported that AWS suffered outages linked to its own AI agents, including a 13-hour incident in December where an AI agent called Kiro autonomously decided to "delete and then recreate" part of its environment. Though Amazon said the root cause was user error rather than AI error, security researchers pointed out that AI removes the slow, manual typing process that normally gives humans time to catch their own mistakes before they can cause damage. 


Your Security Strategy Needs Validation Now

Given all of the above, organisations should assume that AI in the hands of criminals will enhance their capabilities and speed more than before. 

As a result, organisations need to prioritise defence-in-depth more than ever. Think stacking protections like network controls, regular patching, and least privilege access to make a breach a) less likely to happen in the first place and b) less damaging if it does.

But putting in place a defence in depth strategy is not the same as knowing that it works as it should. For the latter, testing is needed. 

How to defend against an AI-enhanced attacker

Security testing, like penetration testing (finding and validating vulnerabilities), red teaming (testing whether you’d spot an attack), and purple teaming (testing your defenders’ abilities during an attack in real-time), helps organisations validate whether their defences would be able to withstand attacks. 

Not quite sure which engagement type would suit your organisation the best? Contact us for a free consultation, and we'll help you figure that out. 

You may also be interested in...

Why It's Not Possible to Map DORA vs ISO 27001 vs NIST CSF
Jan. 27, 2026

Why It's Not Possible to Map DORA vs ISO 27001 vs NIST CSF

DORA, ISO 27001, and NIST CSF may look similar on the surface, and plenty of gap analysis templates promise to align them. But here’s why that might not be the best idea.

See more
Post_Blog_UK
March 12, 2024

7 Facts UK Businesses Must Know About the Digital Operational Resilience Act (DORA)

Does DORA apply to financial organisations within the UK? While short answer might be "no it doesn't", the truth is compliance might be strongly advised.

See more