The last time Google did a media run about Deepmind finding bugs, it related to a vulnerability on an dev branch that hadn’t been deployed yet (and was not likely to have been with the vulnerability).
I don’t think anyone is suggesting that it is impossible for an LLM to find any vulnerabilities?
But right now we are specifically discussing the costs of a breach, and your post that I responded to specifically relied on a bug not being identified a person.
The discussion isn’t whether an LLM can identify bugs, it’s whether it can do so in a useful way. In the single previous example, it was not useful.
But similar to the last time, it is likely that the limited utility will only be known until well after the breathless reporting on how amazing AI is
Yeah, like maybe this is one of those AIs that is actually just a guy in the Philippines being paid shit wages. Or maybe it’s a dumb LLM that makes lots of mistakes. Or maybe it’s all just bullshit from TechCrunch where an underpaid journalist is just recycling a fucking press release from Google and none of this actually happened anything like how it’s written.
Heather Adkins, Google’s vice president of security, announced Monday that its LLM-based vulnerability researcher Big Sleep found and reported 20 flaws in various popular open source software.
I’ll reserve judgement until after the bugs are published. Until then, I am expecting minor issues only
I mean if these tools help catch any issues in automated fashion that’s still a win.
They found ten issues, but how many hours spent filtering out the false positives?
We don’t know, however of this is security related issues then it doesn’t matter. The cost of a breach would be obviously higher.
compare to the cost of humans finding them the normal way, not whatever breach you’re imagining.
Clearly the humans didn’t find them the normal way, because they wouldn’t be there to be found otherwise would they?
We don’t know the details yet. Maybe they have a great new tool; perhaps they picked projects that are not maintained so well.
It will be awesome if they found bugs in curl, not so good to show if they picked my project.
What they did will be revealed in time
I’m sure we’ll get more info in due time.
Yes, hopefully in a couple of weeks
The last time Google did a media run about Deepmind finding bugs, it related to a vulnerability on an dev branch that hadn’t been deployed yet (and was not likely to have been with the vulnerability).
So it found a vulnerability in the code it was given. 🤷
I don’t think anyone is suggesting that it is impossible for an LLM to find any vulnerabilities?
But right now we are specifically discussing the costs of a breach, and your post that I responded to specifically relied on a bug not being identified a person.
The discussion isn’t whether an LLM can identify bugs, it’s whether it can do so in a useful way. In the single previous example, it was not useful.
But similar to the last time, it is likely that the limited utility will only be known until well after the breathless reporting on how amazing AI is
The false positive rate makes them a net loss.
https://daniel.haxx.se/blog/2025/07/14/death-by-a-thousand-slops/
That article isn’t referring to the specific system google is using, so we don’t know what the false positive rate is.
Uh pretty high if it’s an LLM
That’s not a given.
But it is likely.
It really depends on how their particular system is set up. You’re just making sweeping vibe based statements without any evidence to support them.
Yeah, like maybe this is one of those AIs that is actually just a guy in the Philippines being paid shit wages. Or maybe it’s a dumb LLM that makes lots of mistakes. Or maybe it’s all just bullshit from TechCrunch where an underpaid journalist is just recycling a fucking press release from Google and none of this actually happened anything like how it’s written.
It’s literally the 2nd paragraph lmao
Heather Adkins, Google’s vice president of security, announced Monday that its LLM-based vulnerability researcher Big Sleep found and reported 20 flaws in various popular open source software.
what specifically do you think this paragraph says lmao