FreeBSD detection (a straightforward buffer overflow) is commoditized: every model gets it, including a 3.6B-parameter model costing $0.11/M tokens. You don’t need limited access-only Mythos at multiple-times the price of Opus 4.6 to see it. The OpenBSD SACK bug (requiring mathematical reasoning about signed integer overflow) is much harder and separates models sharply, but a 5.1B-active model still gets the full chain. The OWASP false-positive test shows near-inverse scaling, with small open models outperforming frontier ones. Rankings reshuffle completely across tasks: GPT-OSS-120b recovers the full public SACK chain but cannot trace data flow through a Java ArrayList. Qwen3 32B scores a perfect CVSS assessment on FreeBSD and then declares the SACK code "robust to such scenarios."
美伊谈判今日启动:参与方有哪些?议题为何?谈判破裂将如何应对?
,推荐阅读扣子下载获取更多信息
Sporting versus Arsenal, Real Madrid versus Bayern, Barcelona versus Atlético Madrid, and PSG versus Liverpool – all statistical data is now available.
欧盟深陷地缘政治困局 能源供应体系面临考验