Download the PDF version of this report: https://vordan.co/vordan_report_model_identity.pdf

On May 7, 2026, a repository named Open-OSS/privacy-filter appeared on Hugging Face impersonating OpenAI's Privacy Filter release. It copied the official model card nearly verbatim. It embedded a Rust-based information stealer in its loader. Within 18 hours, 667 automated accounts gamed the platform's trending algorithm and pushed the repository to the number one position. It accumulated 244,000 downloads before removal. Every machine that executed the loader lost credentials, cookies, encryption keys, and cryptocurrency wallet data to a Chinese threat actor attributed by researchers to the Silver Fox campaign.

Discovery came from HiddenLayer Research. Not from Hugging Face.

This was not a novel attack. It was the latest iteration of a pattern that has been documented, escalated, and repeatedly not addressed since at least March 2024. The platform's own discovery surface was weaponized against its users. The mechanism that made it possible has been present since the platform launched.

That mechanism is the absence of any technical control capable of answering a single governance question: is this model what it claims to be?

The attack surface is the format

The Hugging Face incident is the most visible entry point into a problem that runs deeper than any single fraudulent repository.

Python's pickle serialization format is the dominant storage mechanism for machine learning model weights. It is also, by design, a mechanism for arbitrary code execution. When a developer loads a pickle-format model, Python's deserialization process executes whatever code the file contains. This is not a vulnerability in the traditional sense. It is the intended behavior of the format. Embedding malicious logic in a model's __reduce__ method is not an exploit. It is a feature of how the format works, applied adversarially.

JFrog's security researchers documented approximately 100 malicious models on Hugging Face in March 2024, roughly 95 percent of them using PyTorch's pickle-based format. Protect AI, which partnered with Hugging Face to scan the platform's model library, has since examined more than four million models and identified approximately 352,000 unsafe or suspicious issues across 51,700 models. JFrog's December 2025 disclosure revealed three zero-day bypasses in PickleScan itself, the primary open-source tool integrated into Hugging Face's scanning pipeline, each rated CVSS 9.3.

SafeTensors exists as a safer serialization format that does not permit arbitrary code execution. The industry has not adopted it uniformly. The rapid pace of AI development has led practitioners to prioritize speed over security. Many organizations still load models in pickle format and have no controls at the point of execution capable of detecting a modified payload.

The format-level attack surface is known, documented, and unresolved. Every organization loading models from public repositories into production systems is operating with exposure it cannot fully scan its way out of.

The cascade that returns "verified"

The Hugging Face incident is often framed as a distribution problem. A bad actor uploaded a fraudulent model and the platform failed to catch it. That framing is accurate but incomplete. It treats detection as the primary control and frames the failure as a detection failure.

The deeper failure is structural. It runs through three layers, each of which can return "verified" while the deployment remains compromised.

The first layer is the hub checksum. When a model is downloaded from Hugging Face, the platform provides a SHA256 hash of the file. That hash verifies that the bytes you downloaded match the bytes the platform has on record. It does not verify that the bytes the platform has on record are the bytes of the model they claim to be. A fraudulent model uploaded with a fraudulent model card has a valid checksum. The checksum answers the question it was designed to answer, correctly. The answer is useless.

The second layer is the scanner. Hugging Face's PickleScan integration scans for known malicious patterns. JFrog's partnership extends that scanning. Protect AI's ModelScan covers additional vectors. Each scanner operates at the same trust level as the model it is scanning. A sufficiently novel payload evades the blocklist. Three zero-day bypasses in the primary scanning tool were disclosed in December 2025. The scanner returns "clean." The payload executes.

The third layer is the inference gateway. LiteLLM, vLLM, and similar tools sit between the model and the application. They log what the model does. They do not verify what the model is. The audit trail records every inference call. It does not record whether the model that answered those calls was the model the deployment authorized.

All three layers can simultaneously return "verified," "clean," and "logged" while an organization is running a model it did not authorize, loaded from weights it cannot prove are genuine, producing outputs that could be backdoored in ways no behavioral test has ever detected.

Why verification and authorization are not the same problem

The framing of AI supply chain security as a detection problem obscures a more fundamental governance failure. Detection asks: does this artifact contain known malicious content? Authorization asks: is this artifact the specific artifact this organization's governance process sanctioned for this use?

A model card is a self-certification. The lab that trained the model wrote it. The repository that hosts it displays it. No independent party verified that the weights match the claims. No governance body required them to.

A checksum is a chain of custody document for a set of bytes. It proves the bytes arrived intact. It says nothing about what the bytes do, whether they came from the source they claim, or whether the training process that produced them was the process described in the documentation.

A scanning result is a comparison against a known-bad registry. It proves the artifact does not match anything already documented as malicious. It cannot detect the novel. It cannot detect the subtle. It cannot detect the adversarially distilled, where a model has been trained to reproduce the outputs of a target model through 16 million synthetic exchanges, producing weights whose behavior is indistinguishable from the original but whose provenance is entirely unverifiable.

None of these controls constitute authorization. Authorization requires a named human, a defined standard, a specific time, and a claim that survives adversarial scrutiny. The governance infrastructure for software supply chains spent a decade building toward this. The result is SLSA: Supply Chain Levels for Software Artifacts, a framework that codifies how to cryptographically attest every step of the build process so that the artifact that arrives in production can be traced back to a specific human decision to build it in a specific way.

No equivalent existed for AI models.

It now does. The Open Source Security Foundation's AI/ML Working Group, in collaboration with Google, HiddenLayer, and NVIDIA, published the OpenSSF Model Signing specification earlier this year: a flexible, implementation-agnostic standard for cryptographically signing model artifacts, purpose-built for the unique requirements of AI workflows. In March 2026, the NSA's AI Security Center and seven allied national cybersecurity agencies released the most expansive multinational guidance to date on AI/ML supply chain security, explicitly recommending AI Bills of Materials and cryptographic integrity validation.

The standard exists. The guidance exists. The adoption does not.

The precondition no framework named

The accountability frameworks that govern enterprise AI deployments were written for the deployment layer. NIST AI RMF, ISO 42001, the EU AI Act's deployer obligations under Articles 25 and 53 — each addresses what happens after the model is running. How the system behaves. How its outputs are monitored. How risk is managed at inference time.

None of them reach into the artifact.

An organization that fully satisfies every condition in every major AI governance framework can be running a model whose identity it has never verified, loaded from weights it cannot prove are genuine, executing instructions in a production environment without any record that a human authorized these specific weights for this specific deployment.

The Agentic Accountability Baseline was designed to govern how autonomous agents act. The seven conditions published in May 2026 addressed authorization chains, scope boundaries, memory governance, handoff traceability, prompt integrity, decision auditability, and forensic reconstructibility. They govern everything that happens after the model begins executing instructions.

None of them addressed what the model is.

That gap is not small. Authorization Provenance — the first and most foundational condition in the Baseline — requires that every action be traceable to a human authorization event that preceded it. If the model taking those actions cannot be verified as the model the organization authorized, Authorization Provenance fails before it can be evaluated. You cannot trace an action to an authorization if you cannot establish which model took the action.

This is why Vordan published AAB v0.2 on June 6, 2026, adding Condition 2.8: Model Substrate Integrity.

The condition requires that an accountable agentic deployment verify, through a mechanism independent of the deploying party's assertion, that the model executing its instructions is the model it was authorized to deploy. Namespace, documentation, community attestation, and checksum verification do not satisfy it. Verification must be technically grounded in a mechanism that the model's own architecture cannot be mimicked to produce.

That mechanism exists. Finlayson, Grivas, Ren, and Swayamditta demonstrated in a June 2026 preprint that token ranking patterns constitute a provably unforgeable model identity signature. Unlike behavioral fingerprinting, which can be mimicked by a sufficiently capable model, token rankings are mathematically constrained by the model's internal parameters in a way that is computationally intractable to replicate. The proof is not empirical. It is a complexity-theoretic guarantee: finding a model with the same feasible rankings is NP-hard. The signature can be verified through black-box API queries without exposing the model's weights, and limiting verification to the top-k tokens protects proprietary weight information while preserving the identity signal.

A verification primitive that satisfies Condition 2.8 was published this week. No distribution platform has implemented it. No governance framework required it before today.

What accountability requires

The OpenSSF Model Signing specification, the eight-nation NSA guidance, and the Finlayson et al. cryptographic primitive represent three different layers of the same answer. The industry has the tools. It has the guidance. It has the mathematical foundation. What it does not have is a governance standard that names model identity verification as a required condition of accountable deployment rather than an optional security enhancement.

That is what Condition 2.8 provides.

An organization that satisfies Model Substrate Integrity must maintain a model identity record for each agentic deployment. It must be able to demonstrate at any point in the deployment's operational period that the model executing instructions is the model that was authorized. Where a cryptographic verification mechanism exists and applies, its use is required. Where none exists, the organization must document that absence explicitly and assess the residual risk. Stating that no verification mechanism exists is not a gap finding. Failing to assess and document that absence is.

The condition does not require every organization to implement a cryptographic verification stack today. It requires every organization to know whether the model they are running is the model they authorized, and to have a defensible record of that determination. The accountability standard moves first. The tooling follows.

Eight-nation guidance. An open signing standard from the Linux Foundation. A mathematical proof of unforgeable identity signatures. A documented pattern of malicious model distribution going back two years, with 352,000 suspicious issues across 51,700 models on the dominant distribution platform.

The model in your production stack was verified. It was not authorized.

A model card is a self-certification. A checksum is a chain of custody document for a set of bytes. Neither is an accountability record. An accountability record requires a named human, a defined standard, a specific time, and a claim that survives adversarial scrutiny. No distribution platform requires one. No governance framework mandated one before this week.

The model in your production stack was verified. It was not authorized.

Where in your deployment architecture right now is an agent executing instructions from a model whose identity was never independently confirmed?

That gap is what we are here to map.

Vordan publishes the Accountability Report every Sunday and the Gap Alert when intelligence warrants it. Doctrine: Accountable by Design.

SOURCES + REFERENCES

  1. HiddenLayer Research — Fake OpenAI Privacy Filter Repository on Hugging Face, May 2026. https://hiddenlayer.com/research/fake-openai-privacy-filter-hugging-face

  2. The Hacker News — Fake OpenAI Privacy Filter Repo Hits No. 1 on Hugging Face, Draws 244K Downloads, May 2026. https://thehackernews.com/2026/05/fake-openai-privacy-filter-repo-hits-1.html

  3. JFrog Security Research — Data Scientists Targeted by Malicious Hugging Face ML Models with Silent Backdoor, February 2024. https://jfrog.com/blog/data-scientists-targeted-by-malicious-hugging-face-ml-models-with-silent-backdoor/

  4. JFrog Security Research — Unveiling Three Zero-Day Vulnerabilities in PickleScan, December 2025. https://jfrog.com/blog/unveiling-3-zero-day-vulnerabilities-in-picklescan/

  5. JFrog and Hugging Face — Join Forces to Expose Malicious ML Models, March 2025. https://jfrog.com/blog/jfrog-and-hugging-face-join-forces/

  6. The Next Web — Hugging Face and ClawHub Compromised With Hundreds of Malicious AI Models, May 2026. https://thenextweb.com/news/hugging-face-clawhub-malware-ai-supply-chain

  7. Cloud Security Alliance — Poisoned Pipelines: Malicious AI Model and Skill Repositories, May 2026. https://labs.cloudsecurityalliance.org/research/csa-research-note-malicious-ai-model-repositories-attack-sur/

  8. OpenSSF — An Introduction to the OpenSSF Model Signing Specification, June 2025. https://openssf.org/blog/2025/06/25/an-introduction-to-the-openssf-model-signing-oms-specification/

  9. NSA AI Security Center and Seven Allied Agencies — Artificial Intelligence and Machine Learning: Supply Chain Risks and Mitigations, March 2026. https://labs.cloudsecurityalliance.org/wp-content/uploads/2026/03/CSA_research_note_nsa_allied_ai_supply_chain_security_guidance_20260317-csa-styled.pdf

  10. Finlayson, Grivas, Ren, Swayamditta — Token Rankings are Unforgeable Language Model Signatures, arXiv:2606.04459, June 2026. https://arxiv.org/abs/2606.04459

  11. Vordan — Gap Alert Seventeen: The Model Was Not What It Said It Was, June 2026. https://reports.vordan.co/p/gap-alert-seventeen-the-model-was-not-what-it-said-it-was

  12. Vordan — Agentic Accountability Baseline v0.2, June 2026. https://vordan.co/baseline

Reply

Avatar

or to participate

Keep Reading