Training-Data Disclosure Pages Are Now Regulated Deliverables Under California’s AB 2013
California just turned AI transparency pages into regulated release artifacts, and any company serving the state inherits the compliance burden. AB 2013 requires dataset summaries before each public release or substantial modification, transforming static statements into recurring deliverables. Meeting the obligation demands controlled workflows, vendor attestations, and version history proving the disclosure tracked every retraining cycle, dataset refresh, and fine-tuning update.
Statute Creates a Recurring Trigger
AB 2013 requires covered developers to post training-data documentation on the developer’s website before each time a generative AI system or service, or a substantial modification to it, is made publicly available to Californians for use. The compliance date passed on Jan. 1, 2026, and the requirement applies whether access is paid or free.
That clause changes the compliance posture. A disclosure page stops being a one-time publishing exercise and starts looking like a release artifact that must track product reality. Model updates, data refreshes, and fine-tuning cycles can now trigger fresh disclosure duties when a company ships something that materially changes functionality or performance.
The statute requires posting before public availability but does not specify lead time. Companies should treat disclosure review as a release gate that clears before launch, not as a same-day publication task, to allow time for cross-functional sign-off.
The disclosure requirement also bakes in a core premise: training-data drives downstream risk. The Senate Judiciary Committee analysis framed the policy case in plain terms, arguing that limited training-data transparency makes it harder to assess, diagnose, and address issues raised by rapid AI development. That framing matters because the analysis helps predict how regulators, plaintiffs, journalists, and enterprise buyers will read the disclosure: as a truth statement about what shaped the system’s outputs.
Definitions Drive Scope, Not Marketing
AB 2013 applies to generative artificial intelligence, defined as AI that generates derived synthetic content such as text, images, video, and audio that emulates the structure and characteristics of the training data. Coverage hinges on public availability to Californians and on release timing, since the disclosure duty targets systems, services, or substantial modifications released on or after Jan. 1, 2022.
Developer status also runs broader than many org charts assume. The statute defines a developer as a person or entity that designs, codes, produces, or substantially modifies a generative AI system or service for use by members of the public. That definition captures companies that fine tune, re-release, or materially alter functionality, even when the base model originated elsewhere.
The statute defines substantial modification as a new version, new release, or other update that materially changes functionality or performance, including changes produced by retraining or fine-tuning. Compliance teams can treat that definition as the trigger logic for change control: when release notes claim materially better performance or meaningfully new capabilities, disclosure review belongs on the same checklist.
Disclosure Checklist Forces Data Decisions
AB 2013 lists specific elements that must appear in the high-level summary of datasets used in development. The list reads like a structured inventory, which means a disclosure program needs an internal data map that can answer each required field consistently across model iterations.
The statute’s required elements include the sources or owners of datasets, how datasets further the intended purpose, and the number of data points, which may be disclosed in general ranges, with estimated figures for dynamic datasets. The statute also requires a description of the types of data points, whether data is protected by copyright, trademark, or patent or is entirely in the public domain, and whether datasets were purchased or licensed.
Several requirements force legal and privacy questions into the same table. AB 2013 requires disclosure of whether datasets include personal information, as defined in the California Consumer Privacy Act, and whether datasets include aggregate consumer information, also as defined in the CCPA. Teams that run privacy programs in one system and ML data governance in another will feel the friction fast because the disclosure page needs a single, coherent answer.
Operational details also show up as mandated transparency. AB 2013 requires disclosure of whether datasets were cleaned, processed, or otherwise modified, with the intended purpose of those efforts, and requires disclosure of the time period during which data was collected, including whether collection is ongoing. The statute also requires the dates the datasets were first used during development and whether synthetic data generation was used or continuously used, with optional description of functional need.
Build a Disclosure Stack as a Workflow
A defensible AB 2013 program treats the disclosure page as a regulated deliverable with owners, gates, and version history. A practical approach starts by assigning the page a system-of-record home, then running each required disclosure field back to an internal source that can be audited later.
Most companies end up needing four layers of control to keep the disclosure aligned with shipping reality: dataset inventory, transformation log, release mapping, and publication controls.
A dataset inventory serves as the normalized catalog that identifies each dataset, the source or owner, acquisition terms, collection period, and the products or models that used it. The transformation log records cleaning, deduplication, filtering, labeling, and other modifications, tied to the intended purpose of each transformation step. Release mapping creates linkage between model versions, substantial modifications, and the datasets used in each development cycle, including dynamic dataset estimates and first-use dates. Publication controls establish review and approval workflow that treats the website disclosure as a controlled document, with sign-off and change control tied to product release governance.
That stack does not require exotic tooling. Many compliance failures come from fractured ownership, not from missing technology. A spreadsheet that cannot be reconciled to the shipped model is worse than no spreadsheet because it creates a false sense of completeness. Governance should focus on traceability that survives scrutiny.
For datasets where complete records are unavailable, developers must disclose the methods used to search for training-data information and document reconstruction efforts. The statute requires reasonable diligence, not perfection, but teams should log vendor inquiries, repository reviews, and other search efforts to demonstrate good-faith compliance.
Vendor Datasets Become Attestation Problems
AB 2013 requires disclosures that many companies cannot answer cleanly without third-party cooperation. Sources or owners, licensing status, IP protection status, and collection periods often sit with dataset vendors, brokers, or upstream aggregators. The disclosure page therefore pulls procurement into the compliance workflow.
A workable program treats vendor data as an attestation problem: the company needs reliable representations about what the vendor provided and what rights traveled with the data. Contracting teams already build representations and warranties for privacy, security, and IP clearance in many contexts. AB 2013 pressures companies to adapt that muscle memory to AI training corpora, then to keep attestations current as vendors refresh or swap underlying sources.
Companies that rely on multiple upstream data providers often discover a second-order risk: inconsistent terminology. One vendor’s public domain label may mean publicly accessible, not free of copyright. A disclosure workflow that harmonizes terms across vendors reduces the chance that the AB 2013 page becomes a patchwork of incompatible claims.
Change Control Tracks Substantial Modifications
AB 2013’s disclosure obligation is recurring. The trigger is not only the original public release, but also each time a substantial modification is made publicly available to Californians. A compliance program should therefore connect the disclosure page to the same governance that approves releases, retraining cycles, and fine-tuning pushes.
A clean operational rule treats disclosure review as a release gate. When product teams propose a new version or update that materially changes functionality or performance, the disclosure page should be reviewed for dataset changes, transformation changes, and synthetic-data changes. That review belongs alongside security review and privacy review, not after launch.
Dynamic datasets present a special challenge that the statute anticipates through estimated figures and general ranges for data points. Governance still needs a method for estimating and documenting ranges that remains consistent across updates. A team that changes estimation methods between releases risks making the disclosure page misleading even when the underlying data use stayed stable.
Trade Secrets Require Publication Strategy
The statute demands public disclosure, yet the statute does not create a dedicated trade-secret safe harbor inside the disclosure obligation. That gap pushes companies toward careful drafting decisions that preserve competitive information while still meeting statutory requirements.
High-level summary is the statutory phrase that does most of the work. The law allows general ranges for data points and estimated figures for dynamic datasets. Those allowances provide room to draft in a way that informs without handing competitors a blueprint. A disciplined program still needs internal guardrails so teams do not confuse high-level with vague, since overly generic disclosures invite credibility risk with regulators and buyers.
The disclosure page can reveal model development strategy indirectly through dataset selection, dataset volume, and the description of how datasets further intended purpose. A structured review that flags competitive sensitivities early reduces last-minute rewrites that can introduce errors.
Privacy Lives in CCPA Cross-References
The disclosure requirement explicitly cross-references the CCPA for personal information and aggregate consumer information. That cross-reference matters because teams cannot safely treat personal data as a purely technical classification. The disclosure page is public-facing, and statements about personal information can influence consumer trust, regulator interest, and litigation narratives.
Privacy teams can use AB 2013 as a forcing function to reconcile AI training practices with existing data maps. A company that already maintains a record of processing activities, a vendor inventory, and privacy impact assessments has a head start, because many AB 2013 fields overlap with established privacy controls. Coordination still needs to be explicit, since training datasets often include derived or scraped content that never flowed through standard product telemetry pipelines.
Exemptions also require careful reading. AB 2013 exempts a generative AI system or service whose sole purpose is to help ensure security and integrity, as defined by reference to the CCPA’s security and integrity concept. The statute also exempts systems whose sole purpose is operating aircraft in national airspace and systems developed for national security, military, or defense purposes that are made available only to a federal entity. Teams that want to rely on any exemption should document the factual basis, since sole purpose is a narrow standard that rarely fits multi-use products.
AB 2013 exists alongside other California AI transparency laws. Senate Bill 942, effective Jan. 1, 2026, requires large AI platforms to provide free AI detection tools and to include manifest and latent disclosures in generated audiovisual content, creating complementary transparency for outputs where AB 2013 addresses inputs. Companies must coordinate compliance across both statutes since disclosure obligations intersect at the dataset and model development layers.
Publish like Compliance, Not Marketing
A disclosure page written like a press release will not age well. AB 2013 asks for concrete dataset facts: sources or owners, collection periods, IP status, whether licensing occurred, and whether synthetic data generation was used. Precision and consistency beat persuasion.
Companies can improve defensibility by treating the disclosure page like a controlled policy artifact. Basic hygiene includes an internal owner, a documented review cadence, and a version history that tracks changes across releases. A public last updated date helps readers understand the snapshot, yet internal logs matter more because disputes usually turn on what the company knew and when approval occurred.
Cross-functional sign-off should reflect the risk profile. Privacy review covers personal-information statements and alignment with privacy notices. IP review covers copyright, trademark, and patent disclosures and the licensing statements that support them. Security review validates whether any exemption claim is plausible and whether publication reveals sensitive security details. Product review confirms that the dataset narrative matches how the model was actually trained, tested, validated, or fine-tuned.
Enforcement through Unfair Competition Law
The disclosure requirement contains no explicit enforcement mechanism or penalty structure within the statute itself. The Assembly Committee on Privacy and Consumer Protection analysis describes the requirement as enforceable under California’s Unfair Competition Law, codified at Business and Professions Code section 17200 et seq.
The UCL defines unfair competition as any unlawful, unfair, or fraudulent business act or practice and unfair, deceptive, untrue, or misleading advertising. Section 17204 authorizes enforcement by the California Attorney General, district attorneys, county counsel, and city attorneys. The UCL also provides a private right of action for plaintiffs who suffered injury-in-fact and lost money or property as a result of the violation. Whether AB 2013 noncompliance supports a private action depends on whether a plaintiff can demonstrate economic injury from inadequate or misleading training-data disclosures.
UCL remedies include injunctive relief to stop specific business practices and restitution to return money or property acquired through unfair competition. Public enforcers can seek civil penalties. What constitutes a discrete violation for AB 2013 purposes will turn on enforcement practice and judicial interpretation.
The December 2025 executive order on AI directs federal agencies to challenge certain state AI laws, but executive orders do not repeal state statutes. AB 2013 remains enforceable unless and until courts rule on preemption claims. Companies face dual exposure: noncompliance risk now, plus uncertainty about future federal preemption.
What AB 2013 Means for AI Governance
AB 2013 forces dataset governance into the same compliance architecture that already manages privacy, security, and IP clearance. The statute treats training data as a disclosure obligation with recurring triggers, not as a one-time transparency gesture. Companies that build the four-layer stack, coordinate vendor attestations, and treat the disclosure page as a controlled document will meet the statutory standard. Companies that approach the requirement as a marketing exercise will discover the gap when regulators, plaintiffs, or enterprise buyers demand proof that the published disclosure matched what actually trained the model.
The statute’s design assumes that transparency about training data reduces downstream risk. Whether that premise holds will depend on how disclosure pages get used in practice. What matters now is that the disclosure obligation is enforceable, the triggers are clear, and the operational requirements are specific enough to demand structured workflows. Teams that treat AB 2013 as a release gate, not as a website update, will be ready when scrutiny arrives.
Sources
- California Assembly Committee on Privacy and Consumer Protection: AB 2013 Analysis (April 30, 2024)
- California Business and Professions Code § 17200: Unfair Competition Law
- California Civil Code § 1798.140: CCPA Definitions
- California Civil Code § 3111: AB 2013 Training-Data Transparency Disclosure Requirements
- California Legislature: AB 2013 Chaptered Text (Chapter 817, Statutes of 2024)
- California Legislature: SB 942 California AI Transparency Act (Chapter 291, Statutes of 2024)
- California Senate Judiciary Committee: AB 2013 (Irwin) Senate Judiciary Analysis (June 17, 2024)
- Goodwin Procter: California’s AB 2013 – Generative AI Developers Must Show Their Data (June 10, 2025)
- Institute for Law & AI: xAI’s Challenge to California’s AI Training Data Transparency Law (AB2013) by Bahrad Sokhansanj (Jan. 2026)
- Jones Day: California Enacts AI Transparency Law Requiring Disclosures for AI Content (Oct. 2024)
- Jones Day: California Bill Will Require Disclosures About Data Used in Training AI Models (Sept. 2025)
- The White House: Executive Order on Ensuring a National Policy Framework for Artificial Intelligence (December 11, 2025)
This article is provided for informational purposes only and does not constitute legal advice. Readers should consult qualified counsel for guidance on specific legal or compliance matters.
See also: Lost in the Cloud: The Long-Term Risks of Storing AI-Driven Court Records

Jon Dykstra, LL.B., MBA, is a legal AI strategist and founder of Jurvantis.ai. He is a former practicing attorney who specializes in researching and writing about AI in law and its implementation for law firms. He helps lawyers navigate the rapid evolution of artificial intelligence in legal practice through essays, tool evaluation, strategic consulting, and full-scale A-to-Z custom implementation.
