The Future of Workers Unions: From Job Protection to Data Ownership

The Future of Workers Unions: From Job Protection to Data Ownership

Musicians don’t work for Spotify, but they get paid every time their songs stream. Actors don’t work for Netflix, but they receive residuals when their shows are watched. What if factory workers, customer service reps, and logistics coordinators got royalties every time their expertise trained—or ran—a robot?

This isn’t science fiction. It’s the logical evolution of collective bargaining in an age when AI learns by watching humans work. Just as ASCAP and BMI collect performance royalties for songwriters, “Worker Data PROs” (Performance Rights Organizations) could license human expertise to robotics and AI companies.

The question isn’t whether workers should have governance rights over the data that automates their jobs. The question is whether unions will reinvent themselves to make it happen—or whether tech companies will capture that value by default.

This transition is already underway. The window for action is measured in months, not years.

But let’s be clear: this is hard. The obstacles are real—international arbitrage, measurement challenges, employer resistance, and the long-run trend toward robots generating their own training data. Worker data licensing won’t stop automation or save every job. What it can do is ensure that workers with high-skill, high-tacit-knowledge expertise capture fair value during the 5-15 year window when human data still matters, while building coordination mechanisms that benefit the broader labor movement.

The Core Principle: No Substitution Without Consent

Before diving into mechanisms, let’s be clear about the fundamental right at stake:

No one’s work should be used for substitutional, competitive training without consent and remuneration.

This isn’t a new principle—it’s the logical extension of existing voice and likeness rights. Actors have long controlled how their image and voice are used. Musicians receive residuals when their performances are replayed. Authors get royalties when their words are reprinted.

Worker expertise captured as training data is no different. When a company records how a skilled welder works, then uses that recording to train a robot that replaces the welder, they’ve converted human expertise into automation capital. The worker deserves consent and compensation—just as an actor deserves compensation when their likeness trains an AI performance model.

What’s needed in every Collective Bargaining Agreement: Explicit data-use clauses that require:

  • Informed opt-in consent for any data capture intended for automation
  • Licensing terms specifying compensation for training, fine-tuning, and inference use
  • Audit rights allowing union review of how data is used
  • Prohibition on substitutional automation without negotiated terms

Three Rights Framework: Separating Governance, Bargaining, and Economic Rights

Before we discuss mechanisms, we need clarity about what worker data rights actually mean. Current discussions blur three distinct categories:

1. Governance Rights (Consent & Control)

What it means: Workers have the right to know when their performance data is being captured and to grant or withhold consent for specific uses.

Not about money: This is about autonomy and informed consent—the right to say “no” to being automated without your permission.

Legal basis: Privacy laws (GDPR, CCPA), workplace monitoring regulations, and collective bargaining over working conditions.

Example: A hospital nurse can refuse to have their patient interaction techniques recorded for AI training, regardless of compensation offered.

2. Bargaining Rights (Collective Terms)

What it means: Workers collectively negotiate the terms under which their expertise can be licensed, just as unions negotiate wages and benefits.

Not individual deals: Like minimum wage floors, licensing terms work when set collectively, not when each worker bargains alone.

Legal basis: National Labor Relations Act (NLRA) treats workplace monitoring and data use as mandatory bargaining subjects.

Example: UAW negotiates that any robotics training using autoworker data requires union approval, third-party audits, and royalty rates tied to production volume.

3. Economic Rights (Compensation)

What it means: Workers receive ongoing payment when AI systems trained on their expertise generate value—through training fees, fine-tuning royalties, or inference-time payments.

Not one-time buyouts: Like music streaming residuals, payment tied to actual usage over time.

Legal basis: Contract law, intellectual property principles, and emerging licensing frameworks (see Copyright.sh for web content precedent).

Example: When a robot trained on a machinist’s techniques runs on a factory floor, that machinist receives micropayments tied to production output.

Why this framework matters: Blurring these categories causes confusion. You can have governance rights (consent) without economic rights (payment). You can have collective bargaining without individual compensation. The strongest position combines all three: collective consent frameworks, union-negotiated terms, and individual revenue streams tied to usage.

The Industry Built on Human Labor (That Workers Don’t Control)

AI doesn’t emerge from algorithms alone. Every robot learning to pick warehouse boxes, every chatbot learning to handle customer complaints, every autonomous forklift learning to navigate factory floors—they all learn from human-labeled training data.

The data labeling industry is worth an estimated $2-4 billion in 2024 (estimates vary by source), growing at approximately 25–29% CAGR toward $15–17 billion by 2030 (Grand View Research, Mordor Intelligence). Companies like Scale AI, Appen, Sama, and Labelbox employ millions of workers globally to annotate images, transcribe conversations, and label sensor data.

Here’s what that looks like in practice:

Manufacturing & Robotics:

  • Human workers demonstrate how to assemble components
  • Cameras capture their movements, annotated by data labelers
  • Robots learn from thousands of labeled examples
  • Vendors like Continental, Precision AI, and Microsoft market annotation services with SLAs promising 99%+ accuracy—though actual achieved rates depend on task complexity and annotation methodology

Customer Service:

  • Service reps handle support tickets while AI systems watch
  • Their responses get labeled: helpful/unhelpful, empathetic/cold, accurate/wrong
  • AI chatbots train on this labeled conversation data
  • Human reps eventually train their own replacements

Logistics & Warehousing:

  • Forklift operators navigate complex environments
  • Sensor data (cameras, LIDAR, GPS) gets annotated frame-by-frame
  • Autonomous vehicles learn safe navigation from human expertise
  • Workers teach the systems that will eliminate their positions

The irony is brutal: workers generate the most valuable data at the exact moment their jobs become automatable. And under current arrangements, they capture none of that value.

But here’s the hard truth: Not all work generates equally valuable data. High-skill, high-tacit-knowledge workers—elite welders, ICU nurses, senior machinists, expert troubleshooters—possess expertise that’s difficult to document and expensive to replicate. This is where licensing creates real value.

Conversely, if your work is already well-documented in training manuals and standard operating procedures, the marginal value of your performance data is lower. Employers will simply improve their documentation rather than pay royalties. This isn’t defeatist—it’s strategic targeting. Start with workers whose tacit knowledge commands premium prices.

Reframing Value: From Training Data to Inference Revenue

Here’s what most discussions of worker data miss: training is only part of the stack.

The real value isn’t locked up at training time—it’s realized during fine-tuning and inference. This is where sector-specific expertise becomes actionable, and where licensing, attribution, and metering can actually be enforced.

Why inference matters more than training:

  • Training happens once; inference happens millions of times. A model trained on worker data runs continuously in production, generating value with every prediction.
  • Fine-tuning is where domain expertise creates differentiation. A generic robotics model becomes valuable when fine-tuned on specific assembly techniques from skilled autoworkers.
  • Inference is where you can meter usage. Unlike training data that’s consumed once, inference calls can be tracked, logged, and billed—just like Spotify counts streams.
  • Technical mechanisms that enable inference-time licensing:

    • Enterprise adapters: Custom model adapters trained on licensed worker data, deployed only for licensed clients
    • Retrieval layers: Worker expertise stored in vector databases, retrieved at inference time with attribution and billing per-query
    • Domain heads: Specialized output layers that bind model behavior to consent and remuneration agreements

    This reframing changes the economics entirely. Instead of a one-time payment for training data, workers receive ongoing revenue share tied to model deployment and usage. Every time a robot uses techniques learned from a specific worker’s data, that worker earns a micropayment.

    Copyright.sh already does this for web content—meta tags that specify licensing terms, HMAC-versioned usage logging, and per-inference royalty tracking. The same infrastructure applies to worker data.

    Realistic timeline: This creates value during the transitional period when human expertise still outperforms synthetic data—roughly 5-15 years depending on the domain. Eventually, robots will generate their own training data through experience, and human data premiums will erode. Worker data licensing isn’t “royalties forever”—it’s extracting fair value during the window when human knowledge provides competitive advantage.

    Why Unions Should Build Worker Data Licensing Infrastructure

    The parallel to music streaming is precise. Before Spotify, piracy threatened musicians’ livelihoods. Performance Rights Organizations (PROs) like ASCAP, BMI, and SESAC emerged to:

  • Track usage: Monitor when and where songs are played
  • License in bulk: Negotiate rates with streaming platforms on behalf of millions of artists
  • Collect royalties: Gather payments and distribute to members based on actual usage
  • Provide leverage: Individual artists can’t negotiate with Spotify. PROs can.
  • Worker Data PROs could do exactly the same—but unions already have strengths PROs don’t: collective bargaining rights, pension capital, and demonstrated strike power in chokepoint sectors.

    Acknowledging union power where it exists: Unions in ports (ILWU), airlines (pilots, flight attendants), entertainment (SAG-AFTRA, WGA), public sector (teachers, transit), and skilled trades (electricians, plumbers) control genuine chokepoints. These unions have proven strike leverage.

    The data strike advantage: What worker data licensing adds is NEW leverage in sectors where traditional strikes are weakening. When automation threatens to replace workers, “we’ll stop showing up” loses power. But “we’ll withhold the training data you need to automate” creates bargaining power BEFORE replacement happens. This is complementary to traditional union strength, not a replacement.

    1. Tracking: Monitor When Worker Data Trains AI Systems

    Just as BMI tracks radio plays and Spotify streams, Worker Data PROs would track when companies use worker-generated data. This isn’t technically complex—Copyright.sh already does it for web content with meta tags and HMAC versioning.

    In manufacturing, this looks like:

    • Workers wear sensors that capture their techniques (with consent)
    • Data gets tagged with a unique ID and licensing terms
    • When robotics companies train models or run inference, they report usage (just like Spotify reports streams)
    • PRO calculates royalties based on data consumption—including ongoing inference calls

    Administrability challenge: Can this actually be measured and enforced? Let’s be honest about what’s feasible:

    What CAN be audited:

    • Training data provenance through cryptographic tagging and version control
    • Fine-tuning jobs using worker-specific datasets (logged in training pipelines)
    • Inference calls in enterprise deployments with contractual audit rights
    • Model performance attribution when worker data creates measurable accuracy gains

    What’s HARDER to audit:

    • Training data mixed into massive web-scraped datasets (requires watermarking)
    • Inference in closed systems without external monitoring
    • Attribution when multiple data sources contribute to outputs
    • Circumvention through synthetic data generation or foreign training facilities

    Mitigation strategies:

    • Start with regulated industries (healthcare, aviation, defense) where audit trails already exist
    • Focus on high-value, specialized datasets where provenance matters for quality
    • Use procurement rules to require licensing compliance as condition of sale
    • Build guild/union certification programs that buyers demand (like Fair Trade labels)

    Realistic expectation: We won’t catch 100% of usage. But we don’t need to—ASCAP doesn’t track every radio play either. What matters is establishing the norm that licensed data is required, creating enough enforcement to make compliance cheaper than evasion, and building mechanisms that work for 70-80% of commercial use cases.

    2. Licensing: Negotiate Collective Terms

    Individual workers have zero leverage negotiating with Boston Dynamics or Tesla’s robotics division. But a union representing 50,000 autoworkers whose assembly techniques are captured in training datasets? That’s a bargaining position.

    But here’s the coordination problem: if one union licenses data at $X and another licenses similar data at $X/2, buyers gravitate to the cheaper source. Multiple unions without coordination creates a race to the bottom.

    Coordination mechanisms to prevent this:

  • Sector-Wide Standards Bodies: Form cross-union licensing consortia (like how ASCAP, BMI, SESAC coexist but coordinate on baseline rates). Example: “Manufacturing Data Licensing Consortium” sets minimum rates and audit standards.
  • Antitrust Safe Harbors: Seek regulatory clarity that collective licensing by labor organizations qualifies for antitrust exemptions (like sports leagues, agricultural co-ops, and existing PROs enjoy).
  • Model Licensing Templates: Publish reference agreements that establish market norms. When 5-10 major unions use identical terms, buyers accept them as industry standard.
  • Buyer Certification Programs: Create “Ethical AI Training” certifications that require licensed data. Buyers seeking certification must meet consortium standards, eliminating low-ball competition.
  • Licensing models could work several ways:

    Model A: Per-Use Licensing (like TollBit for web content)

    • Robotics companies pay per training run and per inference call using worker data
    • Rates negotiated by union, scaled by data volume
    • Workers receive royalties proportional to their data’s usage

    Model B: Subscription Access (like RSL Collective)

    • Companies pay annual fees for access to union training data libraries
    • Revenue distributed to workers based on contribution
    • Tiered pricing: research use vs. commercial deployment vs. inference at scale

    Model C: Performance-Based (emerging AI licensing model)

    • Base licensing fee + royalty if worker data improves robot accuracy beyond benchmarks
    • Aligns incentives: better data = higher compensation
    • Union negotiates performance metrics and audit rights

    3. Collection: Distribute Royalties Like Streaming Residuals

    This is where unions become data co-ops. Instead of individual workers tracking their data usage (impossible), the PRO:

    • Maintains a registry of worker-contributed training data
    • Receives bulk payments from AI companies
    • Distributes royalties based on verified usage
    • Handles accounting, auditing, and disputes

    Just as ASCAP doesn’t require Taylor Swift to personally track every Spotify stream, Worker Data PROs would handle the infrastructure while workers receive quarterly checks.

    Addressing the equity critique: What about non-union workers who get displaced but receive no royalties?

    This is a real problem. Licensing structures that only benefit union members create two-tier systems where the organized get protection while the vulnerable get nothing. Solutions:

  • Sectoral Bargaining: Extend negotiated terms industry-wide, not just to union shops (like European sectoral agreements or California’s proposed FAST Act model).
  • Portable Benefits Funds: Pool a percentage of licensing revenue into industry-wide funds that support ALL displaced workers—training programs, income support, transition assistance.
  • Procurement-Driven Standards: Government contracts and major buyers require ethical AI sourcing, creating baseline protections for all workers regardless of union status.
  • Guild Models: Professional guilds (see section below) can organize workers across employers, creating broader coverage than single-employer unions.
  • This isn’t either/or: Worker data licensing AND wealth redistribution (UBI, job guarantees, robust social safety nets) are both necessary. Licensing provides bargaining leverage and transitional income; redistribution addresses systemic inequality and long-run automation. We need both.

    4. Strategic Ownership: Pension Funds as Equity Weapons

    Here’s the strategic advantage unions already have: many control significant pension capital. CalPERS (California Public Employees’ Retirement System) manages over $500 billion. Major union pension funds collectively control hundreds of billions more.

    This creates a unique opportunity that unions should act on now:

    Acquire Stakes in AI/Data Infrastructure Companies:

    • Pension funds should pursue minority stakes with board seats in data labeling companies, robotics firms, and AI infrastructure players
    • Deals can be leveraged or unleveraged depending on risk appetite—member votes should approve major moves
    • Board representation enables direct enforcement of responsible data handling and worker compensation policies
    • Example: A 5% stake in a data labeling company worth $10B = $500M position with potential board seat

    Playbook for Strategic Ownership:

  • Shareholder Proposals: File proposals at AI companies demanding worker data royalty programs, consent requirements, and audit transparency. Even proposals that don’t pass signal investor sentiment and shape corporate behavior.
  • Proxy Campaigns: Coordinate with other institutional investors (ESG funds, state pension funds, endowments) to vote as a bloc on worker data issues. 10-15% coordinated voting can influence board elections.
  • Governance Policies: Adopt union-wide investment policies that require portfolio companies to:
  • – Disclose worker data usage in training and fine-tuning
    – Implement consent frameworks before automation data capture
    – Share revenue with workers whose data trains AI systems

  • Conditional Investment: “You want our pension capital? You’ll share training data revenue and give workers audit rights.”
  • Strike Threats That Actually Work:

    • Traditional strikes lose power when robots replace workers
    • But “we’ll withhold training data” hits companies before automation completes
    • Can’t train robots without human expertise to learn from
    • Data strikes can be surgical—targeted at specific product lines or customers

    URGENT: Worker Consent and Immediate Union Action

    Employees have not consented to being automated.

    This bears repeating: the vast majority of workers being recorded, tracked, and analyzed have never explicitly consented to their data being used for training AI systems that will replace them.

    Unions must immediately:

  • Oppose data tracking for training/fine-tuning/automation unless workers have given informed, opt-in consent with clear licensing terms
  • Negotiate opt-in programs that include:
  • – Explicit consent for data capture
    – Defined royalty rates for training and inference use
    – Audit rights to verify how data is being used
    – Revocation rights if terms are violated

  • Add data-use clauses to all CBAs treating worker expertise like voice and likeness—requiring consent and compensation for any commercial use
  • Clarifying what workers own vs. what employers own:

    • Employers own: Business process artifacts (SOPs, documented procedures, workflow diagrams, generalized best practices)
    • Workers own: Person-level performance traces (individual movement patterns, problem-solving techniques, tacit knowledge that isn’t documentable)

    Example: A hospital owns the clinical protocol for administering medication (documented process). The hospital doesn’t own the specific diagnostic instincts and patient interaction techniques of an experienced ICU nurse (tacit expertise).

    This distinction determines what’s licensable. Documented, standardized processes have low royalty value—employers will just improve documentation. Undocumented, tacit expertise that’s hard to capture in manuals? That’s where licensing creates real value.

    SPECIAL NOTE FOR CUSTOMER SERVICE WORKERS

    Customer service is ground zero for AI automation. Chatbots are being trained right now on millions of conversation transcripts from human agents—often without explicit consent or compensation.

    Customer service unions and workers should:

  • Pursue rapid collectivization of workers across companies being automated
  • File complaints and litigation against employers who have used interaction logs to train customer service bots without explicit consent and compensation
  • Demand emergency bargaining on:
  • – Immediate moratorium on training AI using agent transcripts
    – Retroactive compensation for data already used
    – Ongoing royalties for any future use
    – Audit access to AI training datasets

  • Document everything: Keep records of when you were recorded, what systems monitored your work, and any AI tools deployed to assist or replace you
  • The legal theory is straightforward: your conversation techniques, empathy patterns, and problem-solving approaches are your expertise. Using them to train a replacement without consent is appropriation of your professional skills.

    Acknowledging second-order effects: Employers may respond to licensing demands by reducing worker discretion, minimizing transparency about data collection, or reclassifying work to avoid royalty obligations. Unions must anticipate this and build mitigations into CBAs: protections against retaliation, mandatory disclosure of all data collection systems, job quality standards that prevent deskilling to evade licensing costs.

    Best Entry Points: Where Worker Data Licensing Works First

    Not all sectors are equally ready for worker data licensing. The strongest starting points combine:

    • High-skill, high-tacit-knowledge work (hard to document in manuals)
    • Existing regulatory frameworks (audit trails, liability requirements)
    • Guild-like professional structures (credentialing, peer review)
    • Buyers who value provenance and quality (willing to pay premiums for ethical sourcing)

    Top Candidates:

    Healthcare (Nurses, Diagnosticians, Specialists)

    Why it works:

    • Patient safety regulation requires documented training data provenance
    • High-liability work (medical AI errors create legal exposure)
    • Tacit expertise is core value (clinical judgment, bedside manner)
    • Buyers (hospitals, device makers) already pay premiums for quality

    Action: National Nurses United and SEIU negotiate consent frameworks in next contract cycles, pilot licensing with medical device companies training surgical robots.

    Aviation (Pilots, Air Traffic Controllers, Mechanics)

    Why it works:

    • FAA certification requirements create natural audit mechanisms
    • Safety-critical systems demand transparent training data
    • Union chokepoint leverage (can’t fly without certified pilots)
    • High compensation floors make royalties economically meaningful

    Action: ALPA (Air Line Pilots Association) establishes licensing terms for simulator data and flight pattern analysis used in autonomous systems.

    Creative/Technical Guilds (VFX Artists, Voice Actors, Editors)

    Why it works:

    • Guild structures already exist (VES, SAG-AFTRA voice division)
    • Clear provenance (individual artist contributions are tracked)
    • Buyers already negotiate licensing (studios, game companies)
    • Rapid automation threat creates urgency

    Action: Visual Effects Society creates standard licensing terms for training data from VFX workflows, negotiated into studio contracts.

    Government/Defense Procurement

    Why it works:

    • Procurement rules can require ethical AI sourcing
    • Buyers are mission-driven (not just profit-maximizing)
    • Transparency requirements (public contracts, FOIA)
    • Domestic data preferences (national security considerations)

    Action: Federal acquisition regulations updated to require worker consent and licensing for any AI training data in government contracts, creating instant market for compliant datasets.

    Why NOT start with easily standardized work: Warehouse picking, basic assembly, scripted customer service—these generate lower-value data because they’re easier to document and standardize. Start where tacit knowledge commands premium prices.

    Which Workers Need This Most: Follow the Automation Money (But Acknowledge Reality)

    Prioritizing by:

  • High automation exposure + High capital investment in robotics
  • High-skill, high-tacit-knowledge work (not easily documented)
  • Unions with existing pension funds and organizing capacity
  • Chokepoint leverage or regulatory advantage
  • Manufacturing & Assembly (UAW, IAM, Steelworkers)

    • Why: Robotics companies pay premium rates for accurate assembly demonstrations from skilled tradespeople
    • Data value: Video of skilled welders, machinists, assembly line workers teaching robots dexterity
    • Market size: Industrial robotics ~$17B market growing ~10% annually
    • Union leverage: UAW has significant pension assets ($50B+), history of effective strikes, chokepoint control in auto sector
    • Reality check: Basic assembly work has lower licensing value; focus on specialty trades and high-skill operations

    Logistics & Transportation (Teamsters, ILWU)

    • Why: Autonomous vehicles and warehouse robots learn from human drivers and operators
    • Data value: Forklift operation, delivery routes, loading dock procedures
    • Market size: Warehouse automation market approaching $30B by 2026
    • Union leverage: ILWU controls port chokepoints; Teamsters have substantial pension capital
    • Reality check: Route optimization data has value; basic driving patterns less so

    Healthcare & Clinical Work (SEIU, National Nurses United)

    • Why: Medical AI needs labeled patient data, diagnostic procedures, care protocols
    • Data value: Nurse workflow data, doctor diagnostic patterns, therapy techniques—high tacit knowledge
    • Market size: Healthcare AI market projected at $190B by 2030
    • Union leverage: NNU has proven strike effectiveness; healthcare has regulatory advantages
    • Reality check: This is high-value territory—clinical judgment is hard to standardize

    Customer Service & Hospitality (UFCW, UNITE HERE)

    • Why: Chatbots and service AI train on human interaction data
    • Data value: Conversation transcripts, conflict resolution techniques, empathy patterns
    • Market size: Conversational AI market reaches $14B by 2025
    • Union leverage: UFCW has 1.3M members, service sector generates massive training data volume
    • Reality check: High volume but lower per-worker value; sectoral funds may be better than individual royalties

    Addressing International Arbitrage: Why Can’t They Just Train Abroad?

    The hard question: If US/EU workers demand licensing fees, what stops companies from training AI on Indian or Southeast Asian workers without paying royalties?

    Short answer: Nothing stops them technically. But enforcement mechanisms exist if we build them:

    1. Procurement Rules & Buyer Standards

    Model: Like conflict-free minerals, Fair Trade coffee, or sustainable timber certification

    Mechanism:

    • Government contracts require “Ethical AI Training” certification
    • Certified providers must prove worker consent and compensation
    • Major buyers (Amazon, Walmart, hospitals) adopt voluntary standards
    • Market pressure favors compliant datasets even when alternatives exist

    Precedent: Apple’s supply chain audits, Patagonia’s fair labor certification—buyers DO pay premiums for ethical sourcing when reputation matters.

    2. Data Localization & Import Restrictions

    Model: EU data protection rules, China’s data sovereignty laws

    Mechanism:

    • AI systems trained on foreign data face import restrictions or tariffs
    • Domestic deployment requires domestic training data (or licensed foreign data)
    • National security rationale: critical infrastructure shouldn’t depend on foreign data

    Precedent: GDPR already restricts cross-border data flows; extending to AI training data isn’t radical.

    3. Training Data Standards & Certification

    Model: ISO standards, organic food labels, LEED building certification

    Mechanism:

    • Industry consortia (unions + buyer coalitions) establish “Certified Ethical AI Training” standards
    • Standards require:

    – Documented worker consent at time of data capture
    – Fair compensation tied to usage (not exploitative one-time payments)
    – Third-party audits of compliance
    – Supply chain transparency (where was data collected, under what terms?)

    • Buyers seeking certification must source only certified data
    • Market bifurcates: premium “certified” vs. commodity “uncertified”

    Why this works: Just as “organic” food commands price premiums despite cheaper conventional alternatives, “ethically trained AI” can command premiums from buyers who value transparency, liability protection, and reputation.

    4. Liability & Legal Risk

    Model: Product liability law, medical malpractice

    Mechanism:

    • AI systems trained on unlicensed worker data face legal exposure when they fail
    • “Your surgical robot was trained on data stolen from unconsenting nurses” becomes viable lawsuit
    • Buyers prefer licensed data because it includes liability protection

    Precedent: Getty Images suing Stability AI for training on unlicensed photos; class action suits against AI companies for copyright infringement.

    Will this stop ALL foreign training? No. Commodity use cases will chase the lowest cost. But high-value, high-liability, reputation-sensitive applications (healthcare, aviation, defense, premium consumer products) will pay for licensed data when certification standards exist and legal risks are clear.

    Strategic implication: Worker data licensing succeeds first in regulated, high-stakes sectors where provenance matters, THEN expands to commodity markets as standards become normalized.

    The Legal Hooks Already Exist (Sort Of)

    Worker data licensing isn’t starting from zero. Several legal frameworks provide partial foundation:

    1. GDPR (Europe) & CCPA (California): Data Subject Rights

    Under GDPR Article 15-22, workers have rights to:

    • Access their personal data
    • Correct inaccuracies
    • Delete data in some circumstances
    • Object to automated decision-making

    Gap: These laws focus on protection (opt-out, deletion) not commercialization (licensing, royalties). Workers can stop data use but can’t monetize it.

    What’s needed: Amendments treating worker data as intellectual property with licensing rights, not just privacy rights.

    2. Contract Law: Terms of Employment vs. Data Ownership

    Currently, most employment contracts include language like “work product belongs to employer.” This covers physical outputs (manufactured goods) and intellectual outputs (designs, code).

    Open question: Does “work product” include biometric data captured while working? Movement patterns? Conversation techniques?

    Courts haven’t definitively ruled. Union-negotiated contracts could explicitly carve out data governance rights, similar to how actors’ contracts address residuals.

    3. Collective Bargaining Agreements: Data as Negotiable Terms

    The National Labor Relations Act (NLRA) requires employers to bargain in good faith over “wages, hours, and other terms and conditions of employment.”

    Legal argument: Worker training data usage qualifies as a “condition of employment” subject to collective bargaining—just as safety conditions, scheduling, and monitoring practices are bargainable. Courts have consistently held that workplace monitoring is a mandatory bargaining subject.

    Key precedent: Screen Actors Guild (SAG-AFTRA) struck from July 14 to November 9, 2023 and won AI usage protections including consent requirements and compensation for AI-generated performances using actors’ likenesses. The Writers Guild of America (WGA) struck from May 2 to September 27, 2023 (148 days) and won protections against AI writing tools, including disclosure requirements and restrictions on using writers’ work to train AI without consent.

    These entertainment industry precedents establish that AI training data usage is negotiable—and that unions can win meaningful protections through collective action.

    4. Trade Secret Law: Worker Expertise as Proprietary Knowledge

    Skilled workers possess trade knowledge that qualifies as trade secrets under the Defend Trade Secrets Act. A master welder’s technique, a senior nurse’s diagnostic instincts, a logistics coordinator’s routing optimization—these are valuable, non-obvious, and economically beneficial.

    Legal hook: When companies capture this expertise as training data, they’re commercializing workers’ trade secrets. Workers could claim governance rights and demand licensing fees.

    Challenge: Trade secret law typically protects employers’ secrets from departing employees, not the reverse. This framing is unconventional but arguable—and unions should push for CBA carve-outs that explicitly recognize worker expertise as worker-owned intellectual property.

    Who Profits from the Data Intermediaries Right Now

    To understand the upside of worker-owned licensing, look at who’s capturing value today:

    Scale AI: The $29 Billion Data Giant

    Scale AI provides “AI training data as a service” to companies like OpenAI, Toyota, and the U.S. military. In early 2025, Meta finalized an investment acquiring approximately a 49% stake in Scale AI at a valuation of $30 billion. Scale AI’s revenue is estimated to have approached $870 million in 2024.

    They pay human workers $15-25/hr to label images, transcribe audio, and annotate sensor data. Then they sell that labeled data to AI companies at massive markup.

    The margin: Revenue approaching $1B while labor costs are a fraction of total expenses. The difference is captured as profit, not paid to the workers who created the value.

    Appen: 1 Million “AI Training Specialists” Globally

    Appen operates a gig platform where workers label data remotely. They’re paid per task—often pennies for each labeled image or transcription. Appen’s clients include Microsoft, Adobe, and major robotics firms.

    The economics: Appen’s market cap peaked at over $1 billion. Workers receive task-based payments with no governance rights over the datasets they create. Once labeled, that data is resold to multiple clients.

    Sama: “Ethical AI” That Still Extracts Worker Value

    Sama positions itself as socially responsible, employing workers in Kenya, Uganda, and India at better-than-local wages. They provide AI training data to companies like Microsoft, Walmart, and Continental (automotive).

    The pitch: “We help people lift themselves out of poverty through AI work.”

    The reality: Workers get $2-9/hour depending on location. The datasets they create sell for thousands to millions per contract. Workers see none of the upside if their labeled data proves particularly valuable, and have no ongoing rights to data they annotated.

    The True Cost of Automation (And Why That’s Appropriate)

    Here’s an economic reality that needs acknowledgment:

    When creator/worker data costs are properly accounted for, automation costs will rise substantially—likely at least double current estimates.

    This isn’t bad news. It’s appropriate pricing of the true cost of automation.

    Currently, automation economics assume near-zero cost for human expertise used in training. When that expertise is properly licensed—when workers receive fair compensation for the knowledge that makes robots work—automation becomes more expensive.

    This is a feature, not a bug:

  • Higher automation costs create transition time. Workers need years, not months, to retrain. Slower automation gives them that runway.
  • Fair pricing aligns incentives. When automation costs include worker compensation, companies make more thoughtful decisions about what to automate and when.
  • Revenue flows back to workers. Instead of all automation value accruing to capital, workers receive ongoing income streams from the expertise they contributed.
  • Policy implications:

    Nations that protect workers and maintain wage floors should consider:

    • Tariffs on automation products from jurisdictions that don’t require worker consent or compensation for training data
    • Import restrictions on AI systems trained on uncompensated worker data
    • Proceeds earmarked for displaced workers and re-skilling programs

    This isn’t protectionism—it’s enforcement of fair labor standards in the AI economy, just as tariffs can enforce environmental standards or prevent dumping.

    Acknowledging the limits: This slows automation but doesn’t stop it. Eventually, as robots generate their own training data and synthetic data improves, human data premiums erode. Worker licensing buys time and captures transitional value—it’s not a permanent solution to technological unemployment. That requires broader wealth redistribution.

    Guilds: The Faster Path for Specialized Workers

    While unions navigate slow-moving collective bargaining processes, guilds can act faster to establish data governance norms.

    Professional guilds—think SAG-AFTRA for actors, but applied to technical and creative fields—have advantages:

    What guilds can do immediately:

  • Set inference-time licensing norms: Publish standard rates for model usage based on member data. Companies want clear terms; guilds can provide them.
  • Create model-usage codes of conduct: Define acceptable use (research, limited commercial) versus prohibited use (substitutional automation without consent) for member expertise.
  • Negotiate access terms for sector datasets: Customer support transcripts, medical workflows, creative works—guilds can license these collectively while individuals cannot.
  • Mandate consent and provenance tags: Guild charters can require all member data contributions include cryptographic provenance, consent records, and licensing terms.
  • Require paid adapters for fine-tuning: Any model fine-tuned on guild member data must pay licensing fees before deployment.
  • Enforce disclosure requirements: Companies using guild member data must disclose inference logs and usage metrics tied to member data.
  • Guild + Union Coordination:

    The optimal structure combines guild agility with union capital:

    • Guilds set standards: Technical licensing terms, consent frameworks, pricing guidelines
    • Unions enforce through collective bargaining: CBA provisions that incorporate guild standards
    • Both pursue board seats: Guilds target AI companies directly; unions use pension capital for broader governance

    Rapid-Response Guild Committees:

    Guilds should establish standing committees with authority to:

    • File complaints with labor boards and regulators
    • Initiate litigation against unauthorized data use
    • Run pilot attribution and royalty registries
    • Negotiate directly with AI companies on behalf of members

    Target timeline: Stand up functional guild committees within 90–180 days.

    180-Day Roadmap: How Unions Launch Worker Data Licensing NOW

    The transition to worker data governance is already underway. Unions that act in the next 6 months will shape the market. Those that wait will find the terms already set by others.

    Phase 1: Foundation (Days 0–60)

    Consent and Provenance Infrastructure:

    • Deploy opt-in consent frameworks for pilot groups (500-1000 workers)
    • Implement provenance tagging for all data capture (unique IDs, timestamps, licensing terms)
    • Establish secure registry for worker-contributed data with version control
    • Partner with technical providers (Copyright.sh, blockchain provenance platforms) for infrastructure

    Governance Preparation:

    • Draft shareholder proposals for AI companies in pension fund portfolios
    • Identify target companies for board seat campaigns
    • Develop union investment policy language on AI data practices
    • Coordinate with other institutional investors on proxy voting strategy

    Legal Foundation:

    • Draft model CBA data-use clauses for next negotiation cycle
    • File initial complaints where clear consent violations exist
    • Establish legal defense fund for data rights litigation
    • Engage labor law firms with AI/technology expertise

    Phase 2: Pilot Licensing (Days 60–120)

    Employer Pilot Programs:

    • Launch licensing pilot with at least one employer per sector
    • Test per-use and inference-time billing mechanisms
    • Implement metering and usage reporting
    • Run first royalty distribution to pilot participants

    Technical Validation:

    • Verify provenance tracking through full data lifecycle
    • Test audit mechanisms with employer cooperation
    • Measure licensing compliance rates
    • Document cost/benefit for both workers and employers

    Governance Execution:

    • Submit shareholder proposals at spring annual meetings
    • Execute first coordinated proxy votes on AI data issues
    • Begin negotiations for board observer seats where stakes justify

    Phase 3: Scale and Standardization (Days 120–180)

    Sector Consortium Formation:

    • Expand pilots to multiple employers per sector
    • Form cross-union working groups for standard terms
    • Publish standardized licensing grammar for worker data (compatible with Copyright.sh format)
    • Create sector-wide registries for cross-company licensing

    Public Standard Terms:

    • Publish reference licensing agreements for common use cases
    • Establish audit process standards and certification
    • Define compliance requirements for AI companies
    • Create public pricing guidelines by data type and use case

    Policy Engagement:

    • Submit recommendations to labor boards on data as bargainable term
    • Engage legislators on worker data governance legislation
    • Coordinate with international unions on cross-border standards
    • Build coalition with consumer groups, academics, and civil society

    Policy Appendix: Inference-Time Licensing Template

    For unions and guilds ready to implement, here’s a template framework for inference-time licensing:

    Standard License Terms:

    “`
    WORKER DATA INFERENCE LICENSE

    Licensor: [Union/Guild Name] on behalf of contributing workers
    Licensee: [AI Company Name]

  • SCOPE OF LICENSE
  • a. Licensed Data: Worker-contributed [type] data as registered in [Registry Name]
    b. Permitted Uses:
    – Model inference in production systems: LICENSED
    – Model fine-tuning: LICENSED with additional fee
    – Redistribution of raw data: PROHIBITED
    – Training foundation models: SEPARATE LICENSE REQUIRED

  • ATTRIBUTION
  • a. Licensee shall maintain logs of inference calls using Licensed Data
    b. Attribution to source workers via registry ID required
    c. Quarterly usage reports required

  • COMPENSATION
  • a. Per-inference fee: $[X] per 1,000 inference calls
    b. Fine-tuning fee: $[X] per model fine-tuned
    c. Revenue share: [X]% of products substantially dependent on Licensed Data
    d. Minimum annual payment: $[X]

  • AUDIT RIGHTS
  • a. Licensor may audit usage logs with 30 days notice
    b. Third-party auditor acceptable by mutual agreement
    c. Non-compliance triggers 2x remediation payment

  • TERM AND TERMINATION
  • a. Initial term: 12 months
    b. Auto-renewal unless 90 days notice
    c. Immediate termination for material breach

  • CONSENT VERIFICATION
  • a. All Licensed Data includes verified worker consent
    b. Licensee warrants not to circumvent consent mechanisms
    c. New data requires re-verification of consent status
    “`

    This template can be customized by sector, data type, and negotiating position.

    The Objections (And Why They’re Partially Right)

    “Workers already get paid for their labor. Why should they get paid twice?”

    The analogy: Musicians already get paid to record songs. Then they get streaming royalties. Actors get paid to film shows. Then they get residuals. Authors get paid advances. Then they get royalty checks.

    The principle: When your work generates ongoing value, you should receive ongoing compensation. AI training data has ongoing value every time a model retrains or—more importantly—every time a model runs inference.

    The caveat: This works best for high-value, hard-to-replicate expertise. If your work is easily standardized, licensing has less economic power.

    “Companies own the data because they own the workplace equipment that captures it.”

    The counter: By that logic, record labels would own musicians’ performance rights because they own the recording studios. But copyright law recognizes that creative output belongs to creators, not to whoever owns the equipment.

    The distinction: Employers own business process artifacts (documented procedures, standardized workflows). Workers have governance rights over person-level performance traces (individual techniques, tacit knowledge, undocumented expertise).

    The caveat: Courts haven’t definitively ruled on this. It’s an open legal question that unions must push through collective bargaining.

    “This will make automation more expensive and slow down progress.”

    The response: Licensing music didn’t kill Spotify. It made streaming sustainable by ensuring artists could afford to keep creating. Similarly, worker data licensing makes automation sustainable by ensuring workers can afford the transition.

    The reality: Yes, it raises costs. That’s appropriate pricing of true automation costs. Faster automation without compensation creates catastrophic social costs.

    The caveat: Eventually, as robots generate their own training data, human data premiums erode. This is transitional value extraction, not permanent protection.

    “Workers will never organize effectively around data ownership.”

    The precedent: SAG-AFTRA struck successfully from July 14 to November 9, 2023 over AI usage and won contractual protections. The Writers Guild struck from May 2 to September 27, 2023 over AI in screenwriting and won. Unions ARE already organizing around automation and data issues.

    The reality: The infrastructure doesn’t exist yet, but neither did music streaming royalties before Spotify forced their creation. Markets adapt when there’s economic and political pressure. Unions provide both.

    The caveat: This is genuinely hard. It requires technical infrastructure, legal innovation, international coordination, and sustained organizing. Many attempts will fail. The question is whether unions try or surrender by default.

    Why Copyright.sh Cares About Worker Data (Beyond Web Licensing)

    Copyright.sh’s mission is “fair compensation when AI learns from human expertise.” We started with web content creators because:

  • Lowest friction: A meta tag is easier to implement than sensor networks
  • Largest immediate market: Billions of web pages already exist
  • Proof of concept: Show licensing infrastructure can work
  • But worker data is the same problem at larger scale. If we solve licensing for bloggers, journalists, and photographers, the same infrastructure can extend to factory workers, nurses, and drivers.

    Our role isn’t to build Worker Data PROs—unions should do that. But the technical primitives we’re developing—licensing grammars, usage tracking, payment automation, provenance verification—apply equally to:

    • Web content (what we do now)
    • Worker-generated training data (what unions could do)
    • Any human expertise that AI learns from (the general case)

    We want to see worker data collectives succeed because every creator—whether they create blog posts or create assembly techniques—deserves governance rights and compensation when AI learns from their work.

    What You Can Do (Even If You’re Not in a Union)

    If you’re a worker in a high-automation-risk field:

    Assert your rights immediately. You have not consented to being automated. Make this explicit to your employer in writing. Document any data capture systems monitoring your work.

    Ask your union about data governance. If your local doesn’t have answers, escalate to regional or national leadership. Make this a contract priority in next negotiations.

    Document your expertise. Even without formal data capture, start thinking about your job skills as intellectual property. What techniques do you have that would be valuable for AI to learn? How would you license that if you could?

    Connect with others. Worker data licensing will only happen if enough workers demand it. Join or form discussion groups about automation and data rights. Build collective awareness before automation completes.

    If you’re a union organizer or labor activist:

    Start the 180-day clock now. The roadmap above is actionable. Identify your first pilot group, first employer partner, and first shareholder proposal targets this week.

    Study the music industry PRO model. ASCAP, BMI, and SESAC provide proven templates for collective licensing and royalty distribution. The infrastructure exists; it just needs adaptation.

    Partner with data rights organizations. Groups working on data governance (like Data & Society, AI Now Institute, Mozilla Foundation) have legal expertise and policy connections.

    Pilot small-scale experiments. Find a sympathetic employer willing to try worker data licensing in limited scope. Prove the concept works before negotiating industry-wide.

    If you’re in tech or AI:

    Support worker data governance. If you work at a robotics or AI company, advocate internally for data licensing programs. Point to music streaming as precedent—it worked, and so can this.

    Build the infrastructure. We need better tools for data provenance, licensing, and royalty distribution. Open source projects that solve these problems make worker data PROs feasible.

    Buy ethically sourced training data. If your company purchases data from labeling services, ask about worker compensation. Demand transparency about how much of your payment reaches the people who created the data.

    The Larger Question: What Do We Owe the People Who Teach Machines?

    AI doesn’t learn from nothing. Every capability—recognizing faces, translating languages, driving cars, diagnosing diseases—comes from human expertise converted to training data.

    Musicians get paid when machines play their songs. Shouldn’t workers get paid when machines learn their skills?

    The technology to make this happen exists. The legal frameworks are evolving. What’s missing is political will and collective organization. Unions have an opportunity to reinvent their purpose: from protecting jobs (which automation makes impossible) to protecting the economic value of human knowledge (which automation makes essential).

    Worker Data PROs aren’t a perfect solution. They won’t stop automation or save every job. The obstacles are real—international arbitrage, measurement challenges, employer resistance, and the long-run trend toward synthetic data. But during the 5-15 year window when human expertise still outperforms robots, worker data licensing could ensure that the humans whose expertise builds the robots get fairly compensated for that contribution.

    The window is closing. Unions and guilds that act in the next 180 days will shape the market. Those that wait will accept terms set by others.

    Because if we don’t build these coordination mechanisms now—if we let tech companies extract worker knowledge for free until automation completes—we’ll look back and realize we gave away the most valuable asset workers have: the knowledge that makes them worth automating.

    *This article represents Copyright.sh’s position on worker data governance as complementary to our web content licensing mission. We build the technical infrastructure for fair AI licensing; unions and workers must build the organizational power to demand it.*