The Future of Workers Unions: From Job Protection to Data Ownership

Musicians don’t work for Spotify, but they get paid every time their songs stream. Actors don’t work for Netflix, but they receive residuals when their shows are watched. What if factory workers, customer service reps, and logistics coordinators got royalties every time their expertise trained—or ran—a robot?

This isn’t science fiction. It’s the logical evolution of collective bargaining in an age when AI learns by watching humans work. Just as ASCAP and BMI collect performance royalties for songwriters, “Worker Data PROs” (Performance Rights Organizations) could license human expertise to robotics and AI companies.

The question isn’t whether workers should have governance rights over the data that automates their jobs. The question is whether unions will reinvent themselves to make it happen—or whether tech companies will capture that value by default.

This transition is already underway. The window for action is measured in months, not years.

But let’s be clear: this is hard. The obstacles are real—international arbitrage, measurement challenges, employer resistance, and the long-run trend toward robots generating their own training data. Worker data licensing won’t stop automation or save every job. What it can do is ensure that workers with high-skill, high-tacit-knowledge expertise capture fair value during the 5-15 year window when human data still matters, while building coordination mechanisms that benefit the broader labor movement.

The Core Principle: No Substitution Without Consent

Before diving into mechanisms, let’s be clear about the fundamental right at stake:

No one’s work should be used for substitutional, competitive training without consent and remuneration.

This isn’t a new principle—it’s the logical extension of existing voice and likeness rights. Actors have long controlled how their image and voice are used. Musicians receive residuals when their performances are replayed. Authors get royalties when their words are reprinted.

Worker expertise captured as training data is no different. When a company records how a skilled welder works, then uses that recording to train a robot that replaces the welder, they’ve converted human expertise into automation capital. The worker deserves consent and compensation—just as an actor deserves compensation when their likeness trains an AI performance model.

What’s needed in every Collective Bargaining Agreement: Explicit data-use clauses that require:

Informed opt-in consent for any data capture intended for automation
Licensing terms specifying compensation for training, fine-tuning, and inference use
Audit rights allowing union review of how data is used
Prohibition on substitutional automation without negotiated terms

Three Rights Framework: Separating Governance, Bargaining, and Economic Rights

Before we discuss mechanisms, we need clarity about what worker data rights actually mean. Current discussions blur three distinct categories:

1. Governance Rights (Consent & Control)

What it means: Workers have the right to know when their performance data is being captured and to grant or withhold consent for specific uses.

Not about money: This is about autonomy and informed consent—the right to say “no” to being automated without your permission.

Legal basis: Privacy laws (GDPR, CCPA), workplace monitoring regulations, and collective bargaining over working conditions.

Example: A hospital nurse can refuse to have their patient interaction techniques recorded for AI training, regardless of compensation offered.

2. Bargaining Rights (Collective Terms)

What it means: Workers collectively negotiate the terms under which their expertise can be licensed, just as unions negotiate wages and benefits.

Not individual deals: Like minimum wage floors, licensing terms work when set collectively, not when each worker bargains alone.

Legal basis: National Labor Relations Act (NLRA) treats workplace monitoring and data use as mandatory bargaining subjects.

Example: UAW negotiates that any robotics training using autoworker data requires union approval, third-party audits, and royalty rates tied to production volume.

3. Economic Rights (Compensation)

What it means: Workers receive ongoing payment when AI systems trained on their expertise generate value—through training fees, fine-tuning royalties, or inference-time payments.

Not one-time buyouts: Like music streaming residuals, payment tied to actual usage over time.

Legal basis: Contract law, intellectual property principles, and emerging licensing frameworks (see Copyright.sh for web content precedent).

Example: When a robot trained on a machinist’s techniques runs on a factory floor, that machinist receives micropayments tied to production output.

Why this framework matters: Blurring these categories causes confusion. You can have governance rights (consent) without economic rights (payment). You can have collective bargaining without individual compensation. The strongest position combines all three: collective consent frameworks, union-negotiated terms, and individual revenue streams tied to usage.

The Industry Built on Human Labor (That Workers Don’t Control)

AI doesn’t emerge from algorithms alone. Every robot learning to pick warehouse boxes, every chatbot learning to handle customer complaints, every autonomous forklift learning to navigate factory floors—they all learn from human-labeled training data.

The data labeling industry is worth an estimated $2-4 billion in 2024 (estimates vary by source), growing at approximately 25–29% CAGR toward $15–17 billion by 2030 (Grand View Research, Mordor Intelligence). Companies like Scale AI, Appen, Sama, and Labelbox employ millions of workers globally to annotate images, transcribe conversations, and label sensor data.

Here’s what that looks like in practice:

Manufacturing & Robotics:

Human workers demonstrate how to assemble components
Cameras capture their movements, annotated by data labelers
Robots learn from thousands of labeled examples
Vendors like Continental, Precision AI, and Microsoft market annotation services with SLAs promising 99%+ accuracy—though actual achieved rates depend on task complexity and annotation methodology

Customer Service:

Service reps handle support tickets while AI systems watch
Their responses get labeled: helpful/unhelpful, empathetic/cold, accurate/wrong
AI chatbots train on this labeled conversation data
Human reps eventually train their own replacements

Logistics & Warehousing:

Forklift operators navigate complex environments
Sensor data (cameras, LIDAR, GPS) gets annotated frame-by-frame
Autonomous vehicles learn safe navigation from human expertise
Workers teach the systems that will eliminate their positions

The irony is brutal: workers generate the most valuable data at the exact moment their jobs become automatable. And under current arrangements, they capture none of that value.

But here’s the hard truth: Not all work generates equally valuable data. High-skill, high-tacit-knowledge workers—elite welders, ICU nurses, senior machinists, expert troubleshooters—possess expertise that’s difficult to document and expensive to replicate. This is where licensing creates real value.

Conversely, if your work is already well-documented in training manuals and standard operating procedures, the marginal value of your performance data is lower. Employers will simply improve their documentation rather than pay royalties. This isn’t defeatist—it’s strategic targeting. Start with workers whose tacit knowledge commands premium prices.

Reframing Value: From Training Data to Inference Revenue

Here’s what most discussions of worker data miss: training is only part of the stack.

The real value isn’t locked up at training time—it’s realized during fine-tuning and inference. This is where sector-specific expertise becomes actionable, and where licensing, attribution, and metering can actually be enforced.

Why inference matters more than training:

Training happens once; inference happens millions of times. A model trained on worker data runs continuously in production, generating value with every prediction.

Fine-tuning is where domain expertise creates differentiation. A generic robotics model becomes valuable when fine-tuned on specific assembly techniques from skilled autoworkers.

Inference is where you can meter usage. Unlike training data that’s consumed once, inference calls can be tracked, logged, and billed—just like Spotify counts streams.

Technical mechanisms that enable inference-time licensing:

Enterprise adapters: Custom model adapters trained on licensed worker data, deployed only for licensed clients
Retrieval layers: Worker expertise stored in vector databases, retrieved at inference time with attribution and billing per-query
Domain heads: Specialized output layers that bind model behavior to consent and remuneration agreements

This reframing changes the economics entirely. Instead of a one-time payment for training data, workers receive ongoing revenue share tied to model deployment and usage. Every time a robot uses techniques learned from a specific worker’s data, that worker earns a micropayment.

Copyright.sh already does this for web content—meta tags that specify licensing terms, HMAC-versioned usage logging, and per-inference royalty tracking. The same infrastructure applies to worker data.

Realistic timeline: This creates value during the transitional period when human expertise still outperforms synthetic data—roughly 5-15 years depending on the domain. Eventually, robots will generate their own training data through experience, and human data premiums will erode. Worker data licensing isn’t “royalties forever”—it’s extracting fair value during the window when human knowledge provides competitive advantage.

Why Unions Should Build Worker Data Licensing Infrastructure

The parallel to music streaming is precise. Before Spotify, piracy threatened musicians’ livelihoods. Performance Rights Organizations (PROs) like ASCAP, BMI, and SESAC emerged to:

Track usage: Monitor when and where songs are played

License in bulk: Negotiate rates with streaming platforms on behalf of millions of artists

Collect royalties: Gather payments and distribute to members based on actual usage

Provide leverage: Individual artists can’t negotiate with Spotify. PROs can.

Worker Data PROs could do exactly the same—but unions already have strengths PROs don’t: collective bargaining rights, pension capital, and demonstrated strike power in chokepoint sectors.

Acknowledging union power where it exists: Unions in ports (ILWU), airlines (pilots, flight attendants), entertainment (SAG-AFTRA, WGA), public sector (teachers, transit), and skilled trades (electricians, plumbers) control genuine chokepoints. These unions have proven strike leverage.

The data strike advantage: What worker data licensing adds is NEW leverage in sectors where traditional strikes are weakening. When automation threatens to replace workers, “we’ll stop showing up” loses power. But “we’ll withhold the training data you need to automate” creates bargaining power BEFORE replacement happens. This is complementary to traditional union strength, not a replacement.

1. Tracking: Monitor When Worker Data Trains AI Systems

Just as BMI tracks radio plays and Spotify streams, Worker Data PROs would track when companies use worker-generated data. This isn’t technically complex—Copyright.sh already does it for web content with meta tags and HMAC versioning.

In manufacturing, this looks like:

Workers wear sensors that capture their techniques (with consent)
Data gets tagged with a unique ID and licensing terms
When robotics companies train models or run inference, they report usage (just like Spotify reports streams)
PRO calculates royalties based on data consumption—including ongoing inference calls

Administrability challenge: Can this actually be measured and enforced? Let’s be honest about what’s feasible:

What CAN be audited:

Training data provenance through cryptographic tagging and version control
Fine-tuning jobs using worker-specific datasets (logged in training pipelines)
Inference calls in enterprise deployments with contractual audit rights
Model performance attribution when worker data creates measurable accuracy gains

What’s HARDER to audit:

Training data mixed into massive web-scraped datasets (requires watermarking)
Inference in closed systems without external monitoring
Attribution when multiple data sources contribute to outputs
Circumvention through synthetic data generation or foreign training facilities

Mitigation strategies:

Start with regulated industries (healthcare, aviation, defense) where audit trails already exist
Focus on high-value, specialized datasets where provenance matters for quality
Use procurement rules to require licensing compliance as condition of sale
Build guild/union certification programs that buyers demand (like Fair Trade labels)

Realistic expectation: We won’t catch 100% of usage. But we don’t need to—ASCAP doesn’t track every radio play either. What matters is establishing the norm that licensed data is required, creating enough enforcement to make compliance cheaper than evasion, and building mechanisms that work for 70-80% of commercial use cases.

2. Licensing: Negotiate Collective Terms

Individual workers have zero leverage negotiating with Boston Dynamics or Tesla’s robotics division. But a union representing 50,000 autoworkers whose assembly techniques are captured in training datasets? That’s a bargaining position.

But here’s the coordination problem: if one union licenses data at $X and another licenses similar data at $X/2, buyers gravitate to the cheaper source. Multiple unions without coordination creates a race to the bottom.

Coordination mechanisms to prevent this:

Sector-Wide Standards Bodies: Form cross-union licensing consortia (like how ASCAP, BMI, SESAC coexist but coordinate on baseline rates). Example: “Manufacturing Data Licensing Consortium” sets minimum rates and audit standards.

Antitrust Safe Harbors: Seek regulatory clarity that collective licensing by labor organizations qualifies for antitrust exemptions (like sports leagues, agricultural co-ops, and existing PROs enjoy).

Model Licensing Templates: Publish reference agreements that establish market norms. When 5-10 major unions use identical terms, buyers accept them as industry standard.

Buyer Certification Programs: Create “Ethical AI Training” certifications that require licensed data. Buyers seeking certification must meet consortium standards, eliminating low-ball competition.

Licensing models could work several ways:

Model A: Per-Use Licensing (like TollBit for web content)

Robotics companies pay per training run and per inference call using worker data
Rates negotiated by union, scaled by data volume
Workers receive royalties proportional to their data’s usage

Model B: Subscription Access (like RSL Collective)

Companies pay annual fees for access to union training data libraries
Revenue distributed to workers based on contribution
Tiered pricing: research use vs. commercial deployment vs. inference at scale

Model C: Performance-Based (emerging AI licensing model)

Base licensing fee + royalty if worker data improves robot accuracy beyond benchmarks
Aligns incentives: better data = higher compensation
Union negotiates performance metrics and audit rights

3. Collection: Distribute Royalties Like Streaming Residuals

This is where unions become data co-ops. Instead of individual workers tracking their data usage (impossible), the PRO:

Maintains a registry of worker-contributed training data
Receives bulk payments from AI companies
Distributes royalties based on verified usage
Handles accounting, auditing, and disputes

Just as ASCAP doesn’t require Taylor Swift to personally track every Spotify stream, Worker Data PROs would handle the infrastructure while workers receive quarterly checks.

Addressing the equity critique: What about non-union workers who get displaced but receive no royalties?

This is a real problem. Licensing structures that only benefit union members create two-tier systems where the organized get protection while the vulnerable get nothing. Solutions:

Sectoral Bargaining: Extend negotiated terms industry-wide, not just to union shops (like European sectoral agreements or California’s proposed FAST Act model).

Portable Benefits Funds: Pool a percentage of licensing revenue into industry-wide funds that support ALL displaced workers—training programs, income support, transition assistance.

Procurement-Driven Standards: Government contracts and major buyers require ethical AI sourcing, creating baseline protections for all workers regardless of union status.

Guild Models: Professional guilds (see section below) can organize workers across employers, creating broader coverage than single-employer unions.

This isn’t either/or: Worker data licensing AND wealth redistribution (UBI, job guarantees, robust social safety nets) are both necessary. Licensing provides bargaining leverage and transitional income; redistribution addresses systemic inequality and long-run automation. We need both.

4. Strategic Ownership: Pension Funds as Equity Weapons

Here’s the strategic advantage unions already have: many control significant pension capital. CalPERS (California Public Employees’ Retirement System) manages over $500 billion. Major union pension funds collectively control hundreds of billions more.

This creates a unique opportunity that unions should act on now:

Acquire Stakes in AI/Data Infrastructure Companies:

Pension funds should pursue minority stakes with board seats in data labeling companies, robotics firms, and AI infrastructure players
Deals can be leveraged or unleveraged depending on risk appetite—member votes should approve major moves
Board representation enables direct enforcement of responsible data handling and worker compensation policies
Example: A 5% stake in a data labeling company worth $10B = $500M position with potential board seat

Playbook for Strategic Ownership:

Shareholder Proposals: File proposals at AI companies demanding worker data royalty programs, consent requirements, and audit transparency. Even proposals that don’t pass signal investor sentiment and shape corporate behavior.

Proxy Campaigns: Coordinate with other institutional investors (ESG funds, state pension funds, endowments) to vote as a bloc on worker data issues. 10-15% coordinated voting can influence board elections.

Governance Policies: Adopt union-wide investment policies that require portfolio companies to:

– Disclose worker data usage in training and fine-tuning
– Implement consent frameworks before automation data capture
– Share revenue with workers whose data trains AI systems

Conditional Investment: “You want our pension capital? You’ll share training data revenue and give workers audit rights.”

Strike Threats That Actually Work:

Traditional strikes lose power when robots replace workers
But “we’ll withhold training data” hits companies before automation completes
Can’t train robots without human expertise to learn from
Data strikes can be surgical—targeted at specific product lines or customers

URGENT: Worker Consent and Immediate Union Action

Employees have not consented to being automated.

This bears repeating: the vast majority of workers being recorded, tracked, and analyzed have never explicitly consented to their data being used for training AI systems that will replace them.

Unions must immediately:

Oppose data tracking for training/fine-tuning/automation unless workers have given informed, opt-in consent with clear licensing terms

Negotiate opt-in programs that include:

– Explicit consent for data capture
– Defined royalty rates for training and inference use
– Audit rights to verify how data is being used
– Revocation rights if terms are violated

Add data-use clauses to all CBAs treating worker expertise like voice and likeness—requiring consent and compensation for any commercial use

Clarifying what workers own vs. what employers own:

Employers own: Business process artifacts (SOPs, documented procedures, workflow diagrams, generalized best practices)
Workers own: Person-level performance traces (individual movement patterns, problem-solving techniques, tacit knowledge that isn’t documentable)

Example: A hospital owns the clinical protocol for administering medication (documented process). The hospital doesn’t own the specific diagnostic instincts and patient interaction techniques of an experienced ICU nurse (tacit expertise).

This distinction determines what’s licensable. Documented, standardized processes have low royalty value—employers will just improve documentation. Undocumented, tacit expertise that’s hard to capture in manuals? That’s where licensing creates real value.

SPECIAL NOTE FOR CUSTOMER SERVICE WORKERS

Customer service is ground zero for AI automation. Chatbots are being trained right now on millions of conversation transcripts from human agents—often without explicit consent or compensation.

Customer service unions and workers should:

Pursue rapid collectivization of workers across companies being automated

File complaints and litigation against employers who have used interaction logs to train customer service bots without explicit consent and compensation

Demand emergency bargaining on:

– Immediate moratorium on training AI using agent transcripts
– Retroactive compensation for data already used
– Ongoing royalties for any future use
– Audit access to AI training datasets

Document everything: Keep records of when you were recorded, what systems monitored your work, and any AI tools deployed to assist or replace you

The legal theory is straightforward: your conversation techniques, empathy patterns, and problem-solving approaches are your expertise. Using them to train a replacement without consent is appropriation of your professional skills.

Acknowledging second-order effects: Employers may respond to licensing demands by reducing worker discretion, minimizing transparency about data collection, or reclassifying work to avoid royalty obligations. Unions must anticipate this and build mitigations into CBAs: protections against retaliation, mandatory disclosure of all data collection systems, job quality standards that prevent deskilling to evade licensing costs.

Best Entry Points: Where Worker Data Licensing Works First

Not all sectors are equally ready for worker data licensing. The strongest starting points combine:

High-skill, high-tacit-knowledge work (hard to document in manuals)
Existing regulatory frameworks (audit trails, liability requirements)
Guild-like professional structures (credentialing, peer review)
Buyers who value provenance and quality (willing to pay premiums for ethical sourcing)

Top Candidates:

Healthcare (Nurses, Diagnosticians, Specialists)

Why it works:

Patient safety regulation requires documented training data provenance
High-liability work (medical AI errors create legal exposure)
Tacit expertise is core value (clinical judgment, bedside manner)
Buyers (hospitals, device makers) already pay premiums for quality

Action: National Nurses United and SEIU negotiate consent frameworks in next contract cycles, pilot licensing with medical device companies training surgical robots.

Aviation (Pilots, Air Traffic Controllers, Mechanics)

Why it works:

FAA certification requirements create natural audit mechanisms
Safety-critical systems demand transparent training data
Union chokepoint leverage (can’t fly without certified pilots)
High compensation floors make royalties economically meaningful

Action: ALPA (Air Line Pilots Association) establishes licensing terms for simulator data and flight pattern analysis used in autonomous systems.

Creative/Technical Guilds (VFX Artists, Voice Actors, Editors)

Why it works:

Guild structures already exist (VES, SAG-AFTRA voice division)
Clear provenance (individual artist contributions are tracked)
Buyers already negotiate licensing (studios, game companies)
Rapid automation threat creates urgency

Action: Visual Effects Society creates standard licensing terms for training data from VFX workflows, negotiated into studio contracts.

Government/Defense Procurement

Why it works:

Procurement rules can require ethical AI sourcing
Buyers are mission-driven (not just profit-maximizing)
Transparency requirements (public contracts, FOIA)
Domestic data preferences (national security considerations)

Action: Federal acquisition regulations updated to require worker consent and licensing for any AI training data in government contracts, creating instant market for compliant datasets.

Why NOT start with easily standardized work: Warehouse picking, basic assembly, scripted customer service—these generate lower-value data because they’re easier to document and standardize. Start where tacit knowledge commands premium prices.

Which Workers Need This Most: Follow the Automation Money (But Acknowledge Reality)

Prioritizing by:

High automation exposure + High capital investment in robotics

High-skill, high-tacit-knowledge work (not easily documented)

Unions with existing pension funds and organizing capacity

Chokepoint leverage or regulatory advantage

Manufacturing & Assembly (UAW, IAM, Steelworkers)

Why: Robotics companies pay premium rates for accurate assembly demonstrations from skilled tradespeople
Data value: Video of skilled welders, machinists, assembly line workers teaching robots dexterity
Market size: Industrial robotics ~$17B market growing ~10% annually
Union leverage: UAW has significant pension assets ($50B+), history of effective strikes, chokepoint control in auto sector
Reality check: Basic assembly work has lower licensing value; focus on specialty trades and high-skill operations

Logistics & Transportation (Teamsters, ILWU)

Why: Autonomous vehicles and warehouse robots learn from human drivers and operators
Data value: Forklift operation, delivery routes, loading dock procedures
Market size: Warehouse automation market approaching $30B by 2026
Union leverage: ILWU controls port chokepoints; Teamsters have substantial pension capital
Reality check: Route optimization data has value; basic driving patterns less so

Healthcare & Clinical Work (SEIU, National Nurses United)

Why: Medical AI needs labeled patient data, diagnostic procedures, care protocols
Data value: Nurse workflow data, doctor diagnostic patterns, therapy techniques—high tacit knowledge
Market size: Healthcare AI market projected at $190B by 2030
Union leverage: NNU has proven strike effectiveness; healthcare has regulatory advantages
Reality check: This is high-value territory—clinical judgment is hard to standardize

Customer Service & Hospitality (UFCW, UNITE HERE)

Why: Chatbots and service AI train on human interaction data
Data value: Conversation transcripts, conflict resolution techniques, empathy patterns
Market size: Conversational AI market reaches $14B by 2025
Union leverage: UFCW has 1.3M members, service sector generates massive training data volume
Reality check: High volume but lower per-worker value; sectoral funds may be better than individual royalties

Addressing International Arbitrage: Why Can’t They Just Train Abroad?

The hard question: If US/EU workers demand licensing fees, what stops companies from training AI on Indian or Southeast Asian workers without paying royalties?

Short answer: Nothing stops them technically. But enforcement mechanisms exist if we build them:

1. Procurement Rules & Buyer Standards

Model: Like conflict-free minerals, Fair Trade coffee, or sustainable timber certification

Mechanism:

Government contracts require “Ethical AI Training” certification
Certified providers must prove worker consent and compensation
Major buyers (Amazon, Walmart, hospitals) adopt voluntary standards
Market pressure favors compliant datasets even when alternatives exist

Precedent: Apple’s supply chain audits, Patagonia’s fair labor certification—buyers DO pay premiums for ethical sourcing when reputation matters.

2. Data Localization & Import Restrictions

Model: EU data protection rules, China’s data sovereignty laws

Mechanism:

AI systems trained on foreign data face import restrictions or tariffs
Domestic deployment requires domestic training data (or licensed foreign data)
National security rationale: critical infrastructure shouldn’t depend on foreign data

Precedent: GDPR already restricts cross-border data flows; extending to AI training data isn’t radical.

3. Training Data Standards & Certification

Model: ISO standards, organic food labels, LEED building certification

Mechanism:

Industry consortia (unions + buyer coalitions) establish “Certified Ethical AI Training” standards
Standards require:

– Documented worker consent at time of data capture
– Fair compensation tied to usage (not exploitative one-time payments)
– Third-party audits of compliance
– Supply chain transparency (where was data collected, under what terms?)

Buyers seeking certification must source only certified data
Market bifurcates: premium “certified” vs. commodity “uncertified”

Why this works: Just as “organic” food commands price premiums despite cheaper conventional alternatives, “ethically trained AI” can command premiums from buyers who value transparency, liability protection, and reputation.

4. Liability & Legal Risk

Model: Product liability law, medical malpractice

Mechanism:

AI systems trained on unlicensed worker data face legal exposure when they fail
“Your surgical robot was trained on data stolen from unconsenting nurses” becomes viable lawsuit
Buyers prefer licensed data because it includes liability protection

Precedent: Getty Images suing Stability AI for training on unlicensed photos; class action suits against AI companies for copyright infringement.

Will this stop ALL foreign training? No. Commodity use cases will chase the lowest cost. But high-value, high-liability, reputation-sensitive applications (healthcare, aviation, defense, premium consumer products) will pay for licensed data when certification standards exist and legal risks are clear.

Strategic implication: Worker data licensing succeeds first in regulated, high-stakes sectors where provenance matters, THEN expands to commodity markets as standards become normalized.

The Legal Hooks Already Exist (Sort Of)

Worker data licensing isn’t starting from zero. Several legal frameworks provide partial foundation:

1. GDPR (Europe) & CCPA (California): Data Subject Rights

Under GDPR Article 15-22, workers have rights to:

Access their personal data
Correct inaccuracies
Delete data in some circumstances
Object to automated decision-making

Gap: These laws focus on protection (opt-out, deletion) not commercialization (licensing, royalties). Workers can stop data use but can’t monetize it.

What’s needed: Amendments treating worker data as intellectual property with licensing rights, not just privacy rights.

2. Contract Law: Terms of Employment vs. Data Ownership

Currently, most employment contracts include language like “work product belongs to employer.” This covers physical outputs (manufactured goods) and intellectual outputs (designs, code).

Open question: Does “work product” include biometric data captured while working? Movement patterns? Conversation techniques?

Courts haven’t definitively ruled. Union-negotiated contracts could explicitly carve out data governance rights, similar to how actors’ contracts address residuals.

3. Collective Bargaining Agreements: Data as Negotiable Terms

The National Labor Relations Act (NLRA) requires employers to bargain in good faith over “wages, hours, and other terms and conditions of employment.”

Legal argument: Worker training data usage qualifies as a “condition of employment” subject to collective bargaining—just as safety conditions, scheduling, and monitoring practices are bargainable. Courts have consistently held that workplace monitoring is a mandatory bargaining subject.

Key precedent: Screen Actors Guild (SAG-AFTRA) struck from July 14 to November 9, 2023 and won AI usage protections including consent requirements and compensation for AI-generated performances using actors’ likenesses. The Writers Guild of America (WGA) struck from May 2 to September 27, 2023 (148 days) and won protections against AI writing tools, including disclosure requirements and restrictions on using writers’ work to train AI without consent.

These entertainment industry precedents establish that AI training data usage is negotiable—and that unions can win meaningful protections through collective action.

4. Trade Secret Law: Worker Expertise as Proprietary Knowledge

Skilled workers possess trade knowledge that qualifies as trade secrets under the Defend Trade Secrets Act. A master welder’s technique, a senior nurse’s diagnostic instincts, a logistics coordinator’s routing optimization—these are valuable, non-obvious, and economically beneficial.

Legal hook: When companies capture this expertise as training data, they’re commercializing workers’ trade secrets. Workers could claim governance rights and demand licensing fees.

Challenge: Trade secret law typically protects employers’ secrets from departing employees, not the reverse. This framing is unconventional but arguable—and unions should push for CBA carve-outs that explicitly recognize worker expertise as worker-owned intellectual property.

Who Profits from the Data Intermediaries Right Now

To understand the upside of worker-owned licensing, look at who’s capturing value today:

Scale AI: The $29 Billion Data Giant

Scale AI provides “AI training data as a service” to companies like OpenAI, Toyota, and the U.S. military. In early 2025, Meta finalized an investment acquiring approximately a 49% stake in Scale AI at a valuation of $30 billion. Scale AI’s revenue is estimated to have approached $870 million in 2024.

They pay human workers $15-25/hr to label images, transcribe audio, and annotate sensor data. Then they sell that labeled data to AI companies at massive markup.

The margin: Revenue approaching $1B while labor costs are a fraction of total expenses. The difference is captured as profit, not paid to the workers who created the value.

Appen: 1 Million “AI Training Specialists” Globally

Appen operates a gig platform where workers label data remotely. They’re paid per task—often pennies for each labeled image or transcription. Appen’s clients include Microsoft, Adobe, and major robotics firms.

The economics: Appen’s market cap peaked at over $1 billion. Workers receive task-based payments with no governance rights over the datasets they create. Once labeled, that data is resold to multiple clients.

Sama: “Ethical AI” That Still Extracts Worker Value

Sama positions itself as socially responsible, employing workers in Kenya, Uganda, and India at better-than-local wages. They provide AI training data to companies like Microsoft, Walmart, and Continental (automotive).

The pitch: “We help people lift themselves out of poverty through AI work.”

The reality: Workers get $2-9/hour depending on location. The datasets they create sell for thousands to millions per contract. Workers see none of the upside if their labeled data proves particularly valuable, and have no ongoing rights to data they annotated.

The True Cost of Automation (And Why That’s Appropriate)

Here’s an economic reality that needs acknowledgment:

When creator/worker data costs are properly accounted for, automation costs will rise substantially—likely at least double current estimates.

This isn’t bad news. It’s appropriate pricing of the true cost of automation.

Currently, automation economics assume near-zero cost for human expertise used in training. When that expertise is properly licensed—when workers receive fair compensation for the knowledge that makes robots work—automation becomes more expensive.

This is a feature, not a bug:

Higher automation costs create transition time. Workers need years, not months, to retrain. Slower automation gives them that runway.

Fair pricing aligns incentives. When automation costs include worker compensation, companies make more thoughtful decisions about what to automate and when.

Revenue flows back to workers. Instead of all automation value accruing to capital, workers receive ongoing income streams from the expertise they contributed.

Policy implications:

Nations that protect workers and maintain wage floors should consider:

Tariffs on automation products from jurisdictions that don’t require worker consent or compensation for training data
Import restrictions on AI systems trained on uncompensated worker data
Proceeds earmarked for displaced workers and re-skilling programs

This isn’t protectionism—it’s enforcement of fair labor standards in the AI economy, just as tariffs can enforce environmental standards or prevent dumping.

Acknowledging the limits: This slows automation but doesn’t stop it. Eventually, as robots generate their own training data and synthetic data improves, human data premiums erode. Worker licensing buys time and captures transitional value—it’s not a permanent solution to technological unemployment. That requires broader wealth redistribution.

Guilds: The Faster Path for Specialized Workers

While unions navigate slow-moving collective bargaining processes, guilds can act faster to establish data governance norms.

Professional guilds—think SAG-AFTRA for actors, but applied to technical and creative fields—have advantages:

What guilds can do immediately:

Set inference-time licensing norms: Publish standard rates for model usage based on member data. Companies want clear terms; guilds can provide them.

Create model-usage codes of conduct: Define acceptable use (research, limited commercial) versus prohibited use (substitutional automation without consent) for member expertise.

Negotiate access terms for sector datasets: Customer support transcripts, medical workflows, creative works—guilds can license these collectively while individuals cannot.

Mandate consent and provenance tags: Guild charters can require all member data contributions include cryptographic provenance, consent records, and licensing terms.

Require paid adapters for fine-tuning: Any model fine-tuned on guild member data must pay licensing fees before deployment.

Enforce disclosure requirements: Companies using guild member data must disclose inference logs and usage metrics tied to member data.

Guild + Union Coordination:

The optimal structure combines guild agility with union capital:

Guilds set standards: Technical licensing terms, consent frameworks, pricing guidelines
Unions enforce through collective bargaining: CBA provisions that incorporate guild standards
Both pursue board seats: Guilds target AI companies directly; unions use pension capital for broader governance

Rapid-Response Guild Committees:

Guilds should establish standing committees with authority to:

File complaints with labor boards and regulators
Initiate litigation against unauthorized data use
Run pilot attribution and royalty registries
Negotiate directly with AI companies on behalf of members

Target timeline: Stand up functional guild committees within 90–180 days.

180-Day Roadmap: How Unions Launch Worker Data Licensing NOW

The transition to worker data governance is already underway. Unions that act in the next 6 months will shape the market. Those that wait will find the terms already set by others.

Phase 1: Foundation (Days 0–60)

Consent and Provenance Infrastructure:

Deploy opt-in consent frameworks for pilot groups (500-1000 workers)
Implement provenance tagging for all data capture (unique IDs, timestamps, licensing terms)
Establish secure registry for worker-contributed data with version control
Partner with technical providers (Copyright.sh, blockchain provenance platforms) for infrastructure

Governance Preparation:

Draft shareholder proposals for AI companies in pension fund portfolios
Identify target companies for board seat campaigns
Develop union investment policy language on AI data practices
Coordinate with other institutional investors on proxy voting strategy

Legal Foundation:

Draft model CBA data-use clauses for next negotiation cycle
File initial complaints where clear consent violations exist
Establish legal defense fund for data rights litigation
Engage labor law firms with AI/technology expertise

Phase 2: Pilot Licensing (Days 60–120)

Employer Pilot Programs:

Launch licensing pilot with at least one employer per sector
Test per-use and inference-time billing mechanisms
Implement metering and usage reporting
Run first royalty distribution to pilot participants

Technical Validation:

Verify provenance tracking through full data lifecycle
Test audit mechanisms with employer cooperation
Measure licensing compliance rates
Document cost/benefit for both workers and employers

Governance Execution:

Submit shareholder proposals at spring annual meetings
Execute first coordinated proxy votes on AI data issues
Begin negotiations for board observer seats where stakes justify

Phase 3: Scale and Standardization (Days 120–180)

Sector Consortium Formation:

Expand pilots to multiple employers per sector
Form cross-union working groups for standard terms
Publish standardized licensing grammar for worker data (compatible with Copyright.sh format)
Create sector-wide registries for cross-company licensing

Public Standard Terms:

Publish reference licensing agreements for common use cases
Establish audit process standards and certification
Define compliance requirements for AI companies
Create public pricing guidelines by data type and use case

Policy Engagement:

Submit recommendations to labor boards on data as bargainable term
Engage legislators on worker data governance legislation
Coordinate with international unions on cross-border standards
Build coalition with consumer groups, academics, and civil society

Policy Appendix: Inference-Time Licensing Template

For unions and guilds ready to implement, here’s a template framework for inference-time licensing:

Standard License Terms:

“`
WORKER DATA INFERENCE LICENSE

Licensor: [Union/Guild Name] on behalf of contributing workers
Licensee: [AI Company Name]

SCOPE OF LICENSE

a. Licensed Data: Worker-contributed [type] data as registered in [Registry Name]
b. Permitted Uses:
– Model inference in production systems: LICENSED
– Model fine-tuning: LICENSED with additional fee
– Redistribution of raw data: PROHIBITED
– Training foundation models: SEPARATE LICENSE REQUIRED

ATTRIBUTION

a. Licensee shall maintain logs of inference calls using Licensed Data
b. Attribution to source workers via registry ID required
c. Quarterly usage reports required

COMPENSATION

a. Per-inference fee: $[X] per 1,000 inference calls
b. Fine-tuning fee: $[X] per model fine-tuned
c. Revenue share: [X]% of products substantially dependent on Licensed Data
d. Minimum annual payment: $[X]

AUDIT RIGHTS

a. Licensor may audit usage logs with 30 days notice
b. Third-party auditor acceptable by mutual agreement
c. Non-compliance triggers 2x remediation payment

TERM AND TERMINATION

a. Initial term: 12 months
b. Auto-renewal unless 90 days notice
c. Immediate termination for material breach

CONSENT VERIFICATION

a. All Licensed Data includes verified worker consent
b. Licensee warrants not to circumvent consent mechanisms
c. New data requires re-verification of consent status
“`

This template can be customized by sector, data type, and negotiating position.

The Objections (And Why They’re Partially Right)

“Workers already get paid for their labor. Why should they get paid twice?”

The analogy: Musicians already get paid to record songs. Then they get streaming royalties. Actors get paid to film shows. Then they get residuals. Authors get paid advances. Then they get royalty checks.

The principle: When your work generates ongoing value, you should receive ongoing compensation. AI training data has ongoing value every time a model retrains or—more importantly—every time a model runs inference.

The caveat: This works best for high-value, hard-to-replicate expertise. If your work is easily standardized, licensing has less economic power.

“Companies own the data because they own the workplace equipment that captures it.”

The counter: By that logic, record labels would own musicians’ performance rights because they own the recording studios. But copyright law recognizes that creative output belongs to creators, not to whoever owns the equipment.

The distinction: Employers own business process artifacts (documented procedures, standardized workflows). Workers have governance rights over person-level performance traces (individual techniques, tacit knowledge, undocumented expertise).

The caveat: Courts haven’t definitively ruled on this. It’s an open legal question that unions must push through collective bargaining.

“This will make automation more expensive and slow down progress.”

The response: Licensing music didn’t kill Spotify. It made streaming sustainable by ensuring artists could afford to keep creating. Similarly, worker data licensing makes automation sustainable by ensuring workers can afford the transition.

The reality: Yes, it raises costs. That’s appropriate pricing of true automation costs. Faster automation without compensation creates catastrophic social costs.

The caveat: Eventually, as robots generate their own training data, human data premiums erode. This is transitional value extraction, not permanent protection.

“Workers will never organize effectively around data ownership.”

The precedent: SAG-AFTRA struck successfully from July 14 to November 9, 2023 over AI usage and won contractual protections. The Writers Guild struck from May 2 to September 27, 2023 over AI in screenwriting and won. Unions ARE already organizing around automation and data issues.

The reality: The infrastructure doesn’t exist yet, but neither did music streaming royalties before Spotify forced their creation. Markets adapt when there’s economic and political pressure. Unions provide both.

The caveat: This is genuinely hard. It requires technical infrastructure, legal innovation, international coordination, and sustained organizing. Many attempts will fail. The question is whether unions try or surrender by default.

Why Copyright.sh Cares About Worker Data (Beyond Web Licensing)

Copyright.sh’s mission is “fair compensation when AI learns from human expertise.” We started with web content creators because:

Lowest friction: A meta tag is easier to implement than sensor networks

Largest immediate market: Billions of web pages already exist

Proof of concept: Show licensing infrastructure can work

But worker data is the same problem at larger scale. If we solve licensing for bloggers, journalists, and photographers, the same infrastructure can extend to factory workers, nurses, and drivers.

Our role isn’t to build Worker Data PROs—unions should do that. But the technical primitives we’re developing—licensing grammars, usage tracking, payment automation, provenance verification—apply equally to:

Web content (what we do now)
Worker-generated training data (what unions could do)
Any human expertise that AI learns from (the general case)

We want to see worker data collectives succeed because every creator—whether they create blog posts or create assembly techniques—deserves governance rights and compensation when AI learns from their work.

What You Can Do (Even If You’re Not in a Union)

If you’re a worker in a high-automation-risk field:

Assert your rights immediately. You have not consented to being automated. Make this explicit to your employer in writing. Document any data capture systems monitoring your work.

Ask your union about data governance. If your local doesn’t have answers, escalate to regional or national leadership. Make this a contract priority in next negotiations.

Document your expertise. Even without formal data capture, start thinking about your job skills as intellectual property. What techniques do you have that would be valuable for AI to learn? How would you license that if you could?

Connect with others. Worker data licensing will only happen if enough workers demand it. Join or form discussion groups about automation and data rights. Build collective awareness before automation completes.

If you’re a union organizer or labor activist:

Start the 180-day clock now. The roadmap above is actionable. Identify your first pilot group, first employer partner, and first shareholder proposal targets this week.

Study the music industry PRO model. ASCAP, BMI, and SESAC provide proven templates for collective licensing and royalty distribution. The infrastructure exists; it just needs adaptation.

Partner with data rights organizations. Groups working on data governance (like Data & Society, AI Now Institute, Mozilla Foundation) have legal expertise and policy connections.

Pilot small-scale experiments. Find a sympathetic employer willing to try worker data licensing in limited scope. Prove the concept works before negotiating industry-wide.

If you’re in tech or AI:

Support worker data governance. If you work at a robotics or AI company, advocate internally for data licensing programs. Point to music streaming as precedent—it worked, and so can this.

Build the infrastructure. We need better tools for data provenance, licensing, and royalty distribution. Open source projects that solve these problems make worker data PROs feasible.

Buy ethically sourced training data. If your company purchases data from labeling services, ask about worker compensation. Demand transparency about how much of your payment reaches the people who created the data.

The Larger Question: What Do We Owe the People Who Teach Machines?

AI doesn’t learn from nothing. Every capability—recognizing faces, translating languages, driving cars, diagnosing diseases—comes from human expertise converted to training data.

Musicians get paid when machines play their songs. Shouldn’t workers get paid when machines learn their skills?

The technology to make this happen exists. The legal frameworks are evolving. What’s missing is political will and collective organization. Unions have an opportunity to reinvent their purpose: from protecting jobs (which automation makes impossible) to protecting the economic value of human knowledge (which automation makes essential).

Worker Data PROs aren’t a perfect solution. They won’t stop automation or save every job. The obstacles are real—international arbitrage, measurement challenges, employer resistance, and the long-run trend toward synthetic data. But during the 5-15 year window when human expertise still outperforms robots, worker data licensing could ensure that the humans whose expertise builds the robots get fairly compensated for that contribution.

The window is closing. Unions and guilds that act in the next 180 days will shape the market. Those that wait will accept terms set by others.

Because if we don’t build these coordination mechanisms now—if we let tech companies extract worker knowledge for free until automation completes—we’ll look back and realize we gave away the most valuable asset workers have: the knowledge that makes them worth automating.

—

*This article represents Copyright.sh’s position on worker data governance as complementary to our web content licensing mission. We build the technical infrastructure for fair AI licensing; unions and workers must build the organizational power to demand it.*

The Future of Workers Unions: From Job Protection to Data Ownership

The Core Principle: No Substitution Without Consent

Three Rights Framework: Separating Governance, Bargaining, and Economic Rights

1. Governance Rights (Consent & Control)

2. Bargaining Rights (Collective Terms)

3. Economic Rights (Compensation)

The Industry Built on Human Labor (That Workers Don’t Control)

Reframing Value: From Training Data to Inference Revenue

Why Unions Should Build Worker Data Licensing Infrastructure

1. Tracking: Monitor When Worker Data Trains AI Systems

2. Licensing: Negotiate Collective Terms

3. Collection: Distribute Royalties Like Streaming Residuals

4. Strategic Ownership: Pension Funds as Equity Weapons

URGENT: Worker Consent and Immediate Union Action

SPECIAL NOTE FOR CUSTOMER SERVICE WORKERS

Best Entry Points: Where Worker Data Licensing Works First

Healthcare (Nurses, Diagnosticians, Specialists)

Aviation (Pilots, Air Traffic Controllers, Mechanics)

Creative/Technical Guilds (VFX Artists, Voice Actors, Editors)

Government/Defense Procurement

Which Workers Need This Most: Follow the Automation Money (But Acknowledge Reality)

Addressing International Arbitrage: Why Can’t They Just Train Abroad?

1. Procurement Rules & Buyer Standards

2. Data Localization & Import Restrictions

3. Training Data Standards & Certification

4. Liability & Legal Risk

The Legal Hooks Already Exist (Sort Of)

1. GDPR (Europe) & CCPA (California): Data Subject Rights

2. Contract Law: Terms of Employment vs. Data Ownership

3. Collective Bargaining Agreements: Data as Negotiable Terms

4. Trade Secret Law: Worker Expertise as Proprietary Knowledge

Who Profits from the Data Intermediaries Right Now

Scale AI: The $29 Billion Data Giant

Appen: 1 Million “AI Training Specialists” Globally

Sama: “Ethical AI” That Still Extracts Worker Value

The True Cost of Automation (And Why That’s Appropriate)

Guilds: The Faster Path for Specialized Workers

180-Day Roadmap: How Unions Launch Worker Data Licensing NOW

Phase 1: Foundation (Days 0–60)

Phase 2: Pilot Licensing (Days 60–120)

Phase 3: Scale and Standardization (Days 120–180)

Policy Appendix: Inference-Time Licensing Template

The Objections (And Why They’re Partially Right)

“Workers already get paid for their labor. Why should they get paid twice?”

“Companies own the data because they own the workplace equipment that captures it.”

“This will make automation more expensive and slow down progress.”

“Workers will never organize effectively around data ownership.”

Why Copyright.sh Cares About Worker Data (Beyond Web Licensing)

What You Can Do (Even If You’re Not in a Union)

If you’re a worker in a high-automation-risk field:

If you’re a union organizer or labor activist:

If you’re in tech or AI:

The Larger Question: What Do We Owe the People Who Teach Machines?

Get the latest on AI licensing

Copyright.sh Team

Start protecting your content in 5 minutes

Leave a Reply Cancel reply