The Future of Workers Unions: From Job Protection to Data Ownership
Musicians don’t work for Spotify, but they get paid every time their songs stream. Actors don’t work for Netflix, but they receive residuals when their shows are watched. What if factory workers, customer service reps, and logistics coordinators got royalties every time their expertise trained—or ran—a robot?
This isn’t science fiction. It’s the logical evolution of collective bargaining in an age when AI learns by watching humans work. Just as ASCAP and BMI collect performance royalties for songwriters, “Worker Data PROs” (Performance Rights Organizations) could license human expertise to robotics and AI companies.
The question isn’t whether workers should have governance rights over the data that automates their jobs. The question is whether unions will reinvent themselves to make it happen—or whether tech companies will capture that value by default.
This transition is already underway. The window for action is measured in months, not years.
But let’s be clear: this is hard. The obstacles are real—international arbitrage, measurement challenges, employer resistance, and the long-run trend toward robots generating their own training data. Worker data licensing won’t stop automation or save every job. What it can do is ensure that workers with high-skill, high-tacit-knowledge expertise capture fair value during the 5-15 year window when human data still matters, while building coordination mechanisms that benefit the broader labor movement.
The Core Principle: No Substitution Without Consent
Before diving into mechanisms, let’s be clear about the fundamental right at stake:
No one’s work should be used for substitutional, competitive training without consent and remuneration.
This isn’t a new principle—it’s the logical extension of existing voice and likeness rights. Actors have long controlled how their image and voice are used. Musicians receive residuals when their performances are replayed. Authors get royalties when their words are reprinted.
Worker expertise captured as training data is no different. When a company records how a skilled welder works, then uses that recording to train a robot that replaces the welder, they’ve converted human expertise into automation capital. The worker deserves consent and compensation—just as an actor deserves compensation when their likeness trains an AI performance model.
What’s needed in every Collective Bargaining Agreement: Explicit data-use clauses that require:
- Informed opt-in consent for any data capture intended for automation
- Licensing terms specifying compensation for training, fine-tuning, and inference use
- Audit rights allowing union review of how data is used
- Prohibition on substitutional automation without negotiated terms
Three Rights Framework: Separating Governance, Bargaining, and Economic Rights
Before we discuss mechanisms, we need clarity about what worker data rights actually mean. Current discussions blur three distinct categories:
1. Governance Rights (Consent & Control)
What it means: Workers have the right to know when their performance data is being captured and to grant or withhold consent for specific uses.
Not about money: This is about autonomy and informed consent—the right to say “no” to being automated without your permission.
Legal basis: Privacy laws (GDPR, CCPA), workplace monitoring regulations, and collective bargaining over working conditions.
Example: A hospital nurse can refuse to have their patient interaction techniques recorded for AI training, regardless of compensation offered.
2. Bargaining Rights (Collective Terms)
What it means: Workers collectively negotiate the terms under which their expertise can be licensed, just as unions negotiate wages and benefits.
Not individual deals: Like minimum wage floors, licensing terms work when set collectively, not when each worker bargains alone.
Legal basis: National Labor Relations Act (NLRA) treats workplace monitoring and data use as mandatory bargaining subjects.
Example: UAW negotiates that any robotics training using autoworker data requires union approval, third-party audits, and royalty rates tied to production volume.
3. Economic Rights (Compensation)
What it means: Workers receive ongoing payment when AI systems trained on their expertise generate value—through training fees, fine-tuning royalties, or inference-time payments.
Not one-time buyouts: Like music streaming residuals, payment tied to actual usage over time.
Legal basis: Contract law, intellectual property principles, and emerging licensing frameworks (see Copyright.sh for web content precedent).
Example: When a robot trained on a machinist’s techniques runs on a factory floor, that machinist receives micropayments tied to production output.
Why this framework matters: Blurring these categories causes confusion. You can have governance rights (consent) without economic rights (payment). You can have collective bargaining without individual compensation. The strongest position combines all three: collective consent frameworks, union-negotiated terms, and individual revenue streams tied to usage.
The Industry Built on Human Labor (That Workers Don’t Control)
AI doesn’t emerge from algorithms alone. Every robot learning to pick warehouse boxes, every chatbot learning to handle customer complaints, every autonomous forklift learning to navigate factory floors—they all learn from human-labeled training data.
The data labeling industry is worth an estimated $2-4 billion in 2024 (estimates vary by source), growing at approximately 25–29% CAGR toward $15–17 billion by 2030 (Grand View Research, Mordor Intelligence). Companies like Scale AI, Appen, Sama, and Labelbox employ millions of workers globally to annotate images, transcribe conversations, and label sensor data.
Here’s what that looks like in practice:
Manufacturing & Robotics:
- Human workers demonstrate how to assemble components
- Cameras capture their movements, annotated by data labelers
- Robots learn from thousands of labeled examples
- Vendors like Continental, Precision AI, and Microsoft market annotation services with SLAs promising 99%+ accuracy—though actual achieved rates depend on task complexity and annotation methodology
Customer Service:
- Service reps handle support tickets while AI systems watch
- Their responses get labeled: helpful/unhelpful, empathetic/cold, accurate/wrong
- AI chatbots train on this labeled conversation data
- Human reps eventually train their own replacements
Logistics & Warehousing:
- Forklift operators navigate complex environments
- Sensor data (cameras, LIDAR, GPS) gets annotated frame-by-frame
- Autonomous vehicles learn safe navigation from human expertise
- Workers teach the systems that will eliminate their positions
The irony is brutal: workers generate the most valuable data at the exact moment their jobs become automatable. And under current arrangements, they capture none of that value.
But here’s the hard truth: Not all work generates equally valuable data. High-skill, high-tacit-knowledge workers—elite welders, ICU nurses, senior machinists, expert troubleshooters—possess expertise that’s difficult to document and expensive to replicate. This is where licensing creates real value.
Conversely, if your work is already well-documented in training manuals and standard operating procedures, the marginal value of your performance data is lower. Employers will simply improve their documentation rather than pay royalties. This isn’t defeatist—it’s strategic targeting. Start with workers whose tacit knowledge commands premium prices.
Reframing Value: From Training Data to Inference Revenue
Here’s what most discussions of worker data miss: training is only part of the stack.
The real value isn’t locked up at training time—it’s realized during fine-tuning and inference. This is where sector-specific expertise becomes actionable, and where licensing, attribution, and metering can actually be enforced.
Why inference matters more than training:
Technical mechanisms that enable inference-time licensing:
- Enterprise adapters: Custom model adapters trained on licensed worker data, deployed only for licensed clients
- Retrieval layers: Worker expertise stored in vector databases, retrieved at inference time with attribution and billing per-query
- Domain heads: Specialized output layers that bind model behavior to consent and remuneration agreements
This reframing changes the economics entirely. Instead of a one-time payment for training data, workers receive ongoing revenue share tied to model deployment and usage. Every time a robot uses techniques learned from a specific worker’s data, that worker earns a micropayment.
Copyright.sh already does this for web content—meta tags that specify licensing terms, HMAC-versioned usage logging, and per-inference royalty tracking. The same infrastructure applies to worker data.
Realistic timeline: This creates value during the transitional period when human expertise still outperforms synthetic data—roughly 5-15 years depending on the domain. Eventually, robots will generate their own training data through experience, and human data premiums will erode. Worker data licensing isn’t “royalties forever”—it’s extracting fair value during the window when human knowledge provides competitive advantage.
Why Unions Should Build Worker Data Licensing Infrastructure
The parallel to music streaming is precise. Before Spotify, piracy threatened musicians’ livelihoods. Performance Rights Organizations (PROs) like ASCAP, BMI, and SESAC emerged to:
Worker Data PROs could do exactly the same—but unions already have strengths PROs don’t: collective bargaining rights, pension capital, and demonstrated strike power in chokepoint sectors.
Acknowledging union power where it exists: Unions in ports (ILWU), airlines (pilots, flight attendants), entertainment (SAG-AFTRA, WGA), public sector (teachers, transit), and skilled trades (electricians, plumbers) control genuine chokepoints. These unions have proven strike leverage.
The data strike advantage: What worker data licensing adds is NEW leverage in sectors where traditional strikes are weakening. When automation threatens to replace workers, “we’ll stop showing up” loses power. But “we’ll withhold the training data you need to automate” creates bargaining power BEFORE replacement happens. This is complementary to traditional union strength, not a replacement.
1. Tracking: Monitor When Worker Data Trains AI Systems
Just as BMI tracks radio plays and Spotify streams, Worker Data PROs would track when companies use worker-generated data. This isn’t technically complex—Copyright.sh already does it for web content with meta tags and HMAC versioning.
In manufacturing, this looks like:
- Workers wear sensors that capture their techniques (with consent)
- Data gets tagged with a unique ID and licensing terms
- When robotics companies train models or run inference, they report usage (just like Spotify reports streams)
- PRO calculates royalties based on data consumption—including ongoing inference calls
Administrability challenge: Can this actually be measured and enforced? Let’s be honest about what’s feasible:
What CAN be audited:
- Training data provenance through cryptographic tagging and version control
- Fine-tuning jobs using worker-specific datasets (logged in training pipelines)
- Inference calls in enterprise deployments with contractual audit rights
- Model performance attribution when worker data creates measurable accuracy gains
What’s HARDER to audit:
- Training data mixed into massive web-scraped datasets (requires watermarking)
- Inference in closed systems without external monitoring
- Attribution when multiple data sources contribute to outputs
- Circumvention through synthetic data generation or foreign training facilities
Mitigation strategies:
- Start with regulated industries (healthcare, aviation, defense) where audit trails already exist
- Focus on high-value, specialized datasets where provenance matters for quality
- Use procurement rules to require licensing compliance as condition of sale
- Build guild/union certification programs that buyers demand (like Fair Trade labels)
Realistic expectation: We won’t catch 100% of usage. But we don’t need to—ASCAP doesn’t track every radio play either. What matters is establishing the norm that licensed data is required, creating enough enforcement to make compliance cheaper than evasion, and building mechanisms that work for 70-80% of commercial use cases.
2. Licensing: Negotiate Collective Terms
Individual workers have zero leverage negotiating with Boston Dynamics or Tesla’s robotics division. But a union representing 50,000 autoworkers whose assembly techniques are captured in training datasets? That’s a bargaining position.
But here’s the coordination problem: if one union licenses data at $X and another licenses similar data at $X/2, buyers gravitate to the cheaper source. Multiple unions without coordination creates a race to the bottom.
Coordination mechanisms to prevent this:
Licensing models could work several ways:
Model A: Per-Use Licensing (like TollBit for web content)
- Robotics companies pay per training run and per inference call using worker data
- Rates negotiated by union, scaled by data volume
- Workers receive royalties proportional to their data’s usage
Model B: Subscription Access (like RSL Collective)
- Companies pay annual fees for access to union training data libraries
- Revenue distributed to workers based on contribution
- Tiered pricing: research use vs. commercial deployment vs. inference at scale
Model C: Performance-Based (emerging AI licensing model)
- Base licensing fee + royalty if worker data improves robot accuracy beyond benchmarks
- Aligns incentives: better data = higher compensation
- Union negotiates performance metrics and audit rights
3. Collection: Distribute Royalties Like Streaming Residuals
This is where unions become data co-ops. Instead of individual workers tracking their data usage (impossible), the PRO:
- Maintains a registry of worker-contributed training data
- Receives bulk payments from AI companies
- Distributes royalties based on verified usage
- Handles accounting, auditing, and disputes
Just as ASCAP doesn’t require Taylor Swift to personally track every Spotify stream, Worker Data PROs would handle the infrastructure while workers receive quarterly checks.
Addressing the equity critique: What about non-union workers who get displaced but receive no royalties?
This is a real problem. Licensing structures that only benefit union members create two-tier systems where the organized get protection while the vulnerable get nothing. Solutions:
This isn’t either/or: Worker data licensing AND wealth redistribution (UBI, job guarantees, robust social safety nets) are both necessary. Licensing provides bargaining leverage and transitional income; redistribution addresses systemic inequality and long-run automation. We need both.
4. Strategic Ownership: Pension Funds as Equity Weapons
Here’s the strategic advantage unions already have: many control significant pension capital. CalPERS (California Public Employees’ Retirement System) manages over $500 billion. Major union pension funds collectively control hundreds of billions more.
This creates a unique opportunity that unions should act on now:
Acquire Stakes in AI/Data Infrastructure Companies:
- Pension funds should pursue minority stakes with board seats in data labeling companies, robotics firms, and AI infrastructure players
- Deals can be leveraged or unleveraged depending on risk appetite—member votes should approve major moves
- Board representation enables direct enforcement of responsible data handling and worker compensation policies
- Example: A 5% stake in a data labeling company worth $10B = $500M position with potential board seat
Playbook for Strategic Ownership:
– Disclose worker data usage in training and fine-tuning
– Implement consent frameworks before automation data capture
– Share revenue with workers whose data trains AI systems
Strike Threats That Actually Work:
- Traditional strikes lose power when robots replace workers
- But “we’ll withhold training data” hits companies before automation completes
- Can’t train robots without human expertise to learn from
- Data strikes can be surgical—targeted at specific product lines or customers
URGENT: Worker Consent and Immediate Union Action
Employees have not consented to being automated.
This bears repeating: the vast majority of workers being recorded, tracked, and analyzed have never explicitly consented to their data being used for training AI systems that will replace them.
Unions must immediately:
– Explicit consent for data capture
– Defined royalty rates for training and inference use
– Audit rights to verify how data is being used
– Revocation rights if terms are violated
Clarifying what workers own vs. what employers own:
- Employers own: Business process artifacts (SOPs, documented procedures, workflow diagrams, generalized best practices)
- Workers own: Person-level performance traces (individual movement patterns, problem-solving techniques, tacit knowledge that isn’t documentable)
Example: A hospital owns the clinical protocol for administering medication (documented process). The hospital doesn’t own the specific diagnostic instincts and patient interaction techniques of an experienced ICU nurse (tacit expertise).
This distinction determines what’s licensable. Documented, standardized processes have low royalty value—employers will just improve documentation. Undocumented, tacit expertise that’s hard to capture in manuals? That’s where licensing creates real value.
SPECIAL NOTE FOR CUSTOMER SERVICE WORKERS
Customer service is ground zero for AI automation. Chatbots are being trained right now on millions of conversation transcripts from human agents—often without explicit consent or compensation.
Customer service unions and workers should:
– Immediate moratorium on training AI using agent transcripts
– Retroactive compensation for data already used
– Ongoing royalties for any future use
– Audit access to AI training datasets
The legal theory is straightforward: your conversation techniques, empathy patterns, and problem-solving approaches are your expertise. Using them to train a replacement without consent is appropriation of your professional skills.
Acknowledging second-order effects: Employers may respond to licensing demands by reducing worker discretion, minimizing transparency about data collection, or reclassifying work to avoid royalty obligations. Unions must anticipate this and build mitigations into CBAs: protections against retaliation, mandatory disclosure of all data collection systems, job quality standards that prevent deskilling to evade licensing costs.
Best Entry Points: Where Worker Data Licensing Works First
Not all sectors are equally ready for worker data licensing. The strongest starting points combine:
- High-skill, high-tacit-knowledge work (hard to document in manuals)
- Existing regulatory frameworks (audit trails, liability requirements)
- Guild-like professional structures (credentialing, peer review)
- Buyers who value provenance and quality (willing to pay premiums for ethical sourcing)
Top Candidates:
Healthcare (Nurses, Diagnosticians, Specialists)
Why it works:
- Patient safety regulation requires documented training data provenance
- High-liability work (medical AI errors create legal exposure)
- Tacit expertise is core value (clinical judgment, bedside manner)
- Buyers (hospitals, device makers) already pay premiums for quality
Action: National Nurses United and SEIU negotiate consent frameworks in next contract cycles, pilot licensing with medical device companies training surgical robots.
Aviation (Pilots, Air Traffic Controllers, Mechanics)
Why it works:
- FAA certification requirements create natural audit mechanisms
- Safety-critical systems demand transparent training data
- Union chokepoint leverage (can’t fly without certified pilots)
- High compensation floors make royalties economically meaningful
Action: ALPA (Air Line Pilots Association) establishes licensing terms for simulator data and flight pattern analysis used in autonomous systems.
Creative/Technical Guilds (VFX Artists, Voice Actors, Editors)
Why it works:
- Guild structures already exist (VES, SAG-AFTRA voice division)
- Clear provenance (individual artist contributions are tracked)
- Buyers already negotiate licensing (studios, game companies)
- Rapid automation threat creates urgency
Action: Visual Effects Society creates standard licensing terms for training data from VFX workflows, negotiated into studio contracts.
Government/Defense Procurement
Why it works:
- Procurement rules can require ethical AI sourcing
- Buyers are mission-driven (not just profit-maximizing)
- Transparency requirements (public contracts, FOIA)
- Domestic data preferences (national security considerations)
Action: Federal acquisition regulations updated to require worker consent and licensing for any AI training data in government contracts, creating instant market for compliant datasets.
Why NOT start with easily standardized work: Warehouse picking, basic assembly, scripted customer service—these generate lower-value data because they’re easier to document and standardize. Start where tacit knowledge commands premium prices.
Which Workers Need This Most: Follow the Automation Money (But Acknowledge Reality)
Prioritizing by:
Manufacturing & Assembly (UAW, IAM, Steelworkers)
- Why: Robotics companies pay premium rates for accurate assembly demonstrations from skilled tradespeople
- Data value: Video of skilled welders, machinists, assembly line workers teaching robots dexterity
- Market size: Industrial robotics ~$17B market growing ~10% annually
- Union leverage: UAW has significant pension assets ($50B+), history of effective strikes, chokepoint control in auto sector
- Reality check: Basic assembly work has lower licensing value; focus on specialty trades and high-skill operations
Logistics & Transportation (Teamsters, ILWU)
- Why: Autonomous vehicles and warehouse robots learn from human drivers and operators
- Data value: Forklift operation, delivery routes, loading dock procedures
- Market size: Warehouse automation market approaching $30B by 2026
- Union leverage: ILWU controls port chokepoints; Teamsters have substantial pension capital
- Reality check: Route optimization data has value; basic driving patterns less so
Healthcare & Clinical Work (SEIU, National Nurses United)
- Why: Medical AI needs labeled patient data, diagnostic procedures, care protocols
- Data value: Nurse workflow data, doctor diagnostic patterns, therapy techniques—high tacit knowledge
- Market size: Healthcare AI market projected at $190B by 2030
- Union leverage: NNU has proven strike effectiveness; healthcare has regulatory advantages
- Reality check: This is high-value territory—clinical judgment is hard to standardize
Customer Service & Hospitality (UFCW, UNITE HERE)
- Why: Chatbots and service AI train on human interaction data
- Data value: Conversation transcripts, conflict resolution techniques, empathy patterns
- Market size: Conversational AI market reaches $14B by 2025
- Union leverage: UFCW has 1.3M members, service sector generates massive training data volume
- Reality check: High volume but lower per-worker value; sectoral funds may be better than individual royalties
Addressing International Arbitrage: Why Can’t They Just Train Abroad?
The hard question: If US/EU workers demand licensing fees, what stops companies from training AI on Indian or Southeast Asian workers without paying royalties?
Short answer: Nothing stops them technically. But enforcement mechanisms exist if we build them:
1. Procurement Rules & Buyer Standards
Model: Like conflict-free minerals, Fair Trade coffee, or sustainable timber certification
Mechanism:
- Government contracts require “Ethical AI Training” certification
- Certified providers must prove worker consent and compensation
- Major buyers (Amazon, Walmart, hospitals) adopt voluntary standards
- Market pressure favors compliant datasets even when alternatives exist
Precedent: Apple’s supply chain audits, Patagonia’s fair labor certification—buyers DO pay premiums for ethical sourcing when reputation matters.
2. Data Localization & Import Restrictions
Model: EU data protection rules, China’s data sovereignty laws
Mechanism:
- AI systems trained on foreign data face import restrictions or tariffs
- Domestic deployment requires domestic training data (or licensed foreign data)
- National security rationale: critical infrastructure shouldn’t depend on foreign data
Precedent: GDPR already restricts cross-border data flows; extending to AI training data isn’t radical.
3. Training Data Standards & Certification
Model: ISO standards, organic food labels, LEED building certification
Mechanism:
- Industry consortia (unions + buyer coalitions) establish “Certified Ethical AI Training” standards
- Standards require:
– Documented worker consent at time of data capture
– Fair compensation tied to usage (not exploitative one-time payments)
– Third-party audits of compliance
– Supply chain transparency (where was data collected, under what terms?)
- Buyers seeking certification must source only certified data
- Market bifurcates: premium “certified” vs. commodity “uncertified”
Why this works: Just as “organic” food commands price premiums despite cheaper conventional alternatives, “ethically trained AI” can command premiums from buyers who value transparency, liability protection, and reputation.
4. Liability & Legal Risk
Model: Product liability law, medical malpractice
Mechanism:
- AI systems trained on unlicensed worker data face legal exposure when they fail
- “Your surgical robot was trained on data stolen from unconsenting nurses” becomes viable lawsuit
- Buyers prefer licensed data because it includes liability protection
Precedent: Getty Images suing Stability AI for training on unlicensed photos; class action suits against AI companies for copyright infringement.
Will this stop ALL foreign training? No. Commodity use cases will chase the lowest cost. But high-value, high-liability, reputation-sensitive applications (healthcare, aviation, defense, premium consumer products) will pay for licensed data when certification standards exist and legal risks are clear.
Strategic implication: Worker data licensing succeeds first in regulated, high-stakes sectors where provenance matters, THEN expands to commodity markets as standards become normalized.
The Legal Hooks Already Exist (Sort Of)
Worker data licensing isn’t starting from zero. Several legal frameworks provide partial foundation:
1. GDPR (Europe) & CCPA (California): Data Subject Rights
Under GDPR Article 15-22, workers have rights to:
- Access their personal data
- Correct inaccuracies
- Delete data in some circumstances
- Object to automated decision-making
Gap: These laws focus on protection (opt-out, deletion) not commercialization (licensing, royalties). Workers can stop data use but can’t monetize it.
What’s needed: Amendments treating worker data as intellectual property with licensing rights, not just privacy rights.
2. Contract Law: Terms of Employment vs. Data Ownership
Currently, most employment contracts include language like “work product belongs to employer.” This covers physical outputs (manufactured goods) and intellectual outputs (designs, code).
Open question: Does “work product” include biometric data captured while working? Movement patterns? Conversation techniques?
Courts haven’t definitively ruled. Union-negotiated contracts could explicitly carve out data governance rights, similar to how actors’ contracts address residuals.
3. Collective Bargaining Agreements: Data as Negotiable Terms
The National Labor Relations Act (NLRA) requires employers to bargain in good faith over “wages, hours, and other terms and conditions of employment.”
Legal argument: Worker training data usage qualifies as a “condition of employment” subject to collective bargaining—just as safety conditions, scheduling, and monitoring practices are bargainable. Courts have consistently held that workplace monitoring is a mandatory bargaining subject.
Key precedent: Screen Actors Guild (SAG-AFTRA) struck from July 14 to November 9, 2023 and won AI usage protections including consent requirements and compensation for AI-generated performances using actors’ likenesses. The Writers Guild of America (WGA) struck from May 2 to September 27, 2023 (148 days) and won protections against AI writing tools, including disclosure requirements and restrictions on using writers’ work to train AI without consent.
These entertainment industry precedents establish that AI training data usage is negotiable—and that unions can win meaningful protections through collective action.
4. Trade Secret Law: Worker Expertise as Proprietary Knowledge
Skilled workers possess trade knowledge that qualifies as trade secrets under the Defend Trade Secrets Act. A master welder’s technique, a senior nurse’s diagnostic instincts, a logistics coordinator’s routing optimization—these are valuable, non-obvious, and economically beneficial.
Legal hook: When companies capture this expertise as training data, they’re commercializing workers’ trade secrets. Workers could claim governance rights and demand licensing fees.
Challenge: Trade secret law typically protects employers’ secrets from departing employees, not the reverse. This framing is unconventional but arguable—and unions should push for CBA carve-outs that explicitly recognize worker expertise as worker-owned intellectual property.
Who Profits from the Data Intermediaries Right Now
To understand the upside of worker-owned licensing, look at who’s capturing value today:
Scale AI: The $29 Billion Data Giant
Scale AI provides “AI training data as a service” to companies like OpenAI, Toyota, and the U.S. military. In early 2025, Meta finalized an investment acquiring approximately a 49% stake in Scale AI at a valuation of $30 billion. Scale AI’s revenue is estimated to have approached $870 million in 2024.
They pay human workers $15-25/hr to label images, transcribe audio, and annotate sensor data. Then they sell that labeled data to AI companies at massive markup.
The margin: Revenue approaching $1B while labor costs are a fraction of total expenses. The difference is captured as profit, not paid to the workers who created the value.
Appen: 1 Million “AI Training Specialists” Globally
Appen operates a gig platform where workers label data remotely. They’re paid per task—often pennies for each labeled image or transcription. Appen’s clients include Microsoft, Adobe, and major robotics firms.
The economics: Appen’s market cap peaked at over $1 billion. Workers receive task-based payments with no governance rights over the datasets they create. Once labeled, that data is resold to multiple clients.
Sama: “Ethical AI” That Still Extracts Worker Value
Sama positions itself as socially responsible, employing workers in Kenya, Uganda, and India at better-than-local wages. They provide AI training data to companies like Microsoft, Walmart, and Continental (automotive).
The pitch: “We help people lift themselves out of poverty through AI work.”
The reality: Workers get $2-9/hour depending on location. The datasets they create sell for thousands to millions per contract. Workers see none of the upside if their labeled data proves particularly valuable, and have no ongoing rights to data they annotated.
The True Cost of Automation (And Why That’s Appropriate)
Here’s an economic reality that needs acknowledgment:
When creator/worker data costs are properly accounted for, automation costs will rise substantially—likely at least double current estimates.
This isn’t bad news. It’s appropriate pricing of the true cost of automation.
Currently, automation economics assume near-zero cost for human expertise used in training. When that expertise is properly licensed—when workers receive fair compensation for the knowledge that makes robots work—automation becomes more expensive.
This is a feature, not a bug:
Policy implications:
Nations that protect workers and maintain wage floors should consider:
- Tariffs on automation products from jurisdictions that don’t require worker consent or compensation for training data
- Import restrictions on AI systems trained on uncompensated worker data
- Proceeds earmarked for displaced workers and re-skilling programs
This isn’t protectionism—it’s enforcement of fair labor standards in the AI economy, just as tariffs can enforce environmental standards or prevent dumping.
Acknowledging the limits: This slows automation but doesn’t stop it. Eventually, as robots generate their own training data and synthetic data improves, human data premiums erode. Worker licensing buys time and captures transitional value—it’s not a permanent solution to technological unemployment. That requires broader wealth redistribution.
Guilds: The Faster Path for Specialized Workers
While unions navigate slow-moving collective bargaining processes, guilds can act faster to establish data governance norms.
Professional guilds—think SAG-AFTRA for actors, but applied to technical and creative fields—have advantages:
What guilds can do immediately:
Guild + Union Coordination:
The optimal structure combines guild agility with union capital:
- Guilds set standards: Technical licensing terms, consent frameworks, pricing guidelines
- Unions enforce through collective bargaining: CBA provisions that incorporate guild standards
- Both pursue board seats: Guilds target AI companies directly; unions use pension capital for broader governance
Rapid-Response Guild Committees:
Guilds should establish standing committees with authority to:
- File complaints with labor boards and regulators
- Initiate litigation against unauthorized data use
- Run pilot attribution and royalty registries
- Negotiate directly with AI companies on behalf of members
Target timeline: Stand up functional guild committees within 90–180 days.
180-Day Roadmap: How Unions Launch Worker Data Licensing NOW
The transition to worker data governance is already underway. Unions that act in the next 6 months will shape the market. Those that wait will find the terms already set by others.
Phase 1: Foundation (Days 0–60)
Consent and Provenance Infrastructure:
- Deploy opt-in consent frameworks for pilot groups (500-1000 workers)
- Implement provenance tagging for all data capture (unique IDs, timestamps, licensing terms)
- Establish secure registry for worker-contributed data with version control
- Partner with technical providers (Copyright.sh, blockchain provenance platforms) for infrastructure
Governance Preparation:
- Draft shareholder proposals for AI companies in pension fund portfolios
- Identify target companies for board seat campaigns
- Develop union investment policy language on AI data practices
- Coordinate with other institutional investors on proxy voting strategy
Legal Foundation:
- Draft model CBA data-use clauses for next negotiation cycle
- File initial complaints where clear consent violations exist
- Establish legal defense fund for data rights litigation
- Engage labor law firms with AI/technology expertise
Phase 2: Pilot Licensing (Days 60–120)
Employer Pilot Programs:
- Launch licensing pilot with at least one employer per sector
- Test per-use and inference-time billing mechanisms
- Implement metering and usage reporting
- Run first royalty distribution to pilot participants
Technical Validation:
- Verify provenance tracking through full data lifecycle
- Test audit mechanisms with employer cooperation
- Measure licensing compliance rates
- Document cost/benefit for both workers and employers
Governance Execution:
- Submit shareholder proposals at spring annual meetings
- Execute first coordinated proxy votes on AI data issues
- Begin negotiations for board observer seats where stakes justify
Phase 3: Scale and Standardization (Days 120–180)
Sector Consortium Formation:
- Expand pilots to multiple employers per sector
- Form cross-union working groups for standard terms
- Publish standardized licensing grammar for worker data (compatible with Copyright.sh format)
- Create sector-wide registries for cross-company licensing
Public Standard Terms:
- Publish reference licensing agreements for common use cases
- Establish audit process standards and certification
- Define compliance requirements for AI companies
- Create public pricing guidelines by data type and use case
Policy Engagement:
- Submit recommendations to labor boards on data as bargainable term
- Engage legislators on worker data governance legislation
- Coordinate with international unions on cross-border standards
- Build coalition with consumer groups, academics, and civil society
Policy Appendix: Inference-Time Licensing Template
For unions and guilds ready to implement, here’s a template framework for inference-time licensing:
Standard License Terms:
“`
WORKER DATA INFERENCE LICENSE
Licensor: [Union/Guild Name] on behalf of contributing workers
Licensee: [AI Company Name]
a. Licensed Data: Worker-contributed [type] data as registered in [Registry Name]
b. Permitted Uses:
– Model inference in production systems: LICENSED
– Model fine-tuning: LICENSED with additional fee
– Redistribution of raw data: PROHIBITED
– Training foundation models: SEPARATE LICENSE REQUIRED
a. Licensee shall maintain logs of inference calls using Licensed Data
b. Attribution to source workers via registry ID required
c. Quarterly usage reports required
a. Per-inference fee: $[X] per 1,000 inference calls
b. Fine-tuning fee: $[X] per model fine-tuned
c. Revenue share: [X]% of products substantially dependent on Licensed Data
d. Minimum annual payment: $[X]
a. Licensor may audit usage logs with 30 days notice
b. Third-party auditor acceptable by mutual agreement
c. Non-compliance triggers 2x remediation payment
a. Initial term: 12 months
b. Auto-renewal unless 90 days notice
c. Immediate termination for material breach
a. All Licensed Data includes verified worker consent
b. Licensee warrants not to circumvent consent mechanisms
c. New data requires re-verification of consent status
“`
This template can be customized by sector, data type, and negotiating position.
The Objections (And Why They’re Partially Right)
“Workers already get paid for their labor. Why should they get paid twice?”
The analogy: Musicians already get paid to record songs. Then they get streaming royalties. Actors get paid to film shows. Then they get residuals. Authors get paid advances. Then they get royalty checks.
The principle: When your work generates ongoing value, you should receive ongoing compensation. AI training data has ongoing value every time a model retrains or—more importantly—every time a model runs inference.
The caveat: This works best for high-value, hard-to-replicate expertise. If your work is easily standardized, licensing has less economic power.
“Companies own the data because they own the workplace equipment that captures it.”
The counter: By that logic, record labels would own musicians’ performance rights because they own the recording studios. But copyright law recognizes that creative output belongs to creators, not to whoever owns the equipment.
The distinction: Employers own business process artifacts (documented procedures, standardized workflows). Workers have governance rights over person-level performance traces (individual techniques, tacit knowledge, undocumented expertise).
The caveat: Courts haven’t definitively ruled on this. It’s an open legal question that unions must push through collective bargaining.
“This will make automation more expensive and slow down progress.”
The response: Licensing music didn’t kill Spotify. It made streaming sustainable by ensuring artists could afford to keep creating. Similarly, worker data licensing makes automation sustainable by ensuring workers can afford the transition.
The reality: Yes, it raises costs. That’s appropriate pricing of true automation costs. Faster automation without compensation creates catastrophic social costs.
The caveat: Eventually, as robots generate their own training data, human data premiums erode. This is transitional value extraction, not permanent protection.
“Workers will never organize effectively around data ownership.”
The precedent: SAG-AFTRA struck successfully from July 14 to November 9, 2023 over AI usage and won contractual protections. The Writers Guild struck from May 2 to September 27, 2023 over AI in screenwriting and won. Unions ARE already organizing around automation and data issues.
The reality: The infrastructure doesn’t exist yet, but neither did music streaming royalties before Spotify forced their creation. Markets adapt when there’s economic and political pressure. Unions provide both.
The caveat: This is genuinely hard. It requires technical infrastructure, legal innovation, international coordination, and sustained organizing. Many attempts will fail. The question is whether unions try or surrender by default.
Why Copyright.sh Cares About Worker Data (Beyond Web Licensing)
Copyright.sh’s mission is “fair compensation when AI learns from human expertise.” We started with web content creators because:
But worker data is the same problem at larger scale. If we solve licensing for bloggers, journalists, and photographers, the same infrastructure can extend to factory workers, nurses, and drivers.
Our role isn’t to build Worker Data PROs—unions should do that. But the technical primitives we’re developing—licensing grammars, usage tracking, payment automation, provenance verification—apply equally to:
- Web content (what we do now)
- Worker-generated training data (what unions could do)
- Any human expertise that AI learns from (the general case)
We want to see worker data collectives succeed because every creator—whether they create blog posts or create assembly techniques—deserves governance rights and compensation when AI learns from their work.
What You Can Do (Even If You’re Not in a Union)
If you’re a worker in a high-automation-risk field:
Assert your rights immediately. You have not consented to being automated. Make this explicit to your employer in writing. Document any data capture systems monitoring your work.
Ask your union about data governance. If your local doesn’t have answers, escalate to regional or national leadership. Make this a contract priority in next negotiations.
Document your expertise. Even without formal data capture, start thinking about your job skills as intellectual property. What techniques do you have that would be valuable for AI to learn? How would you license that if you could?
Connect with others. Worker data licensing will only happen if enough workers demand it. Join or form discussion groups about automation and data rights. Build collective awareness before automation completes.
If you’re a union organizer or labor activist:
Start the 180-day clock now. The roadmap above is actionable. Identify your first pilot group, first employer partner, and first shareholder proposal targets this week.
Study the music industry PRO model. ASCAP, BMI, and SESAC provide proven templates for collective licensing and royalty distribution. The infrastructure exists; it just needs adaptation.
Partner with data rights organizations. Groups working on data governance (like Data & Society, AI Now Institute, Mozilla Foundation) have legal expertise and policy connections.
Pilot small-scale experiments. Find a sympathetic employer willing to try worker data licensing in limited scope. Prove the concept works before negotiating industry-wide.
If you’re in tech or AI:
Support worker data governance. If you work at a robotics or AI company, advocate internally for data licensing programs. Point to music streaming as precedent—it worked, and so can this.
Build the infrastructure. We need better tools for data provenance, licensing, and royalty distribution. Open source projects that solve these problems make worker data PROs feasible.
Buy ethically sourced training data. If your company purchases data from labeling services, ask about worker compensation. Demand transparency about how much of your payment reaches the people who created the data.
The Larger Question: What Do We Owe the People Who Teach Machines?
AI doesn’t learn from nothing. Every capability—recognizing faces, translating languages, driving cars, diagnosing diseases—comes from human expertise converted to training data.
Musicians get paid when machines play their songs. Shouldn’t workers get paid when machines learn their skills?
The technology to make this happen exists. The legal frameworks are evolving. What’s missing is political will and collective organization. Unions have an opportunity to reinvent their purpose: from protecting jobs (which automation makes impossible) to protecting the economic value of human knowledge (which automation makes essential).
Worker Data PROs aren’t a perfect solution. They won’t stop automation or save every job. The obstacles are real—international arbitrage, measurement challenges, employer resistance, and the long-run trend toward synthetic data. But during the 5-15 year window when human expertise still outperforms robots, worker data licensing could ensure that the humans whose expertise builds the robots get fairly compensated for that contribution.
The window is closing. Unions and guilds that act in the next 180 days will shape the market. Those that wait will accept terms set by others.
Because if we don’t build these coordination mechanisms now—if we let tech companies extract worker knowledge for free until automation completes—we’ll look back and realize we gave away the most valuable asset workers have: the knowledge that makes them worth automating.
—
*This article represents Copyright.sh’s position on worker data governance as complementary to our web content licensing mission. We build the technical infrastructure for fair AI licensing; unions and workers must build the organizational power to demand it.*
Leave a Reply