The real question every publisher asks: “If I put a price tag on my content for AI companies, what should it be?”
This isn’t an abstract market design problem. It’s the concrete challenge facing every creator who wants to participate in AI licensing: set the price too high, and AI systems filter you out. Set it too low, and you’re underselling content that took real work to create. Set it just right, and you have no idea if that’s even possible.
This article won’t solve price discovery—nobody can, yet. But we can share what we’re learning about the factors that should inform your pricing decisions, the ranges emerging from real deals, and frameworks for thinking through the tradeoffs.
The Core Tension: Visibility vs. Revenue
Here’s the uncomfortable truth about AI content licensing that nobody wants to say out loud:
If you set a price, some AI systems will skip your content entirely.
AI companies running retrieval-augmented generation (RAG) systems face a budget constraint. When Perplexity or ChatGPT pulls in web sources to answer a query, they’re not going to spend $50 on licensing fees for a single response. If your content is priced above the threshold they’ve set for that query type, you get filtered out.
This creates a fundamental tension:
- Higher price = More revenue per use, but fewer uses (some systems skip you)
- Lower price = More uses, but less revenue per use (you might be underselling)
- No price (free) = Maximum visibility, zero revenue (the current default)
The “right” price depends entirely on your goals. Do you want maximum reach? Set a low price or license freely. Do you want to capture premium value? Set a higher price and accept lower volume.
What Factors Should Inform Your Price?
There’s no formula, but there are factors. Here’s what we’ve learned matters:
1. Content Uniqueness
Generic content competes on price. If your blog post covers “10 Tips for Better Sleep,” you’re competing with thousands of similar articles. AI systems have options. You have no pricing power.
Unique content commands premiums. Original research, proprietary data, expert analysis, first-hand reporting—if AI systems can’t get it elsewhere, you have leverage. A medical researcher’s clinical trial data is worth more than a health blogger’s summary of that data.
Questions to ask:
- Can AI systems find equivalent content elsewhere?
- How much would it cost to recreate this content from scratch?
- Does this content have unique authority (credentials, access, expertise)?
2. Content Type and Typical Use
Different content types get used differently in AI systems:
| Content Type | Typical AI Use | Pricing Considerations |
|---|---|---|
| News/Journalism | Real-time citation, factual grounding | Time-sensitive; value decays quickly after publication |
| Reference/Documentation | Training data, knowledge bases | Stable value; used repeatedly across sessions |
| Creative Writing | Style training, content generation | Harder to track; value in aggregate, not individual pieces |
| Research/Analysis | Domain expertise, specialized queries | High value per use, lower volume |
| Code/Technical | Training, code completion | Very high frequency; consider per-token vs per-document |
3. Your Traffic Economics
This is the math most publishers avoid but shouldn’t:
What’s a pageview worth to you?
If your site earns $5 CPM from advertising, every 1,000 pageviews generates $5. If AI licensing pays $0.05 per 1,000 tokens (~750 words), you need to ask: does the licensing revenue replace or supplement the advertising revenue you’re losing?
The replacement calculation:
Monthly ad revenue from article: $X
Expected AI licensing revenue: $Y
If Y > X: Pricing is sustainable
If Y < X but AI access doesn't cannibalize traffic: Still worth it
If Y < X and AI access replaces visits: You're losing money
For many publishers, AI access replaces visits—users get the answer from ChatGPT instead of clicking through. If that’s your situation, licensing revenue needs to exceed your ad revenue, or you’re subsidizing AI companies.
4. The Filtering Threshold Problem
Here’s where it gets uncomfortable: we don’t know exactly where AI systems set their cost thresholds.
What we do know:
- Enterprise RAG systems often have per-query budgets ($0.10 – $1.00 per response)
- That budget gets split across multiple sources (typically 3-10 retrieved documents)
- If your content’s price exceeds your share of the budget, you’re filtered out
Rough math: If a query budget is $0.50 and the system retrieves 5 sources, your ceiling is ~$0.10 per retrieval. Price above that, and you might not get retrieved at all.
This creates an information asymmetry problem. AI companies know their thresholds; you don’t. Until transparent market data emerges, you’re pricing blind.
Emerging Price Ranges from Real Deals
Based on disclosed deals and our own data, here are the price ranges we’re seeing in 2024-2025:
Text Content (Per 1,000 Tokens)
| Tier | Price Range | Typical Content |
|---|---|---|
| Floor | $0.001 – $0.01 | User-generated content, forums, comments |
| Standard | $0.02 – $0.08 | Blog posts, general web content |
| Premium | $0.10 – $0.25 | Journalism, professional documentation |
| Ultra-Premium | $0.50 – $2.00 | Research, proprietary data, exclusive content |
| Bespoke | Negotiated | Major publications, front-list books |
Context for These Numbers
News Corp’s OpenAI deal ($250M over 5 years) works out to roughly $0.05-$0.10 per 1,000 tokens if you assume their content gets used in ~5-10% of relevant queries. That’s a major publisher with significant leverage.
Reddit’s $203M deal (early 2024) covered billions of comments. On a per-1,000-tokens basis, that’s likely under $0.005—but Reddit had a unique asset (conversational data) that no other source could provide.
Academic publishers (Wiley, Taylor & Francis) are reportedly negotiating $0.15-$0.30 per 1,000 tokens for research content, reflecting the specialized value.
The Brickroad research found that 77% of disclosed AI licensing revenue went to aggregators, not creators. When platforms negotiate, they capture most of the value.
The “Skip Threshold” We’re Observing
Anecdotally, we’re seeing AI systems become more selective above $0.10 per 1,000 tokens for general web content. Below that threshold, most licensed content gets retrieved normally. Above it, some systems filter to cheaper alternatives.
This isn’t a hard rule—it varies by AI company, query type, and content necessity. But if you’re pricing above $0.10 for general content, you should expect lower retrieval rates.
A Framework for Setting Your Price
Given all these factors, here’s a decision framework:
Step 1: Categorize Your Content
- Commodity content (replaceable, widely available): Price at floor ($0.01-$0.03)
- Quality content (well-researched, professional): Price at standard ($0.04-$0.08)
- Differentiated content (unique, authoritative): Price at premium ($0.10-$0.20)
- Irreplaceable content (exclusive, primary sources): Price at ultra-premium ($0.25+)
Step 2: Consider Your Goals
Maximize reach: Price at the lower end of your category. You’ll get retrieved more often, earn less per use.
Maximize revenue per use: Price at the higher end. Fewer retrievals, but more revenue when it happens.
Exclude training use: Set a high training price or deny training entirely. Many publishers want inference access (retrieval, citation) but not training use (model weights).
Step 3: Test and Iterate
The honest answer: you’re going to have to experiment.
Start with a price in the middle of your category. Monitor:
- How often your content gets retrieved (if you have visibility)
- Revenue per month
- Whether adjusting price up/down changes volume
There’s no dashboard for this yet. You’re flying partially blind. But any price you set is better than the default (free), and you can adjust as data comes in.
Pricing by Stage: A More Granular Approach
Copyright.sh’s meta tag syntax lets you set different prices for different AI use cases:
<!-- Base rate for all access -->
<meta name="ai-license" content="allow;price:0.08;payto:yoursite.com">
<!-- Override for inference (real-time queries) -->
<meta name="ai-license-infer" content="allow;price:0.05;payto:yoursite.com">
<!-- Higher price for training use -->
<meta name="ai-license-train" content="allow;price:0.25;payto:yoursite.com">
<!-- Block embedding entirely -->
<meta name="ai-license-embed" content="deny">
This granularity makes sense because different uses have different value:
| Stage | What It Means | Typical Pricing Strategy |
|---|---|---|
| Infer | Real-time retrieval, RAG, citation | Lower price, high volume |
| Embed | Vector database, semantic search | Medium price, one-time per document |
| Tune | Fine-tuning, model adaptation | Higher price, premium permission |
| Train | Foundation model training | Highest price or deny |
Most publishers should consider blocking or pricing high for training, since you’re permanently contributing to model weights. Inference is more transactional—you’re cited, users might click through, and you can track usage.
The Infrastructure Cost Floor
There’s a minimum price below which licensing isn’t worth the overhead:
If it costs more to track and collect than you earn, don’t bother.
Rough math:
- Payment processor fees: 2.9% + $0.30 per transaction
- Minimum viable transaction: ~$1.00 (below this, fees eat too much)
- If you’re earning $0.01 per retrieval, you need 100 retrievals to hit a $1 payout
For small publishers, aggregate before collecting. Build up a balance before triggering payouts. Copyright.sh handles this by batching micropayments, but be aware that ultra-low pricing only works at scale.
What About Blocking Instead?
Some publishers conclude: “If I can’t price fairly, I’ll just block AI entirely.”
That’s a valid choice, but consider:
Blocking costs too. If AI-powered search becomes the dominant discovery channel (it’s heading that way), blocking means invisibility. You might preserve your direct traffic, but you lose the AI-mediated traffic that would have found you.
The middle ground: License inference at a low-but-nonzero price, block training. You participate in AI-mediated discovery, get some revenue, but don’t contribute to permanent model weights.
Honest Unknowns
Here’s what we don’t know yet:
- What prices actually get filtered out? AI companies don’t publish their cost thresholds. We’re inferring from behavior.
- Will prices converge? Is the market heading toward commoditized rates (like ad CPMs) or will differentiation persist?
- How do AI companies budget per query? If a user asks a complex question, does the budget expand? We don’t know.
- Does pricing affect quality of retrieval? If you’re priced low, do you get used for throwaway queries? If priced high, only for premium queries?
These are empirical questions the market hasn’t answered yet. Anyone claiming certainty is speculating.
What We’re Learning at Copyright.sh
From our early data (small sample, not statistically significant yet):
- $0.05-$0.08/1K tokens seems to be a “safe” range—content gets retrieved normally, revenue accumulates
- Above $0.15 we see retrieval drop-off for general content
- Specialty content (technical documentation, research) sustains higher prices
- Time-sensitive content (news) should price lower to maximize retrieval before value decays
We’re sharing this not as definitive guidance but as early signal. As more publishers set prices and we get more data, we’ll update these observations.
Practical Steps
If you’re a publisher wanting to participate in AI licensing:
1. Start with a price in the safe range ($0.04-$0.08 for general content). You can always adjust.
2. Use stage-specific pricing. Price inference low for reach, training high for control.
3. Monitor what you can. If you’re using Copyright.sh, you’ll get usage reports. Watch for patterns.
4. Expect to iterate. This market doesn’t have established pricing yet. You’re part of the price discovery process.
5. Don’t let perfect be the enemy of good. Any price is better than $0.00.
The Bigger Picture
Price discovery in AI content markets is going to take years to stabilize. Right now, we’re in the “wild west” phase where individual publishers are setting prices without market signals, AI companies are making filtering decisions without transparency, and everyone is flying somewhat blind.
That will change. As more licensing transactions happen, as more data accumulates, and as market infrastructure matures, pricing will become more efficient. The publishers who participate early are the ones who’ll have data to guide future decisions.
The goal isn’t to nail the perfect price today. It’s to start participating in the market so you’re positioned when the market matures.
This article reflects our current understanding (November 2025) and will be updated as we learn more. If you’re a publisher with pricing data or experiences to share, we’d love to hear from you.
Leave a Reply