← Back to Case Studies
Internal PrototypeMarketing SaaS

One Brief In, Three Scored Ads Out: A Copy Generation API

An embeddable API that turns a campaign brief into platform-ready ad variations — each one grounded in your product data and ranked by a persuasion score, not vibes.

3scored variants per call
80–85%first-attempt accuracy
< 20sbrief → ready ads
FastAPILangChainpgvectorOpenAIPythonPostgreSQL

Transparency note: This is an internal prototype, not a paid client engagement. It was built on public documentation patterns and synthetic campaign data to demonstrate what our studio can build and adapt for real SaaS teams.

01

Overview

Marketing SaaS platforms made campaign launching easy. But the copy still gets written the same slow way: a brief goes to a copywriter, the copywriter writes one or two drafts, someone reformats them for each channel, and the whole cycle repeats for every product and audience segment. The copy is the bottleneck nobody talks about.

This prototype is an embeddable copy generation API — not a standalone "AI writer" tool. You send it a structured brief (product, platform, tone, audience), and it returns three distinct ad variations, each formatted for the target channel and each scored on a 0–100 persuasion scale. The API is designed to sit inside a marketing SaaS product where teams already build campaigns, so no one has to context-switch to another tool.

The key difference from a generic LLM wrapper: every generation is grounded in retrieved product data and brand constraints, and every output comes with a structured quality signal. Teams can compare options using the persuasion score instead of guessing which variant "feels better."

02

The Math Problem

Here is the math that makes this painful. A mid-size agency running 20 campaigns per week, each needing variants for 4 platforms (Facebook, Instagram, Google Ads, email), in 2 tones (professional and casual), needs 160 pieces of copy per week minimum. Most teams also want A/B test variants, which doubles it. That is 320 ad variations — every week — just to run the campaigns properly.

Weekly copy volume for a typical mid-size agency

20
campaigns
×4
platforms
×2
tone variants
= 160+
copy pieces / week

Where the bottleneck actually lives:

Copywriting is the gating step. A campaign that takes 15 minutes to set up in the platform can take 2–3 hours to write copy for across channels. The platform is fast; the human isn't.

📊

Teams under-test because they can't produce enough variants. They run 2 versions when they should run 6. The testing infrastructure is there — the copy supply isn't.

🔀

Reformatting burns time silently. A great Google Ads headline is useless as an Instagram caption. Someone manually reshapes every piece for every channel, and that work is invisible in sprint planning.

The insight behind this prototype: the problem isn't "generate text with AI." Every tool does that now. The problem is generate the right text, for the right channel, grounded in real product data, and tell you how good it is — at API speed, so it can be embedded in the workflow where campaigns are already built.

03

How It Works

The API follows a strict flow: retrieve real product context first, generate copy anchored to that context, then score the output before returning it. Nothing ships to the user without a quality signal attached.

Single API Call Flow

1

Brief comes in

The SaaS app sends a structured request: product_id, target platform (e.g., Instagram), audience segment, tone, and any constraints.

2

Context is retrieved — not generated

Vector search pulls the most relevant product descriptions, brand guidelines, past campaign snippets, and tone rules. This is what keeps the copy grounded instead of generic.

3

Three variations are drafted

The LLM generates three distinct angles — each tailored to the platform's format constraints (character limits, hashtag conventions, CTA patterns).

4

Each variation is scored

A persuasion scoring layer evaluates CTA strength, emotional resonance, clarity, and use of social proof. Each variant gets a 0–100 score so teams can compare without reading every word.

5

Response returns with metadata

The API returns all three variations, their scores, a confidence flag, and the source context used — so the result is auditable and the calling app can display quality signals to the user.

What makes this different from "just call the LLM"

Retrieval, not hallucination

Every generation starts from retrieved product data. The model can't invent features or claim benefits that don't exist in the indexed content. When retrieved context is thin, the system falls back to safe phrasing and flags the response as low-confidence.

Scoring removes the guesswork

The persuasion score isn't a gimmick — it evaluates CTA strength, emotional language, clarity, and use of social proof. In testing, the highest-scored variant was preferred by reviewers in the majority of cases. It turns "which one feels better?" into a quantified decision.

Platform-native output

An Instagram caption and a Google Ads headline have nothing in common structurally. The API handles platform formatting natively — character limits, hashtag conventions, headline/description splits — so no one has to manually reshape the output.

Feedback loop for missing context

When the system can't confidently generate strong copy — usually because the product positioning is thin or the brand guidelines are unclear — the request gets logged. Over time this builds a map of exactly which documentation gaps are costing the team the most.

04

Under the Hood

Backend
Python, FastAPI
Orchestration
LangChain RAG pipeline
Storage
PostgreSQL + pgvector
AI Layer
OpenAI-compatible embeddings & LLM
Interface
REST API + lightweight review UI
Evaluation
RAGAS-style metrics + manual QA

Context structuring (not just chunking)

  • Product descriptions and brand docs are split into semantically coherent sections: value proposition, feature list, objection handling, tone and style guidance. This is more structured than generic text chunking because copy generation needs specific context types, not just "nearby text."
  • Only sections relevant to a given request are included in the prompt — product features for a product-focused ad, tone guidance for a brand-sensitive audience — to avoid context overload and keep generations focused.

Multi-tenant retrieval

  • Vector similarity search pulls the most relevant chunks based on the incoming request (product_id, platform, audience), with metadata filters ensuring we never mix contexts across different brands or accounts.
  • In a real deployment, this would scope retrieval per customer account, so an agency managing 50 brands never gets brand A's guidelines leaking into brand B's copy.

Persuasion scoring layer

  • After generation, each variant is evaluated on four axes: CTA strength (is the action clear and compelling?), emotional resonance (does it create urgency or desire?), clarity (is the value prop understandable in 3 seconds?), and social proof usage.
  • The composite 0–100 score makes copy quality a measurable metric rather than a subjective opinion — critical for agencies that need to justify creative decisions to clients with data.

Testing approach

  • Manually curated test set of synthetic briefs measured whether generated copy correctly reflected the product description, followed tone guidelines, and included a clear CTA.
  • RAG-style evaluation metrics (faithfulness, answer relevance) were approximated using automated checks plus human spot-review. The emphasis was: does the copy say true things about the product and sound right for the channel?

The architecture is designed to be embedded, not standalone. Every component — retrieval scope, prompt templates, scoring weights, platform formatting rules — is configurable per customer so a SaaS team can iterate without touching core infrastructure.

05

What Happened in Testing

These results are from internal tests using synthetic briefs and products, not production traffic. They show how the system behaves, not ROI claims.

< 20s
brief → 3 scored ads

A human copywriter takes 30–60 minutes to produce the same spread across platforms. The API returns all three, formatted and scored, in under 20 seconds.

80–85%
usable on first pass

Of generated variations accurately reflected the product description, matched the requested tone, and included a clear CTA without needing manual edits.

Scored > unscored
persuasion ranking worked

In blind comparisons, the highest-scored variants were preferred by reviewers in the majority of cases — the scoring wasn't decorative.

Where it struggled (honest notes)

  • When product positioning was vague in the indexed content, the generated copy was technically accurate but bland — it said true things without saying anything compelling. The system needs strong input data to produce strong output.
  • Humor and culturally specific references were hit-or-miss. The model could produce professional and casual tones reliably, but "witty" or "provocative" tones need more guardrails and likely a human-in-the-loop before going live.
  • The persuasion score was directionally useful but not always calibrated at the extremes — scores between 60–80 were more reliable indicators of quality than scores above 90.

Industry Context (not claims from this prototype)

  • Public benchmarks for AI-assisted marketing workflows report significant reductions in time to first draft, with some teams running 3–5x more creative experiments per campaign without increasing headcount.
  • Meaningful improvements in CTR and conversion rates when teams systematically generate and test more targeted variants — even when AI never fully replaces human review and strategy.
06

What This Means If You're Building a Marketing SaaS

If you're a CTO, Head of Product, or VP of Engineering at a marketing or adtech SaaS, this prototype shows something specific: an AI copy feature doesn't have to be a bolt-on chatbot. It can be an API-native capability that your product team controls — with structured inputs, scored outputs, and retrieval grounded in your customer's own brand data.

Your users generate more — and better — variants per campaign, so they actually run the tests they've been planning to run for months.

Copy stays consistent with brand and product data because it's retrieved, not hallucinated. A junior marketer using the tool produces on-brand output by default.

The persuasion score turns copy review from a subjective "which one feels better?" into a data-backed decision — which matters when an agency is reporting to a client.

To take this from prototype to your product:

Plug in your data

Connect to your product catalog, brand guidelines, and past campaign examples. Scope retrieval per customer account so brand contexts never bleed across tenants.

Embed in your builder

Expose the API inside your campaign builder — as a "generate variations" button, a sidebar assistant, or a batch generation endpoint — so it lives where the work already happens.

Ship in your cloud

Deploy in your VPC or preferred cloud, with your auth layer, your logging, and your compliance requirements. We hand off a system your team owns and can iterate on.

Building a Marketing SaaS? Let's Talk Copy Infrastructure.

This is an internal prototype built on synthetic data to show what's buildable. If you're looking to add a grounded, scored copy generation feature to your marketing SaaS, I can walk you through a live demo and map it to your product.

We'd start with your existing brand data and campaign workflows to prove the approach works for your specific use case — before building anything production-grade.