← Back to News

What Is Synthetic Market Research?

What is Synthetic Market Research

What Is Synthetic Market Research?

The Complete Guide to Synthetic Personas and Synthetic Market Research

Synthetic market research is a new way to answer an old question: how do people actually behave - across segments, contexts, and time - when you can’t afford months of recruiting and fieldwork for every decision?

In traditional research, we work hard to build representative samples, craft neutral instruments, and control bias. But we still face structural constraints:

  • recruiting is slow and expensive, especially for niche audiences,

  • participants get fatigued, influenced by context and question order,

  • and re-contacting the same respondents under precisely repeated conditions is difficult.

Ditto’s “Simulating the Human Mind” essay states this plainly: even with statistically representative samples, measurement introduces noise from fatigue, context effects, demand characteristics, and social desirability bias.

Synthetic market research aims to reduce those structural frictions by replacing (or more realistically, augmenting) human fieldwork with synthetic personas: simulated respondents that are (1) statistically grounded to a real population structure, and (2) “cognitively grounded” so they behave consistently over time, reacting to information and context in ways researchers can measure.

Synthetic market research, defined

Synthetic market research is the practice of running research workflows (surveys, concept tests, message tests, pricing experiments, scenario simulations) using synthetic personas - statistically grounded, simulated agents that stand in for segments of a real market.

Ditto frames it as: people don’t have fixed preferences; they have conditional behaviour - answers depend on what they read, their mood, whether they’re rushed or hungry, and what’s happening in the world.

The important point for researchers is this: synthetic research isn’t “random AI opinions.” A credible system is built like any other measurement system:

  1. A population foundation (sampling frame + representativeness),

  2. A behavioural model (how attributes + context drive choices),

  3. A cognitive runtime (memory, attention, emotion, routines - so outputs are stable and longitudinal),

  4. Calibration and validation (benchmarking against ground truth where possible),

  5. Governance (documentation, drift monitoring, privacy-by-design).

The core components of synthetic market research

1) Population-true foundations: how you make synthetic personas representative

If you want synthetic outputs to be analytically defensible, you need an explicit link to real-world population structure. This means creating the synthetic personas based upon actual survey data of humans, so that we can guarantee that the personas generated are representative of a country's population. A common U.S. backbone is ACS PUMS (Public Use Microdata Sample) - individual/household-level records with disclosure protection.

Ditto’s “How we build digital twins” page describes the same “start with the real world” philosophy: anchor to trusted public statistics (census-style age, household, income bands, etc.) and ensure the “big picture adds up” before adding detail.

The researcher’s translation: sampling frames still matter

Synthetic personas can be incredibly useful - but only if you can answer:

  • Which population is represented? (country, region, category users vs all adults)

  • What variables are controlled? (age/sex/income/household/urbanicity, etc.)

  • What’s the effective resolution? (national vs state vs metro vs neighborhood)

  • What slices are too sparse? (and therefore merged/flagged)

Ditto explicitly notes that unstable thin slices should be merged or flagged, and that validation should include holdouts and back-testing.

2) Statistical generation: the mechanics behind credible synthetic panels

A solid synthetic panel pipeline typically uses well-established statistical techniques:

A) Weighting / raking / IPF to match known totals

In survey science and microsimulation, iterative proportional fitting (IPF) and related calibration/raking procedures are used to align a sample to known population totals (“control totals”).

B) Integerisation: turning weights into “people”

IPF often produces non-integer weights. If you want a discrete panel of individual personas (rather than just weighted records), you need a way to convert fractional weights into integer counts.

A widely cited approach is Truncate–Replicate–Sample (TRS), introduced for integerising IPF weights to generate representative integer results.

Why this matters in practice

This is where synthetic market research earns credibility with quant-minded stakeholders: you can show, margin by margin, that the synthetic panel reproduces known distributions (age bands, regions, household size, etc.) within acceptable tolerances - then quantify where it’s weaker.

3) Persona enrichment: detailed descriptions without “fiction drift”

Once you have population-true foundations, you can enrich each persona into something that’s usable for research and strategy:

  • life stage and household context,

  • category relationship (habits, triggers, barriers),

  • budgets and constraints,

  • identity cues and values.

The key researcher rule: expressive narrative is fine; constraints must remain real. A persona can sound vivid, but if the story violates the underlying demographics, you’ll get persuasive nonsense.

Ditto’s method emphasizes keeping only attributes that “actually move behaviour” and avoiding noise variables that don’t add signal.

The big upgrade: “mind simulation” as the engine of longitudinal realism

A lot of early “synthetic AI persona” approaches were essentially: prompt + LLM + answer. That can be useful for brainstorming, but it fails the moment you need longitudinal consistency, realistic within-person variance, and measurable reactions to context.

Ditto’s “Simulating the Human Mind” article describes a different approach: synthetic personas are not static prompts; they’re continuously updating agents with persistent internal state, structured memory, probabilistic beliefs, affect dynamics, and social relationships - so decision-making routes through those components.

4) The Mind Loop: how a synthetic person “runs”

At the center is a closed-loop runtime (“Mind Loop”) with six steps: Perceive → Appraise → Update State → Decide → Act → Reflect.

From a market research standpoint, this matters because it produces:

  • stable individual differences (between-person variance),

  • realistic fluctuations (within-person variance),

  • and repeatability under controlled manipulations.

That’s what lets you treat synthetic results as analyzable distributions rather than one-off “chat answers.”

5) Perception: news ingestion, weather, messages, and time-of-day cues

The Mind Loop explicitly includes ingestion of structured observations from external and internal environments - including news and weather, messages, tasks, and time-of-day cues.

It also decomposes perception into streams:

  • Exteroception

    (news, pricing changes, product exposure, advertising stimuli, trend cues),

  • Social perception

    (tone, norms, status cues),

  • Chronoception

    (time passage, deadlines, seasonality),

  • Interoception

    (energy, hunger, discomfort, stress load).

For synthetic market research, this is where “digital twins” become much more than demographics. It’s how a concept test can meaningfully differ on a calm Wednesday versus a stressful week with negative headlines and bad weather.

6) Attention & salience: why context effects and question order effects emerge

Human cognition is bounded. People don’t process everything evenly; attention is selective.

Ditto describes a salience model that scores incoming observations and selects a top subset into working memory, using drivers like novelty, emotional intensity, goal relevance, social significance, uncertainty, and repetition.

This is a subtle but important research benefit: it yields plausible context effects and question-order effects because salience changes with the information environment.

7) Internal state: goals, constraints, beliefs, affect, identity

A cognitively grounded persona needs explicit state that persists across sessions - otherwise the LLM becomes the “state container,” which can drift and become unauditable.

Ditto’s architecture treats state as formal latent variables researchers can interpret:

  • goals,

  • constraints (time, financial, obligations),

  • beliefs with probabilistic confidence + source attribution,

  • affect (mood baseline, emotions, arousal/energy),

  • identity/self-model (“I’m the kind of person who…”),

  • context (time, location, routine phase).

This is exactly what you need to model things researchers see every day: “I like it, but I can’t afford it,” identity-driven resistance to persuasion, and different responses by daypart or setting.

8) Memory: episodic, semantic, procedural, working, autobiographical, social

Memory is where synthetic personas either become realistic - or become omniscient and weird.

Ditto explicitly decomposes memory types and includes forgetting (salience-based decay) and consolidation (compression), to avoid perfect recall artifacts.

This supports realistic phenomena like:

  • a remembered “bad brand episode” shaping avoidance,

  • habit persistence without constant deliberation,

  • selective recall biased by current concerns.

9) Motivation and drives: what actually powers choice

Traditional surveys struggle with the stated-vs-revealed gap because motivations often operate beneath conscious articulation.

Ditto represents drives as internal deficit/satiation variables (setpoints), including security, affiliation, status, competence, autonomy, novelty, comfort, meaning, and more - parameterized per agent so the same stimulus lands differently across personas.

This is where synthetic research can shine for segmentation and messaging: it’s not only who someone is demographically; it’s which motivational systems get activated under which contexts.

10) Emotion: not a label - an engine (plus regulation strategies)

Emotion is not just “happy/sad.” It changes what is processed, remembered, and chosen.

Ditto treats emotion as an appraisal system mapping events to action tendencies (fear, anger, sadness, joy, shame, etc.), and includes emotion regulation strategies (reappraisal, suppression, distraction, problem-solving, social soothing).

From a market research view, this enables more realistic simulation of:

  • inflation anxiety and brand switching,

  • identity threat and resistance,

  • the “tone” of consumer narratives under stress.

11) Executive function: cognitive control, planning depth, fatigue effects

Executive function sets budgets for cognition - planning depth, inhibition, error monitoring, task switching costs - and predicts realistic fatigue effects: more habit reliance, reduced planning, narrowed attention under stress.

That’s directly relevant to how consumers behave when overloaded: they satisfice, defer, default to the usual brand, or avoid complexity.

12) Habits: routine consumption and path dependence

Many purchases aren’t re-optimized; they’re habitual.

Ditto describes a habit module keyed by triggers like time of day, location, mood/stress, social setting, and cues (promotions, reminders, pantry state), with habits strengthening when they efficiently reduce drive deficits and weakening when outcomes disappoint or the environment changes (e.g., price increases, availability disruptions).

This supports one of the most valuable things a synthetic system can do: model path dependence (how today’s experience changes tomorrow’s behaviour).

13) Social cognition: relationships, norms, and “theory of mind”

Consumers live inside social networks and households.

A mind simulation that models relationship history, trust/closeness, inferred traits, and norms can better simulate:

  • household purchase negotiation,

  • recommendation and word-of-mouth dynamics,

  • social desirability pressure.

Ditto’s architecture includes structured representations for relationships and social memory for these reasons.

Summary: Synthetic Market Research is a huge new opportunity for market researchers

Synthetic market research is moving from “AI personas” toward measurable simulated respondents: population-grounded panels that live in context (news, weather, time pressure), maintain memory and emotion, exhibit habits and bounded attention, and can be calibrated to external benchmarks like Michigan Consumer Sentiment and prediction market signals.

This combination - population truth + cognitive runtime + calibration - is what turns synthetic research from an interesting demo into a method that's trusted and valuable in a board room, senior management meeting, or CEO one-on-one session.

Our customers are making multi-million and billion dollar decisions based upon our timely, accurate and insightful research. It's time you started doing that.

Phillip Gales

About the author

Phillip Gales

Phillip is a serial tech entrepreneur that specializes in applying AI and machine learning solutions to antiquated and heavy industries. He has been a senior leader or founder at a number of succesful startups.

Phillip holds an MBA from Harvard Business School, an MEng from the University of Cambridge, and is a Y-Combinator alum

Related Articles


Ready to Experience Synthetic Persona Intelligence?

See how population-true synthetic personas can transform your market research and strategic decision-making.

Book a Demo

Ditto Newsletter - Subscribe

Get insights that don't slow you down. Research updates, case studies, and market intelligence—delivered monthly.