Synthetic Market Research Definitions
Quick and clear definitions of key terms in synthetic market research - including "synthetic data", "synthetic personas" and "digital twins".Synthetic market research is full of confident-sounding phrases that are often used imprecisely. Some are borrowed from engineering, others from statistics, and a few from marketing departments with a penchant for grand claims.
This post is intended to put the key terms on firmer ground, in plain English, with enough nuance to be useful and enough scepticism to be safe.
What is synthetic market research?
Synthetic market research uses AI and large datasets to create research groups of synthetic personas in order to perform insightful market research.
Instead of recruiting people, running surveys, and waiting weeks for results, a researcher queries a series of synthetic personas that produce accurate simulated responses: example, purchase intent, brand perception, concept feedback, segmentation insights, and so on.
Properly done, it is an excellent replacement for real-life market research studies, and it can be a disciplined shortcut through uncertainty. It can be used to explore hypotheses, pressure-test messaging, identify likely objections, and triage which questions deserve expensive human fieldwork. However, if poorly done, it is merely a fast way to produce plausible nonsense with a veneer of confidence.
What is synthetic data?
Synthetic data is artificially generated data that is designed to mimic the statistical properties of real data. It may be produced to protect privacy, to expand a small dataset, to simulate rare events, or to create test environments where real customer data is too sensitive or too scarce.
Synthetic data is particularly important in fields like medicine and health care, where patient privacy may make it incredibly difficult to directly access real-life data, but where there are strong and demanding research objectives (e.g. curing cancer).
Synthetic market research is about using AI to recreate human responses
Synthetic data is about using AI to create data
It's a bit like the difference between playing a computer game and seeing if the player win or lost - they're both simulations - but synthetic market research is activity-based whereas synthetic data is about the result.
The crucial point is that synthetic data is defined by its relationship to a target distribution. The goal is not to fabricate random numbers, but to reproduce patterns that matter: correlations, frequencies, conditional relationships, and edge cases. A useful synthetic dataset behaves like the real one for the purpose at hand, even though no row corresponds to an actual person.
Common forms of synthetic data
Statistical synthetic data, generated from parametric or probabilistic models calibrated on real datasets.
Agent-based synthetic data, produced by simulated agents interacting under rules in an environment.
Generative synthetic data, created by modern generative models that learn patterns from examples and then produce new ones.
A caution: synthetic data can preserve biases with in subtle but compelling ways. If the original data under-represents a group, the synthetic version may faithfully continue to do so, unless corrected deliberately.
What are synthetic personas?
Synthetic personas are very complex and carefully engineered simulations of humans using data and AI.
They are created to represent a population, segment, or customer archetype. In traditional marketing, a persona is a narrative device, a lightly researched character sketch with a name, a backstory and a few preferences. A synthetic persona is significantly more than a creative writing exercise.
In synthetic market research, a synthetic persona is typically a structured profile with attributes such as demographics, household context, constraints, motivations, and behavioural tendencies. The profile is then used to generate responses to questions, scenarios and stimuli. A large set of such personas can form a synthetic panel, allowing analysis by segment, region, income bracket, or any other variable.
What makes a synthetic persona credible?
Calibration to known population statistics (age, income, geography, household composition, and so forth).
Internal consistency so that a persona’s constraints, preferences and life circumstances do not contradict one another.
Stability so that repeated questioning produces coherent answers rather than random drift.
Variance so that a panel contains real diversity, not one personality copied a thousand times.
Done well, synthetic personas can be used to test messaging and concepts at speed, to explore how different segments might interpret the same claim, or to anticipate objections that a founder is too enamoured to notice. Done badly, they become a mirror in which a team sees only its own assumptions reflected back.
What are digital twins?
A digital twin is a digital representation of a real-world entity used to simulate, predict or optimise its behaviour.
The term comes from engineering, where a digital twin might represent a jet engine, a wind turbine, or a factory line, updated with sensor data and used to run “what-if” scenarios.
In the context of market research, a digital twin is typically an attempt to recreate a specific person.
This can sound like a very compelling and interesting value-proposition - example, "I want to sell to
Worse still, the real human gets treated like their online persona. The User generates assumptions and tries to build a rapport with pre-conceived notions of who that human is based upon their digital presence. There's no interest in building a true connection - time is wasted on both sides, under the pretense of a scientific method
Digital twin vs synthetic persona
This distinction is about the source and use - representation versus digital presence.
A synthetic persona is a representative profile used to generate plausible feedback. They don't actually exist, but they very strongly model how a human like that would react.
A digital twin is a recreation of a real human based upon their digital presence. They may be sophisticated, but it's like chatting with someone's Instagram account - almost entirely fake
Other essential terms in synthetic market research
Synthetic panel
A synthetic panel is a collection of synthetic personas designed to represent a target population. Like a traditional consumer panel, it can be segmented and analysed. Unlike a traditional panel, it can be created quickly and re-weighted easily as the target changes.
Calibration and grounding
Calibration is the process of adjusting a synthetic system so that its outputs match known benchmarks, such as census distributions or observed category purchase rates. Grounding refers to anchoring the model to external facts or constraints so that it does not float away into wishful invention. In practice, grounding may include structured data, documented brand claims, price points, or rules that constrain behaviour.
Bias and representativeness
Bias is systematic error, often invisible to those who benefit from it. Representativeness is the degree to which a sample, synthetic or otherwise, reflects the population of interest. Synthetic research can amplify bias if it learns from skewed inputs, or reduce it if it is explicitly corrected using benchmarks. The outcome depends on choices, not slogans.
Validation
Validation is the process of checking whether synthetic outputs track reality for a particular use case. Validation can include back-testing against historical outcomes, parallel runs with human studies, and sensitivity analysis to see how much conclusions depend on assumptions. Without validation, synthetic market research is theatre.
Scenario testing
Scenario testing means exploring “what if” questions: what if prices rise 10%, a competitor launches a similar product, or regulations change. Synthetic systems are well suited to scenario work because they can run many variations cheaply, exposing which assumptions matter and which are mere decoration.




