March 13, 2025
5 min
Incrementality has become a bit of a buzzword—used liberally and often misunderstood. And while many claim to measure it, few adhere to the data-science-backed methods necessary to actually prove it (as seen here and here). It is not attribution or new-to-brand customers but a controlled experiment comparing a holdout group (users who do not see an ad) with an experimental group (users who do see an ad). This eliminates the guesswork and proves whether ad exposure can drive measurable increases in metrics like conversions and revenue.
So why aren’t more people properly measuring incrementality?
According to Anthony Kilili, a Data Science Leader at Kroger Precision Marketing, “Retailers need expert media teams and sophisticated tools capable of rapidly and repeatedly executing randomized controlled trials and other testing methodologies.” The TL;DR? Companies likely don’t have the tools and teams to execute incrementality correctly. And it is likely why they look to third-party lift studies like Kantar or Neilsen that only provide high-level directional impact analysis and not outcome-based metrics.
Incrementality testing in the Amazon ecosystem
Now, there are several ways to execute a data-science-backed incrementality test, like randomized control groups, geo-based testing, and intent-to-treat/ghost ads, but when it comes to measuring incrementality for large-scale advertising campaigns, especially on the Amazon DSP and with Streaming TV ads, you have to use a geo-based test. This is because its scale, ad delivery method, data accessibility, and reliability ensure clean experimental conditions within the Amazon DSP.
Why a geo-based test over other popular tests like randomized control groups?
When users are randomly split, ad spillover can occur because users in the holdout group might still see the ads in shared environments (households, public spaces, etc), contaminating the holdout group and reducing the accuracy of the results. With a geo holdout, the risk of ad spillover and contamination is minimized because ads are either shown or not shown to entire regions. This reduces the likelihood of users in the holdout group seeing the ad, ensuring a more accurate measurement of the campaign's incremental impact.
Additionally, a randomized control group test could assign users across regions, leading to geographic imbalances where market dynamics (e.g., competitive conditions, economic factors, seasonality) might vary significantly, making the results harder to generalize to the broader population. In using a geo-based test, marketers can measure incrementality at a market-wide level, capturing the broader impact of the campaign.
Why geo-based incrementality tests are so effective within the Amazon DSP:
Rich zip-code level total sales data available from Seller Central/Vendor Central can be leveraged in the holdout model and can be combined with 1P data to create DMA level groupings
Can easily negate Designated Market Areas (DMA’s) from DSP campaigns
High degree of control with the experiment and holdout group makeup
For marketers that demand greater transparency and control, this level of customization offers the ability to intentionally pull select levers within the test. For example, a seasonal winter brand would likely want to ensure colder and warmer regions are evenly distributed amongst the two test groups to avoid skewing the results of the campaign toward a specific seasonality. This would’ve otherwise been impossible to control in a randomized test, or in a test where parameters are a black box.
Considerations for control—tradeoffs when curating your DMA groupings:
Incrementality testing can be scary. Not all brands are necessarily willing to risk 50% of their audience not seeing their ads throughout the testing period. And while DMA curation allows marketers to adjust their holdout/experimental group breakdown, there will be tradeoffs. Brands that don’t want to do an even 50/50 split and potentially want to adjust their experimental group to 60 or even 70 percent of the test, could risk encountering statistical insignificance as there isn’t enough data to compare within the holdout group. Alternatively, some brands may be more bullish to prove the effectiveness of their ads on driving incremental revenue, and could skew their holdout group to 60-70%. This would likely provide greater statistical significance and better help assess how ads are potentially cannibalizing sales.
Introducing Gigi’s approach to Incrementality
At Gigi, we’re addressing this demand for transparent and controllable incrementality testing with our geo-based tests built specifically for Amazon advertisers. No more black box measurement and no more guesswork. Once DMA grouping parameters are confirmed, Gigi’s proprietary DMA curation engine builds holdout and experimental groups, while our DMA exclusion ensures all agreed-upon DMAs are removed from all STV campaigns. This is built directly into the Amazon DSP and is automated when building audience segmentation at a line item level—helping eliminate manual effort and time. Plus, via data collaboration, marketers can use their 1P data for their DMA creation and measurement models, allowing them to measure the incremental omnichannel impact of their STV ads with robust reporting on iROAS, lift in revenue, incremental lift percentages, and detailed statistical analysis.