When Shopify introduced Rollouts, a lot of merchants got excited. They saw a built-in way to stage theme changes gradually, monitor performance, and roll back if something went wrong, all without a third-party app or dev tickets. For teams that have been waiting on this kind of infrastructure, the reaction made sense.
Rollouts is genuinely useful, but it's not really new. It's closer to a rebadge of Shopify Launch, a feature Shopify had experimented with and shelved, now reintroduced in a more polished form. That's a good thing for merchants; native deployment controls are the kind of infrastructure Shopify should've had in place long ago. The issue is how Rollouts is being talked about outside of Shopify. Shopify itself hasn’t labeled it an A/B testing tool. But that hasn't stopped agencies, app stores, and merchant communities from using the phrase anyway, and once "built-in A/B testing" starts circulating as a description, CRO decisions start to follow.
Rollouts is useful for deployment management, not experimentation, and the gap between those two jobs is wider than most brands realize before they've already acted on Rollout data.
What Shopify Rollouts does well
Rollouts is a controlled theme deployment and traffic-splitting tool built into Shopify Admin. You set a percentage of traffic to see a new theme variant, choose a launch date, and monitor how that variant performs against your control before committing to a full rollout.

For what it does, it works well:
- Side-by-side control and variant comparison. Rollouts shows performance for both the control and variant in the same dashboard. That's a meaningful baseline for theme testing inside Shopify.
- Funnel impact reporting. It reports which variant influenced orders, add-to-cart rate, and reached checkout rate, so you can see where the variant moved the needle.
- Confidence intervals on behavioral metrics. Conversion rate, add-to-cart, reached checkout, and bounce rate all show confidence intervals in the dashboard.
- Autopublish on winning variants. Rollouts can promote the winning variant automatically, which removes some of the manual decision-making at the end of a test.
- Rollback and scheduling controls. You can schedule the launch, pause mid-test, and roll back without breaking your storefront.
For a merchant pushing a meaningful visual update to their published theme and wanting to validate that it didn't tank conversion, that's a useful set of capabilities. The question is what happens when the testing program needs to do more than that.
Where Rollouts runs out
The data Rollouts produces is real. What's narrow is the scope of what it can test and the operating model around how those tests can run. Here's where the constraints start to matter.
It only tests visual-layer theme changes.
Rollouts is scoped to changes made through the theme editor: layout, sections, content, basic visual edits. Liquid template changes, theme settings, and app embeds are out of scope. For brands with sophisticated theme development, the most impactful changes are often the ones that touch the logic layer, things like metafield-driven layouts, conditional content rules, or custom sections. Those are the changes you most want to test, and they're exactly what Rollouts can't measure.
It only tests theme variants. No price, URL, or template tests.
Theme A/B testing is one type of CRO test, but it isn't the whole program. Price testing, URL split testing, template-level testing, and targeted element tests all sit outside what Rollouts can do. For brands trying to validate pricing changes, test landing page variants, or run a focused experiment on a single template like a PDP, Rollouts isn't the right tool. It isn't built to be.
It only works on the currently published theme.
The variant in a Rollout has to be created inside the rollout flow itself, derived from the live published theme. Draft themes, vintage themes, and any theme version your team has been building separately can't be used as a baseline. That blocks pre-launch testing entirely and forces brands to rebuild work they've already done.
Tests and deployments don't coexist.
If you have an active rollout running and you need to push a theme update or hotfix, you have to archive the rollout first. The test ends. For any brand doing regular theme updates or running parallel development, that's a structural conflict. You're either testing or shipping, not both. Active CRO programs and active theme development can't share the same calendar.
No mutual exclusion across tests.
A high-velocity CRO program eventually needs to run multiple tests concurrently while making sure visitors aren't being exposed to overlapping experiments. That requires mutual exclusion logic, which Rollouts doesn't provide. Effectively, it's one test at a time per storefront.
Revenue-side metrics don't get statistical framing.
Confidence intervals show up on conversion rate, add-to-cart, reached checkout, and bounce rate. They don't show up on AOV, gross sales, or total revenue. For brands where the test hypothesis is fundamentally about revenue impact, the metrics that matter most are the ones reported without statistical context.
Targeting, segmentation, and segmented reporting are unclear or limited.
A serious experimentation program needs to be able to target specific audiences (new vs. returning visitors, geographies, traffic sources, device types) and break results down by segment after the test ends. Public documentation on what Rollouts supports here is thin. For brands whose hypotheses are segment-specific, that's a question worth asking before relying on Rollouts as the primary testing tool.
Localized storefronts aren't supported.
International brands running localized storefronts through Markets can't run Rollouts on those localized versions. That's a real constraint for any DTC brand operating across multiple geographies with different content, pricing, or merchandising strategies per market.
What full Shopify experimentation looks like
When you run a full CRO program on a purpose-built Shopify A/B testing platform, the scope expands in a few directions at once.

The variant doesn't have to be built inside the test flow. Any theme, draft or published, can serve as a control or a variant. Pre-launch testing, vintage theme comparisons, and reusing existing development work are all possible.
The scope of what's testable extends past the visual layer and past full-theme tests. Liquid template changes, theme settings, app embed behavior, individual templates, URLs, and prices are all in scope. The logic-layer changes that drive the largest impact for sophisticated brands are testable rather than guessed at.
Tests and deployments can run in parallel. Pushing a theme update or hotfix doesn't require ending an active test. That keeps CRO velocity from competing with development velocity.
Multiple tests can run concurrently with mutual exclusion. Visitors are kept inside a single test at a time, so results stay clean even when the testing calendar is full.
Statistical framing extends across all key metrics, including revenue-side ones. AOV, gross sales, and total revenue carry the same confidence and significance signals as conversion rate, so the decisions that hinge on revenue impact are backed by the same statistical rigor as the rest.
Targeting and segmented reporting are first-class. Tests can be scoped to specific audience segments, and results can be broken down by segment after the fact, which is how brands learn whether a winning variant actually won for the audience they care about.
Hypothesis generation is built in. LiftAssist analyzes shopper behavior and surfaces test ideas grounded in observed friction points, which means the program isn't bottlenecked on someone's gut feel about what to test next.
Using them for what they're each built for
Rollouts makes sense when your question is: did this visual theme change deploy without breaking anything, and is it moving funnel metrics in the right direction? It's a controlled deployment tool with real testing capability inside its scope. Use it for visual updates to your published theme when you don't have active development running in parallel.

A dedicated shopify a/b testing platform makes sense when your testing program needs to extend past visual-layer changes on the live theme. Whether you're testing Liquid logic, evaluating a draft theme before launch, running shopify split testing on prices or URLs, validating hypotheses with revenue-side statistical confidence, or running multiple tests concurrently, the scope and operating model need to match the program.
Rollouts isn't the wrong tool because it's poorly built. It's the wrong tool for sophisticated CRO programs because it was built for a narrower job.
Our assessment
Shopify shipping Rollouts is a net positive for the ecosystem. Native theme testing has been a real gap for years, and having it available inside Shopify Admin raises the baseline for every merchant on the platform. It's also a good signal: Shopify is investing in the infrastructure around storefront experimentation, and that's directionally good for brands that care about CRO.
What Rollouts isn't, and what Shopify has been careful not to claim it is, is a complete experimentation platform. It's a scoped tool that works well inside its scope. The further your testing program moves past visual changes on a published theme, the faster you'll hit its ceiling.
If your program is heading there, reach out to get a demo of Shoplift.
Frequently asked questions
Q: When should I use Shopify Rollouts vs. a dedicated A/B testing platform?
A: Use Rollouts when you're staging a visual theme change on your published storefront and want a controlled deployment with funnel metrics. Use a dedicated A/B testing platform when your program needs to test outside the visual layer, run concurrent experiments, or validate revenue-side metrics across prices, URLs, and templates.
Q: Can Shopify Rollouts test prices, URLs, or templates?
A: No, Rollouts only tests full theme variants against each other. Price tests, URL split tests, and template-level tests fall outside its scope and require a CRO platform built for those test types.
Q: Can I test a draft theme with Shopify Rollouts before it goes live?
A: No, the variant in a Rollout has to be derived from your currently published theme and built inside the rollout flow. Pre-launch testing on a draft or vintage theme isn't supported.
Q: Does Shopify Rollouts work with localized storefronts?
A: No, Rollouts doesn't support testing on localized versions of a storefront. International brands running multiple locales through Markets need a CRO platform with localized testing built in.
Q: Can I run multiple A/B tests at the same time with Shopify Rollouts?
A: Effectively no, Rollouts runs one test at a time per storefront and doesn't include mutual exclusion logic to keep visitors out of overlapping experiments. High-velocity CRO programs need concurrent testing with mutual exclusion to avoid contamination across tests.
.png)
