Blog/Data Analysis

Data Analysis

Practical deep-dives into causal inference, experiment design, and predictive modeling. From A/B testing methodology to LTV prediction and ad auction mechanics — the core topics every data analyst needs to master.

TWFE with Product Color × Search Query Interactions

11 min read2025-01-20

Learn how to apply Two-Way Fixed Effects (TWFE) models to estimate interaction effects between product color and search queries. We cover the bias pitfalls of TWFE under staggered adoption and practical strategies for robust causal estimation in e-commerce settings.

Causal InferenceTWFEDiDSearch

Read article →

Sequential A/B Testing and Alpha Spending

10 min read2025-01-15

Running experiments without a fixed sample size inflates Type I error rates. This article explains how alpha spending functions keep false positive rates under control, using a practical two-look scenario with interim analyses spaced one week apart.

A/B TestingSequential TestingAlpha SpendingStatistics

Read article →

Multiple Hypothesis Testing: Bonferroni and FDR

9 min read2025-01-10

When you test many metrics simultaneously, your Family-Wise Error Rate explodes. This article compares Bonferroni correction and Benjamini-Hochberg FDR control, with concrete guidance on which to apply in recommendation and advertising experiments.

Multiple TestingBonferroniFDRStatistics

Read article →

Experiment Window Design: How Long Should You Run a Test?

8 min read2025-01-05

Experiment duration affects result validity more than most practitioners realize. We explore novelty effects, seasonality, and user learning curves, then lay out statistical and business criteria for choosing the right experiment window.

Experiment DesignA/B TestingNovelty EffectPower Analysis

Read article →

Stratified Sampling in Practice: 4 Features, 2⁴ Strata

10 min read2024-12-28

With 1,000 users and 4 binary features, you can construct up to 16 strata — but should you? This article walks through the variance reduction benefits of stratified sampling and the sparsity trade-offs that emerge when strata become too granular.

SamplingExperiment DesignVariance ReductionStatistics

Read article →

CUPED Is Equivalent to the Frisch-Waugh-Lovell Theorem

12 min read2024-12-20

CUPED (Controlled-experiment Using Pre-Experiment Data) is widely used to reduce variance in A/B tests, but its theoretical foundation is rarely discussed. This article formally shows its equivalence to the Frisch-Waugh-Lovell theorem and explains why covariate adjustment preserves unbiasedness while shrinking variance.

CUPEDFWL TheoremVariance ReductionEconometrics

Read article →

LTV Prediction: Surrogate Index vs. Deep Learning Approaches

13 min read2024-12-12

Predicting long-term LTV from short-term behavioral signals is one of the hardest problems in growth analytics. We compare the causal surrogate index framework from Athey et al. (2025) with deep learning approaches from Google researchers, highlighting when each method is appropriate.

LTVSurrogate IndexCausal InferencePrediction

Read article →

CRM and Uplift Modeling: Finding Users Most Likely to Convert

11 min read2024-12-05

Predicting who will purchase is not the same as predicting who will purchase because of your campaign. This article explains the difference between response modeling and uplift modeling (CATE estimation), compares T-Learner, S-Learner, and X-Learner architectures, and shows how to maximize ROI on CRM interventions like retention coupons.

CRMUplift ModelingCATEMarketing

Read article →

Ghost Bidding: Counterfactual Estimation in Ad Auctions

10 min read2024-11-28

Ghost bidding simulates counterfactual ad exposures for losing bids, enabling causal measurement of ad effectiveness without the interference problems that plague standard A/B tests. We explain the mechanics of ghost bidding within auction systems and compare it to holdout-based experimental designs.

Ad AuctionGhost BiddingCausal InferenceAdvertising

Read article →