Readme Driven Development for an A/B testing analysis package

Dec 11, 2023 a/b testing statistics tea-tasting python

tea-tasting: statistical analysis of A/B tests #

tea-tasting is a Python package for statistical analysis of A/B tests that features:

  • Student's t-test, Z-test, and Bootstrap out of the box.
  • Extensible API: Define and use statistical tests of your choice.
  • Delta method for ratio metrics.
  • Variance reduction with CUPED/CUPAC (also in combination with delta method for ratio metrics).
  • Fieller's confidence interval for percent change.
  • Sample ratio mismatch check.
  • Power analysis.
  • A/A tests.

Currently, tea-tasting is in the planning stage, and I'm starting with a README that outlines the proposed API — an approach known as Readme Driven Development (RDD).

In this blog post, I'll explain the motivation for creating this package and the benefits of the RDD approach.

Explore the README here.

I welcome your insights! If you have any suggestions, questions, or ideas for alternative designs, please join the conversation on GitHub.

Why a new package? #

One might wonder why there's a need for a new package like tea-tasting when there are already SciPy and statsmodels available. Both of these are general-purpose statistical packages offering a wide range of statistical tests. However, they lack certain methods that are specific for A/B test analysis.

tea-tasting aims to fill this gap. For example, it includes methods such as the delta method for ratio metrics, CUPED/CUPAC for variance reduction, and methods for calculating confidence intervals for percentage changes.

The intention behind tea-tasting is not only to automate the analysis process but also to reduce the probability of making a mistake. Statistical analysis of A/B tests can be challenging, with many opportunities for errors. tea-tasting is designed to mitigate these risks.

Many companies develop in-house platforms for A/B testing, but these platforms often don't cover every scenario, such as non-standard randomization units used in cluster-level or switchback experiments, or the analysis of new metrics not yet integrated into the platform. tea-tasting is also useful in these cases.

In summary, tea-tasting is designed for a more specialized, error-minimizing approach to A/B testing analysis, filling in the gaps left by existing general-purpose statistical packages.

Package name #

The package name "tea-tasting" is a play of words which refers to two subjects:

  • Lady tasting tea is a famous experiment which was devised by Ronald Fisher. In this experiment, Fisher developed the null hypothesis significance testing framework to analyze a lady's claim that she could discern whether the tea or the milk was added first to a cup.
  • "tea-tasting" phonetically resembles "t-testing" or Student's t-test, a statistical test developed by William Gosset.
Lady tasting tea by DALL-E

Lady tasting tea by DALL-E

Readme Driven Development #

Readme Driven Development (RDD) is a software development approach where you start by writing the README file first, before writing any code. The idea is similar to Amazon's Working Backwards approach, where a press release is drafted before starting a new product.

Writing README encourages thinking from a user's perspective, focusing on how users will interact with the API. It allows for better project planning. You don't need to revise the code every time you decide to include something in the public API.

Don't confuse RDD with the Waterfall approach. It's not about creating exhaustive specifications or detailed documentation upfront. In fact, with tea-tasting, I might have gone a bit deeper into details than usual, but this was a conscious choice to thoroughly plan the package's functionality.

To learn more about RDD, I recommend reading the Tom Preston-Werner's blog post.

Request for feedback #

One of the key benefits of Readme Driven Development (RDD) is the ability to adapt design choices with minimal cost. After reviewing the README, you might have insights that could enhance this project. It could be anything from improved naming conventions and additional features to an entirely different API approach.

Your perspective is valuable! If the concept of this package interests you and you have ideas for its design, I encourage you to share your thoughts on Github. Let's collaborate to make tea-tasting as effective and user-friendly as possible.

P.S. Can ChatGPT write a Python package? #

Tom Preston-Werner wrote his blog post in 2010. There were no LLMs at that moment. Thinking about RDD in 2023 I couldn't resist asking ChatGPT to write the package code based on README.

Well, no, not here yet. ChatGPT resisted performing a large task at once. After I insisted, ChatGPT started implementing it class by class. In the first class it started hallucinating on SciPy API. I gave up shortly.

For those interested in how this experiment unfolded, you can view the entire conversation here: chat.

© Evgeny Ivanov 2023