BOIN vs 3+3: A Practitioner's View After Running Both

Dose escalation trial design comparison chart showing BOIN and 3+3 decision rules

Dose escalation design debates have been ongoing since the 3+3 rule's first formalization in the 1990s. Bayesian optimal interval (BOIN) designs entered widespread discussion after Yuan and colleagues published the framework in 2016, and uptake among academic oncology centers has been substantial since then. The conversation in most clinical pharmacology circles has become somewhat one-sided: BOIN is statistically better, therefore BOIN should be used.

The statistical argument is largely correct. But statistical superiority in simulation does not automatically translate to operational superiority in specific trial contexts. This article examines both designs from the perspective of a clinical team that has run multiple Phase I oncology studies with each, focusing on the practical trade-offs that simulation studies cannot fully capture.

What the 3+3 Actually Is - and Why It Persists

The 3+3 design is a rule-based algorithm, not a statistical model. Cohorts of three patients are enrolled at a given dose level. If zero out of three patients experience a dose-limiting toxicity (DLT), the next cohort is enrolled at the next dose level. If one out of three patients has a DLT, the cohort is expanded to six patients total. If two or more DLTs occur in a cohort, dose escalation stops and that level is declared above the maximum tolerated dose (MTD).

The 3+3's persistence is not ignorance of its statistical limitations. Those limitations are real and documented: the design has approximately 25-30% probability of selecting a dose with a true DLT rate of 33% as the MTD, and it tends to enroll disproportionate numbers of patients at low, subtherapeutic dose levels. These are serious limitations in a context where patient exposures and trial efficiency both matter.

The design persists because it is transparent. An institutional review board member, a referring oncologist, or a patient can understand the stopping rules without a statistical primer. Regulatory reviewers recognize it and know how to evaluate it in an IND. Protocol deviations are easy to identify because the rules are simple and binary. In a multi-site trial where the site coordinators change every 18 months and the PI is covering three protocols simultaneously, transparency has operational value.

BOIN: What the Statistics Buy You

BOIN targets a prespecified target DLT rate (typically 0.25 or 0.30) and uses Bayesian optimal interval calculations to determine whether to escalate, stay, or de-escalate after each cohort. The escalation and de-escalation boundaries are precomputed and presented as simple lookup tables - in this sense BOIN tries to offer BOIN's statistical properties within a structure that looks similar to rule-based designs operationally.

The advantages in simulation are consistent. BOIN correctly identifies the MTD (within one dose level) approximately 15-20% more often than 3+3 in standard simulation scenarios. It enrolls fewer patients at doses far below the MTD in scenarios where the true MTD is in the upper portion of the dose range. When designed for a target DLT rate of 0.25, it assigns approximately 50% of patients to doses within one level of the MTD versus 35-40% for 3+3.

More importantly for Phase I studies where PK data are collected, BOIN integrates cleanly with exposure-response modeling. The BOIN-ET variant allows joint use of DLT observations and PK or biomarker data in escalation decisions, which is the framework described in the ROMI design and similar exposure-informed approaches. For agents where the PK-toxicity relationship is already partially characterized from preclinical or prior clinical work, this integration has genuine scientific value.

The DLT Window Problem That Both Designs Share

Neither BOIN nor 3+3 fully solves the problem that their DLT assessment window was designed around targeted therapies with rapid-onset toxicity. Standard DLT windows are 21 or 28 days. This is adequate for immunotherapy-related adverse events and tyrosine kinase inhibitor toxicities, which typically manifest within the first cycle.

For myelosuppressive regimens, the nadir of hematologic toxicity may not occur until day 12-18, with the secondary nadir sometimes appearing in cycle 2. A patient with a cycle-1 nadir ANC of 0.3 x10^9/L on day 15 meets criteria for grade 4 febrile neutropenia under CTCAE v5.0, but only if the febrile event occurred and was documented before the DLT window closed. If the window closed at day 21 and the patient's only fever is on day 20, the documentation margin is thin. Both BOIN and 3+3 protocols should specify a secondary DLT assessment at cycle 2, day 1, for myelosuppressive agents - a point we cover in more depth in our article on the DLT window problem in myelosuppressive regimens.

Accrual Rate Is the Deciding Practical Factor

The most underappreciated factor in design selection is accrual rate - specifically, whether your trial can maintain cohort-by-cohort synchrony. The 3+3 and BOIN designs both, in their standard formulations, enroll cohorts sequentially. You enroll three patients, wait for all three to complete their DLT window, then make the next escalation decision. If accrual is rapid and patients complete their DLT window before the next cohort is ready to enroll, this is not a constraint. If accrual is slow - common in rare tumor types, pediatric trials, or sites with tight eligibility criteria - the between-cohort waiting time dominates trial duration and the design choice matters less than the enrollment problem.

The accelerated titration variant of 3+3 and the BOIN design with rolling enrollment extension (BOIN12) both address this. BOIN12 allows enrollment to continue while DLT data are pending, similar to the TITE-CRM approach, and is worth considering for any trial where accrual is expected to be brisk and the DLT window is long.

Regulatory Acceptance and IND Review

In our experience, FDA reviewers in CDER and CDER oncology groups are comfortable with BOIN and have reviewed multiple BOIN-based IND applications since 2018. The key requirement is that the dose-escalation algorithm be fully prespecified in the protocol and statistical analysis plan, with the decision boundaries presented as lookup tables so there is no ambiguity in how escalation decisions will be made in real time.

The BOIN algorithm itself does not require a statistician on-call to make each escalation decision - the precomputed tables handle that. What BOIN does require is a data review committee process with defined roles: who calls the DLT review, who applies the table, and who signs off on the escalation recommendation before the next cohort opens. For 3+3, this process is often informal because the decision rules are so simple. For BOIN, formalizing it is important for regulatory documentation purposes.

When We Would Choose 3+3 Today

Three situations favor 3+3 despite its statistical limitations. First: academic investigator-initiated trials with limited biostatistics support, where the BOIN implementation must be managed by the PI and the study coordinator without dedicated statistical oversight. Second: combination regimens where one component has an established toxicity profile and the design is really answering a narrow question about tolerability of adding the new agent. The statistical precision of BOIN matters most when the dose-toxicity relationship is genuinely unknown. Third: trials where the sponsor or funder has a preference for the simpler design and the PK data will provide the exposure characterization that the dose-escalation design cannot.

When We Would Choose BOIN

BOIN is the better choice when the target DLT rate is well-defined in the protocol and the team can prespecify the decision boundaries before IND submission. It is essential when the design will use exposure data alongside DLT data (BOIN-ET) and when the sponsor expects a precise MTD estimate rather than simply a recommended Phase II dose derived from the observed safety data. For first-in-human studies of novel mechanisms in tumor types with high unmet need, the improved probability of identifying the true MTD is scientifically and ethically important.

The two designs are not interchangeable, and the decision should be made jointly by the clinical pharmacologist, the biostatistician, and the principal investigator before protocol finalization - not treated as a template selection in the protocol-writing phase. Both designs benefit from integration with an exposure monitoring system that tracks each patient's PK during the DLT window, so that dose escalation decisions can be informed by actual drug concentrations rather than dose levels alone.

Conclusion: Design Selection Is a Clinical Decision, Not a Statistics Decision

BOIN is statistically superior to 3+3 in most simulation scenarios. The argument for 3+3 is operational, not statistical. Making the right choice requires an honest assessment of the trial's accrual environment, the team's biostatistics infrastructure, and the specific scientific question the dose-escalation phase is intended to answer.

Neither design answers the exposure question without PK data. Both designs benefit from real-time AUC monitoring that flags patients with unusually high or low exposure during their DLT window. DoseMind integrates with both design frameworks to provide per-patient exposure data alongside the toxicity observations that drive escalation decisions. Reach out at [email protected] to discuss how PK monitoring fits into your dose-escalation study design.