Avoid These 5 AR Pitfalls: Solve Problems Without Wasting Budget

Every dollar spent on epidemiological research should bring us closer to understanding a health problem. Yet too many projects burn through budget chasing questions that were never well-posed, collecting data that cannot answer them, or analyzing results in ways that obscure rather than clarify. This guide names the five most common pitfalls we see in applied research—and shows how to avoid them without inflating your budget.

We write from the perspective of practitioners who have watched teams struggle with these same issues. The advice is grounded in everyday epidemiology: outbreak investigations, cohort studies, and program evaluations where resources are limited and decisions must be made quickly. If you are designing a study, reviewing a protocol, or trying to stretch a grant further, these lessons will help you spend where it counts.

1. Why This Topic Matters Now

Epidemiology is under pressure to deliver faster answers with smaller budgets. Public health emergencies, shrinking grants, and the expectation of real-time evidence mean that every research dollar must count. Yet many teams repeat the same costly mistakes: chasing hypotheses that are too broad, collecting data that cannot be analyzed, or ignoring confounding until it is too late.

The stakes are high. A poorly designed study not only wastes money but can mislead policy, delay interventions, or erode public trust. In contrast, a well-scoped project—even with modest funding—can produce clear, actionable findings. The difference often comes down to avoiding a handful of predictable errors.

We have seen projects fail because the research question was not specific enough to guide data collection. We have watched teams collect hundreds of variables without a clear analysis plan, then struggle to interpret results. And we have observed studies that ignored known confounders, producing associations that later dissolved under scrutiny. Each of these mistakes has a common root: a gap between the problem we want to solve and the methods we actually use.

This article is not about theoretical perfection. It is about practical decisions that save time, money, and credibility. The five pitfalls we cover are the ones we encounter most often in field epidemiology—and the ones that are easiest to fix with upfront planning.

Who this is for: Epidemiologists, public health students, program evaluators, and anyone who commissions or reviews research. If you have ever wondered why a study cost more than expected or delivered less than promised, these pages will help you diagnose the problem.

2. Core Idea in Plain Language

At its heart, good epidemiology is about asking the right question and then choosing a method that can answer it honestly. The core idea of this guide is that most budget waste comes from a mismatch between the question and the design—not from technical incompetence.

Think of it this way: if you want to know whether a new vaccine reduces hospitalizations, you need a study that compares vaccinated and unvaccinated groups while controlling for differences in age, underlying conditions, and exposure risk. If instead you simply track hospitalizations among vaccinated people, you have no comparison group—and no answer. That mismatch is pitfall number one: an ill-defined research question that cannot be answered by the data you plan to collect.

The other four pitfalls follow the same logic. Pitfall two is over-collecting data without a plan—gathering variables because they might be interesting rather than because they are needed to test a specific hypothesis. This inflates costs, complicates analysis, and often leads to post-hoc fishing expeditions that produce false positives.

Pitfall three is ignoring confounding during design. Many teams collect data on confounders but fail to account for them in the analysis, or they rely on statistical adjustment alone when matching or restriction would have been more efficient. The result is a study that cannot convincingly separate cause from association.

Pitfall four is using convenience samples when a more systematic approach is feasible. Convenience samples are fast and cheap, but they often introduce selection bias that cannot be corrected later. The budget saved on sampling is then spent on complex statistical models that still fail to produce generalizable results.

Pitfall five is poor communication of findings—either overstating conclusions or burying them in jargon. Research that cannot be understood by decision-makers is research that has no impact, regardless of how well it was designed. The budget spent on data collection and analysis is wasted if the results do not inform action.

These five pitfalls are interconnected. Fixing one often helps avoid others. For example, a well-defined question (pitfall one) naturally limits the variables you need to collect (pitfall two) and clarifies which confounders must be measured (pitfall three). The key is to invest time in planning before you collect any data.

3. How It Works Under the Hood

Avoiding these pitfalls requires a systematic approach to study design. We break it down into four steps that mirror the typical workflow of an epidemiological project.

Step 1: Define the research question with precision

Start by writing the question in a single sentence. Use the PICOT framework (Population, Intervention, Comparison, Outcome, Time) to ensure completeness. For example: 'Among adults aged 50+ in rural Zambia, does a mobile health intervention compared to standard care reduce time to tuberculosis diagnosis within six months?' This question tells you exactly who to study, what to compare, what outcome to measure, and over what period. Any question that cannot be stated this clearly is likely too vague to guide data collection.

Once the question is written, ask: can this be answered with the resources available? If not, narrow the scope. A smaller, well-answered question is more useful than a large, ambiguous one.

Step 2: Map the causal pathway

Draw a directed acyclic graph (DAG) showing the hypothesized relationship between exposure and outcome, including known confounders, mediators, and colliders. This step forces you to think about which variables are essential to measure and which are not. It also identifies variables that should not be adjusted for (colliders) because doing so would introduce bias.

For example, if you are studying the effect of air pollution on asthma exacerbations, a DAG would show that socioeconomic status confounds the relationship (poorer neighborhoods have higher pollution and worse health access). It would also show that medication adherence might be a mediator—adjusting for it would block part of the effect you want to estimate. Without a DAG, teams often adjust for everything, which can distort results.

Step 3: Choose a design that fits the question and context

Different questions require different designs. A table can help compare options:

Design	Best for	Common pitfall
Cohort study	Incidence, multiple outcomes	Loss to follow-up, confounding
Case-control study	Rare outcomes, efficient	Recall bias, selection of controls
Cross-sectional study	Prevalence, associations	Temporal ambiguity
Randomized trial	Causal inference	Cost, generalizability

Choose the simplest design that can answer the question. If a cross-sectional study can provide enough evidence for a decision, do not default to a cohort. Simpler designs are cheaper and faster, and they often suffer from fewer operational problems.

Step 4: Plan the analysis before collecting data

Write a statistical analysis plan (SAP) that specifies primary and secondary analyses, subgroup analyses, and sensitivity analyses. The SAP should include the models you will fit, the covariates you will adjust for, and how you will handle missing data. This prevents the temptation to 'try different models until something is significant'—a practice that inflates false-positive rates.

An SAP also forces you to estimate sample size requirements. If the required sample is larger than you can afford, you need to revisit the question or design before spending money on data collection.

4. Worked Example or Walkthrough

Let us apply these steps to a realistic scenario: a local health department wants to know whether a new community health worker program reduces diabetes-related emergency department visits among adults with type 2 diabetes in a low-income urban neighborhood.

Step 1: Define the question

The team writes: 'Among adults aged 18+ with type 2 diabetes in the Southside district, does enrollment in the Community Health Worker (CHW) program compared to usual care reduce the rate of emergency department visits for hyperglycemia or hypoglycemia within 12 months?' This is specific: population (Southside adults with diabetes), intervention (CHW program), comparison (usual care, i.e., no CHW), outcome (ED visits for glycemic emergencies), time (12 months).

Step 2: Map the causal pathway

The team draws a DAG. They identify confounders: age, diabetes duration, baseline HbA1c, health literacy, and insurance status. They also identify a potential mediator: medication adherence. They note that adjusting for adherence would block part of the program's effect (if the program works partly by improving adherence), so they decide not to include it as a covariate in the primary analysis. They also identify a collider: referral to a specialist (which might be affected by both the program and the outcome). Adjusting for specialist referral would introduce bias, so they exclude it.

Step 3: Choose a design

Randomization is not feasible because the program is already being rolled out neighborhood-wide. The team considers a cohort study comparing enrolled vs. non-enrolled residents. But they worry about selection bias: people who enroll may be more motivated and healthier. They decide on a difference-in-differences design, comparing changes in ED visit rates before and after program implementation in Southside versus a similar comparison neighborhood. This controls for time-invariant confounders and common trends.

Step 4: Plan the analysis

The SAP specifies a Poisson regression model with an interaction term for time (pre/post) and group (Southside vs. comparison). Covariates include age, sex, baseline HbA1c, and insurance status. Sensitivity analyses will adjust for multiple comparisons and test for parallel trends in the pre-period. The sample size calculation shows that with 6 months of pre- and post-data, they have 80% power to detect a 20% reduction in ED visits.

By following these steps, the team avoids pitfall one (vague question), pitfall two (collecting unnecessary variables—they only collect confounders identified in the DAG), and pitfall three (ignoring confounding—they adjust for known confounders and avoid colliders). They also avoid pitfall four (convenience sample) by using a systematic comparison neighborhood. The remaining risk is pitfall five (poor communication), which they address by planning a plain-language summary for policymakers alongside the technical report.

5. Edge Cases and Exceptions

No set of guidelines covers every situation. Here are common edge cases where the standard advice may need adjustment.

When the question is exploratory

If you are generating hypotheses rather than testing them, a broader question may be acceptable. However, even exploratory work should have a focused aim. For example, 'What factors are associated with delayed diagnosis of Lyme disease in New England?' is broad but still guides data collection toward symptoms, healthcare access, and diagnostic testing. Avoid 'Let's collect everything and see what we find.' That approach wastes budget and produces unreliable results.

When data are already collected

Sometimes you inherit a dataset that was not designed for your question. In that case, you cannot go back and add variables. The solution is to reframe the question to fit the data, not to force the data to answer a question they cannot. For example, if the dataset lacks information on confounders, consider whether the question can be answered using a different design (e.g., a self-controlled case series) or whether you need to acknowledge the limitation upfront.

When confounding is unavoidable

In some settings, key confounders are unmeasured (e.g., genetic predisposition, health literacy). You can still proceed, but you must use methods like instrumental variables, negative controls, or sensitivity analyses to bound the potential bias. These methods require strong assumptions and should be specified in the SAP. Do not pretend the confounding does not exist; that is pitfall three in disguise.

When the sample is inherently convenience-based

In outbreak investigations, you often have no choice but to sample whoever is available. The key is to acknowledge the limitations and be cautious about generalizing. For example, if you investigate a foodborne outbreak by interviewing people who reported illness to the health department, you know that mild cases are underrepresented. Your findings apply only to the subset of cases that seek care. Do not claim your sample is representative without stating the selection mechanism.

When communicating to non-specialists

Pitfall five is especially tricky when the audience includes journalists, policymakers, or community members. The solution is to prepare two versions of your findings: a technical appendix with full statistical details and a plain-language summary with key messages, limitations, and confidence intervals expressed in everyday terms (e.g., 'the reduction could be as small as 2% or as large as 15%'). Avoid categorical statements like 'the program works' without quantifying uncertainty.

These edge cases remind us that guidelines are tools, not rules. The goal is to make thoughtful trade-offs, not to follow a checklist blindly.

6. Limits of the Approach

The framework we have described works well for many applied research questions, but it has limits. Acknowledging them helps you decide when to adapt or supplement the approach.

It assumes you have time to plan

In acute outbreak investigations, you may need to start data collection within hours. In such cases, you cannot spend days refining the question or drawing DAGs. The solution is to have pre-prepared templates and protocols for common scenarios (e.g., foodborne illness, respiratory outbreaks). These templates embed the planning steps so that you can execute them quickly. Even in emergencies, however, take five minutes to write down the core question and the key confounders—it will save time later.

It does not eliminate bias entirely

No study design can eliminate all bias. The goal is to reduce bias to a level where it does not change the practical conclusion. Sensitivity analyses can help you assess how robust your findings are to unmeasured confounding or selection bias. If the conclusions change under plausible assumptions, you need to collect more data or adjust your interpretation.

It requires domain knowledge

Drawing a valid DAG requires understanding the subject matter. If you are new to a disease or population, consult with clinicians or local experts. A DAG drawn by someone who does not know the field may miss important confounders or include irrelevant variables. Budget for expert input early—it is cheaper than redoing the analysis later.

It may not fit complex systems

When outcomes are influenced by many interacting factors (e.g., social determinants, policy changes, individual behavior), a simple causal diagram may be insufficient. In such cases, consider systems epidemiology approaches like agent-based modeling or structural equation modeling. These methods require more data and expertise, but they can capture feedback loops and nonlinearities that traditional designs miss.

It does not guarantee impact

Even a perfectly designed study can fail to influence policy if the results are not communicated at the right time to the right audience. Pitfall five is not just about writing clearly—it is about engaging stakeholders throughout the research process. Involve decision-makers when defining the question, so they have ownership of the findings. Present preliminary results in forums where they can ask questions and shape the final interpretation.

Despite these limits, the five-pitfall framework provides a practical starting point for most epidemiological projects. By avoiding the most common errors, you increase the chance that your research will be valid, useful, and worth the investment.

Your next steps:

Before your next study, write a one-sentence research question and check it against the PICOT criteria.
Draw a DAG for the hypothesized relationship—include at least three confounders and one potential collider.
Write a statistical analysis plan before collecting any data, specifying primary and sensitivity analyses.
Prepare a plain-language summary of your findings alongside the technical report.
Review the five pitfalls with your team before finalizing the study protocol. Discuss which ones pose the greatest risk for your specific project.

By taking these steps, you will spend less time fixing problems after data collection and more time solving the health problems that matter.

Avoid These 5 AR Pitfalls: Solve Problems Without Wasting Budget

Table of Contents

1. Why This Topic Matters Now

2. Core Idea in Plain Language

3. How It Works Under the Hood

Step 1: Define the research question with precision

Step 2: Map the causal pathway

Step 3: Choose a design that fits the question and context

Step 4: Plan the analysis before collecting data

4. Worked Example or Walkthrough

Step 1: Define the question

Step 2: Map the causal pathway

Step 3: Choose a design

Step 4: Plan the analysis

5. Edge Cases and Exceptions

When the question is exploratory

When data are already collected

When confounding is unavoidable

When the sample is inherently convenience-based

When communicating to non-specialists

6. Limits of the Approach

It assumes you have time to plan

It does not eliminate bias entirely

It requires domain knowledge

It may not fit complex systems

It does not guarantee impact

Comments (0)

Table of Contents

1. Why This Topic Matters Now

2. Core Idea in Plain Language

3. How It Works Under the Hood

Step 1: Define the research question with precision

Step 2: Map the causal pathway

Step 3: Choose a design that fits the question and context

Step 4: Plan the analysis before collecting data

4. Worked Example or Walkthrough

Step 1: Define the question

Step 2: Map the causal pathway

Step 3: Choose a design

Step 4: Plan the analysis

5. Edge Cases and Exceptions

When the question is exploratory

When data are already collected

When confounding is unavoidable

When the sample is inherently convenience-based

When communicating to non-specialists

6. Limits of the Approach

It assumes you have time to plan

It does not eliminate bias entirely

It requires domain knowledge

It may not fit complex systems

It does not guarantee impact

Share this article:

Comments (0)