How to find sample size for Case-control study when Exposure variable of study measured on Nominal scale.
In medical, epidemiological, and public health research, case-control studies are frequently employed to examine relationships between exposure and illness. They are especially helpful for researching uncommon results, newly developing illnesses, and long-term events. Choosing the right sample size is one of the most crucial steps in creating a trustworthy case-control study. The sample size calculation becomes particular and needs to be done carefully when the exposure variable is measured on a nominal scale, such as smoking status (yes/no), blood group, vaccination status, or occupational category.
Understanding Case–Control Studies:
A case–control research compares two groups:
Case Group : Persons with the outcome/disease.
Control Group: Individuals without the outcome/disease.
Researchers evaluate exposure history by looking back.
These designs are efficient, rapid, and cost-effective, especially for rare conditions.
Why Sample Size Matters in Case–Control Studies:
A carefully determined sample size guarantees:
1. Sufficient Statistical Power :
Power is the probability of detecting a true association. Studies with a small sample size are underpowered, which raises the possibility of Type II errors (false negatives).
2. Reliable Estimates of Association:
Measures like Odds Ratio (OR) become more accurate with appropriate sample sizes.
3. Generalizability:
Larger, properly calculated samples increase the likelihood that results represent the target population.
4. Ethical and Economic Balance:
Using too many participants wastes time and resources, while using too few makes results scientifically weak.
Why Nominal Exposure Variables Need Special Consideration:
Categories with no intrinsic order are represented by nominal variables. Examples consist of:
• Sex (male/female)
• Smoking status (yes/no)
• Blood Group (A/B/O/AB)
• Status of vaccination (vaccinated/unvaccinated)
• Type of occupation
• Risk factor ( Present / Absent )
When the exposure is nominal (especially binary categories like yes/no), sample size calculations rely on comparing proportions of exposure across cases and controls.
Sample Size formula for Case–Control Studies:
The sample size is calculated using following Formula :

Source: Lwanga SK, Lameshaw S. Sample size determaination in health studies. WHO, Geneava, 1991.
Where:
n = sample size.
Z1 = value for the selected alpha level (1.96 at 95% confidence level)
Z2 = value associated with selected Power (1.28 at 90% power of test)
P1 = Probability of exposure in cases
P2 = Probability of exposure in control
P = Average of P1 and P2
Key Parameters Required to Calculate Sample Size
To determine the sample size for a case–control study with nominal exposure, you need the following parameters:
1. Expected Proportion of Exposure among Case (P1): and Controls (P2):
This is the percentage of controls predicted to have the exposure.
Example: 30% lung cancer cases smokes and 10% of healthy individuals smoke.
If no prior study / Published literature provides this value, use either:
✔ Pilot data
✔ Expert opinion.
2. Two. Ratio of Expected Odds (OR)
This is the strength of association that the researcher expects to observe.
For instance, OR = 2.0 indicates that those who are exposed have a two fold increased risk of contracting the illness.
3. Confidence Level (Zα)
Typically 95% confidence level → Zα = 1.96.
4. Study Power (Zβ)
Common power values:
80% → Zβ = 0.84
90% → Zβ = 1.28
The necessary sample size rises with higher power.
5. Controls to Cases Ratio (r) :
Most usual ratio of cases and controls is 1:1.
Increasing ratio of to cases and controls 1:2 or 1:3 enhances power marginally but after 1:4 gives less benefit.
6. Adjusting for Non-Response :
Most case–control studies face 5–15% dropouts or missing data.
Example for calculation of sample size for case –control study:
If in a case –control study if you want to study an association between smoking and lung cancer. The Prevalence of smokers in lung cancer patient was 0.30 and prevalence of smoking in controls was 0.10. These prevalence are obtained from pilot study or literature. Here we considered 95% confidence level and 90% power of test. What will be the minimum sample size in case and control group.
| Parameter |
Description |
Value |
| P1 |
Probability of exposure in cases |
0.30 |
| P2 |
Probability of exposure in controls |
0.10 |
| P |
Average of P1 and P2 |
0.20 |
| Z1 |
Value for 95% confidence level |
1.96 |
| Z2 |
Value for 90% power of test |
1.28 |
By putting the values in above formula

If non-response rate = 10%:

Required sample size for study is 47 subjects in each group.
So we will enroll 47 subjects in Case group and Control Group for 1: 1 subject in Case group and Control Group.
If you want to enroll subject in Case group and Control Group as 1: 2 then you can enroll 47 subjects in case group and 94 patients in Control group.
Note: Sample size can be decrease by decreasing Poser of a study. Power of stud can be consider minimum up to 80%.
Practical Tips for Choosing Parameter Values for Case and Control study:
✔ Use previous studies to estimate p1 & p₂: Search literature for similar populations. If not available in literature then try to conduct pilot study and find the values of Proportion of Exposure among Case (P1): and Controls (P2) or take expert opinion values of Proportion of Exposure among Case (P1): and Controls (P2).
✔ Choose a realistic odds ratio: Avoid over estimating odds ratio (OR), which artificially reduces sample size.
✔ Use at least 80% power for study: Higher power → better scientific credibility.
✔ Keep a 1:1 ratio unless recruitment is difficult: Increasing control–case ratio beyond 1:4 gives minimal benefit.
✔ Account for missing data: Always add 10–15% inflation.
Statistical Software for Sample Size Calculation:
Following are the some user-friendly tools for calculation of sample size:
• OpenEpi: Free online software and widely used in medical research, WHO-supported, great for epidemiology.
• G*Power: Excellent for comparing proportions
• PASS: Professional-grade software
• Stata: Commands like power cc