Statistics
~14 min read
- What: Statistics in NDA Maths covers measures of central tendency (mean, median, mode), measures of dispersion (variance, standard deviation), and bivariate analysis (correlation and regression).
- Why it matters: This topic has appeared consistently across every NDA paper since 2010, often contributing 4–8 questions per paper spanning straightforward calculations to conceptual statement-type questions.
- Key fact: Variance is independent of change of origin but NOT of change of scale — multiplying each observation by k multiplies the variance by k².
Statistics is one of the highest-yield topics in NDA Mathematics. Every paper carries questions on it — sometimes as many as 8 in a single sitting. The good news: the syllabus is focused. Master mean, median, mode, variance, standard deviation, and the basics of correlation and regression, and you cover almost every question type that appears.
This page walks you through every concept tested, shows you PYQ solutions step by step, and tells you exactly which question patterns to expect on exam day.
What This Topic Covers
Sub-topics in scope
- Measures of central tendency — arithmetic mean (simple and weighted), geometric mean, harmonic mean, median, mode
- Measures of dispersion — range, mean deviation, variance, standard deviation, coefficient of variation
- Frequency distributions — grouped and ungrouped data, class intervals, cumulative frequency, ogive
- Graphical representation — histogram, frequency polygon, ogive (less-than and more-than), pie chart
- Bivariate analysis — correlation coefficient, lines of regression (y on x, and x on y)
- Properties of measures — effect of change of origin and scale on mean, variance, and standard deviation
NDA questions fall into two broad types: calculation-based (find the mean of a dataset, compute the standard deviation) and concept/statement-based (decide which statements about variance or correlation are correct). Both types appear every year, so you need both computational speed and conceptual clarity.
The two ogive curves (less-than and more-than) intersect at the median. The abscissa of that intersection point gives the median value. This fact has appeared in multiple papers including 2015-I, 2017-I, and 2011-I.
Exam Pattern & Weightage
The table below is built from PYQ data. It shows how many Statistics questions appeared in each paper and which subtopics were tested.
| Year | Paper | No. | Key Subtopics Tested |
|---|---|---|---|
| 2010 | I & II | 7 | Standard deviation (shift by k), regression lines, median class, combined mean |
| 2011 | I & II | 8+ | Mean, median, mode, ogive, correlation coefficient, variance units, combined mean |
| 2012 | I & II | 7 | Mean shift, mode, variance scaling, regression lines, coefficient of variation |
| 2013 | I & II | 8 | Median (raw data), variance properties, regression coefficients, cumulative frequency |
| 2014 | I & II | 7 | Regression lines, combined SD, correlation coefficient, mean deviation, histogram |
| 2015 | I & II | 6 | Excluded observation, ogive intersection, geometric mean, regression coefficients |
| 2016 | I & II | 8 | Regression (y on x), variance formula, correlation coefficient from covariance |
| 2017 | I & II | 9 | Variance scaling, empirical relation, ogive median, regression equation, CV |
| 2018 | I | 6 | Correlation coefficient, median of raw data, regression lines intersection, pie chart |
Takeaway: Statistics consistently delivers 5–9 questions per paper. Correlation and regression, variance scaling, and median/ogive concepts are the most repeated subtopics.
Statement-type questions (decide which of the given statements is/are correct) make up roughly 40% of Statistics questions. These test conceptual knowledge — you rarely need to calculate anything. Know your properties cold.
Core Concepts
Arithmetic Mean
The arithmetic mean of \(n\) observations \(x_1, x_2, \ldots, x_n\) is the sum divided by \(n\). For a frequency distribution, multiply each value by its frequency, sum the products, then divide by the total frequency.
Effect of change of origin and scale: If every observation is increased by \(k\), the mean increases by \(k\). If every observation is multiplied by \(k\), the mean is multiplied by \(k\). The algebraic sum of deviations from the mean is always zero.
Median
For raw data, arrange in ascending order and pick the middle value. For a frequency distribution, find the median class using cumulative frequencies, then apply the interpolation formula.
where \(L\) = lower limit of median class, \(cf\) = cumulative frequency before median class, \(f\) = frequency of median class, \(h\) = class width
The less-than ogive and more-than ogive intersect at the median. The cumulative frequency curve is commonly called an ogive.
Mode
The mode is the value with the highest frequency in a dataset. For a frequency distribution, the modal class has the maximum frequency. In moderately asymmetric distributions, the empirical relation connects mean, median, and mode:
This 3-2-1 rule is a guaranteed one-mark NDA question: if any two of mean, median, and mode are given, the third can be obtained instantly.
The sum of absolute deviations \(\sum |x_i - M|\) is minimum when measured from the median, not the mean. Setters often swap these two in statement questions — the mean minimises the sum of squared deviations; the median minimises the sum of absolute deviations.
Variance and Standard Deviation
Variance measures how spread out the data is. Standard deviation is the positive square root of variance. If values are measured in cm, variance is in cm² — it has squared units.
Also: $$\sigma^2 = \frac{1}{n}\sum_{i=1}^{n} x_i^2 - \bar{x}^2$$
Key properties of variance:
- Adding a constant \(k\) to every observation leaves the variance unchanged (change of origin does not affect variance).
- Multiplying every observation by \(k\) multiplies the variance by \(k^2\).
- The standard deviation of identical observations is \(0\).
This direct formula appears as a one-liner — for \(n = 10\), \(\sigma = \sqrt{99/12} \approx 2.87\). No need to compute the mean and run the variance summation.
"Adding a constant doesn't change the SD" is a goldmine. If SD of a dataset is 5 and you add 10 to every observation, SD is still 5. Setters slip this in among options like 15 or 5 + 10 = 15 to catch candidates who confuse origin and scale.
This is the single most-tested property in Statistics: variance is independent of change of origin but NOT of change of scale. If variance is V and each observation is multiplied by 3, the new variance is 9V, not 3V. Confirmed in NDA papers 2013-I, 2014-I, 2016-I, 2017-I.
Coefficient of Variation
The coefficient of variation (CV) lets you compare variability across datasets with different means. A lower CV means less relative variability.
From a 2012-II question: if mean = 40 and SD = 8, then \(\text{CV} = \frac{8}{40} \times 100 = 20\%\).
Correlation Coefficient
The Pearson correlation coefficient \(r\) measures the strength and direction of the linear relationship between two variables \(x\) and \(y\).
Range: $$-1 \le r \le 1$$
- \(r = +1\) or \(-1\): perfect linear relationship.
- \(r = 0\): no linear relationship; if \(r = 0\), the two regression lines are perpendicular to each other.
- \(r^2\) is the coefficient of determination — a measure of the proportion of linear relationship between the variables.
- Both regression coefficients always have the same sign as \(r\).
- If one regression coefficient is greater than 1, the other must be less than 1 — both cannot simultaneously exceed 1 in magnitude.
- \(r\) is independent of change of origin and scale (provided the scale factors are positive).
$$r = \pm\sqrt{b_{yx} \cdot b_{xy}}$$ — sign is the same as the sign of both regression coefficients
Lines of Regression
Two regression lines exist for any bivariate dataset. They intersect at the point \((\bar{x}, \bar{y})\) — the means of \(x\) and \(y\). When \(r = 0\), the two lines are perpendicular. When \(r = \pm 1\), the two lines coincide.
where $$b_{yx} = r \cdot \frac{\sigma_y}{\sigma_x}$$
where $$b_{xy} = r \cdot \frac{\sigma_x}{\sigma_y}$$
To find \(\bar{x}\) and \(\bar{y}\) from two regression equations, solve the pair of equations simultaneously — the intersection point is \((\bar{x}, \bar{y})\).
Worked Examples
Example 1 — Effect of Shift on Standard Deviation (2010-I)
Question: A set of \(n\) values has standard deviation \(\sigma\). What is the standard deviation of the \(n\) values obtained by adding \(k\) to each value?
- Standard deviation measures spread from the mean. Adding a constant \(k\) shifts every value and the mean by the same amount.
- The difference $$(x_i + k) - (\bar{x} + k) = x_i - \bar{x}$$ is unchanged for every observation.
- Since the deviations from the mean are identical, the standard deviation remains \(\sigma\).
- Answer: (a) \(\sigma\)
Example 2 — Variance Scaling (2017-I)
Question: The variance of 20 observations is 5. If each observation is multiplied by 3, what is the new variance?
- Let the observations be \(x_1, x_2, \ldots, x_{20}\) with variance \(\sigma^2 = 5\).
- New observations are \(3x_1, 3x_2, \ldots, 3x_{20}\). New mean = \(3\bar{x}\).
- New variance: $$\frac{1}{n}\sum (3x_i - 3\bar{x})^2 = 9 \cdot \frac{1}{n}\sum (x_i - \bar{x})^2 = 9 \times 5 = 45$$
- Answer: (d) 45
Example 3 — Correlation Coefficient from Regression Coefficients (2014-I)
Question: For two variables \(x\) and \(y\), \(b_{yx} = -3/2\) and \(b_{xy} = -1/6\). Find the correlation coefficient.
- Use $$r^2 = b_{yx} \cdot b_{xy} = \left(-\frac{3}{2}\right) \cdot \left(-\frac{1}{6}\right) = \frac{3}{12} = \frac{1}{4}$$
- So \(r = \pm 1/2\). Since both regression coefficients are negative, \(r\) is negative.
- \(r = -1/2\).
- Answer: (c) \(-1/2\)
Example 4 — Mean of Combined Distributions (2010-I)
Question: Distribution X has 36 observations with mean 4. Distribution Y has 64 observations with mean 3. What is the mean of the combined distribution X + Y?
- Combined mean: $$\bar{x}_{\text{combined}} = \frac{n_1 \bar{x}_1 + n_2 \bar{x}_2}{n_1 + n_2}$$
- $$= \frac{36 \times 4 + 64 \times 3}{36 + 64} = \frac{144 + 192}{100} = \frac{336}{100} = 3.36$$
- Answer: (c) 3.36
Example 5 — Finding \(\bar{x}\) and \(\bar{y}\) from Two Regression Lines (2018-I)
Question: Two lines of regression are \(4x - 5y + 33 = 0\) and \(20x - 9y = 107\). Find the values of \(\bar{x}\) and \(\bar{y}\).
- The two regression lines intersect at \((\bar{x}, \bar{y})\). Solve the system simultaneously.
- Equation 1: \(4x - 5y = -33\). Multiply by 5: \(20x - 25y = -165\).
- Equation 2: \(20x - 9y = 107\). Subtract equation 2 from the scaled equation 1:
- $$(20x - 25y) - (20x - 9y) = -165 - 107 \;\Rightarrow\; -16y = -272 \;\Rightarrow\; y = 17$$
- Substitute into equation 1: $$4x - 5(17) = -33 \;\Rightarrow\; 4x = -33 + 85 = 52 \;\Rightarrow\; x = 13$$
- Answer: (c) \(\bar{x} = 13,\ \bar{y} = 17\)
Example 6 — Correcting a Misread Observation
Question: The mean of 100 observations is 40. Later it was discovered that one observation was misread as 83 instead of the correct value 53. Find the corrected mean.
- Use the correction shortcut: $$\text{Correct Mean} = \text{Old Mean} + \frac{\text{Correct} - \text{Incorrect}}{N}$$.
- Substitute: $$40 + \frac{53 - 83}{100} = 40 + \frac{-30}{100} = 40 - 0.3$$.
- No need to reconstruct the full sum — adjust only the affected term.
- Answer: 39.7
Example 7 — SD After Scale Multiplication
Question: The standard deviation of a dataset of 20 observations is 5. If every observation is multiplied by 3 and then 7 is added to each, find the new standard deviation.
- SD is independent of change of origin: adding 7 to every observation does nothing.
- SD is dependent on change of scale: multiplying every observation by 3 multiplies SD by \(|3| = 3\).
- New SD = \(3 \times 5 = 15\).
- Answer: 15 (variance, by contrast, would become \(3^2 \times 25 = 225\).)
Exam Shortcuts (Pro-Tips)
Statistics rewards pattern recognition. The three shortcuts below collapse classic NDA setups into 15-second solves — every one has appeared in past papers. Memorise them; they regularly turn 3-minute calculations into one-liners.
Shortcut 1 — Incorrect Observation Correction
When a mean is reported and one observation is later found to be misread, do not recompute the whole sum. Apply the correction directly to the mean.
Example: mean of 100 observations is 40; an observation 83 was actually 53. Correct mean \(= 40 + (53 - 83)/100 = 39.7\). For corrected variance, use $$\text{Correct } \sum x^2 = \text{Old } \sum x^2 - (\text{Incorrect})^2 + (\text{Correct})^2$$ and reapply the variance formula.
Shortcut 2 — Combined Variance Formula
When two groups are merged, combined variance is not the simple weighted average of the individual variances — you must add a correction for how far each group's mean sits from the combined mean.
where $$d_1 = \bar{x}_1 - \bar{x}_{12}$$ and $$d_2 = \bar{x}_2 - \bar{x}_{12}$$
Compute the combined mean first, then the deviations \(d_1, d_2\) of each group mean from it, and plug in. Forgetting the \(d^2\) terms is the single most common error in this pattern.
Shortcut 3 — Regression Line Quick Solve
If a question gives two regression line equations and asks for the means \(\bar{x}\) and \(\bar{y}\), ignore every statistics formula. The two lines always intersect at \((\bar{x}, \bar{y})\), so just solve them as ordinary simultaneous linear equations.
Example: given \(3x + 2y - 26 = 0\) and \(6x + y - 31 = 0\), solve to get \(x = 4, y = 7\). So \(\bar{x} = 4, \bar{y} = 7\) — done in under 30 seconds. This pattern appeared in NDA 2012-I and 2018-I.
Common Question Patterns
How NDA Tests Statistics
After analysing papers from 2010 to 2018, six recurring question patterns emerge. Every paper tests at least three of these.
Pattern 1 — Variance and SD After Scaling or Shifting
You are told the variance (or SD) of a dataset, then asked to find the new variance if each observation is multiplied by \(k\) or increased by \(k\). The rule is: adding \(k\) does nothing to variance; multiplying by \(k\) multiplies variance by \(k^2\). This pattern appeared in 2010-I, 2011-II, 2013-I, 2014-II, 2016-I, 2016-II, 2017-I.
Pattern 2 — Median from Frequency Distribution
You are given a grouped frequency table (sometimes with a missing frequency), told the median value, and asked to find the missing frequency or the median class. Apply the interpolation formula. Appeared in 2010-II (the TV tubes question with median life 17 months), 2011-I, 2012-I.
Pattern 3 — Correlation Coefficient from Regression Coefficients
Two regression coefficients \(b_{yx}\) and \(b_{xy}\) are given. Use \(r = \pm\sqrt{b_{yx} \cdot b_{xy}}\). The sign of \(r\) matches the sign of both coefficients. Appeared in 2014-I, 2015-II, 2017-II, 2018-I.
Pattern 4 — Finding Intersection of Two Regression Lines
Two regression line equations are given. Solve them simultaneously — the solution is \((\bar{x}, \bar{y})\). Appeared in 2012-I (regression lines \(x - y + 1 = 0\) and \(2x - y + 4 = 0\) giving intersection \((-3, -2)\)), and 2018-I.
Pattern 5 — Properties of Measures (Statement Type)
Two or three statements about mean, median, variance, regression, or correlation are given. You pick which are correct. These questions test knowledge of properties like "algebraic sum of deviations from mean is zero" (2013-II), "both regression coefficients have the same sign" (2013-II), "variance is independent of origin" (2018-I). Prepare a list of all key properties.
Pattern 6 — Combined Mean or Corrected Mean
You are told the mean of a group, then one observation is found to be wrong and corrected. Find the new mean. Or combine two groups with given means and sizes. Formula: new total = old total ± correction, then divide by n (or combined n). Appeared in 2012-I, 2013-I, 2017-I.
Preparation Strategy
Week 1 — Central Tendency and Dispersion
Start with mean, median, and mode for both raw and grouped data. Practice the combined mean formula with 3–4 PYQ examples. Then move to variance and standard deviation — compute them by hand for small datasets. Memorise the scaling rule (variance scales by \(k^2\)) and the shift rule (variance unchanged by adding \(k\)).
Week 2 — Correlation and Regression
Learn the correlation coefficient formula and its properties. Then learn how to find regression lines, how to extract \(\bar{x}\) and \(\bar{y}\) by solving two regression equations simultaneously, and how to compute \(r\) from \(b_{yx}\) and \(b_{xy}\). Practise with PYQ questions from 2010 to 2018.
Week 3 — Statement-Type Questions
List every key property from the PYQs you have solved. Convert them into flashcard-style statements. For each one, know whether it is true or false and why. Statement-type questions in Statistics are almost always about properties you have already seen — the wording changes, but the fact does not.
High-Value Properties to Memorise
- Algebraic sum of deviations from the mean = 0.
- Mean deviation is least when measured about the median.
- Variance is independent of change of origin; coefficient of variation is independent of units.
- Both regression coefficients have the same sign. If one exceeds 1, the other must be less than 1.
- The two regression lines intersect at \((\bar{x}, \bar{y})\). When \(r = 0\), they are perpendicular. When \(|r| = 1\), they coincide.
- The abscissa of the intersection of less-than and more-than ogives is the median.
- Geometric mean is used in construction of index numbers.
Time Allocation in the Exam
Statement-type questions: 30–45 seconds each (no calculation, just recall). Calculation questions like variance scaling or combined mean: 60–90 seconds. Regression line intersection (solve two simultaneous equations): 90–120 seconds. Grouped data median or mode: 2 minutes if the table is complex. Skip and return if stuck — Statistics has enough easy questions that you can score well without attempting the hardest ones.
Test Your Statistics Prep
Mock tests replicate the real NDA paper pattern — time pressure, mixed difficulty, and the exact same question formats. Check where you stand before exam day.
Start Free Mock TestFrequently Asked Questions
How many questions from Statistics appear in NDA Maths?
Based on PYQ data from 2010 to 2018, Statistics consistently delivers 5–9 questions per paper. Papers in 2011, 2016, and 2017 had 8–9 questions. It is one of the most heavily weighted topics in the NDA Maths syllabus.
What happens to variance when each observation is multiplied by a constant k?
The variance is multiplied by \(k^2\). So if variance is 5 and each observation is multiplied by 3, the new variance is \(9 \times 5 = 45\). The standard deviation is multiplied by \(k\) (not \(k^2\)). Adding a constant to every observation has no effect on variance or standard deviation.
What is the difference between the two lines of regression?
The regression line of \(y\) on \(x\) minimises the sum of squared vertical deviations — use it to predict \(y\) from \(x\). The regression line of \(x\) on \(y\) minimises the sum of squared horizontal deviations — use it to predict \(x\) from \(y\). Both lines pass through the point \((\bar{x}, \bar{y})\). They coincide only when \(|r| = 1\), and they are perpendicular when \(r = 0\).
How do I find the mean and SD of y given the two regression lines?
From the 2010-II PYQ: the lines were \(8x - 10y = 66\) and \(40x - 18y = 214\), with variance of \(x = 9\) (so \(\sigma_x = 3\)). Identify which line is \(y\) on \(x\) (lower coefficient on \(x\)) and which is \(x\) on \(y\). Extract \(b_{yx}\) from the \(y\)-on-\(x\) line. Then use \(b_{yx} = r \cdot (\sigma_y/\sigma_x)\) along with \(b_{xy} = r \cdot (\sigma_x/\sigma_y)\). Multiply the two regression coefficients: \(b_{yx} \cdot b_{xy} = r^2\). Solve for \(r\) and then for \(\sigma_y\). The answer for that question was \(\sigma_y = 4\).
What is the coefficient of variation and when is it tested?
\(\text{CV} = (\sigma / \bar{x}) \times 100\%\). It is a unit-free relative measure of dispersion. NDA tests it in two ways: direct calculation (e.g., mean = 40, SD = 8 \(\to\) CV = 20%, from 2012-II) and as a conceptual statement (CV is independent of the unit of measurement, which is true). It is also used to compare variability between two groups — the group with a lower CV is less variable relative to its mean.
How does the median change when new observations are added?
From the 2012-II PYQ: the median of 27 observations was 18. Three more observations — 16, 18, and 50 — were added. The median of the 30 observations was still 18. The key insight: adding observations that are near or equal to the existing median often leaves the median unchanged, but you must recount positions carefully. Always re-sort and find the new middle position.
Which measure of central tendency is used in constructing index numbers?
Geometric mean. This was asked directly in 2015-I. The geometric mean is preferred for index numbers because it gives equal weight to equal ratios of change, making it suitable for combining price relatives across different commodities.