Detecting a change/difference between sample and population/another sample: Steps:
1. Check: Is data Variable or Proportional.
2. Check: If data needs discrimination.
3. Determine minimum Sample size.
4. Then perform qualitative analysis: check difference in Distribution- 5. Then Sigma – quantitiative analysis: Excel chi^2 for proportional, and Anova= Ftest for variable.
6. Then comparison of Average: probability, Zt or binomial test for proportional, and t-test for variable data.
Note: sigma & average analysis alone are not enough, studying the distribution first is important.
To compare sigma of a sample vs a population: Chi^2=(n-1)s^2/S^2
To compare sigma between two samples: F=s1^2/s2^2
Note: Tests don’t prove Sameness, rather only difference.
Q: chi^2 against expected or population(n-1)s^2/S^2 or another sample:s1^2/s2^2? Expected is the population.
6 Sigma -> Analysis -> statistical change.
Variable (a.k.a. continuous, measurable): Calculate Xbar & S, Assume normal distribution & use t-test for change in the mean between 2 samples/population.
proportional :frequencies/counts of attributes; use binomial & Excel chi^2 for sigma change or probabilities (single success rate,tries,n of successes) for change in the mean.
DMAIC Analysis Steps: Data Discrimination (optional, only for variable data) >> Histogram >> Estimate Distribution >> Estimate Probability of no-change >> Conclude Change >> Discover KPIV (or if no change then exclude this KPIV) >> DMAIC Improve.
Data discrimination: converting variable data to proportional data (categorical, ranges, discrete). Rule of thumb: 10 steps.
Then, we draw histograms of variable data: drawing the frequency of each range of values, and we need the histograms to estimate the distribution of data, whether normal, bimodal, skewed, exponential, etc.
Why we need data distributions: We need the analysis of data distribution for probabilities (probability that an event resulted in an outcome value x) and to compare a distribution of a sample to a population or another sample.
Sample size: small samples lead to error in conclusions. Large samples are expensive and take long time to extract (sometimes long enough for a trend to expire).
Heuristically – minimum 11, and preferable 30+.
Statistically – For variable data: n=(ZxS/h)^2
For proportional data: n=(1.96*sqrt (p (1-p))/h)
Where Z is the confidence level that is aimed for (question is 2-tailed?Z=1.96 for 95%, etc), S population standard deviation, and h is the smallest change we want to be able to sense. p is the proportion of individual event to happen in the value under consideration.
Estimating population X bar & S: assume the same of a large sample. Or average of statistics(x & s) of multiple samples. Keep confidence 95% not to miss an opportunity.
Checking for a change: sample vs population:
1- plot histograms and compare distributions. If outliers then remove them. If similar:
2- compare sigmas (variable Chi square or F test for variable data; Zt test for proportional data) If similar:
3- compare means (Anova/t test for variable, Zt test for proportional data).
Proportional vs. variable data: variable data require smaller minimum sample size than proportional data so it is easier to test.
We can treat proportional data as being variable data, thus use chi&F for sigma analysis, and t test instead of Zt test for analysis of the average.