**Detecting a change/difference between sample and population/another sample: Steps:
**1. Check: Is data Variable or Proportional.

2. Check: If data needs discrimination.

3. Determine minimum Sample size.

4. Then perform qualitative analysis: check difference in Distribution- 5. Then Sigma – quantitiative analysis: Excel chi^2 for proportional, and Anova= Ftest for variable.

6. Then comparison of Average: probability, Zt or binomial test for proportional, and t-test for variable data.

**Note:** sigma & average analysis alone are not enough, studying the distribution first is important.

To compare sigma of a sample vs a population: Chi^2=(n-1)s^2/S^2

To compare sigma between two samples: F=s1^2/s2^2

**Note:** Tests don’t prove Sameness, rather only difference.

Q: chi^2 against expected or population(n-1)s^2/S^2 or another sample:s1^2/s2^2? Expected is the population.

6 Sigma -> Analysis -> statistical change.

**Data types:**

**Variable **(a.k.a. continuous, measurable): Calculate Xbar & S, Assume normal distribution & use t-test for change in the mean between 2 samples/population.

**proportional** :frequencies/counts of attributes; use binomial & Excel chi^2 for sigma change or probabilities (single success rate,tries,n of successes) for change in the mean.

**DMAIC Analysis Steps:** Data Discrimination (optional, only for variable data) >> Histogram >> Estimate Distribution >> Estimate Probability of no-change >> Conclude Change >> Discover KPIV (or if no change then exclude this KPIV) >> DMAIC Improve.

**Data discrimination:** converting variable data to proportional data (categorical, ranges, discrete). Rule of thumb: 10 steps.

Then, we draw histograms of variable data: drawing the frequency of each range of values, and we need the histograms to estimate the distribution of data, whether normal, bimodal, skewed, exponential, etc.

**Why we need data distributions:** We need the analysis of data distribution for probabilities (probability that an event resulted in an outcome value x) and to compare a distribution of a sample to a population or another sample.

**Sample size:** small samples lead to error in conclusions. Large samples are expensive and take long time to extract (sometimes long enough for a trend to expire).

**Heuristically –** minimum 11, and preferable 30+.

**Statistically – **For variable data: n=(ZxS/h)^2

For proportional data: n=(1.96*sqrt (p (1-p))/h)

Where Z is the confidence level that is aimed for (question is 2-tailed?Z=1.96 for 95%, etc), S population standard deviation, and *h* is the smallest change we want to be able to sense. *p* is the proportion of individual event to happen in the value under consideration.

**Estimating population X bar & S:** assume the same of a large sample. Or average of statistics(x & s) of multiple samples. Keep confidence 95% not to miss an opportunity.

**Checking for a change: sample vs population:**

1- plot histograms and compare distributions. If outliers then remove them. If similar:

2- compare sigmas (variable Chi square or F test for variable data; Zt test for proportional data) If similar:

3- compare means (Anova/t test for variable, Zt test for proportional data).

**Proportional vs. variable data:** variable data require smaller minimum sample size than proportional data so it is easier to test.

We can treat proportional data as being variable data, thus use chi&F for sigma analysis, and t test instead of Zt test for analysis of the average.