Excel SPSS Chi Squared Goodness of fit test

This SPSS/Excel tutorial explains how to use Chi Squared Goodness of fit test in Excel, SPSS and manual calculation.

You may also want to read:

SPSS Excel Chi Squared Test of Contingency Table

Chi Squared Goodness of fit test

Chi Squared Goodness of fit test is used to test hypotheses about a population proportion of nominal data. A coin flipping experiment has the chance of tossing head and tail (p=0.5), we call the experiment binomial experiment, where we only have two possibilities and each trial is independent. For Chi Squared Goodness of fit test, it is a multinominal experiment where more than 2 possibilities can happen for each trial. We can use Chi Squared Goodness of fit test to evaluate if there is significant evidence of a proportion change.

Chi Squared Goodness of fit test – Manual Calculation

Assume that in 2010, supermarket A has the market share of 45%, supermarket B has 40%, supermarket C has 15%. In 2016 we conduct new survey to see if customer preferences have changed. Of the 200 customers we asked, 102 selects supermarket A, 82 selects supermarket B, 16 selects supermarket C. To evaluate if the proportion has a significant change, we create hypothesis:

Ho: p1 = 0.45, p2 = 0.4, p3 = 0.15
H1: At least one pi is changed

If  the proportion has not changed, then in the 2016 survey, frequency of supermarket A should be 90 (200*0.45), B to be 80 (200*0.4), C to be 30 (200*0.15).

Apparently the actual frequency would not be exactly the same as the expected frequency, the point is we will see if the difference is significant.

Create a table as below and do some calculations for the blue columns.

Observed Frequency
(2016 survey result)
Expected Frequency
(if no change)
Differencesquare of difference/

expected frequency

Supermarket A10290121.6
Supermarket B828020.05
Supermarket C1630-146.53
Chi Square = 1.6+0.05+6.533333 = 8.1833333

The next step is to calculate the probability of Chi Square value to see if it is <0.05. Since we cannot manually calculate the value, we have to use Excel Function CHISQ.DIST.RT. There are two arguments in this function, the first parameter is the Chi Square that we just calculated, the 2nd is the degree of freedom. The degree of freedom of Chi Squared Goodness of fit test is

v = n-1

In our example, we have 3 supermarkets, therefore degree of freedom is 3-1 = 2

Now apply the formula

FormulaResult
=CHISQ.DIST.RT(8.18,2)0.016711358

Since the P value is <0.05, we conclude that there is significant evidence to infer that proportions have change in 2016 survey.

Chi Squared Goodness of fit test – using Excel

Assume that your source data is like this

Excel Chi Squared Goodness of fit test 02

Use CountIf Function to create a summery table

Observed Frequency
(2016 survey result)
Expected Frquency
(if no change)
Supermarket A=COUNTIF(A2:A201,”Supermarket A”)=COUNTIF(B2:B201,”Supermarket A”)
Supermarket B=COUNTIF(A2:A201,”Supermarket B”)=COUNTIF(B2:B201,”Supermarket B”)
Supermarket C=COUNTIF(A2:A201,”Supermarket C”)=COUNTIF(B2:B201,”Supermarket C”)

 

Excel Chi Squared Goodness of fit test 01

 

Now we can directly calculate the probability using CHITEST function.

There are two arguments for CHITEST Function. The first one is the observed frequency range, the second one is expected frequency range. Now apply the formula

FormulaResult
=CHITEST(B2:B4,C2:C4)0.016711358

Chi Squared Goodness of fit test – using SPSS

The input requirement of SPSS is quite different from Excel.

First, you need to have data source in numerical value instead of nominal value Supermarket A B C. Assign each supermarket a number, for example, Supermarket A represents 1, B for 2, C for 3.

Excel Chi Squared Goodness of fit test 03

 

Then go to the Variable View tab, Values field to add a supermarket label for 1 2 3. This step is optional but it makes you easier to read the output file.

Excel Chi Squared Goodness of fit test 04

 

Go to the menu tab > Analyze > Non Parametric Tests > Legacy Dialogs > Chi-Square

Excel Chi Squared Goodness of fit test 05

Move the “Observed Frequency” to the Test Variable List

At the bottom, type the Expected Frequency percentage in order of Supermarket A B C (because we have named them as 1,2,3 so the proportion has to follow the ascending order of numerical value).

Excel Chi Squared Goodness of fit test 06

Now we have the Chi-Square value as well as the P value.

 

SPSS Excel one sample T Test

This tutorial explains how to use Excel and SPSS to perform one sample T Test.

You may also want to read:

Statistics Hypothesis Testing

Calculate Z Score and probability using SPSS and Excel

What is T Test

In statistical inference, we are interested to know whether a small sample comes from a population. To inference using sample mean, when the population standard deviation and population mean are known, we can use Z test to interference the population mean from sample mean.

In reality, we do not have data of the whole population. Without the population standard deviation, we use T Test (also known as Student’s t Test) to interference the population mean from sample mean.

There are basically three kinds of T Test in statistics:

One sample T TestTest whether a target mean value is equal to the population mean
Independent samples T TestTest whether the population mean from two individual samples are equal
Paired-samples T TestTest whether the population mean between paired observations are equal

This tutorial will only talk about One sample T Test.

One sample T Test – manual calculation

Formula

t value of particular test mean value

spss excel t test 01
x bar: sample mean
μo: population mean to test
s: sample standard deviation
n: sample size

Confidence interval estimator of population mean (the value range of population mean at particular confidence level α)

spss excel one sample t test 08

Degree of freedom (ν)

ν = n-1

Example

To test whether population mean of student’s exam marks is over 50, given the below sample data

Student IDAverage marks
140
250
360
436
570

Step 1: Hypothesis

H1: μ > 50
Ho: μ = 50

Click here to learn more about Hypothesis Testing

Step 2: Calculate t of test mean value

spss excel t test 01

sample mean = 51.2
sample standard deviation = 14.043
square root of sample size = 2.236

t = (51.2-50)/(14.043/2.236)
t = 0.191

Step 3: Find t at 5% Significance level from t table

Find the t value for df = 4, p = 0.05 in t table

The value of t is 2.132

test statistics table

Conclusion

The rejection region is t > 2.132. Now that t for 50 is 0.191, therefore we do not have enough evidence to reject null hypothesis.

We can also calculate the confidence interval for our reference.

spss excel one sample t test 08

Interval = 2.571*14.043/2.236 = 16.15

Thus

Lower limit = 50-16.15 = 33.85
Upper limit = 50+16.15 = 66.15

One sample T Test – Excel

Create a spreadsheet as below. Column A is optional, column C is a dummy column to be used later as explained below.

spss excel one sample t test 01

 

In Excel 2010 / 2013, navigate to Data > Data Analysis

spss excel one sample t test 02

 

Since Excel does not have One Sample T Test, the closet we can use  is t-Test: Two Sample Assuming Unequal Variances. We will compare column B with column C as a workaround.

Enter the information as below, and the click OK.

spss excel one sample t test 03

Variable 1 Range is your data source. Variable 2 Range is the hypothesis mean in this case.

Hypotheses Mean Difference is 0 because we just want to know if there is any difference, but not how much difference.

Labels means whether the first data of variable range is a header, which will be used in the output table.

Alpha is the significance level of one tail.

Output Range is where we want to output the result table.

 

This is the generated result table. I have highlighted the two T value for easy reference. They are same as manual calculation.

spss excel one sample t test 04

 

To find the confidence interval in Excel, navigate to Data > Data Analysis

Select Descriptive Statistics

spss excel one sample t test 09

 

Input the value as below, click on OK

spss excel one sample t test 10

 

A table is generated. The value I highlight is the confidence interval, therefore

Lower limit = 50-17.44 = 32.56
Upper limit = 50+17.44 = 67.44

spss excel one sample t test 11

 

One sample T Test – SPSS

Import data into SPSS

spss excel one sample t test 05

 

Navigate to Analyze > Compare Means > One-Sample T Test

spss excel one sample t test 06

Move the average marks field to the right box for analysis, and then enter 50 as Test Value, click on OK.

 

spss excel one sample t test 07

Interpreting the result:

tThis is the t value for mean = 50
dfdegree of freedom equals to sample size-1
Sig (2-tailed)The total probability that mean > 50 and mean < 50 is 0.858 (probability not equal to 50). Since our hypothesis is testing mean > 50, we divide 0.858 by 2 = 0.429.
Since 0.429 > 0.05 significance level, we cannot reject Null hypothesis.
Mean DifferenceDifference between sample mean (51.2) and test value (50)
95% Confidence Interval of the DifferenceAs we do not know the true mean of the population, from here we can estimate 95% chance that the mean would lie between 34.8 (51.2-16.24) and 69.84 (51.2+18.64).

 

95% confidence level is the default value, to change the confidence level,  go back to the One-Sample T Test box > click on Options button

spss excel one sample t test 06

 

You may change the confidence level right here

spss excel one sample t test 12

 

Outbound References

https://www.youtube.com/watch?v=tdaVp9BOdjk

 

SPSS Select Cases (filter data)

This SPSS tutorial explains how to filter data using Select Cases Function.

SPSS Select Cases (filter data)

In Excel, you can apply Autofilter and enter criteria to select data what you want. SPSS has similar function but it is called “Select Cases“. In this tutorial, we will talk about the options are available to facilitate data filtering.

Navigate to Data > Select Cases

SPSS select cases 01

 

There are several options in this Select Cases box. We will talk about each option in the below section.

SPSS select cases 02

 

SPSS Select Cases – If condition is satisfied

This option allows you to select data based on criteria.

Select the first option If condition is satisfied > click on If… button

SPSS select cases 03

 

For example, if we only want to use data where Salary is larger than 20000, then we type the followings in the text box

Type Salary > 20000

Press Continue button > Press OK button to exit the Select Cases dialog box

SPSS select cases 05

You can also use AND OR operator in the Text Box, for example

Salary>20000 AND Allowance=0    (0 can represent Null data)

 

Now you can see row 6 and row 8 (salary = 9545) have a slash on the row number. The slash means those rows are excluded in the dataset.

When you run statistics report, you will only find the population size N is only 6.

SPSS select cases 06

 

SPSS Select Cases – Random sample of cases

This option allows you to randomly select rows in the data.

Select the second option Random sample of cases > click on Sample… button

SPSS select cases 07

 

The first option is to filter a certain percentage of cases. But note that it is only “Approximately”, the number of cases filtered could vary significantly.

SPSS select cases 08

 

Lets say we enter 50% in the box. Go back to Data View and you can see only 3 cases out of 8 cases are filtered.

SPSS select cases 09

 

Go back to the previous screen and select the second option. Enter 4 cases from the first 7 cases. Click on Continue button and go back to Data View.

SPSS select cases 10

 

Now you can see 4 random rows from row 1 to 7 remain.

SPSS select cases 11

 

SPSS Select Cases – Based on time or case range

This option allows you to filter specific row number range.

Select the third option Based on time or case range > click on Range button

SPSS select cases 12

 

Enter 2 and 6 as follows

SPSS select cases 13

 

Go back to Data View, now you can see only row 2 to row 6 are left.

SPSS select cases 14

 

SPSS Select Cases – Use filter variable

This option is to manually select which row of data to include or exclude.

First, in Variable View, create a field called filterA

In the Select Cases Option, select the forth option Use filter variable and then move filterA to the box

SPSS select cases 15

 

Go back to Data View. Now you can enter “1” in filterA column to indicate you want to include that row of data, enter 0 (or Null input) to exclude data.

SPSS select cases 16

 

Outbound References

https://www.youtube.com/watch?v=5AVWlYyPnQo

 

Statistics Hypothesis Testing

This tutorial explains the concepts of hypothesis testing in statistics.

Hypothesis Testing

Many text book uses the court case example to explain hypothesis testing. In the U.S. (but not all countries), the court finds evidence to prove the defendant guilty, instead of proving the defendant innocent.

There are two hypothesis for a court case:

  1. The defendant is guilty – when we have a theory and try to prove it, the hypothesis is called H1

  2. The defendant is innocent –  the alternative of H1 (you can simply interpret as something is normally true, i.e. everyone should be innocent) , the hypothesis is called H0 (Null hypothesis)

We don’t prove someone is innocent because we have assumed so, we just need to see if there is enough evidence to prove the defendant guilty.

If there is enough evidence, we say: There is enough evidence to conclude that the defendant is guilty

If there is not enough evidence, we say: There is not enough evidence to conclude that the defendant is guilty, but we cannot say the person is innocent

Hypothesis Testing Errors

As we try to prove defendant  is guilty, an innocent person may be judged guilty (which is very serious) or a guilty person is judged innocent. Using the statistics terminology to describe these two errors:

  1. Type I Error / False Positive (probability is α): We found enough evidence to conclude the defendant is guilty, but it turns out the person is innocent. α is the level of significance we allow to reject null hypothesis.

  2. Type II Error / False Negative (probability is β): We could not find enough evidence to conclude the defendant is guilty, but it turns out the person is guilty

The term Power describes the probability of rejecting a false null hypothesis. Specifically, it is the probability that a randomly selected sample will show that the null hypothesis is false.when the null hypothesis is indeed false.

Examples of Hypothesis Testing

Given μ = 100, if we try to prove population mean >100, then

H0: μ = 100

H1: μ < 100

If we try to prove population mean ≠100, then

H0: μ = 100

H1: μ ≠100

Outbound References

http://www.sagepub.com/sites/default/files/upm-binaries/40007_Chapter8.pdf

 

SPSS Import Text File

This SPSS tutorial explains how to import text file into SPSS.

SPSS Import Text File

Navigate to File > Open > Data

spss import text 01

 

Select Text in Files of Type, and then select the target text file

spss import text 02

 

Step 1

spss import text 03

Does your text file match a predefined format?

Leave the option default as No if this is the first time you import this file.

Select Yes if you have previously saved the format in your last import.

 

Step 2

spss import text 04

How are your variables arranged?

Whether you want to separate the data by characters like comma, colon, tab, or anything you specify (which will be defined later).

Fixed width means to separate data by character length.

In our example, we want to import txt file, which should be delimited by Tab, therefore we leave this option default as Delimited

Are variable names included at the top of your file?

Whether the first row of your data is the header. Since my text file has headers, I choose Yes

 

Step 3

spss import text 05

The first case of data begins on which line number?

Since I said the first row is header in the last step, the data begins in line 2.

How are your cases represented?

If each row of data represents data set of each person (a case), then you should choose Each line represents a case

If you have combined different persons (cases) in one data row, SPSS can help you move the data down the next row if you choose A specific number of variables represents a case

For example, if you data looks like this

Name  Address   Gender   Name  Address   Gender   
Peter    HK      M       Mary    US        F

When you tell SPSS 3 variables represent a case, then SPSS will help you move the data down

Name  Address   Gender 
Peter    HK       M
Mary     US       F

 

Step 4

spss import text 06

Which delimiters appear between variables?

We want to separate the text by Tab, because txt file is Tab delimited

What is the text qualifier?

This option is to prevent data from being mistakenly delimited. Some data such as CSV file are delimited by comma,  but the data itself may contain comma and you don’t want to delimit them with comma. In that case, you may enclose the data with another symbol to indicate that is one data.

For example, if your data is enclosed with double quote

 "May, John, Mary"    "Peter"    "Zoe"

As you select Double quote in this option, SPSS will split the data as below

May, John, Mary    Peter     Zoe

 

Step 5

spss import text 08

Variable name

SPSS has specific rules on the naming (e.g. no space is allowed), it will suggest you an acceptable Variable name

Data Format

SPSS automatically guesses the data type for you and you can verify if the guess is correct, correct it if wrong

 

Step 6

spss import text 09

Would you like to save this file format for future use?

This is for use in Step 1

Would you like to paste the syntax?

Choosing Yes will generate the code in Syntax Editor for all the options you selected. The import action will not execute until you click on the Run button in the Syntax Editor.

Cache data locally

This option is only useful when the SPSS file is located in network drive. Cache data locally will improve the performance especially for large database.

 

Txt file successfully imported

spss import text 10

Outbound References

http://www-01.ibm.com/support/docview.wss?uid=swg21480547

 

SPSS Excel Covariance, Correlation, Least Squares Method

This SPSS Excel tutorial explains how to calculate Covariance, Correlation and Least Squares Method in Excel and SPSS.

Covariance and Correlation

spss excel correlation 01

When we plot a scattered graph to see the relationship between two interval variables, we can see how straight a line is formed (whether the relationship is strong) and whether the relationship is positive or negative (upward or download).

However, when we want to look deeper to see the strength of the relationship, we need to calculate covariance and correlation. Covariance is a process to help us find the Correlation, which is the number we ultimately need for the strength.

Covariance

The below formula is for calculation of Population Covariance. For Sample Covariance, divide n-1 instead of N.

population covariance

While σx is denoted as standard variation of x, σxy is denoted as Covariance.

After we calculate the covariance, we can check the sign whether it is negative or positive.  Positive covariance means positive relationship (y increases as x increases), negative covariance means a negative relationship (y decreases as x increases). However, we cannot see the strength of relationship at this point.

Calculate Covariance in Excel

For sample covariance

=COVARIANCE.S(array1, array2)

For population covariance

=COVARIANCE.P(array1, array2)

Calculate Covariance in SPSS

Note that SPSS can only calculate sample Covariance.

Navigate to Analyze > Correlate > Bivariate

spss covariance 01

Move x and y to the right hand side

spss covariance 02

Click on Option, and then check the box Cross-product deviations and covariances

spss covariance 03

 

In the output table, 26.164 is the sample covariance. 1897.659 on the left is the variance of x.

The significance level (2 tailed) < 0.05 means there is a statistically significant correlation between two variables.

spss covariance 04

Coefficient of Correlation

Again, Covariance is just a step to calculate correlation. The next step is to calculate Coefficient of Correlation using Covariance.

Formula of Population coefficient of correlation:  (σ is the standard deviation)

ρ = σxy / (σx * σy)

Sample coefficient of correlation:

r = Sxy / (Sx * Sy)

The calculated result of Coefficient of Correlation ranges between -1 and 1. The positive sign indicates positive relationship while negative sign indicates negative relationship.

In addition, 1 indicates the strength of linear relationship is very strong, 0 indicates no linear relationship.

Calculate Coefficient of Correlation in Excel

=CORREL(array x, array y)

Calculate Coefficient of Correlation in SPSS

You can find the Correlation in the above output table.

spss correlation 01

Least Squares Method

Least Squares Method is to find the linear equation of best fit line for the x and y.

The equation of best fit line is

mean of y = slope * (mean of x) + constant

To create the equation, first calculate the slope of the straight line

slope = covariance x y / variance of x

Afterwards, substitute mean of x and mean of y to find constant.

Calculate slope in Excel using Slope Function

=SLOPE(array y, array x)

or calculate indirectly

=COVARIANCE.S(array x, array y)/VAR.S(array y)

Calculate constant in Excel using Intercept Function

=INTERCEPT(array y, array x)

Calculate slope and constant in SPSS

Navigate to Analyze > Regression > Linear

spss linear regression 01

Select y as Dependent variable and x as Independent variable, click on OK

spss linear regression 02

In the output, one of the table is called Coefficients, where you can find the constant is 0.15 and slope is 0.14

spss linear regression 03

 

Calculate Z Score and probability using SPSS and Excel

This tutorial explains how to calculate Z Score and probability using SPSS and Excel.

You may also want to read

SPSS Excel one sample T Test

Calculate probability of a range using Z Score

Calculate Z Score and probability using SPSS and Excel

In statistical inference, we are interested to know whether a small sample comes from a population. To inference using sample mean, when the population standard deviation and population mean are known, we can use Z test to interference the population mean from sample mean.

In reality, we do not have data of the whole population. Without the population standard deviation, we use T Test (also known as Student’s t Test) to interference the population mean from sample mean.

Use Excel to calculate Z Score

There are two ways for Excel to calculate Z Score. Assume that we have a population data range named data_rng.

Method 1:  Use AVERAGE and STDEV.P Functions

If we calculate Z Score manually, we use the below formula

Z score = (X-μ)/σ
        = (target value - population mean) / population standard deviation

Follow the Z score formula with the help of Average Function to calculate mean and use STEDEV.P to calculate the population standard deviation. If you want to calculate sample standard deviation, use STEDEV.S

For example, type the below formula to find Z score for X = 10

=(10-AVERAGE(data_rng))/STDEV.P(data_rng)

Method 2:  Use Standardize Function

Syntax of Standardize Function

STANDARDIZE(x, mean, standard_dev)

This Function is a bit cleaner than Method 1, but we still need the help of AVERAGE and STDEV.P Functions.

For X = 10

=STANDARDIZE(10,AVERAGE(data_rng),STDEV.P(data_rng))

Use Excel to calculate Z Score and probability

How Excel calculates probability is different from the Standard Normal Table. Standard Normal Table finds the probability from 0 to Z, while Excel calculates from infinity to Z.

Therefore, if you are trying to get the same result as Standard Normal Table does, subtract 0.5 by the Excel result and then apply absolute value.

For example, for Z score = 2.41, probability = 0.492 according to the Standard Normal Table. In Excel, the probability is 0.07869, which is approximately equal to 0.5-0.492.

DataZ ScoreP value (Excel)0.5-P value=ABS(0.5- P value)
1-1.41420.07860.42140.4214
2-0.70710.23980.26020.2602
30.00000.50000.00000.0000
40.70710.7602-0.26020.2602
51.41420.9214-0.42140.4214

Method 1:  Use NORM.S.DIST to convert Z score to probability

Syntax of Norm.S.DIST Function

NORM.S.DIST(z,cumulative)
ZRequired. The value for which you want the distribution.
CumulativeRequired. Cumulative is a logical value that determines the form of the function. If cumulative is TRUE, NORMS.DIST returns the cumulative distribution function; if FALSE, it returns the probability mass function.

This method directly converts Z Score to probability. Assume that we have already found Z Score, which is 1

=NORM.S.DIST(1,TRUE)

Assume that we haven’t found the Z score for X=10, then

=NORM.S.DIST(STANDARDIZE(10,AVERAGE(data_rng),STDEV.P(data_rng)),TRUE)

Method 2:  Use NORM.DIST to calculate probability directly

Syntax of Norm.DIST Function

NORM.DIST(x,mean,standard_dev,cumulative)
XRequired. The value for which you want the distribution.
MeanRequired. The arithmetic mean of the distribution.
Standard_devRequired. The standard deviation of the distribution.
CumulativeRequired. A logical value that determines the form of the function. If cumulative is TRUE, NORM.DIST returns the cumulative distribution function; if FALSE, it returns the probability mass function.

This method assumes that we still have not found the Z score yet. Assume that X=10

=NORM.S.DIST(STANDARDIZE(10,AVERAGE(data_rng),STDEV.P(data_rng)),TRUE)

Use SPSS to calculate Z Score and probability

Input the value 1 to 5 in SPSS Data View.

spss z score 01

Navigate to Analyze > Descriptive Statistics > Descriptives

spss z score 02

Move the variable to the right and check the box “Save standardized values as variables”

spss z score 03

Close the Statistics Viewer (because we don’t need this) and go back to Data View.

Now we see a new column is created, which is the Z value for each data.

spss z score 04

Lets compare SPSS result with Excel result, we can see that SPSS is using sample standard deviation to calculate Z Score (SPSS always uses sample instead of population).

excel z score 01

After calculating the Z Score in SPSS, you can use Excel or refer to Standard Normal Table to find the probability.

z score 02

Excel vs SPSS – calculate Z Score and probability

  1. SPSS fails to calculate Z Score for variable value not in the data set, while Excel can calculate any value
  2. SPSS can only calculate Z Score using sample standard deviation but not population standard deviation, while Excel can do both
  3. SPSS fails to convert Z Score to probability, while Excel can do
  4. SPSS fails to directly calculate Z Score from data set, while Excel can do

Outbound References

http://spssx-discussion.1045642.n5.nabble.com/Can-anyone-help-me-get-a-population-standard-deviation-td1075352.html

Calculate probability of a range using Z Score

This tutorial explains how to calculate probability of a range using Z score (standard normal random variable).

You may also want to read:

Excel Range, Variance, Standard Deviation

Calculate Z Score and probability using SPSS and Excel

SPSS Excel one sample T Test

Calculate probability of a range using Z Score

Assume that a random variable is a normally distributed (a normal curve), given that we have the standard deviation and mean, we can find the probability that a certain value range would occur.

I will demonstrate the this concept using an example.

Question

A fund has a return with a mean of 10% and standard deviation of 5%. What is the probability of losing money?

Answer

Losing money means the return < 0%. When calculating probability, we represent this statement as

P (X < 0)

Step 1 – Calculate Z Score

The first step is to standardize the target variable value into a standard normal random variable (Z Score) using the known standard deviation and mean. Z Score is an indicator of how far the value is away from the mean.

Formula:

Z score = (X-μ)/σ
        = (target value - population mean) / population standard deviation
        = (0 - 10)/5
        = -2  (2 standard deviation below mean)

Meaning of the Z score result:

z score 01

Z score is a standardized standard deviation (no matter what the actual standard deviation is, it is standardized from -4 to 4), and therefore we can calculate the investment return and probability. You can image each Z value corresponds to a standard deviation, probability and investment return value.

Z score = 0 : variable value = mean (10% investment return)

Z score > 0 : variable value > mean, Z score = 1 means 1 standard deviation above the mean, 2 = 2 standard deviation

Z score < 0 : variable value < mean

Step 2 – Look up probability from Standard Normal Table

The value in the first column (0.00, 0.01, 0.02…) is the first decimal place of Z, the value in the first row (0.00, 0.01, 0.02…) is the second decimal place of Z.

For example, the calculated Z value is 0.13, then we look for the 0.10  in the first column, and look for 0.03 in the first row, the result is 0.0517.

Standard Normal Table only provides positive Z value but not negative Z value, it is because the normal curve is a regular bell shape, it makes no difference between positive Z and negative Z. To save space, it is no point to repeat the whole table again with a negative Z value. In other words, the table only shows probability of half the curve, therefore the maximum probability you can find is 0.4998 (ideally it should be 0.5).

In our example, since the the probability of +2 and -2 are the same under the normal curve, we can simply  look for Z = 2.00, the result is 0.4772.

z score 02

Step 3 – Interpreting the result

As spoken above, the probability we looked at is half the normal curve from the middle point. Result 0.4772 is referring to the probability from Z = 0  (the mean of 10% return) to Z= -2 (0% return), but since we want the probability of <0% return return (Z < -2), we need to subtract 0.4772 from 0.5 (probability of half the normal curve area).

To represent this concept in formula,

P (X < 0)

= P (Z < -2)

= 0.5 – 0.4772

=0.0228

According to the formula

Z score = (X-μ)/σ
        = (target value - mean) / standard deviation

Increasing the standard deviation will decrease Z score, lower Z score means lower probability, thus increasing chance of losing money.

What’s Next?

Calculating Z Score manually and finding the probability in Standard Normal Table is only for you to learn the concept, you can get the value quickly using SPSS and Excel.

You should also read the below article:

Calculate Z Score and probability using SPSS and Excel

Outbound References

http://faculty.tarleton.edu/crawford/math131.html

Import Excel into SPSS

This SPSS tutorial explains how to import Excel into SPSS.

Import Excel into SPSS

It is easy to import Excel into SPSS but you should note that SPSS will automatically define each variable, some of them can be correct and some incorrect (this is very similar to Microsoft Access). In this tutorial, I will demonstrate how to import Excel into SPSS and how SPSS guesses the variable types, I will try too cover the most common scenario.

Lets say we have the following table in Excel, which contains time stamp, date, date with blank records, text, number in text format, number with 2 decimal places, integer, integer with blank records.

spss excel 01

Open SPSS, navigate to File > Open > Data

The name “Open” is confusing. In fact it is not open, it is import. Any change you made in SPSS is in SPSS, it does not modify the actual Excel file.

spss excel 02

Under “Files of type”, select Excel, then choose the workbook you want to import

spss excel 03

Select the worksheet to be imported > OK

spss excel 04

After the worksheet is successfully imported, the Variable View pops up.

spss excel 05

Switch to Data View to have a look

spss excel 07

Lets compare them with the data we originally imported.

spss excel 01

Here are some observations of the import.

  1. All Type (numeric / date / string) are correctly converted even though there are some blank fields on the top (which means SPSS does look through all data in the field to make the guess, instead of just looking at the few cells on top)
  2. All Measure (scale / nominal)  are correctly converted, except ordinal data where manual correction is required
  3. Invalid Excel column names such as spacing are removed and convert in Name column
  4. Date / number format in Excel are not converted into SPSS, manual correction in Variable View is required

In conclusion, after importing Excel into SPSS, while the overall data quality is good, you still need to check carefully whether the fields are converted correctly.

Outbound References

http://libguides.library.kent.edu/SPSS/ImportData

SPSS define variables in Variable View

This SPSS tutorial explains how to define variables in Variable View.

You may also want to read:

Types of data in statistics (Interval, Nominal, Ordinal)

SPSS – Variable View

In SPSS, understanding the Variable View is the first step before using any analytic functions. The purpose of Variable View is to define variables.

In Excel, you don’t have to define whether a column is number, date, or text, because Excel guesses the data type for you but of course it could be wrong in some cases. However, you still need to do the Cell formatting such as how many decimal places to display or add the dollar sign. In SPSS, these can be done in Variable View (it is very similar to Design View of Microsoft Access).

spss variable view

Define Variables in Variable View

AttributeExplanation
NameA unique field name without space, the first character must be a letter
Typespss variable view 2

Numeric: Number in standard format or in scientific notation
Comma: Format a number with thousands separator using comma
Dot: Format a number with thousands separator using dot
Scientific notation: Format a number with E or D
Date: Format a number in Date or times tamp format
Dollar: Format a number with $
Custom currency: Format a number in custom currency defined in currency tab
String: Text, can be number, letter, or a mix of both
Restricted Numeric (integer with leading zeros): Pad a positive integer with leading zero

WidthLength of integer / string
DecimalsNumber of decimal places
LabelDescription of the field to display in output
ValuesDefine a number to represent a label, the Data View uses number as input but using label in output report

spss variable view 3

MissingDefine what values are deemed as missing to exclude in analysis

spss variable view 4

ColumnsThe column width in Data View
AlignLeft, Right or Center
MeasureScale: number
Nominal: text
Ordinal: text with order
RoleInput: Independent Variable
Target: Dependent Variable
Both: Dependent and Independent Variable
None: No role
Partition: The variable will partition the data into separate samples
Split: Only applicable for SPSS Modeler

Outbound References

http://libguides.library.kent.edu/SPSS/DefineVariables