A 3.5 inch floppy disk, either high density or double density, is required. They are available in the SFU Bookstore or the Quad Books. No computer account is necessary.
Table of Contents
Part I: Basics
1. Starting Minitab 2. Minitab windows 3. Entering, Editing and Saving Data 3.1 Entering Data (1) Entering data from the Data window: (2] Entering data from the Session window (3) Opening a Minitab Worksheet 3.2 Editing and Manipulating Data 3.3 Saving Worksheet 4. Saving the Session Window (Minitab Output) 4.1 Recording Session method 4.2 Save Window As... method 5. Printing Output and Quitting Minitab 5.1 Printing Session Window 5.2 Quitting Minitab 6. Editing and Printing Output 7. Ejecting your DiskPart II: More Minitab Commands and Examples
1. Arithmetic Commands 11 2. Plotting Data 12 3. Basic Statistics 17 4. Regression 20 5. Analysis of Variance 24 6. Tables 26 7. Random Data and Distribution 27 References 30
If your floppy disk is new, it may need to be formatted. Insert the disk and follow the instructions on the screen.
To start Minitab, move the mouse pointer onto Aliases and click the mouse button twice, quickly. (This is called a double click.) Then double click on Minitab Accelerated. Minitab is started when both the Session and Data windows are open. The session window will be hidden behind the data window.
The Session window, the Data window and the Help window are used most frequently and are discussed later. The other three windows are Info, History and Graph. Info summarizes the data in the current worksheet, History contains a record of previously executed commands, and a Graph window is usually created when you create a new graph.
The Session window scrolls as more output goes into it. You can scroll up and down to see various parts of your output. However, the number of screens you can scroll back through is limited. It is usually between 5 and 15 screens. If your output is long and you want to save more of it, you should save your output continuously in an outfile. See section 4.1 Recording Session Method.
Exercise 1 Open the five windows except Graph.
father's : 64 65 64 64 63 62 62 63 65 68 68 64 65 65 66 66 65 63 63 63 son's : 65 66 66 65 64 63 63 65 66 68 68 65 67 67 66 64 65 62 65 64
Select the Data window. Click on the small arrow in the upper left hand corner of the worksheet until the arrow is pointing down. Click the cell just below the column number C1; type the word father and press return to name C1 as father; type the first number 64 and press return. Continue until you have typed the last piece of the father's height data and pressed return. Similarly, we can enter the son's height into column C2 and name C2 as son.
------------------------------------------------------------------------------ MTB >set c3 DATA>1 2 3 4 30 30 30 7 7 DATA>end ------------------------------------------------------------------------------ MTB > read c4 c5 DATA>1 2 DATA>2 4 DATA>3 6 DATA>4 8 DATA>end -------------------------------------------------------------------------------Entering patterned data from the Session window is rather simple. For example,
MTB > set c6 DATA> 1:4 3(30) 2(7) DATA> endputs 1 2 3 4 30 30 30 7 7 into column c6.
You can also use Set Patterned Data under Edit.
To enter a new value into an active cell   |
type the value and press return. It overwrites the previous contents of the cell   |
To correct the active cell   |
type the correct data and press return.   |
To delete the active cell   |
choose Editor > Delete Cell.   |
To delete the active row   |
choose Editor > Delete Row.   |
To insert one cell above the active cell   |
choose Editor > Insert Cell.   |
To insert one row above the active cell   |
choose Editor > Insert Row.   |
To repeat the last insertion or deletion   |
choose Editor > Repeat.   |
To restore the previous value of the cell   |
choose Edit > Undo Change Within Cell.   |
To erase a column (variable)   |
choose Calc > Erase variables, type the column number and press return.   |
One main advantage of this method is that the current session can be appended to previous sessions, if you store it to the same outfile each time. This allows you to accumulate your computer outputs.
describle c1 histogram c1 plot 'son' 'father' boxplot c2.Then stop recording as described above in 4.1. The following (except for the notes following the <-- symbols) will appear in your Session window.
---------------------------------------------------------------------------
MTB > OutFile 'father-son'. <-- choose File > Other Files > Start Recording
Session
MTB > Retrieve 'father-son.MTW' <-- choose File > Open Worksheet...
WORKSHEET SAVED 5/19/1995
Worksheet retrieved from file: father-son.MTW
MTB > describe c1
N MEAN MEDIAN TRMEAN STDEV SEMEAN
father 20 64.400 64.000 64.333 1.698 0.380
MIN MAX Q1 Q3
father 62.000 68.000 63.000 65.000
MTB > histogram c1
Histogram of father N = 20
Midpoint Count
62 2 **
63 5 *****
64 4 ****
65 5 *****
66 2 **
67 0
68 2 **
MTB > plot 'son' 'father'
68.0+ 2
-
son -
- 2
-
66.0+ * 2 *
-
-
- 2 3 *
-
64.0+ 2 *
-
-
- 2
-
62.0+ *
-
------+---------+---------+---------+---------+---------+father
62.4 63.6 64.8 66.0 67.2 68.4
MTB > boxplot c2
------------------
----------------I + I-----------------
------------------
------+---------+---------+---------+---------+---------+son
62.4 63.6 64.8 66.0 67.2 68.4
MTB > Nooutfile <-- choose File > Other Files > Stop Recording
To print on the Stats Lab DeskWriter, choose Chooser from the apple menu, click on the DeskWriter icon, then math.sfu.ca in the Zone window and Stats Lab DeskWriter in the printer window. These are usually highlighted automatically. Close the chooser by clicking on the little square on the upper-right corner of the chooser window. Then follow the instructions in Section 5.1 for sending a job to the printer.
To print on a university LaserWriter, follow the steps below:
Note: To print on a university printer, you will need an SFU Printing Card, available from the green and black vending machines in the MCF and the Library. The card costs one dollar and must be re-inserted and additional dollar coins inserted to add printing credits to it. Be sure to remove your card from the card reader when you are finished printing. The Data window cannot be printed. However, the data may be printed by displaying it in the Sessionwindow first then printing the Session window. To display data in the Session window, e.g., columns C1 to C10 and constant K1, type the command print C1-C10 K1. Then you can select the displayed data and print the selection as described below.
See a TA in the lab if you have any questions regarding the use of Minitab in the Stats Lab.
Practice Problem: Open the worksheet employ.MTW from Desktop > SLS 0c Applications > Statistics > Minitab8.2 > Data. Use the Minitab commands introduced in Section 4 to find the mean of Trade, Food and Metals. Are the histograms for Trade, Food and Metals symmetric? Does the scatter plot of Metals vs. Trade show a positive association between the two variables?
All Minitab commands can be typed in the Session window, and most of them can also be selected from the pop-up menu. Minitab commands in this part are presented here will be typed into the Session window. Commands except those in the examples are underlined. In a command line only the bold-faced characters are necessary; characters within [] are optional. Commands are not sensitive to the case of letters. Also, for a command, the first four characters are sufficient. That is, Correlation C1 C2 and corr c1 c2 are the same.
let E= expression computes an algebraic expression. For example, MTB> LET C3 = 2*(C1+C2) <--- C3 is the double of the sum of C1 and C2 MTB > let k1=mean(C1) <--- k1 is the mean of c11 MTB > let k2=C1(2) <--- k2 is the second element of column C1 count, sum, mean, median, stdev, minimum, maximum, sqrt and ssq each computes a statistic for a column. For example, MTB > mean c1 <--- computes the mean value of column c1 MTB > count c1 <--- counts how many pieces of data in c1 MTB > stdev c1 <--- computes the sample standard deviation of column c1 MTB > ssq c1 <--- computes the sum of squared values in c1 Accordingly, rcount, rsum, rmean, rmedian, rstdev, rminimum, rmaximum, rsqrt and rssq each computes a statistic rowwise. For example, MTB > rmean c1-c3 c11 MTB > rsum c1-c3 c12 MTB > rssq c1-c3 c13 c1 c2 c3 c11 c12 c13 2 3 4 ---> 3 9 29 3 4 5 4 12 50 indicator variables for values in c, put into c...c creates indicator or dummy variables. For example, MTB > indicator 'sex' c21 c22 'sex' c21 c22 1 1 0 1 ---->1 0 2 0 1 2 0 1 sort c...c put into c...c to sort by the first column and carry along additional columns. Sorting by multiple columns can be done with the by c...c subcommand. Sorting is done in ascending order unless you use the subcommand descending. To see how command sort works, open worksheet in Minitab8.2 > Data, type command sort c1-c3 c11-c13 to see what happens. rank the values in c put ranks into c ranks the smallest number as 1, the second smallest number as 2 and so on, ties are assigned the average rank. For example, MTB > rank c1 c2 c1 c2 1 2.5 1.5 ---> 4 0 1 2 5 1 2.5
histogram c...c prints a separate histogram for each column.
increment = k specifies interval width to be k
start = k specifies the first midpoint to be value k
by c produce a separate histogram for each value in c.
same the same scale will be used for all columns listed on
histogram
For example,
MTB > histogram 'father';
SUBC> start = 62;
SUBC> increment = 2.
Histogram of father N = 20
Midpoint Count
62.00 2 **
64.00 8 ********
66.00 7 *******
68.00 3 ***
dotplot c...c produces a separate dotplot for each column
increment=k specifies the distance between tick marks on the
axis.
start = k specifies the first tick mark
by c produce a separate histogram for each value in c.
same the same scale will be used for all columns listed on
dotplot
For example,
MTB > dotplot 'father'
. .
: : :
: : : : : :
-----+---------+---------+---------+---------+---------+father
62.4 63.6 64.8 66.0 67.2 68.4
stem-and-leaf c...c produces a separate stem-and-leaf display for each
column. Open Help window to understand a stem-and-leaf display. It is too
lengthy to explain here.
For example,
MTB > stem-and-leaf 'father'
Stem-and-leaf of father N = 20
Leaf Unit = 0.10
2 62 00
7 63 00000
(4) 64 0000
9 65 00000
4 66 00
2 67
2 68 00
boxplot c produce a boxplot.
by c one boxplot is produced for each level given in c. For
example,
MTB > Retrieve 'pulse.MTW'
WORKSHEET SAVED 4/ 1/1991
Worksheet retrieved from file: pulse.MTW
MTB > boxplot 'pulse1';
SUBC> by 'sex'.
SEX
-----------
1 * ----------I + I-------------- * *
-----------
-------------------
2 ---------I + I---------------
-------------------
----+---------+---------+---------+---------+---------+--PULSE1
50 60 70 80 90 100
plot c versus c prints a scatter diagram with the first column on the
vertical (y) axis and the second column on the horizontal (x) axis. Each
point is plotted with the symbol * ordinarily. When two or more points fall
on the same spot, a count is printed. When the count is over nine, the
symbol + is used. Plot has the following subcommands to control labeling
and to specify symbol and scales:
title ='text'
footnote ='text'
xlabel ='text'
ylabel ='text'
symbol ='symbol'
xincrement =k
xstart =k
yincrement =k
ystart =k For example,
MTB > plot c2 c1;
SUBC> title='son vs. father';
SUBC> symbol = 'h'.
son vs. father
68.0+ 2
-
son -
- 2
-
66.0+ h 2 h
-
-
- 2 3 h
-
64.0+ h h h
-
-
- 2
-
62.0+ h
-
------+---------+---------+---------+---------+---------+father
62.4 63.6 64.8 66.0 67.2 68.4
mplot c vs. c,...,c vs. c plots several pairs of columns on the same axes.
The m is for multiple. The first pair of columns is plotted with the symbol
A, the second pair with B, and so on. If several points fall on the same
spot, a count is given. Up to nine pairs of columns in one mplot may be
plotted. Mplot share the same subcommands, except 'symbol', with plot.
Example,
MTB > Retrieve 'fa.MTW'
WORKSHEET SAVED 4/ 1/1991
Worksheet retrieved from file: fa.MTW
MTB > mplot 'y1' 'x' 'y2' 'x' 'y3' 'x'
-
- C
-
-
10.5+ A
- A
- 2 B B B C
- B A C B B
- 2 C A
7.0+ A B 2 C
- 2 C
- C 2
- B A
- A
3.5+
- B
-
----+---------+---------+---------+---------+---------+--
4.0 6.0 8.0 10.0 12.0 14.0
A = Y1 vs. X B = Y2 vs. X C = Y3 vs. X
lplot c vs. c using letters as coded in c plots data using letters for
plotting symbols. The l is for letter. As in plot, the first column is the
vertical (y) axis and the second column is the horizontal (x) axis. Each
point is plotted with a letter which is determined by the number in the
last column, using the following correspondence:
... -2 -1 0 1 2 3. .. 2 3 24 25 26 27 28 29...
... X Y Z A B C ...V W X Y Z A B C...
If several points fall on the same spot, a count is printed. Lplot has the
same eight subcommands as that for mplot, to control labeling and to
specify scales. For example,
MTB > Retrieve 'pulse.MTW'
WORKSHEET SAVED 4/ 1/1991
Worksheet retrieved from file: pulse.MTW
MTB > lplot 'height' 'weight' 'sex'
-
75.0+ A A A
- A A 2 A 2
height - 3 A A A A
- AA A A A A A A A
- AA A A 2
70.0+ B A 2AAA
- A 2 3 A A A A
- B B B BB B A A 2
- B AB A A 2
- 2 B 3 2 A
65.0+ BB B B
- B B
- B BBBB
- 2B B B
- B
60.0+
------+---------+---------+---------+---------+---------+weight
100 125 150 175 200 225
where B represents female and A represents male.
tsplot [period =k] c does a time series plot with a column of data which
often are observations made at equally spaced intervals in time (y axis)
versus the integers 1,2,3...which indicates the times when the observations
obtained(x axis).
Tsplot plots data using special symbols to indicate a cycle. All you need
to do is to specify the length of the cycle called period k in the tsplot
command. For example, if the data are collected monthly, then the period k
= 12. By identifying k=12, tsplot will plot each observation from January
with a "1", from February with a "2",...., September with a "9", October
with a "0", November with an "A" and December with a "B".
If the period k is not specified, period 10 is assumed, and plotting
symbols 1, 2,..., 9, 0, 1,... are used.
Subcommands for tsplot:
increment=k, start=k[end=k] to specify the scale for the y axis.
origin=k to specify the time value associated with the first observation.
For example, if origin = 1940 is used, then the first observation on the
plot will be labeled 1940 on the time (x) axis, the second 1941, and so on.
tstart=k [end = k] allows you to plot a subset of your time series. For
example, if tstart = 5 is used, then the first observation plotted is the
5th observation and the first 4 observations are omitted from the plot.
mtsplot [period k] c...c plots several time series all on the same axes.
High-Resolution Graphics
Here are most of the High-Resolution Graphics commands implemented in
Minitab:
Ghistogram C...C
increment = k
start = k
by c
same scales for all columns
Gboxplot C
Gplot C C
Gmplot C vs. C,..., C vs. C
Glplot C C C
The last three commands share the same subcommands:
title = 'text'
footnote = 'text'
xlabel = 'text'
ylabel = 'text'
xincrement = k
xstart = k
yincrement = k
ystart = k
lines [style = k ] connecting pairs in c c
symbol = 'symbol' (only works with gplot)
As you might have noticed that the subcommand lines is what the ordinary
plotting commands do not have. For example, the following commands produce
Picture 6 in Part I.
MTB > regress c2 1 c1 c3 c4
MTB > name c4 'fit-son'
MTB > gmplot c2 c1 c4 c1;
SUBC> lines c4 c1.
Note: High-resolution graphics are each displayed in a separate graph
window. You can print the active high-resolution graphic window by
selecting Print window... from the File menu. They cannot be saved with the
session window output, however, they can be saved as picture files which
can be inserted to other software files such a MSWord file.
Due to page limit of this booklet, computer outputs will not be explained
in much detail here. Please use Help... or ask a TA to understand them
better.
describe zinterval ztest tinterval ttest
twosample twot correlation covaiance centre
describe c..c prints ten descriptive statistics for each column
by c separate statistics are produced for each value in c. The
values in c
must be between -9999 to +9999. For example,
Worksheet retrieved from file: pulse.MTW
MTB > describe 'pulse1';
SUBC> by 'sex'.
SEX N MEAN MEDIAN TRMEAN STDEV SEMEAN
PULSE1 1 57 70.42 70.00 70.27 9.95 1.32
2 35 76.86 78.00 76.65 11.62 1.96
SEX MIN MAX Q1 Q3
PULSE1 1 48.00 92.00 63.00 75.00
2 58.00 100.00 66.00 86.00
zinterval [k% confidence], sigma = k for c...c calculates a k%
confidence interval for the mean, separately for each column. If k
is not specified, then a 95% confidence interval for the population mean
will be calculated. For example,
MTB > zinterval 95 10 'pulse1'
THE ASSUMED SIGMA =10.0
N MEAN STDEV SE MEAN 95.0 PERCENT C.I.
PULSE1 92 72.87 11.01 1.04 ( 70.82, 74.92)
ztest [mu=k] sigma=k for c...c performs a separate z-test on each column.
alternative = k k=-1 gives H1: mu mu0.
If mu is not specified, then H0: mu = 0 is used. If subcommand alter =k is
not used, a two-sided (H1: mu - mu0) ztest will be done. For example,
MTB > ztest 75 10 'pulse1';
SUBC> alternative = -1.
TEST OF MU = 75.00 VS. MU L.T. 75.00
THE ASSUMED SIGMA = 10.0
N MEAN STDEV SE MEAN Z P VALUE
PULSE1 92 72.87 11.01 1.04 -2.04 0.021
The output P-value =0.021 tells us that we can reject H0: mu=75 at 5% level
and conclude that the population mean is significantly less than 75.
tinterval [k% confidence] for c...c calculates a separate k% confidence
for each column. If k is not specified, then a 95%
confidence interval is calculated.
ttest [mu=k] for c...c performs a separate t-test on each column.
alternative k k=-1 gives H1: mu mu0.
If mu is not specified, then H0: mu = 0 is used. If subcommand alter =k is
not used, a two-sided (H1: mu - mu0) ttest will be done.
MTB > ttest 75 'pulse1';
SUBC> alter=-1.
TEST OF MU = 75.00 VS. MU L.T. 75.00
N MEAN STDEV SE MEAN T P VALUE
PULSE1 92 72.87 11.01 1.15 -1.86 0.033
The P-Value of 0.033 suggests that there is significant evidence against
H0: mean of pulse1 is 75, the test result is in favor of the alternative
that the mean of pulse1 is less than 75.
Note: zinterval and ztest need population standard deviation sigma, but
tinterval and ttest don't. The degrees of freedom of the t-statistics is
n-1, n is the sample size.
twosample [k% confidence] for c1 c2 Does a two sample t-test H0: mu1 =
mu2, and calculates a confidence interval for (mu1-mu2). If k is not
specified, a 95% confidence interval is calculated.
alternative= k k=-1 gives H1: mu1 mu2.
pooled the common variance is estimated by the pooled variance under
the assumption that the two populations have the same variance.
If subcommand alternative= k is not used, a two-sided (H1: mu1 - mu2)
twosample t-test will be done. For example,
MTB > twosample c1 c2
TWOSAMPLE T FOR father VS. son
N MEAN STDEV SE MEAN
father 20 64.60 1.76 0.39
son 20 65.20 1.61 0.36
95 PCT CI FOR MU father - MU son: (-1.68, 0.48)
TTEST MU father = MU son (VS. NE): T= -1.13 P=0.27 DF= 37
The degrees of freedom of the t-statistic used in a non-pooled test and
confidence interval is given by :
df=
f((VAR1+VAR2)2, [VAR12/(n1-1)]+[VAR22/(n2-1)])
where VAR1=s12/n1, VAR2 =s22/n2. Minitab truncates the number to an
integer, when it is necessary.
When Pooled is used, the t-statistic has a degrees of freedom of n1+n2-2.
twot [k% confidence] data in c, groups in c
alternative= k
pooled
does exactly the same test and confidence interval as twosample. The only
difference is the form of the data. Twot expects the data for both groups
in the first column, and the group codes that specifies which group each
observation belongs to in the second column.
Group codes must be integers between -10,000 to 10,000 or the missing data
code *.
It is convenient to use twot when data for two groups are mixed together.
correlation c...c [put in m] calculates linear correlation coefficient for
all pairs of columns, and stores them into a matrix m, optionally.
MTB > Retrieve 'pulse.MTW'
MTB > correlation 'PULSE1' 'PULSE2' 'HEIGHT' 'WEIGHT'
PULSE1 PULSE2 HEIGHT
PULSE2 0.616
HEIGHT -0.212 -0.143
WEIGHT -0.202 -0.169 0.785
says that the linear correlation coefficient between WEIGHT and HEIGHT is
0.785, etc.
covariance c...c [put in m] calculates covariance for all pairs of
columns, and store them into a matrix m, optionally. For example,
centre c..c out into c...c standardizing each column into z-scores
by subtracting its mean and dividing by its standard deviation,
when no subcommands.
location [subtracting k...k] when no k's are given, each
column is transformed by subtracting its mean.
scale [dividing by k..k] when no k's are given, each
column is transformed by dividing by its standard
deviation.
minmax [ min=k, max=k] when no k's are given, all columns are
transformed to have minimum -1 and maximum +1.
where c2=c1**2, c3=c1**3,..., ck=c1**k
noconstant fit equation without constant term coefficients put into c stores the coefficients b0, b1,...,bk into column c residual put into c stores the residuals in column c predict for E...E computes fitted Y's for given values of X mse put into k stores mean square error into k. hi put into c stores leverages into c cooked put into c store Cook's distance into c. vif prints variance inflation factor associated with each predictor. pure prints the results of the usual pure error test for lack of fit More subcommands can be found from apple menu >Help....> commands by Name > Regression. Here is an example of simple regression and a short interpretation of its output. Worksheet size: 123723 cells MTB > Retrieve 'pulse.MTW' WORKSHEET SAVED 4/ 1/1991 Worksheet retrieved from file: pulse.MTW MTB > regress 'pulse1' 1 'height'; SUBC> residual c11. -------------------------output part 1---------------------------------- The regression equation is PULSE1 = 117 - 0.637 HEIGHT <--- o(Y,^) = b0+b1X, where o(Y,^) is called fitted value. Predictor Coef Stdev t-ratio p Constant 116.65 21.33 5.47 0.000 HEIGHT -0.6372 0.3099 -2.06 0.043 s = 10.82 R-sq = 4.5% R-sq(adj) = 3.4% -------------------------output part 2--------------------------------- Analysis of Variance SOURCE DF SS MS F p Regression 1 494.7 494.7 4.23 0.043 Error 90 10533.8 117.0 Total 91 11028.4 -------------------------output part 3--------------------------------- Unusual Observations Obs. HEIGHT PULSE1 Fit Stdev.Fit Residual St.Resid 29 63.0 100.00 76.51 2.10 23.49 2.21R 31 68.0 96.00 73.33 1.15 22.67 2.11R 54 68.0 48.00 73.33 1.15 -25.33 -2.35R R denotes an obs. with a large st. resid. MTB > name c11 'resid' MTB > print c11 resid <--- resid = y -f(SS Regression, SS Total) R-sq(adj) = 3.4% is R2 adjusted for degrees of freedom, and calculated by R2 (adj) = 1- Data, execute the following commands, and then compare your output with the output in the example above. What are the same? What are different? This is a multiple regression example. MTB > regress 'pulse1' 2 'height' 'weight'; SUBC> residual c11; SUBC> mse k1; SUBC> coefficient c12. MTB > name c11 'resid' c12 'coefft' MTB > print k1 MTB > print c12 MTB > print c11 As you will have noticed that the Analysis of Variance table includes the sequential sums of squares. Here is a short note of it. SOURCE DF SEQ SS <--- sequential sums of squares HEIGHT 1 494.7 <---SS(b1|b0), an addition in SSR due to adding 'height' to the constant model WEIGHT 1 37.2 <--- SS(b2|b0, b1), an addition in SSR due to adding 'weight' to the model bo(y,^) = b0+b1'height' Best Subset Regression: breg c on predictors c...c does best subset regression using the maximum R2 criterion. Suppose you specify m predictors on command breg. Breg first looks at all one-predictor models and chooses the model with the largest R2. Four statistics (R2, adj R2, cp and s) on this and the next best model is printed. Then Breg looks at all two-predictor models and selects the model with the largest R2, and prints information on this and the next best model. This process stops when all m predictors are used. When comparing models with the same number of predictors, choosing the model with the highest R2 is equivalent to choosing the model with the smallest SSE. When comparing models with different number of predictors, choosing the model with the highest adjR2 is equivalent to choosing the model with the smallest MSE. In general, we look for model which has cp small and close to p, the number of parameters in the model. Here is an example of Breg: MTB > breg 'pulse1' 'height' 'weight' 'activity' Best Subsets Regression of PULSE1 A C H W T E E I I I V G G I Adj. H H T Vars R-sq R-sq C-p s T T Y 1 4.5 3.4 0.6 10.819 X <--- best one-predictor model 1 4.1 3.0 0.9 10.841 X <--- the next best one-predictor model 2 4.8 2.7 2.3 10.860 X X <---best two-predictor model 2 4.7 2.6 2.4 10.867 X X <--the next best two-predictor model 3 5.1 1.9 4.0 10.906 X X X <--- model with all three predictors Each line of the output represents a different model. Vars is the number of variables (predictors) in the model, R-sq is R2. A predictor in the model is indicated by an "X". Subcommands of breg: include c...c in all models the specified columns are included as predictors in all the models. best k models to print the 'bet' k models of each size, the default is 2 nvars k[k] nvars 2 4 tells computer to only print the best 2, 3, 4 predictor models noconstant the constant term is omitted from the model 5. Analysis of Variance
aovoneway on c...c does a one-way analysis of variance. Data for each group (level) are to be put into a separate column. oneway aov, data in c, levels in c [put resids in c[fits in c]] Tukey [family error rate k] Fisher [individual error rate k] Dunnett [family error rate k] control level is k MCB [family error rate k] best is k is similar to aovoneway, but all data are in the first column and corresponding levels are in the second column; and that multiple comparisons can be made with subcommands. Tukey and Fisher provide confidence intervals for all pairwise differences between level means. Dunnett provides a confidence interval for the difference between each treatment mean and a control mean. MCB provides a confidence interval for difference between each level mean and the best of the other level means. In MCB, there are two choices for best, when the smallest mean is considered the best, set k = -1; set k=1 when the largest mean is considered the best. An example of oneway and aovoneway: MTB > Retrieve 'poplar2.MTW' WORKSHEET SAVED 3/13/1991 Worksheet retrieved from file: poplar2.MTW < QuickStart < Minitab 8.2 Note: -99.00 was used to represent missing value in this data set. Change -99.00 in c4 into * which stands for missing value in Minitab, before preceding with the following commands. MTB > oneway c4 c3; SUBC> tukey 0.05. ANALYSIS OF VARIANCE ON Diameter SOURCE DF SS MS F p Treatmnt 3 54.76 18.25 6.67 0.000 ERROR 291 796.81 2.74 TOTAL 294 851.57 INDIVIDUAL 95 PCT CI'S FOR MEAN BASED ON POOLED STDEV LEVEL N MEAN STDEV --+---------+---------+---------+---- 1 74 4.652 1.682 (------*-----) 2 75 4.918 1.749 (-----*-----) 3 74 4.471 1.591 (------*-----) 4 72 5.613 1.588 (------*-----) --+---------+---------+---------+---- POOLED STDEV = 1.655 4.20 4.80 5.40 6.00 Tukey's pairwise comparisons Family error rate = 0.0500 Individual error rate = 0.0107 Critical value = 3.63 Intervals for (column level mean) - (row level mean) 1 2 3 2 -0.962 0.430 3 -0.517 -0.248 0.880 1.143 4 -1.664 -1.395 -1.845 -0.258 0.006 -0.439 To apply command aovoneway on to the same data set, use command unstack with subcommand subscripts to separate data for different level into different columns, before applying command aovoneway. MTB > unstack c4; SUBC > subscripts c3. MTB > aovoneway c11-c14 ANALYSIS OF VARIANCE SOURCE DF SS MS F p FACTOR 3 54.76 18.25 6.67 0.000 ERROR 291 796.81 2.74 TOTAL 294 851.57 INDIVIDUAL 95 PCT CI'S FOR MEAN BASED ON POOLED STDEV LEVEL N MEAN STDEV --+---------+---------+---------+---- C11 74 4.652 1.682 (------*-----) C12 75 4.918 1.749 (-----*-----) C13 74 4.471 1.591 (------*-----) C14 72 5.613 1.588 (------*-----) --+---------+---------+---------+---- POOLED STDEV = 1.655 4.20 4.80 5.40 6.00 twoway aov, data in c, levels in c c [put resids in c[fits in c]] additive model <--- to fit a model without the interaction term mean for factors c [and c] <--- prints marginal means and 95% C.I., separately for each factor specified. does a two-way analysis of variance for balanced data (equal number of obs. in each cell) An example of twoway MTB > Retrieve 'anom.MTW' MTB > twoway c1 c2 c3; SUBC> mean c2 c3. ANALYSIS OF VARIANCE C1 SOURCE DF SS MS C2 2 22.72 11.36 C3 2 198.22 99.11 INTERACTION 4 3.28 0.82 ERROR 27 71.00 2.63 TOTAL 35 295.22 Individual 95% CI C2 Mean ------+---------+---------+---------+----- 1 5.42 (--------*---------) 2 6.08 (---------*--------) 3 7.33 (--------*---------) ------+---------+---------+---------+----- 5.00 6.00 7.00 8.00 Individual 95% CI C3 Mean ---------+---------+---------+---------+-- 1 3.17 (----*----) 2 6.83 (----*----) 3 8.83 (----*----) ---------+---------+---------+---------+-- 4.00 6.00 8.00 10.006. Tables
tally the data in c...c prints a one-way table for each column. The default output contains frequency counts. The columns must contain integers from -9999 to 9999. percents output contains percents cumcounts output contains cumulative counts cumpercents output contains cumulative percents all output contains all four statistics For example, MTB > Retrieve 'pulse.MTW' MTB > tally 'sex' 'smokes' SEX COUNT SMOKES COUNT 1 57 1 28 2 35 2 64 N= 92 N= 92 MTB > tally 'sex'; SUBC> percent. SEX PERCENT 1 61.96 2 38.04 chisquare test on table stored in c...c Does a c2 test for association on a contingency table that has been stored in the columns c...c. In case of raw data, you need to form the contingency table first using the command TABLE with its subcommand chisquare. table the data classified by c...c : displays one-way, two-way, multiple-way tables. Some subcommands for table are as follows: counts includes a count of total number of observations in each cell rowpercents includes a row percent in each cell colpercents includes a row percent in each cell totpercents includes the total percent in each cell chisquare [k] does a c2 test of independence between the rows and columns of each two-way table printed. k=1, the default, only count will be put in each cell; k=2 says put the count and expected values, under the assumption of independence, into each cell. k=3 says put the count, expected values, and standardized residual into each cell. An example of table: MTB > Retrieve 'pulse.MTW' MTB > table 'sex' 'activity'; SUBC> chisquare 2. ROWS: SEX COLUMNS: ACTIVITY 0 1 2 3 ALL 1 1 5 35 16 57 0.62 5.58 37.79 13.01 57.00 2 0 4 26 5 35 0.38 3.42 23.21 7.99 35.00 ALL 1 9 61 21 92 1.00 9.00 61.00 21.00 92.00 CHI-SQUARE = 3.118 WITH D.F. = 3 CELL CONTENTS -- COUNT EXP FREQ MTB > table 'smokes' 'sex'; SUBC> rowpercents. ROWS: SMOKES COLUMNS: SEX 1 2 ALL 1 71.43 28.57 100.00 2 57.81 42.19 100.00 ALL 61.96 38.04 100.00 CELL CONTENTS -- % OF ROW7. Random Data and Distribution
random k observations into each of c...c bernoulli p = k binomial n = k p = k poisson mu=k integer a=k b= k discrete values in c, probabilities in c normal [mu=k [sigma=k]] uniform [a=k b=k] t df=k f df1=k df2=k chisquare df=k Generates a separate random sample of k observations into each column, from a distribution specified by a subcommand. If no subcommand is given, data are generated from standard normal distribution. Example: Generate 20 samples of size 50 from a normal population with mean = 2, standard deviation =1. Solution: We can use commands MTB > random 50 c1-c20; SUBC > normal 2 1. to make each column a random sample of size 50, or use MTB > random 20 c1-c50; SUBC >normal 2 1. to make each row a random sample of size 50. The latter one is useful when calculation of sample averages for all the random samples is required. Command RMEAN c1-c50 c51 puts the 20 sample means into column c51. sample k rows from c...c, put into c...c replace Takes a random sample of k rows from each listed column and put into another column. When subcommand replace is used, sampling is done with replacement; otherwise, without replacement sampling is done. For example, to randomly select 6 numbers without replacement from integers 1 to 49, we can do it as follows. MTB > set c1 DATA> 1:49 DATA> end MTB > sample 6 c1 c2 MTB > print c2 C2 43 25 3 31 11 17 Quick pick (Lotto 6/49), anyone? pdf for values in E[put into E] bernoulli p = k binomial n = k p = k poisson mu=k interger a=k b= k discrete values in c, probabilities in c normal [mu=k [sigma=k]] uniform [a=k b=k] t df=k F df1=k df2=k chisquare df=k For a discrete distribution, PDF calculates probabilities for the specified values in E. For a continuous distribution, PDF calculates the probability density function. For example, MTB > set c3 DATA> 1 4 5 DATA> end MTB > pdf c3; SUBC> binomial 5 0.5. K P( X = K) 1.00 0.1562 4.00 0.1562 5.00 0.0312 MTB > pdf c3; SUBC> normal 3 2. 1.0000 0.1210 3.0000 0.1995 5.0000 0.1210 CDF for values in E ...E[put into E...E] bernoulli p = k binomial n = k p = k poisson mu=k interger a=k b= k discrete values in c, probabilities in c normal [mu=k [sigma=k]] uniform [a=k b=k] t df=k f df1=k df2=k chisquare df=k CDF, cumulative distribution function, calculates the probability that a random variable X has a value less than or equal to x. That is CDF(x) = Pr (X2x). For example, MTB > cdf c3; SUBC> binomial 5 0.5. K P(X LESS OR = K) 1.00 0.1875 3.00 0.8125 5.00 1.0000 MTB > cdf c3; SUBC> normal 3 2. 1.0000 0.1587 3.0000 0.5000 5.0000 0.8413 INVCDF for values in E put into E] bernoulli p = k binomial n = k p = k poisson mu=k interger a=k b= k discrete values in c, probabilities in c normal [mu=k [sigma=k]] uniform [a=k b=k] t df=k f df1=k df2=k chisquare df=k INVCDF finds a value x corresponding to a given probability p with respect of the specified distribution. For example, MTB > invcdf 0.05; SUBC> t 15. 0.0500 -1.7531Command INVCDF is particularly useful in finding a critical value of a given critical level a in doing test of hypotheses or computing confidence intervals. Acknowledgment I would like to thank Myra Andrews for her most helpful assistance in editing this manual.