127 56 68KB
English Pages 11 Year 2011
Answers to questions
2.8 (1) The value recorded from only one sampling or experimental unit may not be very representative of the remainder of the population. 2.8 (2) The ‘hypothetico-deductive’ model is that science is done by proposing an hypothesis, which is an idea about a phenomenon or process that may or may not be true. The hypothesis is used to generate predictions that can be tested by doing a mensurative or a manipulative experiment. If the results of the experiment are consistent with the predictions, the hypothesis is retained. If they are not (for an experiment that appears to be a good test of the predictions), the hypothesis is rejected. By convention, an hypothesis is stated as two alternatives: the null hypothesis of no effect or no difference, and the alternate hypothesis which states an effect. For example, ‘Light affects the behaviour of millipedes’ is an alternate hypothesis, and the null is ‘Light does not affect the behaviour of millipedes.’ Importantly, an hypothesis can never be proven because there is always the possibility that new evidence may be found to disprove it. A ‘negative’ outcome, where the alternate hypothesis is rejected, is still progress in our understanding of the natural world and therefore just as important as a ‘positive’ outcome where the null hypothesis is rejected. 4.10 (1) The experiment was manipulative, in that the handle of the pump was removed, but unreplicated because there was only one pump. Nor was there a control treatment of a suspect pump/well that remained open. The experiment was also confounded in time – the number of cholera cases may have ceased even if the pump had not been disabled. Having said this, the result of the experiment was consistent with the hypothesis and led to further work that provided more evidence that cholera could be contracted by drinking contaminated water. 1
2
Answers to questions
4.10 (2) The biologist meant that even though the class mark had been similar from 2004–9, but increased by 12% after the introduction of the new textbook, the experiment is confounded in time and there was no control of another class that did not receive the new textbook. Therefore, the biologist could not confidently attribute the change to the new textbook, even though the result is consistent with an improvement. 4.10 (3) An example of confusing a correlation with causality is when two variables are related (that is, they vary together) but neither causes the other to change. For example, as depth in the ocean increases, light intensity decreases and pressure increases, but the decrease in light intensity does not cause the increased pressure or vice versa. 4.10 (4) ‘Apparent replication’ is when an experiment (either mensurative or manipulative) contains replicates, but the placement or collective treatment of the replicates reduces the true amount of replication. For example, if you had two different treatments replicated several times within only two incubators set at different temperatures, the level of replication is actually the incubator in each treatment (and therefore one). Another example could be two different fertiliser treatments applied to each of ten plots, but all ten plots of one treatment were clustered together in one part of a field and all ten of the other treatment were clustered together in another part of the field. 4.10 (5) The colleague meant that the experimental conditions were not like those occurring in the natural world. For example, snails collected from desert areas can often survive for several days at 60°C in the laboratory, but those from cooler habitats only survive for a few hours. But prolonged exposure to such a high temperature is never experienced in nature, so the experiment is unrealistic. 5.6 (1) Copying the mark for an assignment and using it to represent an examination mark is grossly dishonest. First, the two types of assessment are different. Second, the lecturer admitted the variation between the assignment and exam mark was ‘give or take 15%’, so the relationship between the two marks is not very precise and may severely disadvantage some students. Third,
Answers to questions
5.6 (2)
6.12 (1)
6.12 (2)
6.12 (3)
8.12 (1)
8.12 (2)
3
there is no reason why the relationship between the assignment and exam mark observed in past classes will necessarily apply in the future. Fourth, the students are being misled: their performance in the exam is being ignored. It is not necessarily true that a result with a small number of replicates will be the same if the number of replicates is increased, because a small number is often not representative of the population. Furthermore, to claim that a larger number was used is dishonest. Many scientists would be uneasy about a probability of 0.06 for the result of a statistical test because this non-significant outcome is very close to the generally accepted significance level of 0.05. It would be helpful to repeat the experiment. Type 1 error is the probability of rejecting the null hypothesis when it is true. Type 2 error is the probability of rejecting the alternate hypothesis when it is true. The 0.05 level is the commonly agreed upon probability used for significance testing. If the outcome of an experiment has a probability of less than 0.05, the null hypothesis is rejected. The 0.01 probability level is sometimes used when the risk of a Type 1 error (i.e. rejecting the null hypothesis when it is true) has very important consequences. For example, you might use the 0.01 level when assessing a new filter material for reducing the airborne concentration of hazardous particles. You would need to be reasonably confident that a new material was better than existing ones before recommending it as a replacement. For a population of snails with a mean length of 100 mm and a standard deviation of 10 mm, the occurrence of a 75 mm shell is somewhat unlikely (because it is more than 1.96 standard deviations away from the mean), but not impossible: 5% of individuals in the population would be expected to have shells longer than 119.6 mm or less than 80.4 mm. The variance calculated from a sample is corrected by dividing by n – 1 and not n in an attempt to give a more realistic indication of the variance of the population from which it has been taken, because a small sample is unlikely to include sampling units from the most extreme upper and lower tails of the population that
4
Answers to questions
8.12 (3)
9.12 (1)
9.12 (2)
10.8 (1)
10.8 (2)
11.9 (1)
will nevertheless make a large contribution to the population variance. The 95% confidence intervals for the two populations are (a) North Keppel Island: 1000 ± 1.96 × 400/4 = 804 to 1196g and (b) South Keppel Island: 650 ± 1.96 × 400/4 = 454 to 846g. Therefore, it is more likely that the sample of 16 rats with a mean weight of 875 g from ‘site 3’ was collected at North Keppel Island because 875 is within the 95% confidence interval for North Keppel but outside the 95% confidence interval for South Keppel. (a) These data could be analysed with a paired samples t test because the two samples are related (the same ten people are in each). (b) The test would be two-tailed because the alternate hypothesis is non-directional (it only specifies that the time taken to respond to the alarm may change). (c) The test gives a significant result (t9 = 3.17, P < 0.05). (d) The mean response time was 2.19 minutes on day 1 and 4.35 minutes on day 2, which showed the people took significantly longer to respond to the second false alarm. This exercise will initially give a t statistic of zero and a probability of 1.0, meaning that the likelihood of this amount of difference or greater between the sample mean and the expected value is 100%. As the sample mean becomes increasingly different to the expected mean, the value of t will increase and the probability of the difference will decrease and eventually be less than 5%. A non-significant result in a statistical test may not necessarily be correct because there is always a risk of either Type 1 error or Type 2 error. Small sample size will particularly increase the risk of Type 2 error – rejection of the alternate hypothesis when in reality it is correct. A significant result and therefore rejection of the null hypothesis following an experiment with only 10% power may still occur, even though the probability of Type 1 error is relatively low. You would expect an F ratio of about 1.00 if there was no significant difference among treatments analysed by a singlefactor ANOVA because both the ‘treatment’ and ‘error’ variances
Answers to questions
11.9 (2)
11.9 (3)
11.9 (4)
12.7 (1)
12.7 (2)
12.7 (3)
5
would only be estimating error. Therefore, the ‘treatment’ variance and ‘error’ variance are likely to be similar and dividing the former by the latter would be expected to give an F statistic of about 1.00 (although departures from this are possible due to chance). The three treatment means are similar. (a) A single-factor ANOVA shows no significant difference among the treatments: F2,9 = 2.4, NS (P > 0.05). (b) The within group (error) mean square is 1.6667. (c) When the data for one treatment group are changed to 21, 22, 23 and 24, the ANOVA shows a highly significant difference among treatments: F2,9 = 304.8, P < 0.001. (d) The within group (error) mean square is still 1.667 because there is still the same amount of variation within each treatment (the variance for the treatment group containing 21, 22, 23 and 24 is the same as the variance within the groups containing 1, 2, 3 and 4, and 2, 3, 4 and 5). (a) Model II – three lakes are selected as random representatives of the total of 21. (b) Model I – the three lakes are specifically being compared. This is true. An F ratio of 0.99 can never be significant because it is slightly less than 1.00, which is the value expected if there is no effect of treatment. For ANOVA, critical values of F are numbers greater than 1.00, with the actual significant value dependent on the number of degrees of freedom. (a) Yes, F2,21 = 7.894, P < 0.05. (b) Yes, a posteriori testing is needed. A Tukey test shows that Variety 2 yielded a significantly greater weight of fruit than the other two, which do not differ significantly from each other. An a priori comparison between Varieties 1 and 3 using a t test, showed no significant difference: t14 = −0.066, NS. This result is consistent with the Tukey test in 12.7 (1). (a) The analysis is Model I – the researcher is only interested in these specific repellents. (b) A single-factor ANOVA shows a highly significant difference among treatments: F5,24 = 82.3, P < 0.001. (c) An a posteriori Tukey test separates the treatments into three distinct groups: (a) The Clear Off and Slap treatments recorded the lowest number of bites and were significantly less
6
Answers to questions
than a second group (b) which contains the untreated control, Go-Way and Outdoors. Finally, Holiday has significantly more bites than all the treatments, including the control. The means of the treatments are: Control 25.8, Slap 16.2, Go-Way 27.2, Outdoors 28, Holiday 39.4 and Clear Off 15.6. Holiday seems to stimulate biting, so you are unlikely to recommend it. Both Go-Way and Outdoors cannot be distinguished from the control, so you are unlikely to recommend either. From this experiment, you could only recommend Clear Off and Slap. 13.10 (1) (a) For this contrived example where all cell means are the same: Factor A, F2,18 = 0.0, NS; Factor B, F1,18 = 0.0, NS; Interaction, F2,18 = 0.0, NS. (b) This is quite difficult and drawing a rough graph showing the cell means for each treatment combination is likely to help. One solution is to increase every value within the three B2 treatments by ten units, thereby making each cell with B2: 11, 12, 13, 14. This will give Factor A, F2,18 = 0.0, Factor B, F1,18 = 360.0, P < 0.001, Interaction, F2,18 = 0.0, NS. (c) Here too a graph of the cell means will help. One solution is to change the data to the following, which, when graphed (with Factor A on the X axis, the value for the means on the Y axis and the two levels of Factor B indicated as separate lines as in Figure 13.7) show why there is no interaction:
A1
Factor A Factor B
A2
A3
B1
B2
B1
B2
B1
B2
1 2 3 4
11 12 13 14
11 12 13 14
21 22 23 24
21 22 23 24
31 32 33 34
13.10 (2) Here you need a significant effect of Factor A and Factor B as well as a significant interaction. One easy solution is to grossly increase the values for one cell only (e.g. by making the data in cell A3/B2 (the one on the far right on the table above) 61, 62, 63 and 64). 14.9 (1) Transformations can reduce heteroscedasticity, make a skewed distribution more symmetrical and reduce the truncation of
Answers to questions
14.9 (2)
14.9 (3)
15.9 (1)
15.9 (2)
15.9 (3)
7
distributions at the lower and upper ends of fixed ranges such as percentages. (a) Yes, the variance is very different among treatments and the ratio of the largest to smallest is 9:1 (15.0:1.67), which is more than the maximum recommended ratio of 4:1. (b) The variance increases as the mean increases so a square root transformation is recommended. (c) The transformation has reduced the ratio to 2.7:1 (0.286:0.106). (d) A single-factor ANOVA on the transformed data shows a significant effect of treatment: F3,12 = 20.97, P < 0.001. (e) A posteriori testing is needed because the ANOVA is Model I. (f) A Tukey test on the transformed data separated the treatments into two groups: (a) Bugroff and Bitefree had significantly fewer bites than (b) the untreated control and Nobite. A Levene test shows a significant greater variance in the drug treatment compared to the control (F1,18 = 4.91, P = 0.039). Even though the treated group has significantly lower blood pressure than the control, the effectiveness of the drug appears to vary among individuals. (a) There is a significant effect of distance: F2,8 = 8.267, P < 0.05, and of depth: F4,8 = 3935.1, P < 0.001. (b) When analysed by single-factor ANOVA, ignoring depth, there is no significant difference among the three cores: F2,12 = 0.006, NS. The variation among depths within each core has obscured the difference among the three cores, so the researcher would mistakenly conclude there was no significant difference in the concentrations of PAHs and distance from the refinery. The fisheries biologist is using the wrong analysis because the design has locations nested within ponds, so a nested ANOVA is appropriate. (a) This design could be analysed by a repeated-measures ANOVA. (b) There is a significant difference in the percentage of plaque among toothpaste treatments (F3,15 = 6.378, P < 0.01). An a posteriori Tukey test separates the treatments into two groups (a) Plarkoff (12.5%), ScrubOff (13.33%) and Whiteup (17.67%) and (b) Whiteup (17.67%) and Abrade (21.67%). The result is ambiguous because Whiteup is in both groups. From
8
Answers to questions
16.8 (1) 16.8 (2)
17.13 (1)
17.13 (2)
17.13 (3)
this analysis, you could recommend Plarkoff and ScrubOff, but definitely not Abrade. (a) ‘. . ..can be predicted from. . . . . .’. (b) ‘. . .varies with. . . . .’ (a) The value of r is –0.045, NS. (b) You need to do this by having Y increasing as X increases. (c) You need to do this by having Y decreasing as X increases. (a) For this contrived case r2 = 0.000. The slope of the regression is not significant: the ANOVA for the slope gives F1,7 = 0.0. The intercept is significantly different to zero: t7 = 20.49, P < 0.001. (b) The data can be modified to give an intercept of 20 and a zero slope by increasing each value of Y by 10. (c) Data with a significant negative slope need to have the value of Y consistently decreasing as X increases. (a) r2 = 0.995. The relationship is significant: ANOVA F1,7 = 1300.35, P < 0.001. (b) The relationship is Y (Weight in grams recovered) = – 2.2 + 242 × X (Volume of foliage processed). (c) The intercept does not differ significantly from zero, which is not surprising because when no foliage is processed no opioid will be recovered. (a) A multiple linear regression for the combined effects of these three elements on fruit production gives the equation: Y (fruit) = 6.996 + 2.36 X1 (nitrogen) + 5.40 X2 (phosphorous) – 1.41 X3 (nickel), which is highly significant overall (F3,10 = 247.73, P < 0.001). (b) The value of r2 is 0.987. (c) The table of coefficients shows the intercept and each of the three independent variables are significant. Source of variation
Sum of squares
df
Regression Residual Total
2574.22 34.64 2608.86
3 10 13
Mean square 858.07 3.46
F 247.73
Probability < 0.001
Model
Value
Probability
Constant (a) Nitrogen (b1) Phosphorous (b2) Nickel (b3)
6.99 2.36 5.40 –1.41
< 0.01 < 0.001 < 0.001 < 0.001
Answers to questions
9
18.7 (1) ANCOVA only compares the data for two or more treatments or groups at a ‘standard value’ (which is usually the grand mean of the covariate), so if the regression lines for the relationship between the dependent variable and the covariate are grossly non-parallel, the difference (or lack thereof) at the standard value will not represent the difference across the range of the covariate. 18.7 (2) (a) A graph for the blood pressure of the control group and the drug treated group against age suggests this relationship differs between treatments. (b) A preliminary analysis (with blood pressure as the response variable, drug treatment as the factor and age as the covariate) confirms a significant lack of parallelism: interaction between treatment and age: F1,16 = 6.92, P < 0.05. Therefore, it is not appropriate to use ANCOVA to compare the blood pressure between the two treatments. (c) The data could be analysed as separate regression lines. Blood pressure increases with age in the untreated control group, but not in the group treated with Systolsyn B. The regression for blood pressure against age for the control group is: blood pressure = 93.3 + 0.516 age, r2 = 0.854, and the (positive) slope of the line is highly significant (F1,8 = 46.69, P < 0.001). In contrast, there is no significant relationship between blood pressure and age for the group given Systolsyn B: blood pressure = 107.96 + 0.103 age, r2 = 0.06, F1,8 = 0.512, NS). Therefore, it appears Systolsyn B may prevent an increase in blood pressure with age in humans. 20.10 (1) (a) The data could be compared to the expected frequencies of 1.4 left-handed and 12.6 right-handed by using an exact or randomisation test and (b) the result will be highly significant (P < 0.001). Because students were assigned to groups at random, it seems this high proportion of left-handers occurred by chance, so the significant result appears to be an example of Type 1 error. 20.10 (2) (a) The value of chi-square will be zero. (b) The value of chisquare will increase, and the probability will decrease. 20.10 (3) This is not appropriate because the numbers of fillings in the two mouths are independent of each other. The numbers are not mutually exclusive or contingent between mouths.
10
Answers to questions
20.10 (4) (a) No. (b) The experiment lacked controls for time, the ‘predator shape’ and disturbance. The change in background may have occurred in response to any shape. (c) It would be helpful to have controls for time and the disturbance of being exposed to any silhouette. 21.10 (1) (a) The rank sums are: Group 1: 85, Group 2: 86. (b) There is no significant difference between the two samples: Mann–Whitney U = 40.0, NS. (c) One possible change to the data that gives a significant result is to increase the value of every datum in Group 2 by 20. 21.10 (2) The ‘outdoor workers’ sample appears to be bimodal, but the ‘indoor’ sample does not and there is a gross difference in variance between the two samples. (b) One solution is to transform the data to a nominal scale by expressing them as the number of observations within the two mutually exclusive categories of ‘three or less keratoses’ and ‘four or more keratoses’. This will give a 2 × 2 table (indoor workers 23:0; outdoor workers 8:15) that can be analysed using chi-square (χ21 = 22.26, P < 0.001, Yates’ corrected χ21 = 19.39, P < 0.001) or a Fisher Exact Test (P < 0.001). A significantly greater proportion of outdoor workers have relatively high numbers of solar keratoses on their hands compared to indoor workers. 22.7 (1) If there are no correlations within a multivariate data set, then a principal components analysis will show that for the variables measured there appears to be little separation among objects. This finding that all objects appear to be relatively similar can be useful in the same way that a ‘negative’ result of hypothesis testing still improves our understanding of the natural world. 22.7 (2) Eigenvalues that explain more than 10% of variation are usually used in a graphical display, so components 1–3 would be used. 22.7 (3) ‘Stress’ in the context of a two-dimensional summary of the results from a multidimensional scaling analysis indicates how objects from a multidimensional space equal to the number of initial variables will actually fit into a two-dimensional plane and still be appropriate distances apart. As stress increases it means
Answers to questions
11
the two-dimensional plane has to be distorted more and more to accommodate the objects at their ‘true’ distances from each other. 22.7 (4) The ‘groups’ produced by cluster analysis are artificial divisions of continuous data into categories based upon percentage similarity and therefore may not correspond to true nominal categories or states.