I fully concur with you that D&H’s design is insightful. It would have been better for them to have had the same degree of consideration over the presentation of their results. Frankly, I’m surprised that the Science editors let the graph pass the editorial process.
When I wrote that there was no pre-test, I meant that there was no test that tested the participants in Study 1 that acted as a baseline for comparison with their Study 1 results following a standard pretest-posttest design. Do you consider a screening procedure for a study appropriacy to be the same as a pre-test?
Study 1 results showed a very narrow difference at the very low end of the achievement scale. D&H’s insistence on statistical significance falls prey to a classic error: mistaking significance for importance. The p-value may be very low, but all that does is indicate the certainty that there’s a difference between the groups. The real difference between these groups, however, is minimal.