The specific area where I’d like to question comes from your assertion that “[Qualitative research] is NOT intended for testing hypotheses in experimental design or determining causality”. I have no issue with the second part of this statement, but the first part deserves a closer look.

I’ll keep the focus squarely on educational (EFL in particular) research and limit my examples to that. The research method was to select two articles from Language Testing as this is one of the most respected EFL journals. The choice to limit to the first two articles was taken to keep this response short but to have some element of randomness. I took the main hypothesis from each and tried to see if it could be answered by the other methodology.

Table 1. Two Quantitative Research Articles and their primary hypothesis

Access Article Hypothesis
Powers, Donald E.; Powers, Andrew. Language Testing. Apr2015, Vol. 32 Issue 2, p151-167. 17p. DOI: 10.1177/0265532214551855. The incremental contribution of TOEIC® Listening, Reading, Speaking, and Writing tests to predicting performance on real-life English language tasks. The objective of the study was to determine if real-life performance in a domain, such as speaking, can be better predicted by considering not only the TOEIC test that corresponds to that domain
Kim, Ah-Young (Alicia). Language Testing. Apr2015, Vol. 32 Issue 2, p227-258. 32p. DOI: 10.1177/0265532214558457. Exploring ways to provide diagnostic feedback with an ESL placement test: Cognitive diagnostic assessment of L2 reading ability. What are the major L2 reading attributes involved in successfully completing a reading test?


Powers and Powers’ attempt to correlate TOEIC test scores with real-life language ability. Immediately two aspects of face validity present themselves. The first is that the primary author is an employee of ETS (Educational Testing Service) and has a conflict of interests. They state that the article received no funding, leading us to expect that it was written in Powers’ private time. The second is that the instrument used to generate the data was a self-assessment survey. Japanese respondents are well-known to rate their abilities low as self-depreciation is valued in Japan (Lapinski et al., 2007; Kim et al., 2007). Nevertheless, the collection of data from 2300 respondents would be practically impossible if qualitative methods of interview, discourse analysis and so on were used. The same hypothesis may be better served by a qualitative methodology using fewer respondents. The face validity issues are too strong for me to place much belief in the results, and a smaller sample (properly selected) whose English ability could be rated by raters using either qualitative methods or better quantitative methods would go a long way towards providing believable results.

Kim’s study involved 1,982 ESL placement test papers. Her aim was to identify attributes of successful reading. She trained five proficient users of English in how to code the test papers. She doesn’t say what the coding was or how it generated results. We have to assume from the highly numerical nature of her equations that each individual coder looked at a section of test papers and assigned a numeric value to it. Again, we have to presume that this process was speedier than simply reading the texts and assigning holistic values in certain categories of the 1,982 participants. (I’m actually not sure that this wasn’t done.) However, it cannot have been much faster as there were training sessions and a lot of new learning involved on the part of the coders. The mathematics in the study are impressive, and it is clear that Kim is expert in quantitative methodologies.

The question right now is if this same information could have derived using qualitative methods. This begs the question as to what constitutes a qualitative method on a written text. Kim operationalised the construct of L2 reading mastery into ten variables. Mastery was judged by the degree to which test items were answered correctly. This is a very common method, but it must be remembered that a construct is not the operationalisment of that construct. L2 reading mastery may just as well be operationalised as the degree to which L2 users can respond in writing to given circumstances. (This goes somewhat but not completely towards qualitative methods.) Or have respondents use think-aloud protocols to describe what they are thinking during reading a text. With this, we are in the qualitative realm. Although Kim’s choice of method may have been selected due to temporal constraints, there is nothing in the hypotheses that explicitly negates using different methods.


Kim, S. Y., Lapinski, M., Rimal, R., Glazer, E., Nebashi-Nakahara, R., Sherman, S., & Detenber, B. (2007). Is it humility or self-depreciation? Self-Deprecation and Culture. Conference Papers, International Communication Association. 2007 Annual Meeting.

Lapinski, M., Kim, S. Y., Rimal, R., Glazer, E., Nebashi-Nakahara, R., Sherman, S., & Detenber, B. (2007). “I’m not an expert, but … “: The impact of self-deprecation and source expertise in 3 countries. Self-Deprecation and Culture. Conference Papers, International Communication Association. 2007 Annual Meeting.

