Yesterday I posted a tale about how a working quantitative scientist might bring to this discussion. I thought later on that and decided to send him what I had written to see if I had represented him well or not. His reply was illuminating in many ways, so I’ll post that here. It’s a bit long, so apologies.
The TLDR version:
He does say that the choice of quantitative or qualitative is a matter of balance, which is line with the general tenor in this board. However, I was initially surprised at his comment that all science is about finding a qualitative difference. After some thought, this made sense as there must eventually be a difference is the ‘quality’ of a studied object even if we operationalise that difference numerically.
Ian’s own terminology, ‘guesstimate’, reminds us of the subjectivity involved in this type of research. How can we understand ‘guesstimate’: as a result of years of experience building up an intuition, the cumulation of various types of knowledge that will differ in kind and amount between individuals?
I was appropriately criticised for my ethnography example which was too weak to allow any kind of comparisons to be drawn. And the current trend in octopus studies is to use Bayesian statistics that work in different manner from the ones we’re likely to see in a beginner stats textbook.
Here’s the reply.
I don’t remember much about this discussion, but it certainly sounds like the sort of thing I would have said. However, I think it depends on what you want to do – what ideas you have about a research topic and what it is you’re interested in trying to show. My statements related to my experiences in experimental physiological science, where you can’t even fart without quantitative data.
You mention “quantitative versus qualitative” but in the end isn’t it a balance? – whatever quantity of data you have, your final interest will be in demonstrating a qualitative difference of some kind, albeit with some quantification as a measure of that. If you have to work with a context-rich environment and you want to write your findings up into a paper within a reasonable period, you clearly have to find a suitable method to test the bits (as in pieces of information) you’re interested in that relate to your working hypothesis.
As an example of what I mean, some of my recent research involves dealing with gene sequences. Certain genes vary very little among individuals within a certain species (say, less than 0.3%), but are predictably different when you compare species (around 3% between closely related species). This is the 10-times rule (the difference between individuals of closely related species is 10 times that of variation among individuals of the same species).
There are all sorts of factors that affect this in practice, though, so when comparing species to guesstimate their phylogenetic relationship (i.e., construct a family tree for the group of species), there are numerous factors that you have to take into account (technical factors such as the rate of evolution of the group & problems with the disappearance of sequence information among background noise the farther back in time you try to go).
To deal with all this, one of the favourite tools of the moment is software based on Bayesian statistics, because you can plug in a guesstimate, called a “prior,” and the output is a “posterior probability,” which is a statistical adjustment on your original prior (your original guess adjusted statistically against the data). So, rather than trying to test against a null hypothesis (which is often no good anyway because the “null hypothesis” often is not well chosen and may be irrelevant in really “proving” your hypothesis), you end up with estimates something along the lines of “probably right” or “more likely to be incorrect” rather than blunt (& unrealistic?) “right” or “wrong.” (Although we also say that the only certainty about the phylogenetic tree you produce is that it will be wrong).
My interest for one study was in trying to calibrate a phylogenetic tree to arrive at estimates of when each species separated from its nearest relative. The result was based on Bayesian guesstimates, but I was able to plug in priors based on my hypotheses and the results obtained seemed feasible (against the background of our present understanding). The point I want to make is that, in the midst of a large amount of (context-rich) information, I chose methods that could pull out the data points sufficiently to reach a conclusion for my area of interest. I could imagine similar methods being applied in language research.
The ethnography interview example doesn’t seem particularly relevant (in my understanding, and in the absence of specific examples, anyway). You conduct interviews and then make statements about cultures, customs, habits and differences among different ethnic groups. Making “truth” statements may be “soft science” but I don’t see how it’s relevant to the quantitative vs. qualitative debate (which, to me, implies that you need and define such data in order to test hypotheses and propose theories). Perhaps I’m sitting at one end of a continuum, though.