Quasi-revolution in psychology and reproducibility of extraordinary resultsArtem Akopyan
|
Class 1: Primary information focus, immaterial |
This construct describes the instructions, materials, and events that create a certain stimulus complex for the participant. Immaterial subclass includes information conveyed to participants |
Class 2: Primary information focus, material |
material realization that is necessary to convey this information |
Class 3: Participant characteristics |
e.g., gender |
Class 4: Specific research history of participants |
includes prior experiences and motivation for the participation in the experiment |
Class5: cultural and historical context in which the study is embedded |
e.g., point in time when the experiment is performed |
Class 6: control agent |
the experimenter who is interacting with the participants |
Class 7: specific task variables |
minute material circumstances such as typing font, color of paper, etc. |
Class 8: the modes of data reduction and presentation. |
how the assessment of experimental effect is transformed and reported |
If one considers experimental manipulation (treatment) with all of the potential contributing factors in mind, an experiment may be likened to a function that maps the Class 2-7 characteristics into corresponding scores which are later subjected to statistical analysis. Let [a1, b1, c1, d1,…α1] be the set of personality characteristics of participant 1; likewise, the set of personality characteristics of subsequent participants will be indexed according to the order of appearance of the participant in the discussion. The hypothetical experiment then involves the application of treatments T = [t1, t2, ti], where the set T consists of all characteristics of a primary information focus pertaining to that particular treatment. A literal replication of the initial study where T was employed, produces a mapping (ai, bi,.., λi) to T(ai, bi,…, λi], whereas conceptual replication performs a set of manipulations T* = [t*1, t*2, t*3…, t*i] on [ai, bi, ci, di, …λi], resulting in T*(ai, bi, ci,…, λi). Schmidt (2009) acknowledged that a replication can never be exact in the sense of performing the very same experiment twice because Class 2-7 specifications cannot be emulated with perfect accuracy; however, a literal replication embodies the best approximation to the initial experiment because the execution of experimental procedure is observed as closely as possible; thus, the recently introduced emphasis on literal replications is in some sense justified, but mostly because human beings are not always good at making inferences about environments (in this case, results of experiments) that appear to be different in more ways than they appear similar.
Which of the two types of replication is more difficult to achieve? The personality characteristics [ai, bi, ci…λi] react to the treatment, and the resulting scores are collected (albeit with less-than-perfect precision) by a psychologist; whether a statistically significant result is achieved depends exactly upon how the treatment parameters interact with personality characteristics. If the combined treatments are not sufficiently unequivocal in their effect on the output scores, the researcher will have obtained a so-called "failed replication": effect of treatment not exceeding that of a chance finding. Similarly, if the net effect of experimental treatment set T (or T*) is directional inasmuch as it yields an above-chance result, having multiple distinctions between replication and initial study would always lead to the detection of a statistically significant result. Because many psychologists (Open Science Collaboration, 2012) see literal replication as the ultimate test of whether a hypothesis/theory is sound, the interpretation of "failed" literal replications is problematic due to the fact that precise influence of treatment specifications on personality is not known. In such cases, psychologists duly rely on fundamental, "classic" concepts that exist in their area of interest (priming in cognitive psychology, out-group bias in social psychology, among others). If one replicates any one of the classic psychological experiments and obtains a significant result, that latter result is in a certain sense taken for granted because priming, for instance, is a priori expected to facilitate recall. Schmidt indicated that decisions of a psychologist about the treatment to be administered might (and does) lead to subsequent presentation of the study as either a replication or independent test of a similar yet different hypothesis. In practice, a fellow researcher is most inclined to emphasize procedural differences' potential confounding if he/she does not agree with the result of that study as such; on the other hand, if the results appear plausible, he/she cites the study as yet another confirmation of the underlying theory because "priming ought to always facilitate recall in principle." Thus the key to understanding the real effect of an experimental treatment set [t1, t2,…, ti] must be key to interpreting experimental findings. Advocates of the popularization of literal replications dislike the ambiguity arising from mixed results produced by conceptual replications, yet confirmatory power of a literal replication hinges upon both the specific personality and treatment set characteristics. In view of the importance of procedural distinctions in producing statistically significant results, speculations about the various impacts of treatment on output scores will be in line with this approach. The ultimate goal of such efforts is the identification of one-to-one correspondences between a given element tn of the (hypothetical) treatment set and a commensurate change in the set of output scores. If theories depend upon propositions about the interaction between experimental manipulations and notable results, the meta-analytic approach described above would avoid the conflicting interpretations of experimental findings that result from disagreement about some aspect of a theoretical construct in question, including which another construct should or should not covary with it.
The use of replication, if conducted properly, can help protect the scientific community from proliferation of mistaken (in some cases, fraudulent) claims. Such insulation is especially important in the study of phenomena not readily embraced by the majority of psychologists; parapsychological events are a perfect example of such ambivalence. For instance, a recent controversy was instigated by Darryl Bem's article (2011) that reported statistically significant results that supported the existence of precognition. Participants were able to predict the side of the screen (left/right) where an erotic stimulus would appear after the prediction has been registered; also, participants' performance on word recall appeared to be significantly greater for words that were rehearsed after the test.
The main difficulty in settling the dispute lay in the inferential procedures as, strictly speaking, no number of experiments would allow scientists to gain absolute confidence in the correctness of either hypothesis. Moreover, in spite of the overwhelming skepticism surrounding the issue of psi, Dean Radin, one of the world's leading specialists in parapsychology, suggested that the outward disagreement between precognition and traditional science is illusory and is merely the result of preconceived notions instilled in aspiring psychologists by jeer pressure from their more conservative and eminent colleagues (Science and the taboo of psi, 2008). If that is the case, the issue of precognition remains unresolved as scientists project their beliefs about psi onto a pre-selected subset of published studies that do not find the supposed effect of precognition, if any, significantly exceeding that of a chance finding. Because the studies of Bem (2011) were challenged by subsequent replications and critical assessments of other investigators, an open-minded scientist is confined to disagreeable conclusions based on articles published thus far, and ultimately – the aforementioned preference for one set of beliefs or another in spite of wealth of published experiments (LeBel & Peters, 2011).
The idea of a replication (conceptual or literal) is simple and potentially powerful as there is no such thing as a "replication attempt" or "failed replication," provided a researcher is diligent in conducting the replication. However, a replication that produces the data consistent with the primary source is not in itself an unequivocal validation of it; likewise, a replication which does not lead to the same conclusion cannot justify a dismissal of the "source" hypothesis. The popularization of literal replications is not a panacea from uncertainty in scientific discourse, for establishing the laws of nature requires the knowledge of the number of factors determining an outcome as well as insights about the nature of those factors are more likely to be confounded.
Still, psychologists must be aware of the benefit of literal replications as well as the distinction between confirmatory and exploratory research in order to plan studies appropriately and to correctly interpret those of their colleagues. Moreover, statistical science is currently being used for the formulation of fine-grained cognitive models; for instance, Dr. Etienne LeBel at the University of Western Ontario is developing a sophisticated technique for understanding participants' individual differences based on a modification of multinomial modeling (see Batchelder & Riefer, 1990). Web-sites including PloS and PsychFileDrawer, the Open Science Framework launched the Reproducibility Project (2012), and Bayesian statistics (centered around the Bayes Theorem) are being introduced into contemporary research practices, allowing for meta-analyses of data-to-hypothesis fit based on series of literal replications. With initiatives like these in place, the practical significance of psychological findings will rise considerably. Literal replications are a vital part of the ongoing quasi-revolution in psychology because by reaching consensus about how the literal replication is best carried out with the initial author, an inquisitive researcher can be sure that any differences in the materials, procedures, or participant demographics is conceptually negligible and the output scores obtained in a later study are comparable with the initial set and may be merged with it to result in a more compelling evidence in favour of the theory being scrutinized; conceptual equivalence among researchers is therefore central to the practical merit of literal replications.
Batchelder, William H., and David M. Riefer. 1990. Multinomial processing models of source monitoring. Psychological review 97, (4): 548-564
Bem, D.J. (2011). Feeling the future: Experimental evidence for anomalous retroactive influences on cognition and affect. Journal of personality and social psychology 100, (3): 407-425, https://www.lib.uwo.ca/cgi- bin/ezpauthn.cgi/docview/851236583?accountid=15115 (accessed December 17, 2012).
LeBel, E. & Peters, K. (2011). Fearing the future of empirical psychology: Bem's (2011) evidence of psi as a case study of deficiencies in modal research practice. Review of General Psychology 15, (4): 371-379
Open Science Collaboration. (2012). An Open, Large-Scale, Collaborative Effort to Estimate the Reproducibility of Psychological Science. Submitted for Perspectives on Psychological Science.
Science and the taboo of psi with Dean Radin. Retrieved December 17, 2012 from http://www.youtube.com/watch?v=qw_O9Qiwqew
Schmidt, S. (2009). Shall we really do it again? The powerful concept of replication is neglected in the social sciences. Review of General Psychology 13, (2): 90-100, https://www.lib.uwo.ca/cgi-bin/ezpauthn.cgi/docview/621988722?accountid=15115 (accessed December 17, 2012).
Wagenmakers, E-J., Wetzels, R., Borsboom, D., van der Maas, H.J.L., & Kievit, R.A. (2012) - An agenda for purely confirmatory research. Submitted for Perspectives on Psychological Science.
|