The Automated Discovery of Implied Causation Statements in Medical Research Literature: The Text-based Causation Signal "caused by"

Document Type


Publication Date



Medicine and Health Sciences


The goal of the IMPLICATION project is to provide a tool that will keep the already busy Physicians or Scientists up to date on the latest discoveries, in their respective specialties by providing a faster way to access the latest material. The tool will automatically process text and extract from it critical information known as causation statements. These statements are identified with the help of causation signals, which are found in almost everything we read. A causation signal is a word or a phrase that links together the cause and the effect in a statement, and indicates the presence of causation in text. In this study, the causation signal "caused by" was identified in articles from PubMed. Using a pre-existing software tool, 300 abstracts were obtained from the PubMed database. The first fifty abstracts were used to identify the characteristics of causation statements and formulate the antecedent/consequent extraction rules, known as the pseudo-code extraction algorithm. To construct the algorithm, we used the process of iterative refinement: for each causation statement, the position of the signal "caused by" with respect to the antecedent and consequent was noted, which modified the pseudo-code, each time exceptions to the statement were observed. The common pattern observed in the initial set of statements was that the consequent occurred upstream of the signal while the antecedent occurred downstream of the signal. The pseudo-code extraction algorithm was tested on the rest of the abstracts to show its effectiveness. The algorithm correctly extracted the consequent 93.6% of the time and the antecedent 89.6% of the time. Using these results, the goal of identifying multiple chains of causation to provide Physicians and Scientists with accurate information quickly.


Copyright all authors

This document is currently not available here.