Automatic customization of health-education brochures for individual patients Graeme Hirst, PhD (1) Chrysanne DiMarco, PhD (2) (1) Department of Computer Science, University of Toronto, Toronto, Ontario M5S 3G4; gh@cs.toronto.edu (2) Department of Computer Science, University of Waterloo, Waterloo, Ontario N2L 3G1; cdimarco@logos.uwaterloo.ca keywords: health education, customization, tailoring Abstract Many studies have shown that health-education messages and patient instructions are more effective when closely tailored to the particular condition and characteristics of the individual recipient. But in situations where many factors interact -- for example, in explaining the pros and cons of hormone replacement therapy -- the number of different combinations is far too large for a set of appropriately tailored messages to be produced in advance. The HealthDoc project is presently developing linguistic techniques for producing, on demand, health-education and patient-information brochures that are customized to the medical and personal characteristics of an individual patient. For each topic, HealthDoc requires a `master document' written by an expert on the subject with the help of a program called an `authoring tool'. The writer decides upon the basic elements of the text -- clauses and sentences -- and the patient conditions under which each element should be included in the output. The program assists the writer in building correctly structured master-document fragments and annotating them with the relationships and conditions for inclusion. When a clinician wishes to give a patient a particular brochure from HealthDoc, she will select it from a menu and specify the name of the patient. HealthDoc will use information from the patient's on-line medical record to then create and print a version of the document appropriate to that patient, by selecting the appropriate pieces of material and then performing the necessary linguistic operations to combine them into a single, coherent text. 1. Customizing health-education documents Health-education and patient-information brochures and leaflets are often limited in their effectiveness by the need to address them to a wide audience. What is generally produced is either a minimal, generic document that contains only the information relevant to everyone, or a maximal document that tries to provide all the information that might be relevant to someone (and hence much that is irrelevant to many). But documents that contain irrelevant information, or omit relevant information, or that for any other reason just don't seem to be addressed to the particular reader, are likely to be discounted or ignored. Recognizing this, health educators have paid much attention to methods of identifying different segments of their audience and their differing needs and constructing material accordingly [1]. But the documents remain, to a significant degree, generic. However, recent experiments (see below) have shown that health-education documents can be much more effective if they are customized for individual readers in accordance with their medical conditions, demographic variables, personality profile, or other relevant factors. This kind of customization involves much more than just producing each brochure or leaflet in half a dozen different versions for different audiences. Rather, given the number of independent variables and their range, the number of different combinations of factors can easily be in the tens or hundreds of thousands. Thus, each brochure must be produced individually for each recipient. For example, Strecher and colleagues sent unsolicited leaflets to patients of family practices on topics such as giving up smoking [2], improving dietary behaviour [3], or having a mammogram [4]. Each leaflet was `tailored' to the recipient by a program that selected fragments of text for inclusion on the basis of answers that the recipient had given in an earlier telephone survey. In each study, the tailored leaflets had a significantly greater effect on the patients' behaviour than `generic' leaflets had upon patients in a control group. In these studies, the leaflets to be tailored were represented on-line simply as a large set of simple fragments of text to be included when appropriate in both content and form. While this is straightforward in principle, it requires that an extremely large number of bits and pieces of text be available: each fact expressed in each possible way. Sarah Kobrin reports (p.c.) that in extensions to the work of [2], the creation and management of the large number of text fragments involved became extremely difficult. And the assembly of such bits and pieces suffers from the problem that the resulting document might not be coherent or cohesive, or at the very least, not stylistically polished. These kinds of deleterious effects usually become apparent only over stretches of 100-200 words or more, so lack of space precludes us from giving complete examples. But we can show a simplified case. Suppose that the following two fragments are possible components of a text: {\it People with respiratory disorders} have a high risk of developing Glaumann's syndrome. {\it People with respiratory disorders} should take immediate action to quit smoking. If in some document these fragments are both selected to appear, and they do so adjacent to one another, then the result is linguistically clumsy; the italicized noun phrase should be replaced by the pronoun {\it they} in whichever one comes second. Moreover, if the two facts have a causal relationship and appear in the order shown above, then inserting the word {\it therefore} at the start of the second sentence, rather than leaving the relationship implicit, would improve clarity. It might be objected that the fragments could be carefully constructed so that all possible selections resulted in a well-formed document. Indeed, Strecher and colleagues attempted just that. However, they found it difficult even for their fairly simple documents (Sarah Kobrin and Victor Strecher, p.c.); it would surely be very hard to achieve for complex documents unless the granularity were extremely coarse, thereby increasing the number of distinct elements required. In the limit, one would simply have a distinct document pre-written for every single combination of possibilities, a situation that would be quite impractical. Rather, what is needed is a system for the production of tailored health-education and patient-education documents, that would, on demand, customize a `master document' to the needs of a particular individual. The HealthDoc project is building such a system. 2. The conceptual framework of the HealthDoc project The HealthDoc project aims to develop techniques for producing health-information and patient-education documents that are customized to the personal and medical characteristics of the individual patient receiving it. Information from an on-line medical record or from the clinician will be used as the basis for deciding how best to fit the document to the patient. The project is concentrating on the production of printed materials---brochures and leaflets that the patient can take away to read and refer to whenever they wish. Nonetheless, many of the techniques that we are developing will also be applicable in the interactive, hypertext-like systems that others are developing [5,6]. 2.1 The HealthDoc model of patient education ** Master documents: Each customized brochure is produced from a {\it master document} on a particular topic. These master documents are created by a medical writer with the aid of an authoring tool. Each contains all the information, including illustrations, that might be included in any individual version, along with annotations as to the conditions under which each piece of information is relevant. ** Dimensions of customization: A master document may be customized with data about the individual patient, and the selection of content and manner of expression of that content may be determined by the patient's medical condition and their personal and cultural characteristics. Selection may occur at the level of paragraphs, sentences, phrases, or words. ** Clinical use: In clinical use, HealthDoc would have access to the on-line medical records of patients. When the clinician wishes to give a patient a particular brochure from HealthDoc, she selects it from a menu of master documents, and specifies the name of the patient. HealthDoc will then generate a version of the document appropriate to that patient (possibly asking the clinician for information to supplement that which it finds in the patient's record). The document will be attractively laid out and formatted, and may be run off on pre-printed stationery. 2.2 Goals of the present project The creation of a complete system as just described is beyond the scope and resources of the current HealthDoc research project. The project is at present concentrating primarily on the central research problems in computational linguistics that are entailed by the development of such a system, and in particular the nature of master documents, tools for authoring them, and the generation of coherent text from them. Both the authoring tools and the processes that refine the selections from the master document are necessarily language-dependent, so at present HealthDoc is limited to English, our working language. We hope that in the long term it will be possible to add master documents in other languages for which the necessary grammars and lexicons are available. (Unfortunately, there are little or no applicable resources for the languages---Chinese, Vietnamese, Khmer---that are the greatest problems for the hospitals with which we are collaborating.) 2.3 Customizing patient-education material A HealthDoc brochure may be customized with data about the individual patient, and the selection of content and manner of expression of that content may be determined by the patient's medical condition and (in later stages of the project) their personal and cultural characteristics. ** Patient data: The simplest kind of customization is inclusion of simple numerical or alphabetic data from the patient's chart---in effect, filling in the blanks in a template. This might include the name of the patient, their physician, or details of a prescription. Template-filling is straightforward, and independent of other kinds of customization. Where we speak below about customization by the creation or inclusion of pieces of text, it is to be understood that these pieces might actually be templates that are then further customized by filling with the appropriate data. ** Patient's medical condition and physical characteristics: Customization by medical condition and physical characteristics entails choosing what to say and not say in the document, in accordance with the patient's diagnosis, physical characteristics (such as age and gender), and medical history. When several medical conditions interact, the choice of what to include and exclude may be quite complex. For example, the customization of a brochure advising a patient on the benefits and risks of hormone-replacement therapy needs to take into account a large number of interacting factors in her medical history and that of her family. It is in such cases that customizable documents will be of particular utility. ** Patient's culture, health beliefs, and other personal characteristics: Customization by patient characteristics involves the choice of both form and content. Many studies have shown that the `same' message often needs to be framed or presented in very different ways in order to be communicated most effectively and most persuasively to different people; individual and cultural differences in health beliefs, perception of and attitude to risk, and level of education are among the factors that must be considered when tailoring a health message to an individual [1,7]. Indeed, what may be persuasive to one person can actually reduce compliance in another [8]. Despite its restriction to a single language, the later phases of the HealthDoc project will attempt to customize documents to account for cultural differences in health beliefs and other individual differences. Customization or tailoring thus involves much more than the mere inclusion of data about the individual patient. For example, giving a patient a textual summary of their chart would not, by itself, be tailoring, even though the information is particular to that patient. But if the creation of the summary were to take into account how the information is best presented to that individual, or if the information in the chart were used in deciding the form or content of a different health message, then tailoring would have occurred. HealthDoc is thus distinguished from `tailoring' projects such as [9], in which what is tailored to the individual patient is a clinical decision model for which an explanation is then produced; individual characteristics of the patient affect the content of the explanation only insofar as they are the inputs of the decision model. The explanation itself is created by filling in template slots with information from the patient's chart and with phrases that describe which nodes in the model proved to be important in making the decision; templates were pre-written for all possible combinations of decisions, as there were not very many. 3. The master document 3.1 The master document and generation by selection and repair As explained above, a master document is a specification of all the information that might be included in a brochure on a particular topic, along with annotations indicating what is to be included when. We now discuss the nature of this master document and the problems of combining selections from it. In the simplest kind of customization for content and form, a master document would just be a large set of simple blocks of text (or templates for patient data) as used by Strecher and colleagues; we saw the limitations of this approach in section 1 above. At the other end of the spectrum, the elements of the master document would be pieces of a language-independent structure in some knowledge-representation formalism, and would be selected for content but not form. These elements would then have to pass through some complete language-generation system that would decide on how to organize and express the content, given information about the form best suited to the patient's personal characteristics. This approach is elegant and flexible, but is not yet close to being possible for domains as complex as those of interest here. Our approach is a workable compromise between these extremes. The master document is represented neither in a knowledge-representation formalism nor as text blocks, but in an abstract text specification language that expresses not only the content of the document but also information that will assist any subsequent process of revision; this language will be described below. Selections from this document are made for both content and form, and are then automatically post-edited---``{\it repaired}''---for form, style, and coherence. These repairs take place upon the abstract representation, and are guided by the additional information that it contains. Thus in this process of {\it generation by selection and repair} the starting point is a partially specified, pre-existing document with an overall text organization, division of propositional content into sentences, choice of words, and lexical cohesive structure. Even though the system might subsequently modify many of these aspects as it produces a customized text from the master document, we nonetheless start from a highly useful draft form, rich in linguistic and stylistic information---in effect, we observe the maxim that it is generally much easier to {\it re}-write than to write. We discuss generation by selection and repair at greater length in [10]. 3.2 Text Specification Language Text Specification Language (TSL) is a language that we have developed for the internal representation of master documents. It is an extension of the Sentence Plan Language (SPL) used by the text generation system that HealthDoc employs for realization of its final output [11]. TSL expresses the content of the document, and permits this content to be annotated as to which elements are to be selected from the master document under what circumstances. Annotations for selection may refer to the patient's medical record or to information that the clinician could supply, such as the patient's reading level or the preferred style of presentation. For example: :patient-age (greater-than 70) :patient-blood-pressure (high) :patient-recent-medical-history (myocardial-infarction) :patient-history-reliability (good) :reading-level (fourth-grade) :formality (low) These must be translated to queries on the medical record or to the clinician. Note that ``don't know'' is a possible answer that the author of the master document must allow for. TSL also includes two kinds of information to guide `repairs' to the selected text: coreference links and rhetorical relations between sentences. Each object or entity referred to in any fragment of the document is represented by a pointer into a list of all objects referred to, along with the kind of reference: definite or indefinite, extensional or intensional, and so on. Thus, it will always be known if two different sentences refer to the same thing, and pronominalization can occur accordingly, as in the example in section 1. Rhetorical relations are cohesive relationships between sentences such as CAUSE, CONTRAST, ELABORATION, and so on. All such relationships between sentences in the document are recorded in the TSL, so that markers such as {\it therefore} and {\it however} can be used in the text where appropriate, again as in section 1. For example, the following is the TSL structure for the sentence ``The condition that you have is insulin-dependent diabetes.'' Comments are indicated by a `%' sign. :tsl '(asc / ascription % Annotations for selection: :patient-diagnosis insulin-dependent :technicality all % Information on context: % "diabetes" was the focus of the preceding text :focus diab6 % Content of sentence: :tense present :domain (cond / abstraction :specific cond0 % coreference link to other instances of "condition" :lex condition :determiner the :process (have / ownership :lex have-possession :tense present :domain (hearer / person) :range cond)) :range (diab6 / abstraction :specific diab4 % coreference link to other instances of "diabetes" :lex diabetes :determiner zero :property-ascription (ins / quality :lex insulin-dependent))))) 3.3 Authoring a master document The author of a master document would normally be a professional medical writer, who will need to understand the nature of customized and customizable texts but who is not assumed to have any special knowledge or understanding of TSL or the innards of HealthDoc. The authoring tool, presently under development, must therefore be no more difficult for the author to use than, say, the more-sophisticated features of a typical word processor. The text is therefore written in English, and translated to TSL by the authoring tool. (The English source text is also retained for use in subsequent sessions---for example, if the document is to be updated or amended.) It is the writer's job to decide upon the basic elements of the text, the rhetorical and coreferential links between them, and the conditions under which each element should be included in the output. The elements of the text are then typed into the authoring tool in English, and are marked up by the writer with conditions for inclusion and with links for cohesion and coreference. The tool will then translate the text into TSL. This is essentially a process of parsing, but the resultant structures are (annotated) TSL expressions rather than parse trees. Whenever an ambiguity cannot be resolved, the writer is queried in an easy-to-understand form; for example: When you say ``treat the patient with myoform'', did you mean 1. the patient with myoform is treated 2. the patient is treated with myoform 4. Automatic post-editing After material in the TSL master document has been selected for a particular patient, textual repairs or post-editing might be needed. The sentence planner in which these mechanisms will operate is presently under development. It uses a blackboard architecture in which individual sentence-repair modules communicate and resolve their conflicts with one another. The architecture is described in greater detail in [12]. Four modules are being being built in the first phase of the project: for reference, rhetorical relations, aggregation, and constituent ordering. Coreference repairs, for example, include decisions as to when a reference should be pronominalized. If the two sentences of the example in section 1 turned up in close proximity, the italicized noun phrases would be recognized as coreferential because in their TSL form they would be labelled with the same referent symbol. The first occurrence would therefore be marked for realization as a full noun phrase, and the second would be marked for pronominalization. 5. Realization and formatting The final specifications for the edited text, represented in SPL, are passed to the realization stage, which uses the KPML text generation system [11] to generate an appropriate surface form in English. A formatter then lays out the text attractively and adds headings and illustrations for final printing. 6. Conclusion The HealthDoc project aims to provide a comprehensive approach to the tailoring of patient-information and health-education materials for the individual patient. The earlier results of Strecher and colleagues [2,3,4], showing the benefits of such customization, justify the importance of what we are doing for a community health-care setting. However, the ability to create customized health communication points out that, while the need for such customization has been long recognized, there has as yet been little research on how information may be conveyed most effectively to individuals with particular characteristics in order to motivate a change in their behaviour. In the next stage of the project, identifying critical examples of variations in text by medical condition and by health beliefs will be an important task. Acknowledgements The HealthDoc project is supported by a grant from Technology Ontario. The project is advised by patient-education committees of Peel Memorial Hospital (Brampton, Ontario), Massachusetts General Hospital (Boston), and Sunnybrook Health Science Centre (University of Toronto). The authors are grateful to H. Dominic Covvey and Eduard Hovy for comments on earlier drafts of this paper. References [1] Maibach E, Parrott RL (Eds.). Designing health messages. Sage Publications, 1995. [2] Strecher VJ, Kreuter M, Den Boer D-J, Kobrin S, Hospers HJ, Skinner CS. The effects of computer-tailored smoking cessation messages in family practice settings. Journal of Family Practice, 1994, 39(3):262--270. [3] Campbell MK, DeVellis BM, Strecher VJ, Ammerman AS, DeVellis RF, Sandler RS. Improving dietary behavior: The effectiveness of tailored messages in primary care settings. American Journal of Public Health, 1994, 84(5):783-787. [4] Skinner CS, Strecher VJ, Hospers HJ. Physicians' recommendations for mammography: Do tailored messages make a difference? American Journal of Public Health, 1994, 84(1):43-49. [5] Cawsey A, Binsted K, Jones R. Personalized explanations for patient education. Proceedings of the Fifth European Workshop on Natural Language Generation; 1995 May; Leiden; 57-94. [6] Buchanan B, Moore JD, Forsythe DE, Carenini G, Ohlsson S, Banks G. An intelligent interactive system for delivering individualized information to patients. Artificial Intelligence in Medicine, 1995, 7(2):117-154. [7] Kreps GL, Kunimoto EN. Effective communication in multicultural health care settings. Sage Publications, 1994. [8] Monahan JL. Thinking positively: Using positive affect when designing health messages. In [1]:81-98. [9] Jimison HB, Fagan LM, Shachter RD, Shortliffe EH. Patient-specific explanation in models of chronic disease. Artificial Intelligence in Medicine, 1992, 4:191-205. [10] DiMarco C, Hirst G, Wanner L, Wilkinson J. HealthDoc: Customizing patient information and health education by medical condition and personal characteristics. Proceedings of the Workshop on Artificial Intelligence in Patient Education; 1995 August; Glasgow; 59-71. [11] Bateman JA. KPML: The KOMET-Penman multilingual linguistic resource development environment. Proceedings of the Fifth European Workshop on Natural Language Generation; 1995 May; Leiden; 219-222. [12] Hovy EH, Wanner L. The HealthDoc sentence planner. Proceedings of the Eighth International Language Generation Workshop; 1996 June; Herstmonceux, East Sussex; 1-10.