AI/ML Approaches to Assisted Medical Writing—Part 1

, , , , ,

Experimenting with different methods using AI.


Analysis of current landscape

Medical writing is a highly specialized field that encompasses the art and science of content writing, as well as of clinical research. It entails the creation of well-structured scientific materials, such as clinical research papers, healthcare website content, periodicals, journals etc. The audience for these documents range from a layman to a highly skilled medical expert. Medical writers must possess a deep understanding of the subject and be able to comprehend medical concepts and terminology. In addition, they should be familiar with applicable rules on the style and content of certain papers and possess excellent writing abilities. The art of writing documentation for regulatory authorities requesting approval for devices, medicines, and biologics is referred to as regulatory medical writing. This contains information from clinical trials, regulatory submission materials, and post-approval documents, among other things.1 On the other hand, writing documents on devices, biologics, and medicine for educational purposes is known as educational medical writing. In recent times, the demand for medical writers has exponentially increased as the number of drugs discovery and clinical studies undertaken have ramped up.

The scope of our experiment is limited to regulatory medical writing involving updates or amendments to various clinical trial documents, vis-à-vis, Protocol, Investigation Brochure (IB), Informed Consent Forms (ICF), Common Technical Document (CTD) and Clinical Study Report (CSR).

Challenges and need for assisted medical writing

Medical writing involves highly skilled and trained professionals. Every single aspect mentioned in a document must be reviewed by multiple peers to check for authenticity. Authoring a single paper and double-checking its contents therefore becomes an extremely cumbersome and time-consuming process.

Medical writing necessitates the writers to read and understand numerous research papers, documents, and journals to arrive at a conclusion. A machine can scan thousands of articles in matter of seconds, compared to humans who would require days. Disparity and biases in the human mind too exist, that invariably finds its way in the article being produced.

Research question and hypothesis

AI has made significant advances in the field of text mining, processing, and generation. AI engines powering these services can comprehend the context and recommend text accordingly. The technology also proves instrumental in intuitive content writing.

A computer if trained properly exhibits no bias. It provides its predictions and suggestions basis its training. With advances in computer technology and rise of AI in the field of NLP and NLG (Natural Language Generation), medical writers can leverage these innovations to assist them in creating medical documentation.

Rationale of the experiment

Nowadays, AI is being frequently used in Pharmacovigilance for extracting information from ICSR (Individual Case Study Reports) and further processing of data.2 The machines in these cases are performing the preliminary and repetitive tasks that are time consuming and thus, long working hours of a skilled person is saved allowing him/her to focus on critical activities. There have been attempts to identify the causal relationship between drugs and adverse events by leveraging AI. The key focus of our experiment is to assist the regulatory medical writers and provide them with a launchpad for seamless authoring of regulatory documents. The AI model would subsequently be trained to generate sections of a protocol and help in validating the efforts taken for using AI in the field of Medical Writing.


A medical writer needs to enter medical terms for which he or she wishes to generate the section of a new document These keywords are combined to form a search query, which is then run through PMC (PubMed Central Repository). This search returns several articles/papers that are related to the search terms, post which relevant articles are chosen, and data is extracted from them in a section-wise manner. The search result is generally used as a reference point to build a new regulatory document.

The sections considered in this experiment are introduction, methodology, and discussion pertaining to a protocol. The extracted data is cleaned and pre-processed before NLG and NLP algorithms are applied to it to generate the required section.

Materials, data sources, and resources

Medical data required to generate text in this experiment is sourced from the PMC. Data from the full text articles is extracted in the form of a Data Table, with one article/paper forming a row. As seen in Figure 1 below, the research paper/article sourced from PMC is transformed into a structured format which is processed further.

Methods and experimental strategy

We have proposed a framework that makes use of the techniques and algorithms of NLP and NLG to generate the articles. The proposed framework for Medical Writing Automation (MWA) uses AI and NLG to generate a summary of various research papers and articles published. The AI-generated reports include sections such as Introduction, Methodology, Discussion, alongside a few other sections such as Study Objectives and Inclusion-Exclusion criteria.

The framework leverages words as input from the user, e.g., Diabetes, Malaria, India etc. and makes use of PMC as its database. To search for papers on PubMed, a precise search phrase must be used, to receive optimized results and avoid unrelated items from appearing.

The model makes a proper MeSH (Medical Subject Headings) query3 and provisions a search on the PMC. The search returns a list of Articles along with their identifiers. These articles form the base that needs to be generated. Several approaches are used to collect data from these full-text articles. This data is stored in a DataFrame (spreadsheet) with columns such as article id, title of the article, introduction portion, methodology section, and discussion section.

The raw data obtained after extraction from the articles can contain HTML elements, unnecessary punctuations, and other unicode characters. To address this, data cleaning and pre-processing follows. The data comprising of date, time, weights, volumes etc is converted into standard units. For instance, if the article has a statement that says 48 hours, the time frame is converted into days.

In an MWA system, user has the options to select the type of text generation, between summarization or pure abstraction. NLP techniques for summarization and generation used here are Extractive and Abstractive summarization.

The approach for extractive summarization involves taking the complete text as input and breaking it down into individual sentences. These sentences are then sorted as per their significance, that is determined using the cosine similarity and textrank algorithms.4 Once the sentences have been organized per importance, user can select the top ranked sentences and delete the remaining ones. When a user selects Extractive summarization as the method for generating text in MWA, a summary of each article in the DataFrame is generated first (in the background, the Introduction, Methodology, and Discussion sections of each article are individually summarised), post which, once the summaries of all relevant articles are obtained, a summary of the entire data, i.e., a summary of all the summaries, is generated. This cascading of summaries ensures that no information is left out of, and that each sentence from the article is equally significant.

Except for the last section, abstractive summarization operates in the same way as extractive summarization. Individual sentences are prioritised according to their value in abstractive summarization, just as they do in extractive summarization. The only difference being, abstractive summarization approaches use AI and machine learning to understand the relationships between words and sentences, and is trained to compare two words and phrases and calculate their similarity.


Primary data analysis and defending research hypothesis

The data extracted from the articles of PMC is in English and contains many fillers and stop words that are necessary, yet insignificant to complete a sentence. Such words can be removed while training the AI model for abstractive summarization. There will also be few frequent words occurring in the sentence, which can be known by the word count or word cloud approach. If the frequency of the word is high, it may alter the model performance in Abstractive generation, thus, it is better to skip these words too.

In summarization techniques, the individual sentences are considered as a single entity. The correlation of words in the sentence calculated by cosine similarity is critical and if there are similar kind of sentences present in the paragraph to be summarized, the summarization algorithm will find it difficult to rank the sentences. For the summarization techniques to work properly, the sentences should be properly separated out. Long tailed sentences have lower accuracy rates in proper summarization. Hence, while processing the data, sentences should be split properly.

Following are the results of using the search phrases ‘Diabetes Mellitus India' with a view to see how the system fared. Extractive Summarization and Abstractive Summarization were used to create the methodology section

In the case where Extractive Summarization was chosen as the mode of generation, the articles selected from the PMC database were with the PMCIDs of 7328526, 7325532, 7216981, 5210443, 5763039, 6357479, 7667845, 5240082, 6676834, 6755825, 5999405, 5471800, 5884366 and 6227383. Summary from each article was generated followed by a combined summary of all the articles.

The generated summary was, ‘the theoretical minimum risk exposure level for BMI at age 7300 days or more was estimated to range from 20 to 25 kg/m2. Estimates of deaths, YLLs, YLDs, and DALYs attributable to each risk factor for diabetes were produced by location, age, sex, and year. For the secondary analysis, results from interaction models exploring the change in clinical parameters over time were examined and compared with new and known diagnoses of hypertension.’

For Abstractive Summarization the articles selected by the system from PMC database were with the PMCIDs of 7328526 and 7325532. The summary generated was, ‘Inclusion/exclusion criteria Adults with T1DM or T2DM who consulted their physicians during the 14 day-long recruitment period were included in the study. In a survey on 307 diabetes patients in Singapore, only 30.6% of patients were found to be vaccinated with influenza vaccine.’

Performing plagiarism check would reveal that the summary generated by the extractive summarization approach would result in a very high plagiarism score. The reason for high plagiarism is justified by the selection of top correlated sentences and producing the summary without changing the text whereas abstractive summarization has a small edge over extractive summarization in that it makes slight alterations to the words and phrases by replacing them with synonyms or matching terms.


Abstractive summarization techniques score over extractive summarization; however both are not the best approaches in generating accurate articles. The current limitation with respect to the deep learning model is that it generates sentences basis the specific context it has been trained for. With the advancements in technologies like Transfer Learning and Ensemble Learning, a generic model can be built to understand the context and generate summary. In addition, there is a third method of generating text besides summarization, that is referred to as Pure Abstraction method. We shall discuss this method in the second part of our article. Stay tuned!

Saurabh Das, Head, Research and Innovation, Niketan Panchal, Researcher, Ashutosh Pachisia, Data Scientist, Rohit Kadam, Researcher, Prashant Chaturvedi, Data Scientist, Dr. Ashish Indani, Former Head Research and Innovation; all with TCS ADDTM Platforms