NLP-AI4Health: Second Workshop on Integrating NLP and AI for Multilingual and Patient-Centric Healthcare Communication

On 23rd December, 2025 at IJCNLP AACL

Venue - Victor Menezes Convention Centre (VMCC), IIT Bombay, Mumbai, India
(20th-24th December, 2025)

About NLP-AI4Health 2025

NLP-AI4Health 2025 is a focused workshop on how Natural Language Processing (NLP) and Artificial Intelligence (AI) can improve multilingual and patient-centered healthcare communication. Organized by CMC Vellore and IIIT-Hyderabad (LTRC), the workshop provides a platform to advance inclusive, accessible, and culturally aware language technologies for healthcare. We invite research contributions on topics such as low-resource medical NLP, simplification of clinical documents, speech-based tools for Electronic Health Records (EHRs), and ethically aligned, culturally sensitive AI systems. A key feature of this year’s workshop is a shared task on Multilingual Health Question Answering, which encourages the development of inclusive and robust QA systems across Indian languages. Held alongside IJCNLP-AACL 2025, the event will include keynote talks, paper presentations, and collaborative discussions, bringing together researchers, clinicians, developers, and policy experts to co-create practical and impactful language solutions for healthcare in diverse linguistic settings.

Objectives

The primary goal of this workshop is to explore how advances in Natural Language Processing (NLP), Computational Linguistics (CL), and Artificial Intelligence (AI) can be leveraged to address communication challenges in multilingual and resource-constrained healthcare environments.

Understandcommunication challenges in multilingual and low-resource healthcare settings.

Buildtools to simplify and translate patient documents like consent forms and discharge summaries.

Improvespeech-based systems for clinical documentation and multilingual EHR entry.

Promotecollaboration between healthcare professionals and language technology researchers.

Organizea shared task on health question answering in Indian languages to support low resource NLP research.

Call for Papers

We invite submissions of original research papers, position papers, and system demonstrations that address challenges at the intersection of language technologies and healthcare communication, particularly in multilingual and low-resource settings. Submissions may describe completed work, ongoing projects, preliminary results, or innovative ideas relevant to the following (but not limited to) topics:

NLP for multilingual and low-resource healthcare applications

Translation and simplification of patient documents (e.g., consent forms, summaries)

Speech and voice technologies for clinical data capture

Language model development for underrepresented languages in health domains

Adapting and fine-tuning LLMs for healthcare-specific use

Interfaces for patients with low literacy or special accessibility needs

NLP tools that support culturally sensitive patient care and education

Ethical and equitable AI use in healthcare language technologies

Real-world case studies on multilingual healthcare NLP deployments

10.

Shared task systems, datasets, and evaluation methodologies

Workshop Timeline

First Call for Papers

July 22, 2025

Second Call for Papers

August 22, 2025

Third Call for Papers

September 22, 2025

Submission Deadline

September 29, 2025

October 6, 2025

ARR Commitment Deadline

October 27, 2025

Notification of Acceptance

November 3, 2025

Camera-ready Papers Due

November 11, 2025

Proceedings Due

December 1, 2025

Workshop

December 23, 2025

Call for Papers

NLP for multilingual and low-resource healthcare applications

Translation and simplification of patient documents (e.g., consent forms, summaries)

Speech and voice technologies for clinical data capture

Language model development for underrepresented languages in health domains

Adapting and fine-tuning LLMs for healthcare-specific use

Interfaces for patients with low literacy or special accessibility needs

NLP tools that support culturally sensitive patient care and education

Ethical and equitable AI use in healthcare language technologies

Real-world case studies on multilingual healthcare NLP deployments

10.

Shared task systems, datasets, and evaluation methodologies

Workshop Timeline

First Call for Papers

July 22, 2025

Second Call for Papers

August 22, 2025

Third Call for Papers

September 22, 2025

Submission Deadline

September 29, 2025

October 6, 2025

ARR Commitment Deadline

October 27, 2025

Notification of Acceptance

November 3, 2025

Camera-ready Papers Due

November 11, 2025

Proceedings Due

December 1, 2025

Workshop

December 23, 2025

Submission Guidelines

Please use the ACL 2025 style template for your submission. The submission should be anonymized for double-blind review.

Page limits:

• 4 pages for short papers (excluding references and appendices)
• 8 pages for full papers (excluding references and appendices)

Accepted papers will be published in the ACL Anthology as part of the IJCNLP-AACL 2025 workshop proceedings.

Shared Task on Patient-Centric Question Answering

Multilingual Health Question Answering for Head and Neck Cancer and Cystic Fibrosis

Task Overview

This shared task challenges participants to build models that can generate concise summaries and answer patient-centric questions based on natural, multi-turn dialogues related to Head and Neck Cancer (HNC) and Cystic Fibrosis. The dialogues are real-world conversations between patients and healthcare providers, focusing on providing reliable, understandable information to patients and caregivers.

Sample Dataset

The NLP4Health Dataset is part of the Shared Task on Patient-Centric Question Answering. Key features include:

Organized by patient scenarios with multiple consultation instances per case
Multiple file formats including conversation transcripts, structured summaries, and QA pairs
Two participation tracks: Closed Task (provided data only) or Open Task (external resources allowed)

Data Description

Training Set:

50K validated dialogues in English, Hindi, Telugu, Tamil, Bangla, Gujarati, Kannada and Dogri on HNC and Cystic Fibrosis
Synthetically generated dialogues validated by humans for medical accuracy and cultural appropriateness
Each dialogue includes:
- A multi-turn conversation between patients and healthcare providers
- A summary capturing the main information points
- 4-5 question-answer pairs derived from the dialogue

Test Set:

5K unseen dialogues for which participants must generate summaries and answers to new questions
Gold summaries and answers validated by medical experts will be used for evaluation

Task Objectives

Participants are asked to build models with fewer than 3 billion parameters that can:

Generate informative and accurate summaries of multi-turn healthcare dialogues
Answer patient-centric questions based on the dialogue content with high factual correctness and clarity
Support cross-lingual input-output: The model should be capable of taking input dialogues in one language and generating summaries and answers in a different requested target language from the supported set
Participants may use any available external data for training (open task) or must strictly use the provided training data (closed task)

Evaluation

Models will be evaluated on:

Automatic Metrics:

ROUGE, BLEU, BERTScore for summaries
Exact Match (EM), F1 score for QA pairs

Human Evaluation:

Medical experts will assess factual accuracy, completeness, and helpfulness of generated summaries and answers
Evaluation will be conducted for question answering tasks for English and translated test data

Submission Guidelines

Participants should submit:

Generated summaries for all test dialogues
Answers to all test questions
Models must have fewer than 3B parameters, and participants should also provide a resource link (e.g., GitHub or Hugging Face) for the model
Submit results via the shared task website before the deadline in codabench

Data Usage and Access

The training, validation, and test data will be released in standard JSON format containing dialogues, summaries, and QA pairs
The data is licensed for the shared task purposes only

Baselines and Resources

Baseline models and evaluation scripts will be provided upon data release
Suggested toolkits and example code will be made available

Timeline

1.First Call for Participation (Registration)

August 01,2025

2.Second Call for Participation (Registration)

August 28,2025

3.Release of Sample Submission Dataset

September 04,2025

4.Release of Training Dataset

October 03,2025

5.Release of Test Dataset

October 07,2025

6.System Submission Deadline

October 17,2025

7.Result announcement

October 21,2025

8.Final Papers Submission Deadline

October 24,2025

9.Notification of acceptance/writing papers for shared task

November 03,2025

10.Camera Ready Papers Due

November 11,2025

11.Pre-recorded Video Due

November 21,2025

12.Workshop Date

December 23,2025

First Call for Participation (Registration)

August 01,2025

Second Call for Participation (Registration)

August 28,2025

Release of Sample Submission Dataset

September 04,2025

Release of Training Dataset

October 03,2025

Release of Test Dataset

October 07,2025

System Submission Deadline

October 17,2025

Result announcement

October 21,2025

Final Papers Submission Deadline

October 24,2025

Notification of acceptance/writing papers for shared task

November 03,2025

10.

Camera Ready Papers Due

November 11,2025

11.

Pre-recorded Video Due

November 21,2025

12.

Workshop Date

December 23,2025

Important: Registration post-test data release will not be accepted.

Invited Speakers

Tanmoy ChakrabortyIIT Delhi

Dr. Parag R. RindaniWockhardt Hospitals

Workshop Schedule (Tentative)

Time

Session

Speaker / Moderator

8:00 A.M. - 9:00 A.M.

Registration

Overall Chair: Hannah Thomas

9:00 A.M. - 9:10 A.M.

Workshop Introduction

Parameshwari Krishnamurthy

9:10 A.M. - 9:40 A.M.

Opening Address

Balukrishna S

9:40 A.M. - 10:30 A.M.

Keynote Talk

Tanmoy Chakraborty, IIT Delhi; Chair: Asif Ekbal

10:30 A.M. - 11:00 A.M.

Coffee Break

—

11:00 A.M. - 11:45 A.M.

Workshop Paper Presentations (Session 1)

MOD-KG: MultiOrgan Diagnostic Knowledge Graph — Anas Anwarul Haq Khan
Cross-Lingual Mental Health Ontologies for Indian Languages — Sunaina Singh
Automated Coding of Counsellor and Client Behaviours in Motivational Interviewing Transcripts — Soliman Ali

Session Chair: Sobha L (AU-KBC Research Centre)

11:50 A.M. - 12:45 P.M.

Panel Discussion

Overcoming Language Barriers in HealthCare

Dipti M Sharma, Joy Mammen, Parag Rindani, Balukrishna S, Surabhi Goel

Moderator: Hannah Thomas

12:45 P.M. - 2:00 P.M.

Lunch

—

2:00 P.M. - 2:40 P.M.

Keynote Talk

Dr. Parag R. Rindani, Wockhardt Hospitals, Chair : Joy Mammen

2:40 P.M. - 2:50 P.M.

Shared Task Overview

Vandan Mujadia

2:50 P.M. - 3:20 P.M.

Shared Task Paper & Poster Booster

NLP4Health: Multilingual Clinical Dialogue Summarization and QA with mT5 and LoRA — Moutushi Roy
Multilingual Clinical Dialogue Summarization and Information Extraction with Qwen-1.5B LoRA — Kunwar Zaid
Patient-Centric Multilingual QA and Summary Generation for Head and Neck Cancer and Blood Donation — Saloni Chitte
MedQwen-PE: Parameter-Efficient Multilingual Patient-Centric Medical QA and Summarization — Vinay Babu Ulli
SAHA: Samvad AI for Healthcare Assistance — Aditya Kumar / Team

Session Moderator: Priyanka Dasari

3:25 P.M. - 3:30 P.M.

Concluding Remarks

Dipti M Sharma

3:30 P.M. - 4:00 P.M.

Coffee Break

—

Expected Outcomes

By the end of the workshop, participants will have a deeper understanding of the current challenges and opportunities in deploying language technologies in healthcare.
We expect that the workshop will help create a community of interdisciplinary experts who can then engage together to find meaningful solutions to existing challenges in healthcare communications with the use of NLP, AI and other language technologies.
The workshop will provide potential directions for future research and collaboration.

Program Committee Members

Miguel Angel Rios Gaoana, University of Vienna
Vincent Briva Iglesias, School of Applied Language and Intercultural Studies (SALIS), Dublin City University
Sara Vecchiato, Università degli Studi di Udine
Sneha Mithun, Tata Memorial Hospital, Mumbai
Ashish Kumar Jha, Tata Memorial Hospital, Mumbai
Dilip Abraham, Christian Medical College, Vellore
Sivakumar Balasubramanian, Christian Medical College, Vellore
Sonish Sivarajkumar, School of Computing and Information, University of Pittsburgh
Asif Ekbal, Department of Computer Science and Engineering, IIT Patna
Tathagata Raha, M42 Health, Abu Dhabi
Ashwath Rao B, Manipal Institute of Technology, Manipal
Karunesh Arora, Centre for Development of Advanced Computing, Pune
Dipankar Das, Jadavpur University, Kolkata