Hi, I am a computer engineering graduate from IOE, Pulchowk Campus, with a year of experience as a research assistant at NAAMII, a reputable research organization. My primary research interests lie in developing and applying machine learning techniques to solve practical problems, focusing on semi-supervised learning, multi-modal learning, and natural language processing.
At NAAMII, I gained valuable experience developing and implementing complex machine-learning models for various applications, including natural language processing and medical images. I have a strong foundation in algorithms, data structures, and mathematics, enabling me to create practical solutions to challenging problems. I am a self-motivated individual with strong self-management skills and comfortable working independently or as part of a team. I am passionate about producing high-quality results that have a real impact on society.
As I seek to further my knowledge and skills in machine learning, I am eager to pursue a Ph.D. program that would allow me to delve deeper into these research areas. With my research experience and passion for creating practical solutions, I am confident I can make meaningful contributions as a research assistant.
Bachelors in Computer Engineering, 2022
Tribhuvan University, Institute of Engineering, Pulchowk Campus
High School in Physical Sciences, 2017
SOS Hermann Gmeiner School Bharatpur, Bharatpur, Nepal
Supervisor: Bishesh Khanal, Ph.D.
Supervisor: Binod Bhattarai, Ph.D.
Medical image segmentation with deep learning is an important and widely studied topic because segmentation enables quantifying target structure size and shape that can help in disease diagnosis, prognosis, surgery planning, and understanding. Recent advances in the foundation Vision-Language Models (VLMs) and their adaptation to segmentation tasks in natural images with Vision-Language Segmentation Models (VLSMs) have opened up a unique opportunity to build potentially powerful segmentation models for medical images that enable providing helpful information via language prompt as input, leverage the extensive range of other medical imaging datasets by pooled dataset training, adapt to new classes, and be robust against out-of-distribution data with human-in-the-loop prompting during inference. Although transfer learning from natural to medical images for image-only segmentation models has been studied, no studies have analyzed how the joint representation of vision-language transfers to medical images in segmentation problems and understand gaps in leveraging their full potential.
We present the first benchmark study on transfer learning of VLSMs to 2D medical images with thoughtfully collected 11 existing 2D medical image datasets of diverse modalities with carefully presented 9 types of language prompts from 14 attributes. Our results indicate that VLSMs trained in natural image-text pairs transfer reasonably to the medical domain in zero-shot settings when prompted appropriately for non-radiology photographic modalities; when finetuned, they obtain comparable performance to conventional architectures, even in X-rays and ultrasound modalities. However, the additional benefit of language prompts during finetuning may be limited, with image features playing a more dominant role; they can better handle training on pooled datasets combining diverse modalities and are potentially more robust to domain shift than the conventional segmentation models.