← Portfolio
ML·Jan 2026
Applied Research Scientist: Medical AI Consulting
Foundation-model pipeline for gastroenterology imaging with cloud-scale experimentation
Self-Supervised LearningPyTorchMedical ImagingS3Weights & BiasesCloud GPUDINOv3Vision Transformers
View Repository →Architecture design and SSL pretraining strategy for a ViT-based foundation model on 5M+ gastrointestinal video frames for endoscopic polyp detection.
Scope (NDA-safe)
- Designed SSL pretraining strategy combining masked image modelling with DINOv3-style self-distillation.
- Built data ingestion pipeline from cloud object storage to training-ready datasets under strict GDPR compliance.
- Set up experiment tracking and reproducibility workflows across multiple model architectures.
- Defined fine-tuning protocols for classification, segmentation, detection, and severity scoring tasks.
- Managed cloud GPU resources to optimize training throughput and cost-efficiency.
- Planned deployment pathways for clinical validation and edge-oriented inference environments.
Stack
- Python, PyTorch, Vision Transformers (ViT), DINOv3-style self-distillation.
- Weights & Biases for tracking and experiment management.
- S3-backed datasets and cloud GPU compute.
Engineering Challenges
- Processing high-volume, heterogeneous endoscopic video under medical data governance constraints.
- Designing multi-objective pretraining that balances feature self-distillation with masked image modelling.
- Reducing bottlenecks between storage, preprocessing, and multi-GPU training jobs.
- Keeping experiments reproducible across rapid model iterations and changing dataset versions.
Outcome
- Foundation model architecture capable of self-supervised learning on millions of unlabeled clinical frames.
- Benchmarking framework comparing performance against state-of-the-art medical foundation models.
- Reusable training pipeline scalable across future medical imaging tasks and institutions.
- Foundation for real-time, clinically relevant inference pipelines under GDPR and EHDS compliant data governance.