I build and evaluate multimodal AI systems.

I work on vision–language models, large-scale video understanding, and practical MLOps for deploying these systems in real products. I’m especially interested in retrieval-augmented pipelines, representation learning, and evaluation for safety and robustness.

  • Multimodal LLMs & video understanding
  • Large-scale evaluation & content safety
  • End-to-end pipelines from research to deployment

Current focus

  • Designing and evaluating video search & summarization pipelines.
  • Fine-tuning vision–language models for safety and classification.
  • Building tools to read and analyze research papers more efficiently.

About

I am a machine learning researcher working on multimodal AI and real-world deployment of vision–language systems. My recent work focuses on video search and summarization, content understanding, and evaluation pipelines that connect research models to production constraints.

Broadly, I enjoy working at the boundary between research ideas and systems that actually ship: designing models and representations, and then building the data, training, and inference infrastructure needed to make them useful in practice.

Research & Publications

I am currently organizing my publications, patents, and project notes. A more detailed, paper-style list will appear here soon.

Selected Publications

Patents

  • Electronic device and operation method for multimodal temporal-axis fusion AI models
    U.S. Patent No. 12,299,967 (Granted May 2025) and Korean Patent No. 10-2812533 (Granted May 2025)
    Co-inventor
  • Electronic device for multi-modal temporal-axis fusion artificial intelligence models and operation method thereof
    Korean Patent No. 10-2721151 (Granted Oct 2024)
    Co-inventor
  • Apparatus and method for generating a learning model for classifying images using transfer learning
    Korean Patent No. 10-2641533 (Granted Feb 2024)
    Co-inventor
  • Method and apparatus for automatic design of artificial neural network structure based on crow search algorithm
    Korean Patent No. 10-2694148 (Granted Aug 2024)
    Co-inventor

Interests

  • Multimodal representation learning (text–image–video)
  • Evaluation and alignment of vision–language models
  • Retrieval-augmented generation and video summarization
  • Safety, brand suitability, and content classification

Selected themes

  • Building evaluation pipelines for large-scale video catalogs
  • Leveraging LLMs and VLMs for automatic labeling and curation
  • Designing long-form understanding tasks over video segments

Projects

Below are a few representative areas I’ve worked on recently. Some are research-oriented; others are closer to production ML engineering.

Large-Scale Video Understanding Pipeline

End-to-end pipeline for semantic video analysis and retrieval using multimodal embeddings, automatic speech recognition, and LLM-based content understanding. Developed novel fusion techniques for combining visual, audio, and textual features at scale.

  • Multimodal fusion
  • Video-text retrieval
  • Scalable pipelines

Multimodal AI Safety & Classification

Fine-tuning and evaluation frameworks for vision-language models in content classification tasks. Developed interpretable multi-label classification systems with emphasis on robustness, fairness, and explainable AI outputs.

  • VLM fine-tuning
  • Explainable AI
  • Model evaluation

Academic Research Acceleration Tools

AI-powered tools for efficient academic paper analysis, including automated highlighting, semantic note-taking, and LLM-assisted literature review workflows. Designed to accelerate research discovery and knowledge synthesis.

  • NLP pipelines
  • Knowledge extraction
  • Research automation

Experience (short)

A detailed CV is available on request; this is a compact snapshot of my recent work.

  • Machine Learning Researcher

    Industry · Multimodal AI & Video Understanding

    Working on video understanding, multimodal retrieval, and large-scale evaluation frameworks that connect LLMs/VLMs to real production workloads.

  • PhD in Computer Science

    Deep Learning & Representation Learning

    Research on model design and training methods with a focus on reducing human effort and deploying practical systems.

Contact

I’m open to conversations about research collaboration, postdoctoral opportunities, applied ML projects, and practical deployment of multimodal systems.

Email: ahmadmobeen24@gmail.com