Centific Global Technologies Pte. Ltd. logo

Research Intern, Multimodal LLM Benchmarking

Job Overview

Location

Remote Work( USA)

Job Type

Full-time

Category

Data Science

Date Posted

May 17, 2026

Full Job Description

đź“‹ Description

  • • Design and develop evaluation benchmarks for multimodal foundation models across text-image, text-audio, text-video, or cross-modal retrieval combinations, defining task formats, annotation guidelines, scoring criteria, and coverage dimensions.
  • • Execute benchmarks against multimodal models, analyze performance patterns, identify failure modes, and synthesize findings into clear, actionable research summaries and recommendations.
  • • Investigate and compare automated scoring approaches for multimodal outputs, including model-as-judge methods, reference-free metrics, and human alignment studies, assessing tradeoffs in reliability, validity, cost, and scalability.
  • • Contribute to the collection, filtering, and quality review of multimodal evaluation datasets, including designing annotation schemes and conducting inter-rater reliability analysis.
  • • Survey the state of the art in multimodal evaluation and benchmarking, identify gaps in existing benchmark coverage, and propose novel evaluation methodologies grounded in academic literature.
  • • Produce high-quality internal research write-ups, benchmark datasheets, and presentation-ready summaries of findings tailored for both technical and non-technical audiences.
  • • Focus on one or more primary areas: vision-language evaluation (e.g., image captioning, visual question answering, document understanding, chart reasoning), audio-speech-language benchmarking (e.g., spoken language comprehension, audio captioning), video understanding benchmarks (e.g., temporal reasoning, video QA, video-text retrieval), cross-modal consistency and robustness testing under perturbations or distribution shifts, or automated multimodal scoring via judge-model pipelines.
  • • Work with multimodal models and datasets using Python, PyTorch, and Hugging Face Transformers for data processing, model inference, and quantitative analysis.
  • • Apply statistical analysis to interpret benchmark results, including understanding variance, significance, and limitations of evaluation conclusions.
  • • Collaborate with senior research scientists and ML engineers on frontier AI evaluation problems within an integrated ecosystem of 1.8 million domain experts and 150+ PhDs.
  • • Engage with enterprise AI workflows and customer-facing research consulting as part of an applied research team focused on reducing GenAI costs and accelerating deployment.
  • • Document all research activities with precision to support reproducibility, publication potential, and open-source benchmark releases.
  • • Communicate complex technical findings through structured written reports and presentations to diverse internal stakeholders.

🎯 Requirements

  • • Currently enrolled in an MS or PhD program in Computer Science, Machine Learning, Statistics, AI, Linguistics, or a closely related quantitative field.
  • • Coursework, research projects, or hands-on experience with multimodal models, vision-language systems, or NLP, with familiarity with at least one non-text modality (image, audio, or video).
  • • Exposure to model evaluation concepts such as benchmark design, metric selection, or experimental comparison through academic or internship work.
  • • Solid Python skills for data processing, model inference, and quantitative analysis; working experience with PyTorch or Hugging Face Transformers.
  • • Comfort with basic statistical analysis including understanding variance, significance, and limitations of benchmark conclusions.
  • • Ability to write clearly and present findings in an organized, audience-appropriate manner.

🏖️ Benefits

  • • Mentorship from senior research scientists and ML engineers working on frontier AI evaluation problems.
  • • Ownership of a focused, publishable research project with real-world impact on how leading AI models are evaluated.
  • • Exposure to enterprise AI workflows, customer-facing research consulting, and cross-functional applied research teams.
  • • Potential co-authorship on publications or open-source benchmark releases upon completion of high-quality work.
  • • A competitive internship stipend of $40/hr and flexible hybrid/remote working arrangement.

Skills & Technologies

Python
PyTorch
Junior
Remote
Degree Required

Ready to Apply?

You will be redirected to an external site to apply.

Centific Global Technologies Pte. Ltd. logo
Centific Global Technologies Pte. Ltd.
Visit Website

About Centific Global Technologies Pte. Ltd.

Centific is a data-centric AI services company providing data collection, annotation, and model validation solutions to enterprises and technology vendors. It operates a global crowd platform that combines human intelligence with automation to prepare, curate, and test datasets for computer vision, NLP, and generative AI applications. The company supports full AI lifecycle needs, from training data to reinforcement learning and model safety, serving industries including retail, automotive, healthcare, and technology. Headquartered in Singapore, Centific maintains delivery centers across Asia, Europe, and North America.

Get more remote jobs like this

Subscribe to the weekly newsletter for similar remote roles and curated hiring updates.

Newsletter

Weekly remote jobs and featured talent.

No spam. Only curated remote roles and product updates. You can unsubscribe anytime.

Similar Opportunities

India
Contract
Expires Jun 27, 2026
Remote

1 month ago

Apply
United States - Remote
Full-time
Expires Jul 21, 2026
Remote
Degree Required

16 days ago

Apply
Los Angeles Office
Full-time
Expires Aug 3, 2026
REST
Remote
$115k-120k

3 days ago

Apply
US California (Redwood City) - Office
Full-time
Expires Aug 3, 2026
Product Management
Onsite

3 days ago

Apply