Job description

Position Overview:

iMerit seeks detail-oriented and analytically minded Multimodal GenAI Evaluation Analysts to

perform highly nuanced evaluations of AI system outputs across different modalities: text,

image, video, and multimodal interactions. Analysts will assess the accuracy, appropriateness,

quality, clarity, and cultural alignment of model outputs against complex guidelines, ensuring that

results align with project standards and real-world use cases. These evaluations will directly

inform the development and fine-tuning of advanced large language models (LLMs), vision

models (LVMs), and multimodal AI systems.

Role Responsibilities:

● Evaluate outputs generated by LLMs across multiple modalities (text, image captions,

video descriptions, and multimodal prompts).

● Assess quality against project-specific criteria such as correctness, coherence,

completeness, style, cultural appropriateness, and safety.

● Identify subtle errors, hallucinations, or biases in AI responses.

● Apply domain expertise and logical reasoning to resolve ambiguous or unclear outputs.

● Provide detailed written feedback, tagging, and scoring of outputs to ensure consistency

across the evaluation team.

● Escalate unclear cases and contribute to refining evaluation guidelines.

● Collaborate with Project Managers and Quality Leads to meet accuracy, reliability, and

turnaround benchmarks.

Skills & Competencies:

● Strong critical reading, observational, and evaluative skills across different modalities.

● Ability to articulate nuanced judgments with precision and clarity.

● Excellent English comprehension (CEFR B2 or above); additional languages a plus.

● Familiarity with LLMs, generative AI, and multimodal systems.

● Strong attention to detail and ability to apply guidelines consistently.

● Awareness of cultural and linguistic nuances, including potential bias and harm in AI

outputs.

● Comfort with evolving workflows, rapid feedback cycles, and complex quality

frameworks.

Requirements:

● Bachelor's degree/ diploma or equivalent educational qualification.

● 1+ years of experience in data annotation, LLM evaluation, content moderation, or

related AI/ML domains.

● Demonstrated experience working with data annotation tools and software platforms.

● Strong understanding of language and multimodal communication (instruction following

in image generation, fact-checking, narrative coherence in video, etc.).

● Ability to adapt quickly to changing project directions and fast-paced work environments.

● Previous experience creating or annotating complex data specifically for Large

Language Model (LLM) training.

● Prior exposure to generative AI, prompt engineering, or LLM fine-tuning workflows is a

plus.

While moderation of high-harm/high-risk material is not part of this role, candidates should be

aware that occasional exposure to NSFW or otherwise sensitive content may occur due to

imperfections in client-provided datasets. Applicants should indicate that they are comfortable

working in environments where such incidental exposure is a possibility.

What We Offer:

● Opportunities to shape the evaluation standards for next-generation multimodal AI

systems.

● Innovative and supportive global working environment.

● Competitive compensation and flexible remote working arrangements.

● Continuous learning and growth in applied AI evaluation.

Please acknowledge that you agree to the selection process below:

You will receive an iMerit platform assessment (15–30 minutes). If successfully completed, you’ll be invited to join the first project.
After onboarding, once you’ve completed 10 hours of work, a quality test will be conducted.
If you pass the quality test, you’ll continue on a 3-month project and will be invited to participate in upcoming projects.

Note:

You will complete a quick 15–30 minute assessment. This requires downloading a browser extension, which can be removed once the assessment is completed.
ID verification and background check are required.
Onboarding will be completed through iMerit’s platform.

For Digital Nomads: If you are currently traveling, please let us know. This ensures any discrepancies between your current location and your work authorization location do not affect your application.

Commitment:

Minimum 20 hours per week (flexible schedule).
You may work more hours if desired.

Hourly rates:

Writer / AI Annotator

AI-Powered Summary

Job description

Related Tags

Similar Opportunities