AI Model Evaluator (LLM & Agent Systems)

micro1

Remote

Global

All Levels

Posted: February 24, 2026

$20 – $30 / Hour

AI-Powered Summary

Job Description

Job Title: AI Model Evaluator (LLM & Agent Systems)

Employment Type: Contract (Minimum 2 weeks, with potential extension)

Location: Remote

Job Summary:

Join our customer's team as an AI Model Evaluator (LLM & Agent Systems) and play a pivotal role in shaping the future of generative AI and autonomous agents. You'll help benchmark, analyze, and assess cutting-edge AI systems in real-world scenarios, providing structured insights that drive improvements. This position is ideal for analytical professionals passionate about AI quality and real-world impact.

Key Responsibilities:

Evaluate outputs from large language models (LLMs) and autonomous agent systems against defined guidelines and rubrics
Review multi-step agent actions, including screenshots and reasoning traces, to determine accuracy and quality
Consistently apply evaluation standards, flagging edge cases and identifying recurring patterns or failure modes
Provide detailed, structured feedback to inform benchmarking, product evolution, and model refinement
Participate in calibration and alignment sessions to ensure consistent application of evaluation criteria
Work collaboratively to adapt to evolving scenarios and ambiguous evaluation situations
Document findings and communicate insights clearly both in writing and verbally to relevant stakeholders

Required Skills and Qualifications:

Demonstrated experience with LLM evaluation, AI output analysis, QA/testing, UX research, or similar analytical roles
Strong background in AI model evaluation, benchmarking, and applying rubric-based scoring frameworks
Exceptional attention to detail and sound judgement in ambiguous or edge-case scenarios
Proficiency in English (B2+ or equivalent) with excellent written and verbal communication skills
Ability to adapt quickly to evolving guidelines and work independently
Comfort with remote work and a commitment of at least 20 hours per week for the initial term
Analytical mindset with a focus on actionable, qualitative feedback

Preferred Qualifications:

Experience with RLHF, annotation workflows, or AI benchmarking frameworks
Familiarity with autonomous agent systems or workflow automation tools
Background in mobile apps or digital product evaluation processes

Related Tags

LLMs

Generative AI

AI Model Evaluation

AI Benchmarking

AI Quality Assessment

Model Performance Evaluation

Prompt Response Evaluation

AI Output Analysis

Rubric-Based Scoring

Similar Opportunities

FULL_TIMECONTRACTOR

Talent Ops Manager

micro1 · Remote, Global

Job DescriptionJob Title: Talent Ops ManagerJob Type: Full-timeLocation: Remote$15 - $25/hrAbout Us:micro1 is the end-to-end human data infrastructure behind AGI. Our AI recruiter model is used by fro...

Skills · Large scale onboarding · Client communication · Deep ownership · Customer management

Posted February 24, 2026

View Details

FULL_TIME

English Language Experts - Writers (Freelance, Remote, 20+ openings)

Braintrust · Remote, Global

Job descriptionEnglish Language ExpertsAre you excited about writing, research, and language — and eager to apply your skills to the future of AI?This is a fully remote, fully flexible role where you...

Skills · English Language Expert jobs · AI Data Trainer · Remote writing jobs · AI training jobs

Posted February 24, 2026

View Details

FULL_TIME

Writer / AI Annotator

iMerit · Remote, Global

Job descriptionPosition Overview:iMerit seeks detail-oriented and analytically minded Multimodal GenAI Evaluation Analysts toperform highly nuanced evaluations of AI system outputs across different mo...

Skills · Data annotation · LLM evaluation · content moderation · Remote

Posted February 24, 2026

View Details

FULL_TIME

English Teacher Expert

micro1 · Remote, Global

Job DescriptionJob Title: English Teacher ExpertJob Type: Full-TimeLocation: RemoteJob Summary:Join our dynamic customer's team as an English Teacher Expert, where your passion for teaching and master...

Skills · English teacher · remote · linguistics

Posted February 11, 2026

View Details

FULL_TIMEPART_TIME

Accountant Expert

micro1 · Remote, Global

Job DescriptionJob Title: Accountant ExpertJob Type: Contract (Full-time or Part-time)Location: Remote$45 - $95/hrJob Summary:Join our customer's team as an Accountant Expert, contributing your advanc...

Skills · Accounting · Finance · Auditing

Posted February 11, 2026

View Details

FULL_TIMEPART_TIME

Hindi Bilingual Expert

micro1

Job DescriptionJob Title: Hindi Bilingual ExpertJob Type: Contract (Full-time or Part-time)Location: Remote$45 - $95/hour payJob SummaryJoin our customer's team as a Hindi Bilingual Expert, where your...

Skills · Bilingual expert · Freelance · work from home

Posted February 11, 2026

View Details