What type of work is available

1. Frontier evals

Human data and expert-led evaluations designed to help train and improve foundational AI models. These pipelines often involve specialized domains where expert reasoning, analysis, or review helps expand model capabilities and performance.

2. Real world robotics

Contributors record everyday physical tasks to generate real-world datasets that help train humanoid robots and physical AI systems to understand and perform generalist actions.

3. Contextual evals

As enterprises deploy AI agents, they require evaluations tailored to their specific workflows and environments. These pipelines focus on niche, domain-specific assessments that consider business context, real use cases, and operational constraints across different industries.

​1. Frontier evals

​2. Real world robotics

​3. Contextual evals

1. Frontier evals

2. Real world robotics

3. Contextual evals