RL Environments; MLE; LLM Tasks; Difficulty Distribution; Remote Contractor; PST Overlap (≥4h); Advanced English (C1/C2);
We’re hiring RL Environments Engineers to design and build MLE/SWE environments that deliver high-quality, diverse tasks with minimal supervision. You will target a specific language model, meet a defined difficulty distribution, and deliver about one task every 10 hours. This is a remote contractor role with ≥4 hours overlap to PST and advanced English (C1/C2) required.
Company Preference Model via XOR.AI
Preference Model is building the next generation of training data to power the future of AI. Today's models are powerful but fail to reach their potential across diverse use cases because so many of the tasks that we want to use these models for are outside of their training data distribution. Preference Model creates reinforcement learning environments that encapsulate real-world use cases, enabling AI systems to practice, adapt, and learn from feedback grounded in reality. We seek to bring the real world into distribution for the models.
Our founding team has previous experience on Anthropic’s data team building data infrastructure, tokenizers, and datasets behind the Claude model. We are partnering with leading AI labs to push AI closer to achieving its transformative potential.
The company is backed by Tier 1 Silicon Valley VC.
What we’re looking for (must-haves)
Strong signals (nice-to-have, big plus)
Not a fit if
Log InOnly registered users can open employer contacts.
Our website uses cookies, including web analytics services. By using the website, you consent to the processing of personal data using cookies. You can find out more about the processing of personal data in the Privacy policy