Remote Software Developer (LLM Evaluation) - Remote

Posted:	12/06/26
Recruiter:	Turing
Reference:	3120757027
Type:	Contract
Disciplines:	Developer
Salary:	Competitive £? - ? per year
Location:	London
Description:	Salary: £? - ? per year Requirements: 3+ years of software engineering experience Strong expertise in building full-stack applications and deploying scalable, production-grade software using modern languages and tools Deep understanding of software architecture, design, development, debugging, and code quality/review assessment Excellent oral and written communication skills for clear, structured evaluation rationales Responsibilities: Work on AI model training initiatives by curating code examples, building solutions, and correcting code in Python, JavaScript (including ReactJS), C/C++, Java, Rust, and Go Evaluate and refine AI-generated code to ensure it is efficient, scalable, and reliable Collaborate with cross-functional teams to enhance AI-driven coding solutions against industry performance benchmarks Build agents that can verify the quality of code and identify error patterns Hypothesize on steps in the software engineering cycle (prototyping, architecture design, API design, production implementation, launch, experiments, monitoring, operational maintenance) and evaluate model capabilities on them Design verification mechanisms that can automatically verify a solution to a software engineering task Technologies: AI API C# FastAPI Java JavaScript LLM Python Rust More: - Role: Remote Software Developer (LLM Evaluation) - Company: Turing - based in San Francisco, California; research accelerator for frontier AI labs and partner for enterprises deploying advanced AI systems - Tech stack: Python, JavaScript, C++, C#, Java, FastAPI (and other modern languages/tools) - Category: Python Developer / Engineer - Location address: 548 Market Street, PMB 18282, San Francisco, United States - Salary: 200 - 300 USD per hour - Benefits & perks: Fully home office / remote work, Flexible work time - Project overview: Create datasets for training, benchmarking, and advancing large language models; curate code examples; provide precise solutions and corrections across multiple languages; evaluate and refine AI-generated code; collaborate with researchers and cross-functional teams to enhance enterprise-level AI-driven coding solutions - Engagement details: - Commitment: flexible engagement, minimum 10 hrs/week, up to 40 hrs/week (partial PST overlap required) - Type: Contractor (no medical/paid leave) - Duration: 1 month (starting next week; potential extensions based on performance and fit) - Candidate locations: must be based in US, Canada, or Western European countries (e.g., UK, Netherlands, Italy, Germany) last updated 24 week of 2026
Email a friend Add to shortlist Return to search results

Remote Software Developer (LLM Evaluation) - Remote

Recruiting now