Reinforcement Fine Tuning Expert Needed for GRPO

Reinforcement Fine Tuning Expert Needed for GRPO

Reinforcement Fine Tuning Expert Needed for GRPO

Upwork

Upwork

Remoto

7 hours ago

No application

About

I need help running my reinforcement fine tuning (GRPO). Reward function and dataset are ready but I need help setting up properly for maximum throughput on the machine used (4x or 8x H200). Then you need to be an expert at reinforcement learning to view results and make tweaks to training setup such as hyper parameters and reward function.