Experience with large-scale distributed training and different parallelism techniques for scaling up training, such as FSDP and tensor/pipeline parallelism
Expertise in the area of Generative AI, specifically when it comes to training foundation models, fine tuning them, and distilling them to smaller models
What the job involves
Design and build the platform that powers large-scale machine learning model training, fine-tuning, model transformation and evaluations workflows and use cases from the entire company
Co-design and optimize the systems and models to scale up and increase the cost-effectiveness of machine learning model training
Design easy-to-use APIs and interfaces for experienced ML practitioners, as well as non-experts to easy access the training platform
Application process
Job is open for no less than 7 days and will be removed when the position is filled.
Netflix is dedicated to providing the world with exceptional entertainment experiences through a vast array of genres and languages, allowing members to enjoy content anytime, anywhere.