Machine Learning Engineer – Chatbot Optimization & Performance
Upwork

Remoto
•2 horas atrás
•Nenhuma candidatura
Sobre
We are looking for an experienced Machine Learning Engineer with a focus on optimizing chatbot latency and scalability. The ideal candidate will work on improving response time and computational efficiency of NLP models used in real-time conversational systems. Responsibilities: • Analyze existing NLP and intent classification models for performance bottlenecks. • Optimize transformer-based models using distillation, quantization, and model pruning. • Implement hybrid inference pipelines — lightweight models for common intents, heavier models for fallback queries. • Deploy and monitor chatbot models in production (using Docker, FastAPI, or TensorFlow Serving). • Use caching mechanisms to reduce repetitive inference costs. • Collaborate with DevOps and data teams to ensure seamless integration and continuous deployment. • Benchmark model latency across CPUs and GPUs and optimize hardware utilization. • Implement logging, monitoring, and alerting for real-time response performance. Qualifications: • Bachelor’s or Master’s in Computer Science, AI, or related field. • 2+ years of experience with NLP models (BERT, GPT, Rasa NLU, etc.). • Strong skills in PyTorch, TensorFlow, or Hugging Face Transformers. • Familiarity with ONNX, TensorRT, or OpenVINO for inference acceleration. • Understanding of model compression, quantization, and knowledge distillation. • Experience deploying models via APIs or microservices (FastAPI, Flask, etc.). • Good understanding of real-time systems and latency metrics.




