Your Mission
Attached to the Technical Direction team under Cloud Operation, as Cloud & IA Architect your principal remit will be to design, build-out and ensure the operational readiness of cloud infrastructures and AI platforms that underpin Keolis' information system. You will be the bridge between Cloud Engineering, Data Science / ML teams and our 24 × 7 Production Operations team. Your goal: raise the availability, performance, scalability and reliability of all passenger‑facing and back‑office services while enabling safe, efficient, and fast delivery of AI/ML capabilities.
You will:
- Define operational objectives and ownership for critical transport applications and AI/ML services (model latency, throughput, prediction accuracy, data freshness, drift detection).
- Build and maintain automated runbooks, self‑healing workflows and observability dashboards for both infra and model performance in Azure.
- Drive reliability and governance features directly into CI/CD and MLOps pipelines (Azure DevOps, Azure ML, MLflow) used by >40 product and data teams.
- Define and run post‑incident reviews and blameless retrospectives for infra, data and model incidents; turn insights into prioritized remediation and prevention plans.
- Champion best practices for Infrastructure‑as‑Code (Terraform), model deployment strategies (canary, blue/green, shadow), feature‑flagging and rollout governance.
- Coach Data Engineers and Operations staff to transition repetitive tasks into code, automate model retraining and deployment, and reduce manual toil.
- Partner with SecOps, Data Governance and Legal to embed guardrails for privacy, security, explainability and compliance (GDPR, model risk).
- Contribute to cost‑optimisation and capacity planning across PaaS, IaaS, managed AI services and data platforms (Azure ML, Synapse, Databricks, vWAN).
- Promote responsible AI: implement monitoring for bias, drift, data lineage, model explainability and lifecycle documentation.
- Evaluate and integrate generative AI and LLM capabilities when appropriate (OpenAI / Azure OpenAI, embeddings, retrieval‑augmented generation) ensuring safety and business alignment.
Our Tech Landscape
- Azure AD, VNets, vWAN, App Service, AKS, Functions, SQL MI, Event Hub, IoT Hub
- Azure ML, Azure OpenAI / Cognitive Services, Azure Synapse, Databricks, Data Lake Storage
- Hybrid Windows & Linux workloads, O365, Defender, Intune
- Azure DevOps Repos, Pipelines, Artifacts, Boards; MLflow, Azure Pipelines for MLOps
- HashiCorp Terraform & Packer, Bicep, Ansible
- Prometheus, Grafana, Azure Monitor, Log Analytics, Kusto (KQL)
- Python (pandas, scikit‑learn, PyTorch, TensorFlow), PowerShell, Go (nice‑to‑have)
- Model governance tools, feature stores, monitoring for data/model drift
Pour postuler :
https://keolis.contactrh.com/jobs/773/43688554