In the previous article, we saw how to launch Spark applications with the Spark Operator. In this article, we’ll see how to do the same thing, but natively with spark-submit. Let’s first explain the differences between the two ways of deploying your driver on the worker nodes.
Je vous parle d’un temps
Que les moins de vingt ans
Ne peuvent pas connaître 🎶
Until not long ago, the way to go to run Spark on a cluster was either with Spark’s own standalone cluster manager, Mesos or YARN. In the meantime, the Kingdom of Kubernetes has risen and spread widely.
And when it comes to run Spark on Kubernetes, you now have two choices:
Use “native” Spark’s Kubernetes capabilities: Spark can run on clusters managed by Kubernetes since Spark 2.3. Kubernetes support was still flagged as experimental until very recently, but as per SPARK-33005 Kubernetes GA Preparation, Spark on Kubernetes is now fully supported and production ready! 🎊
Use the Spark Operator, proposed and maintained by Google, which is still in beta version (and always will be).
This series of 3 articles tells the story of my experiments with both methods, and how I launch Spark applications from Python code.
“Cabin crew, arm doors and cross check”. Let’s go! ✈