Tagged "Spark"

My Journey With Spark On Kubernetes... In Python (Part 3 of 3)

We need to operate Kubernetes as part of a Python client application. So, we need to interact with the Kubernetes REST API. Luckily we do not need to implement the API calls and manage HTTP requests/responses ourselves: we can rely on the Kubernetes Python client, among other officially-supported Kubernetes client libraries for other languages such as Go, Java, .NET, JavaScript and Haskell (there are also a lot of community-maintained client libraries for many languages).

My Journey With Spark On Kubernetes... In Python (Part 2 of 3)

In the previous article, we saw how to launch Spark applications with the Spark Operator. In this article, we’ll see how to do the same thing, but natively with spark-submit. Let’s first explain the differences between the two ways of deploying your driver on the worker nodes.

My Journey With Spark On Kubernetes... In Python (Part 1 of 3)


Je vous parle d’un temps
Que les moins de vingt ans
Ne peuvent pas connaître

Until not long ago, the way to go to run Spark on a cluster was either with Spark’s own standalone cluster manager, Mesos or YARN. In the meantime, the Kingdom of Kubernetes has risen and spread widely.

And when it comes to run Spark on Kubernetes, you now have two choices:

  • Use “native” Spark’s Kubernetes capabilities: Spark can run on clusters managed by Kubernetes since Spark 2.3. Kubernetes support was still flagged as experimental until very recently, but as per SPARK-33005 Kubernetes GA Preparation, Spark on Kubernetes is now fully supported and production ready! 🎊

  • Use the Spark Operator, proposed and maintained by Google, which is still in beta version (and always will be).

This series of 3 articles tells the story of my experiments with both methods, and how I launch Spark applications from Python code.

“Cabin crew, arm doors and cross check”. Let’s go! ✈