I have seen the Config, and I will Update: Apache Spark (and its roots in Scala)

Thursday, 23 April 2026

Apache Spark (and its roots in Scala)

Apache Spark is a foundational layer underlying many data platforms.

It is written both in Java and Scala. Read the source code here.

A good starting point is SparkSession.scala.

One of Spark's "selling points" is "Exploratory Data Analysis (EDA) on petabyte-scale data without having to resort to downsampling" (see detailed post on downsampling).

A petabyte (PB) holds 1000 terabytes (one thousand million million bytes).

I have seen the Config, and I will Update

Thursday, 23 April 2026

Apache Spark (and its roots in Scala)

No comments:

My Blog List