Apache Spark is a foundational layer underlying many data platforms.
It is written both in Java and Scala. Read the source code here.
A good starting point is SparkSession.scala.
One of Spark's "selling points" is "Exploratory Data Analysis (EDA) on petabyte-scale data without having to resort to downsampling" (see detailed post on downsampling).
A petabyte (PB) holds 1000 terabytes (one thousand million million bytes).
No comments:
Post a Comment