SparkContext

In order to use Spark and its API we will need to use a SparkContext. When running Spark, you start a new Spark application by creating a SparkContext (http://spark.apache.org/docs/latest/api/python/pyspark.html#pyspark.SparkContext).

When using Databricks, the SparkContext is created for you automatically as sc.

sc

Historically, Apache Spark has had two core contexts that are available to the user. The sparkContext and the SQLContext made available as sqlContext, these contexts make a variety of functions and information available to the user. The sqlContext makes a lot of DataFrame functionality available while the sparkContext focuses more on the Apache Spark engine itself.

print(sqlContext)
print(sc)

In Apache Spark 2.X, there is a new context - the SparkSession (https://spark.apache.org/docs/latest/api/python/pyspark.sql.html?highlight=udf#pyspark.sql.SparkSession).

We can access it via the spark variable. As Dataset and Dataframe API are becoming new standard, SparkSession is the new entry point for them.

# If you're on 2.X the spark session is made available with the variable below
spark

sparkContext is still used as the main entry point for RDD API and is available under sc or spark.sparkContext.

print(spark.sparkContext)
print(sc)

results matching ""

    No results matching ""