Ways to create Spark Session, setting and getting spark configurations:-
creating spark session can be done in many ways
- Directly using sparksession
- spark conf → spark Context → spark session
- spark conf → spark session
- spark conf →spark context
and setting spark configuration properties can be done in many ways
- directly setting using spark session variable — — spark.conf.set()
- using sparkconf variable — conf.set()
- By sending in spark submit command
You can find all the above ways of creating spark session and setting configuration properties in below. The most widely used method in real time projects are they used to create a separate file and they list all the spark configuration properties in that file. They create a spark conf which will set all the configurations mentioned in that file. You can find this approach mentioned as 2 in below
- sparksession : SparkSession is an entry point to Spark and creating a SparkSession instance would be the first statement you would write to program. Though SparkContext used to be an entry point prior to 2.0, It is not completely replaced with SparkSession, many features of SparkContext are still available and used in Spark 2.0 and later. You should also know that SparkSession internally creates SparkConfig and SparkContext with the configuration provided with SparkSession.
Spark Session also includes all the APIs available in different contexts –
- Spark Context,
- SQL Context,
- Streaming Context,
- Hive Context.
SQL context can be created by:
scala> val sqlcontext = spark.sqlContext
val spark = SparkSession.builder()
.master(“local[1]”)
.appName(“SparkByExamples.com”)
.getOrCreate();
org.apache.spark.sql.SparkSession
- Without using sparkconf, directly using sparksession configs
val warehouseLocation = "file:${system:user.dir}/spark-warehouse"
val spark = SparkSession
.builder()
.appName("SparkSessionZipsExample")
.config("spark.sql.warehouse.dir", warehouseLocation)
.enableHiveSupport()
.getOrCreate()//set new runtime options
spark.conf.set("spark.sql.shuffle.partitions", 6)
spark.conf.set("spark.executor.memory", "2g")
//get all settings
val configMap:Map[String, String] = spark.conf.getAll()
2. using spark conf.
- configuration in separate file (anyname ex:spark.conf — but need to keep the file in project root) (key value pair with = )
create a file — spark.conf and paste below configurations
spark.app.name = appname
spark.master = local[3]
In code create a method to set the configs provided in above file (same a property class in java)
val conf = new SparkConf()
val prop = new Properties()
prop.load(Source.fromFile(“spark.conf”).bufferedReader())
prop.forEach((k,v) => conf.set(k.toString,v.toString))
conf //return configuration
2. Sending through spark submit command
double quote is necessary if we have space in it or else not required
if we are giving configs in both sparkconf class and in spark submit then spark conf properties takes highes precedences
deploy configs can be put in spark submit and runtime configs can be put in spark conf
- sparkconf →sparksession (setting spark conf and then passing it into sparksession)
val conf = new SparkConf()
.setAppName("appName")
.setMaster("local");(or)val conf = new SparkConf()
conf.set("spark.app.name","appname")
conf.set("spark.master","local[3]")val sparkSession = SparkSession
.builder()
.config(conf)
.getOrCreate();
2. sparkconf →sparkcontext →Spark Session
// If you already have SparkContext stored in `sc`
val spark = SparkSession.builder.config(sc.getConf).getOrCreate()
// Another example which builds a SparkConf, SparkContext and SparkSession
val conf = new SparkConf().setAppName("sparktest").setMaster("local[2]")
val sc = new SparkContext(conf)
val spark = SparkSession.builder.config(sc.getConf).getOrCreate()
3. sparkconf →sparkcontext
val conf = new SparkConf()
.setAppName("appName")
.setMaster("local[*]")
val sc = new SparkContext(conf)
How can we get spark configurations which are already set
- Getall configs
val arrayConfig=spark.sparkContext.getConf.getAll
for (conf <- arrayConfig)
println(conf._1 +", "+ conf._2)
2. Get only particular configs
print(“spark.sql.shuffle.partitions ==> “+spark.sparkContext.getConf.get(“spark.sql.shuffle.partitions”))
// Display below value
// spark.sql.shuffle.partitions ==> 300Feel free to drop me a mail if you have any queries or planning to make a career in data engineering — “msdilli1997@gmail.com” . I can help you to get your doubts clarified