Apache Spark Build in Functions

PySpark

from pyspark.sql.functions import mean
df.select(mean(df.column1)).show()

df.column1.mean()

All code can be downloaded below and you can run it complete for free in Google Colab.

print(dir(functions))

So lets start to use the Spark functions:

 from pyspark.sql.functions import lower, upper, substring

If you need any helper for any functions you can enter

help(substring)

If you want to display the output in Upper you only need to select

rc.select(upper(col('Primary Type'))).show(5)

rc.select(min(col('Date')),max(col('Date'))).show(1)

What is 3 days earlier that the oldest date and 3 days later than the most recent date?

from pyspark.sql.functions import date_sub, date_add
rc.select(date_sub(min(col('Date')),3),date_add(max(col('Date')),3)).show(1)

You can download our example here

Fellow Consulting AG

If you have any question please contact me on this channels:

7. March 2024

15. February 2024

18. January 2024

11. January 2024

14. December 2023