from pyspark.sql import SparkSession
from pyspark.sql.functions import col
# Create Spark session
spark = SparkSession.builder.appName("SampleExample").getOrCreate()
# Sample data
data = [
(1, "Alice", 5000),
(2, "Bob", 7000),
(3, "Charlie", 4000)
]
columns = ["id", "name", "salary"]
# Create DataFrame
df = spark.createDataFrame(data, columns)
# Apply transformation
filtered_df = df.filter(col("salary") > 4500)
# Display result
filtered_df.show()
Databricks has changed the way organizations approach large-scale information processing by combining collaborative notebooks, distributed computing, and optimized storage layers within a unified environment. Instead of separating engineering, analytics, and machine learning into isolated systems, the platform brings these disciplines together so teams can experiment, validate, and deploy with confidence. Its integration with Apache Spark allows workloads to scale horizontally, while managed infrastructure reduces operational overhead. For professionals working with modern cloud ecosystems, it offers flexibility to design ingestion pipelines, transform raw datasets into structured models, and deliver insights efficiently without maintaining complex clusters manual