Apache Spark is a data processing framework that can process large data sets, and can distribute these tasks across multiple computers. Spark can be deployed in a variety of ways, providing native bindings for Java, Scala, Python, and R programming languages. Spark supports SQL, streaming data, machine learning, and graph processing.
Spark can run in a standalone cluster mode or can run on Cluster Management system like Hadoop, Yarn, Kubernetes or Docker.