Skip to content

Latest commit

 

History

History
67 lines (49 loc) · 1.59 KB

README.md

File metadata and controls

67 lines (49 loc) · 1.59 KB

star-spark

This project is greatly inspired by airstream of Airbnb and S2Job.

You can define a Spark job without writing even one line of Java/Scala codes. What you only need to do is just to write a job description file(json format).

QuickStart

Step 1. write a job description file.

Example

hive_source_example.json

{
    "name": "sampleJob",
    "source": [
        {
            "name": "hiveSource",
            "inputs": [],
            "type": "hive",
            "options": {
            	"sql": "select * from hive_user"
            }
        }
    ],
    "process": [
        {
            "name": "transform",
            "inputs": ["hiveSource"],
            "type": "sql",
            "options": {
                "sql": "select * from hiveSource WHERE id = 1"
            }
        }
    ],
    "sink": [
        {
            "name": "hdfs_sink",
            "inputs": ["transform"],
            "type": "hdfs",
            "options": {
                "path": "/data/test",
                "format": "parquet"
            }
        }
    ]
}

Step 2. run it with spark

spark-submit --class com.easternallstars.star.spark.job.JobLauncher --master yarn --deploy-mode client star-spark.jar hive_source_example.json

Have a cup of coffee and wait util the job is over.

You can also upload job description file and jar to hdfs, and let Oozie schedule the spark job.