star-spark

This project is greatly inspired by airstream of Airbnb and S2Job.

You can define a Spark job without writing even one line of Java/Scala codes. What you only need to do is just to write a job description file(json format).

QuickStart

Step 1. write a job description file.

Example

hive_source_example.json

{
    "name": "sampleJob",
    "source": [
        {
            "name": "hiveSource",
            "inputs": [],
            "type": "hive",
            "options": {
            	"sql": "select * from hive_user"
            }
        }
    ],
    "process": [
        {
            "name": "transform",
            "inputs": ["hiveSource"],
            "type": "sql",
            "options": {
                "sql": "select * from hiveSource WHERE id = 1"
            }
        }
    ],
    "sink": [
        {
            "name": "hdfs_sink",
            "inputs": ["transform"],
            "type": "hdfs",
            "options": {
                "path": "/data/test",
                "format": "parquet"
            }
        }
    ]
}

Step 2. run it with spark

spark-submit --class com.easternallstars.star.spark.job.JobLauncher --master yarn --deploy-mode client star-spark.jar hive_source_example.json

Have a cup of coffee and wait util the job is over.

You can also upload job description file and jar to hdfs, and let Oozie schedule the spark job.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

star-spark

QuickStart

Step 1. write a job description file.

Step 2. run it with spark

Files

README.md

Latest commit

History

README.md

File metadata and controls

star-spark

QuickStart

Step 1. write a job description file.

Step 2. run it with spark