Skip to content

Scala library and abstractions for AWS DataPipeline

License

Notifications You must be signed in to change notification settings

hoangelos/hyperion

 
 

Repository files navigation

Hyperion

Gitter Stories in Ready Build Status

In Starcraft, the Hyperion is a Behemoth-class battlecruiser. During the Second Great War, Raynor's Raiders made strategic decisions on the Hyperion's bridge -- the battlecruiser's command center.

Library and abstractions of AWS data pipeline.

Problem Statement

This project aims to solve the following problem:

  1. Make it easy to define an AWS DataPipeline using a clear, fluent Scala DSL

Configuration

Add the Sonatype.org Releases repo as a resolver in your build.sbt or Build.scala as appropriate.

resolvers += "Sonatype.org Releases" at "https://oss.sonatype.org/content/repositories/releases/"

Add hyperion as a dependency in your build.sbt or Build.scala as appropriate.

libraryDependencies ++= Seq(
  // Other dependencies ...
  "com.krux" %% "hyperion" % "3.5.1"
)

Scala Versions

This project is compiled, tested, and published for the following Scala versions:

  1. 2.10.6
  2. 2.11.7

Usage

Creating a pipeline

To create a new pipeline, create a Scala class in com.krux.datapipeline.pipelines. Look at ExampleSpark for an example pipeline.

Manually uploading

To generate a JSON file describing the pipeline, ensure you have created the assembly:

$ sbt assembly

Then, run hyperion with the class name (specify the external jar location if it's not in the classpath):

$ ./hyperion [-jar your-jar-implementing-pipelines.jar] your.pipelines.ThePipeline generate > ThePipeline.json

Then you can go to the AWS Data Pipeline Management Console, click Create new pipeline and enter the class name for Name and click Import a definition and select Load local file. Finally, click Activate.

Automatically uploading

To create a pipeline automatically, ensure you have created the assembly:

$ sbt assembly

Then, run hyperion with create and the class name:

$ ./hyperion [-jar your-jar-implementing-pipelines.jar] your.pipeline.ThePipeline create

This will use the DataPipeline API to create the pipeline and put the pipeline definition.

Activating a pipeline

You can activate a pipeline either in the Data Pipeline Management Console, by using the --activate option when using create command or by using the activate command.

$ ./hyperion activate df-1234567890

Scaladoc API

The Scaladoc API for this project can be found here.

License

Hyperion is licensed under APL 2.0.

Note

Due to AWS data pipeline bug, all schemas involve data pipleine needs be available in the default search_path.

For more details: https://forums.aws.amazon.com/thread.jspa?threadID=166340

About

Scala library and abstractions for AWS DataPipeline

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Scala 96.8%
  • Shell 3.2%