spark-documentdb-example

Example project showing how to use Microsoft DocumentDB Input/Output formats with Apache Spark.

In this example we write to a remote DocumentDB instance a few thousands lines using hadoop output format classes, read it back using input formats and check written data is correct.

Prerequisites:

Maven 3.3.x
Scala 2.10.4 successfully installed and available in PATH.
You have successfully deployed a DocumentBD instance in Microsoft Azure.

Build the project as:

mvn clean package

Run the example project as:

java -cp target/com.keedio.azure.example-1.0-SNAPSHOT.jar -Dazure.documentdb.host="https://<personal_documentdb_domain>.documents.azure.com:443/" -Dazure.documentdb.key="<documentdb_primary_key>" -Dazure.documentdb.dbname=<example_documentdb_databasename> -Dazure.documentdb.collection.output="<test_output_collection>" com.keedio.azure.documentdb.example.SaveToDocumentDbTest

Property explanation:

azure.documentdb.host: the DocumentDB instance URI as provided in the azure portal.
azure.documentdb.key: the DocumentDB instance primary key (as provided in the azure portal).
azure.documentdb.dbname: the name of the database as created in the DocumentDB instance.
azure.documentdb.collection.output: the output collection to write example data to.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
project		project
src/main		src/main
.gitignore		.gitignore
README.md		README.md
pg1012.txt		pg1012.txt
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

spark-documentdb-example

About

Releases

Packages

Contributors 2

Languages

keedio/spark-documentdb-example

Folders and files

Latest commit

History

Repository files navigation

spark-documentdb-example

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages