Formatted ReadMe files

oracle-samples · May 21, 2021 · a6ff6ff · a6ff6ff
1 parent dbba461
commit a6ff6ff
Show file tree

Hide file tree

Showing 6 changed files with 161 additions and 61 deletions.
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -1,24 +1,55 @@
-# Contributing to oci-dataflow-samples
+# Contributing to this repository
 
-*Copyright (c) 2021, Oracle and/or its affiliates. All rights reserved.*
+We welcome your contributions! There are multiple ways to contribute.
 
-Pull requests can be made under
-[The Oracle Contributor Agreement](https://www.oracle.com/technetwork/community/oca-486395.html)
-(OCA).
+## Opening issues
 
-For pull requests to be accepted, the bottom of
-your commit message must have the following line using your name and
-e-mail address as it appears in the OCA Signatories list.
+For bugs or enhancement requests, please file a GitHub issue unless it's
+security related. When filing a bug remember that the better written the bug is,
+the more likely it is to be fixed. If you think you've found a security
+vulnerability, do not raise a GitHub issue and follow the instructions in our
+[security policy](./SECURITY.md).
 
-```
+## Contributing code
+
+We welcome your code contributions. Before submitting code via a pull request,
+you will need to haved signed the [Oracle Contributor Agreement][OCA] (OCA) and
+your commits need to include the following line using the name and e-mail
+address you used to sign the OCA:
+
+```text
 Signed-off-by: Your Name <[email protected]>
 ```
 
-This can be automatically added to pull requests by committing with:
+This can be automatically added to pull requests by committing with `--sign-off`
+or `-s`, e.g.
 
-```
+```text
 git commit --signoff
-````
+```
+
+Only pull requests from committers that can be verified as having signed the OCA
+can be accepted.
+
+## Pull request process
+
+1. Ensure there is an issue created to track and discuss the fix or enhancement
+   you intend to submit.
+1. Fork this repository
+1. Create a branch in your fork to implement the changes. We recommend using
+   the issue number as part of your branch name, e.g. `1234-fixes`
+1. Ensure that any documentation is updated with the changes that are required
+   by your change.
+1. Ensure that any samples are updated if the base image has been changed.
+1. Submit the pull request. *Do not leave the pull request blank*. Explain exactly
+   what your changes are meant to do and provide simple steps on how to validate
+   your changes. Ensure that you reference the issue you created as well.
+1. We will assign the pull request to 2-3 people for review before it is merged.
+
+## Code of conduct
+
+Follow the [Golden Rule](https://en.wikipedia.org/wiki/Golden_Rule). If you'd
+like more specific guidelines, see the [Contributor Covenant Code of Conduct][COC].
 
-Only pull requests from committers that can be verified as having
-signed the OCA can be accepted.
+[OCA]: https://oca.opensource.oracle.com
+[COC]: https://www.contributor-covenant.org/version/1/4/code-of-conduct/
diff --git a/README.md b/README.md
@@ -1,37 +1,59 @@
 # Oracle Cloud Infrastructure Data Flow Samples
 
+This repository provides examples demonstrating how to use Oracle Cloud Infrastructure Data Flow, a service that lets you run any Apache Spark Application  at any scale with no infrastructure to deploy or manage.
 
-This repository provides examples demonstrating how to use Oracle Cloud Infrastructure Data Flow.
+## What is Oracle Cloud Infrastructure Data Flow
 
-## Setup
-* [Quick start](https://docs.oracle.com/en-us/iaas/data-flow/using/dfs_getting_started.htm)
+Data Flow is a cloud-based serverless platform with a rich user interface. It allows Spark developers and data scientists to create, edit, and run Spark jobs at any scale without the need for clusters, an operations team, or highly specialized Spark knowledge. Being serverless means there is no infrastructure for you to deploy or manage. It is entirely driven by REST APIs, giving you easy integration with applications or workflows. You can:
 
+* Connect to Apache Spark data sources.
 
-## How To
-| Description                                          | Python |
-|------------------------------------------------------|:------:|
-| CSV to Parquet                                       |[sample](./python/csv_to_parquet)|
-| Load to ADW                                          |[sample](./python/loadadw)|
+* Create reusable Apache Spark applications.
 
+* Launch Apache Spark jobs in seconds.
 
-For step-by-step instructions, see the README.txt files included with
+* Manage all Apache Spark applications from a single platform.
+
+* Process data in the Cloud or on-premises in your data center.
+
+* Create Big Data building blocks that you can easily assemble into advanced Big Data applications.
+
+## Before you Begin
+
+* You must have Set Up Your Tenancy and be able to Access Data Flow
+
+  * Setup Tenancy : Before Data Flow can run, you must grant permissions that allow effective log capture and run management. See the Set Up Administration[Set Up Administration](https://docs.oracle.com/iaas/data-flow/using/dfs_getting_started.htm#set_up_admin) section of Data Flow Service Guide, and follow the instructions given there.  
+  * Access Data Flow : Refer to this section on how to [Access Data Flow](https://docs.oracle.com/en-us/iaas/data-flow/data-flow-tutorial/getting-started/dfs_tut_get_started.htm#access_ui)
+
+## Sample Examples
+
+| Example            | Description | Python |
+|-------------------|:-----------:|:------:|
+| CSV to Parquet    |This application shows how to use PySpark to convert CSV data store in OCI Object Store to Apache Parquet format which is then written back to Object Store.              |[sample](./python/csv_to_parquet)|
+| Load to ADW       |This application shows how to read a file from OCI Object Store, perform some transformation and write the results to an Autonomous Data Warehouse instance.              |[sample](./python/loadadw)|
+
+For step-by-step instructions, see the README files included with
 each sample.
 
-## Running the Samples:
+## Running the Samples
+
 These samples show how to use the OCI Data Flow service and are meant
 to be deployed to and run from Oracle Cloud. You can optionally test
-these applications locally before you deploy them. To test these
-applications locally, Apache Spark needs to be installed.
+these applications locally before you deploy them.  When they are ready, you can deploy them to Data Flow without any need to reconfigure them, make code changes, or apply deployment profiles.To test these applications locally, Apache Spark needs to be installed. Refer to section on how to set the Prerequisites before you deploy the application locally [Setup locally](https://docs.oracle.com/en-us/iaas/data-flow/data-flow-tutorial/develop-apps-locally/front.htm).
 
 ## Install Spark
+
 To install Spark, visit [spark.apache.org](https://spark.apache.org/docs/latest/api/python/getting_started/index.html)
 and pick the installation path that best suits your environment.
 
-
 ## Documentation
 
 You can find the online documentation for OCI Data Flow at [docs.oracle.com](https://docs.oracle.com/en-us/iaas/data-flow/using/dfs_getting_started.htm).
 
+## Security
+
+See [Security](./SECURITY.md)
+
 ## Contributing
 
 See [CONTRIBUTING](./CONTRIBUTING.md)

diff --git a/SECURITY.md b/SECURITY.md
@@ -0,0 +1,39 @@
+# Reporting security vulnerabilities
+
+Oracle values the independent security research community and believes that
+responsible disclosure of security vulnerabilities helps us ensure the security
+and privacy of all our users.
+
+Please do NOT raise a GitHub Issue to report a security vulnerability. If you
+believe you have found a security vulnerability, please submit a report to
+<mailto:[email protected]> preferably with a proof of concept. Please review
+some additional information on [how to report security vulnerabilities to Oracle][1].
+We encourage people who contact Oracle Security to use email encryption using
+[our encryption key][2].
+
+We ask that you do not use other channels or contact the project maintainers
+directly.
+
+Non-vulnerability related security issues including ideas for new or improved
+security features are welcome on GitHub Issues.
+
+## Security updates, alerts and bulletins
+
+Security updates will be released on a regular cadence. Many of our projects
+will typically release security fixes in conjunction with the
+[Oracle Critical Patch Update][3] program. Security updates are released on the
+Tuesday closest to the 17th day of January, April, July and October. A pre-release
+announcement will be published on the Thursday preceding each release. Additional
+information, including past advisories, is available on our [security alerts][3]
+page.
+
+## Security-related information
+
+We will provide security related information such as a threat model, considerations
+for secure use, or any known security issues in our documentation. Please note
+that labs and sample code are intended to demonstrate a concept and may not be
+sufficiently hardened for production use.
+
+[1]: https://www.oracle.com/corporate/security-practices/assurance/vulnerability/reporting.html
+[2]: https://www.oracle.com/security-alerts/encryptionkey.html
+[3]: https://www.oracle.com/security-alerts/
diff --git a/python/csv_to_parquet/README.md b/python/csv_to_parquet/README.md
@@ -1,18 +1,19 @@
-# Convert CSV data to Parquet.
-Sample to convert CSV data to Parquet.
+# Convert CSV data to Parquet
 
+The most common first step in data processing applications, is to take data from some source and get it into a format that is suitable for reporting and other forms of analytics. In a database, you would load a flat file into the database and create indexes. In Spark, your first step is usually to clean and convert data from a text format into Parquet format. Parquet is an optimized binary format supporting efficient reads, making it ideal for reporting and analytics.
 
-## Prerequisites
-Before you begin:
+![Convert CSV Data to Parquet](./images/csv_to_parquet.png)
 
-* A - Ensure your tenant is configured according to the instructions [here](https://docs.cloud.oracle.com/en-us/iaas/data-flow/using/dfs_getting_started.htm#set_up_admin)
-* B - Know your object store namespace.
-* C - Know the OCID of a compartment where you want to load your data and create applications.
-* D - (Optional, strongly recommended): Install Spark to test your code locally before deploying.
+## Prerequisites
 
+Before you begin:
 
+* Ensure your tenant is configured according to the instructions to [setup admin](https://docs.cloud.oracle.com/en-us/iaas/data-flow/using/dfs_getting_started.htm#set_up_admin)
+* Know your object store namespace.
+* Know the OCID of a compartment where you want to load your data and create applications.
+* (Optional, strongly recommended): Install Spark to test your code locally before deploying.
 
-## Instructions:
+## Instructions
 
 1. Upload a sample CSV file to object store
 2. Customize csv_to_parquet.py with the OCI path to your CSV data. The format is ```oci://<bucket>@<namespace>/path```
@@ -26,11 +27,11 @@ Before you begin:
 7. Create a Python Data Flow application pointing to ```csv_to_parquet.py```
   7a. Refer [here](https://docs.cloud.oracle.com/en-us/iaas/data-flow/using/dfs_data_flow_library.htm#create_pyspark_app)
 
-
 ## To use OCI CLI to run the PySpark Application
 
 Create a bucket. Alternatively you can re-use an existing bucket.
-```
+
+```sh
 oci os bucket create --name <bucket> --compartment-id <compartment_ocid>
 oci os object put --bucket-name <bucket> --file csv_to_parquet.py
 oci data-flow application create \
@@ -43,8 +44,10 @@ oci data-flow application create \
     --file-uri oci://<bucket>@<namespace>/csv_to_parquet.py \
     --language Python
 ```
+
 Make note of the Application ID produced.
-```
+
+```sh
 oci data-flow run create \
     --compartment-id <compartment_ocid> \
     --application-id <application_ocid> \

diff --git a/python/csv_to_parquet/images/csv_to_parquet.png b/python/csv_to_parquet/images/csv_to_parquet.png
diff --git a/python/loadadw/README.md b/python/loadadw/README.md
@@ -1,7 +1,9 @@
 # Overview
+
 This example shows you how to use OCI Data Flow to process data in OCI Object Store and save the results to Oracle ADW or ATP.
 
 ## Prerequisites
+
 Before you begin:
 
 1. Ensure your tenant is configured for Data Flow by following [instructions](https://docs.cloud.oracle.com/en-us/iaas/data-flow/using/dfs_getting_started.htm#set_up_admin)
@@ -13,49 +15,52 @@ Before you begin:
    * Extract the driver into a directory called ojdbc.
 6. (Optional, strongly recommended): Install Spark to test your code locally before deploying to Data Flow.
 
-## Load Required Data:
+## Load Required Data
 
 Upload a sample CSV file to OCI object store.
 
-## Application Setup:
+## Application Setup
 
 Customize ```loadadw.py``` with:
-  *  Set INPUT_PATH to the OCI path of your CSV data.
-  *  Set PASSWORD_SECRET_OCID to the OCID of the secret created during Required Setup.
-  *  Set TARGET_TABLE to the table in ADW where data is to be written.
-  *  Set TNSNAME to a TNS name valid for the database.
-  *  Set USER to the user who generated the wallet file.
-  *  Set WALLET_PATH to the path on object store for the wallet.
+
+* Set INPUT_PATH to the OCI path of your CSV data.
+* Set PASSWORD_SECRET_OCID to the OCID of the secret created during Required Setup.
+* Set TARGET_TABLE to the table in ADW where data is to be written.
+* Set TNSNAME to a TNS name valid for the database.
+* Set USER to the user who generated the wallet file.
+* Set WALLET_PATH to the path on object store for the wallet.
 
   Test the Application Locally (recommended):
   You can test the application locally using spark-submit:
 
-  ```
+  ```bash
   spark-submit --jars ojdbc/ojdbc8.jar,ojdbc/ucp.jar,ojdbc/oraclepki.jar,ojdbc/osdt_cert.jar,ojdbc/osdt_core.jar loadadw.py
   ```
 
-## Packaging your Application:
+## Packaging your Application
 
-1. Create the Data Flow Dependencies Archive as follows:
-```
+* Create the Data Flow Dependencies Archive as follows:
+
+```bash
    docker pull phx.ocir.io/oracle/dataflow/dependency-packager:latest
    docker run --rm -v $(pwd):/opt/dataflow -it phx.ocir.io/oracle/dataflow/dependency-packager:latest
-```   
-2. Confirm you have a file named **archive.zip** with the Oracle JDBC driver in it.
+ ```
 
-## Deploy and Run the Application:
+* Confirm you have a file named **archive.zip** with the Oracle JDBC driver in it.
 
-1. Copy loadadw.py to object store.
-2. Copy archive.zip to object store.
-3. Create a Data Flow Python application. Be sure to include archive.zip as the dependency archive.
-   * Refer [here](https://docs.cloud.oracle.com/en-us/iaas/data-flow/using/dfs_data_flow_library.htm#create_pyspark_app) for more information.
-4. Run the application.
+## Deploy and Run the Application
 
+* Copy loadadw.py to object store.
+* Copy archive.zip to object store.
+* Create a Data Flow Python application. Be sure to include archive.zip as the dependency archive.
+  * Refer [here](https://docs.cloud.oracle.com/en-us/iaas/data-flow/using/dfs_data_flow_library.htm#create_pyspark_app) for more information.
+* Run the application.
 
-# Deploy and Run the Application using OCI Cloud Shell or OCI CLI
+## Run the Application using OCI Cloud Shell or OCI CLI
 
 Create a bucket. Alternatively you can re-use an existing bucket.
-```
+
+```sh
 oci os object put --bucket-name <bucket> --file loadadw.py
 oci os object put --bucket-name <bucket> --file archive.zip
 oci data-flow application create \