Skip to content

Commit

Permalink
Merge pull request #7 from rghosh9/main
Browse files Browse the repository at this point in the history
Generate Regulatory documents for Compliance submission Documents with OCI Generative AI
  • Loading branch information
anacoman11 authored Aug 21, 2024
2 parents 0496c09 + 9a6d22c commit eb99ee8
Show file tree
Hide file tree
Showing 289 changed files with 931 additions and 35,752 deletions.
Original file line number Diff line number Diff line change
@@ -0,0 +1,155 @@
# OCI Data Science setup

## Introduction

In this lab, we will setup the OCI Data science environment necessary for developing code, API calls, customization and automation for generation of compliance documents

Estimated Lab Time: -- 10 minutes

### Oracle cloud Data science

OCI Data science is a managed Python based Jupyter lab based notebook development environment for developing and deploying Machine learning and AI models including Generative AI services. The service supports both CPU and GPU infrastructure, has access to OCI lakehouse storage and processes like Object store, Autonomous database as well as Data flow, catalog and other services.

### Objectives

In this lab, you will:

* Deploy a pre-built langchain based conda environment
* Test connectivity to OCI Generative AI services
* Test connectivity with OCI Opensearch services
* Deploy OCI CLI connectivity with OCI Object store
* Download and install required pip libraries
* Install Compliance Document Generation Notebooks

### Prerequisites

This lab assumes you have:

* An Oracle Cloud account with admin privileges in the Chicago region
* A running Data science notebook session environment
* A running OCI Opensearch service

## Task 1: Deploy a pre-built langchain conda environment

1. From the Launcher (File-->New Launcher if needed), click on the Environment explorer to view the list of conda environments
![Install pre-built conda](images/lab3-ds-cnd-1.png)

2. Filter the conda environment to view the ones containing the langchain libraries and select the one marked below
![Install pre-built conda](images/lab3-ds-cnd-2.png)

3. Copy the command command below to run in a terminal session
![Install pre-built conda](images/lab3-ds-cnd-2-1.png)

4. Open up a Terminal session as shown from the Launcher
![Install pre-built conda](images/lab3-ds-cnd-3.png)

5. Paste and run the ***odsc conda install -s pytorch21_p39_gpu_v1*** command as shown. It may take a few minutes to install the conda environment. Make sure it is successfully completed and installed as shown
![Install pre-built conda](images/lab3-ds-cnd-4.png)

## Task 2: Download and install required pip libraries

1. Locate the notebooks in the /home/datascience/conda directory. This directory will be used for creating and running all notebooks for the workshop
![Install pip libraries](images/lab3-ds-note-1.png)

2. Create a new notebook
![Install pip libraries](images/lab3-ds-note-2.png)

3. Change the kernel to the installed conda environment
![Install pip libraries](images/lab3-ds-note-3.png)

4. Copy and execute to install the pip libraries as shown below in the notebook cell. Press *Shift+Enter* to execute the notebook cell

```bash
!pip install langchain
!pip install langchain_community
!pip install opensearch-py
!pip install sentence-transformers
!pip install tabulate
!pip install pypdf
!pip install fillpdf
```

![Install pip libraries](images/lab3-ds-note-4.png)

NOTE: It is possible that some of the libraries are pre-installed in the environment. Ignore if that is so. You may also have incompatibilities with other libraries in the pre-built conda. You may ignore them if that occurs. Comment them as shown below

![Install pip libraries](images/lab3-ds-note-5.png)

## Task 3: Install Workshop Compliance Document Generation code

1. Download [LAB-3 Conda zip](https://orasenatdpltintegration03.objectstorage.us-chicago-1.oci.customer-oci.com/p/SfhRh7OEvLj9yR0hAIM3BwT7bCpi3jALfP6NqoCODU7mFe51nl1PeBPWcJj2El9K/n/orasenatdpltintegration03/b/clinical-trials/o/conda.zip) and upload to the home directory */home/datascience* in the notebook session as shown below. You can also directly download in your environment using *wget <download link>* as well from a data science terminal session.
![Install lab notebooks](images/lab3-ds-note-6.png)

2. Open up a terminal session and run *unzip conda.zip* as shown below.
![Install lab notebooks](images/lab3-ds-note-7.png)

## Task 4: Test connectivity with OCI Opensearch services

1. Copy the Opensearch API URL from the console
![Test Opensearch Access](images/lab3-ds-os-1.png)

2. Change to *cd /home/datascience/conda/scripts* directory in a data science terminal window and run. Sucecssful connection should display the json as shown below

```bash
curl -k -u <os_userid>:<os_password> <os_api_endpoint>:9200
```

![Test Opensearch Access](images/lab2-ds-os-2.png)

## Task 5: Configure OCI CLI Connectivity to Object store and Generative AI

1. Get your user OCID and your Tenancy ID from console as shown below
![Test Opensearch Access](images/lab3-ds-cli-1.png)
![Test Opensearch Access](images/lab3-ds-cli-2.png)
![Test Opensearch Access](images/lab3-ds-cli-3.png)

2. Open up a terminal window and enter *oci os ns get*. Enter values as follows

```text
Do you want to create a new config file ? Y
Create logging through a browser? n
Location of your config: Enter
Enter user OCID : <copied from console in previous step>
Enter Tenancy OCID : <copied from console in previous step>
Region by index or name : us-chicago-1
Do you want to generate a new RSA key pair? Y
Enter directory for keys created : Enter
Enter name of your key : Enter
Enter passphrase: N/A
Re-enter passphrase : N/A
```

![Test Opensearch Access](images/lab3-ds-cli-4.png)
![Test Opensearch Access](images/lab3-ds-cli-5.png)

1. Move and download your generated public key pem file
![Test Opensearch Access](images/lab3-ds-cli-7.png)

2. Upload the downloaded public API key to OCI Console
![Test Opensearch Access](images/lab3-ds-cli-8.png)
![Test Opensearch Access](images/lab3-ds-cli-9.png)
![Test Opensearch Access](images/lab3-ds-cli-10.png)

3. Test out the OCI CLI access after from Data science notebook session.
![Test Opensearch Access](images/lab3-ds-cli-11.png)

## Task 6: Test connectivity to OCI Generative AI services

1. Open up the Generative AI Generation Interface for API code testing. Please note that the *command r* chat interface is not OCI API enabled as of yet and is not required for this workshop.
![Test Opensearch Access](images/lab3-ds-gai-1.png)

2. Generate a query and click on the *View Code* button and select *python* as the Language
![Test Opensearch Access](images/lab3-ds-gai-2.png)

3. Copy the generated code to a notebook cell. You should be able to generate output as shown below
![Test Opensearch Access](images/lab3-ds-gai-3.png)

## Learn More

* [Generative AI made easy with OCI Datascience](https://www.oracle.com/artificial-intelligence/generative-ai/generative-ai-service/)
* [Data science github repository](https://github.com/oracle-samples/oci-data-science-ai-samples/tree/main/ai-quick-actions)

## Acknowledgements

* **Author** - Rajib Ghosh, Master Principal Cloud Architect, OCI AI and Gen AI Center of Excellence
* **Last Updated By/Date** - Aug 2024
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
rm -Rf ~/conda/data
rm -Rf ~/conda/scripts
rm ~/conda/notebooks/demo*.ipynb
rm -f conda.zip
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
# OCI Generative AI Playground and Trial Generation

## Introduction

Generative AI is a fully managed Oracle Cloud Infrastructure service that provides a set of state-of-the-art, customizable large language models (LLMs) that cover a wide range of use cases, including chat, text generation, summarization, and creating text embeddings. Use the Generative AI playground to try out the ready-to-use pre-trained models. It also supports creating your own fine-tuned custom models based on your own data in a secure dedicated cluster environment

Estimated Lab Time: -- 5 minutes

### OCI Generative AI Playground

OCI Generative AI Playground allows you chat, generate and summarize content and also show numerical vector embeddings for textual data. It is REST API enabled for programmatic access and allows you to tune the outputs of your query based on certain parameters. It currently allows Cohere command R+ and meta Llama model. In this workshop, all the relevant clinical trials data is generated with Cohere command R+ playground.

### Objectives

In this lab, you will:

* Learn how to use OCI Generative AI Playground interactively
* Prompt and generate a few clinical trial documents yourself
* Verify the generated document has no personal information
* Prompt to try out some summarization examples on your text
* View generated API code (Python) and get familiarized

### Prerequisites (Optional)

This lab assumes you have:

* An Oracle Cloud account in the Chicago or Frankfurt region
* You have the required policies for OCI Generative AI setup

## Task 1: Accessing OCI Generative AI Playground

In this section you will get familiarity with using OCI Generative AI playground console

1. Login to your Oracle cloud tenancy and change your tenancy to US Midwest (chicago)
![Connect to US-Midwest Chicago Tenancy](images/lab-11.png)

2. From Hamburger menu (top left corner), pull up Analytics & AI --> AI Services --> Generative AI
![Connect to OCI Gen AI](images/lab-12.png)

## Task 2: Generate a clinical trial in OCI Generative AI Playground

1. Click on the Generative AI -> Overview -> Playground -> Chat and Run the example "Generate a job description" with the cohere-command-r-16k model
![Test OCI Gen AI Example](images/lab-13.png)

2. Copy the following text in the chat window "Generate a clinical trial report on drug evaluation on Advanced Non-Small Cell Lung Cancer" , change the **Maximum output settings** and the **Temperature** settings. Press Submit button to generate a sample clinical trial for a disease.
![Generate trial document](images/lab-14.png)

3. Note that the PII Information is redacted and substituted
![PII Redaction](images/lab-15.png)

## Task 3: Generate a summary and view generated code

1. Copy the generated clinical trial to Playground -> Summarization and generate summary.
![Summary](images/lab-16.png)

2. Click on the view code button to see the generated code
![Generated python code](images/lab-17.png)

## Learn More

* [OCI Generative AI](https://www.oracle.com/artificial-intelligence/generative-ai/generative-ai-service/)
* [Realize business value by transforming data into action with Generative AI](https://blogs.oracle.com/ai-and-datascience/post/generative-ai-use-cases/)

## Acknowledgements

* **Author** - Rajib Ghosh, Master Principal Cloud Architect, OCI AI and GenAI Center of Excellence
* **Last Updated By/Date** - Aug, 2024
Original file line number Diff line number Diff line change
@@ -0,0 +1,93 @@
# Generate Compliance submission documents

## Introduction

In this lab, we will generate a sample compliance submission form summarizing the clinical trial data sections from a sample with a pre-canned PDF template. Prompting techniques and formatting has been used to hint the OCI Generative AI large language model to contain sectional summaries in a condensed manner. The process can be extended to summarize across multiple retrieval chunks of data with langchain and vector search.

Estimated Lab Time: -- 10 minutes

### Langchain output parsers and OCI Generative AI

[LangChain](https://python.langchain.com/v0.2/docs/introduction/) is a Python library that simplifies the development, productionization, and deployment of applications powered by large language models (LLMs). It is a development framework that chains LLMs, agents and retrieval strategies to effectively make up an end-to-end application's cognitive architecture. Chains can be defined declaratively for convenience. Langchain output parsers offer various custom formatting options for sectional prompting to extract summarized textual data into a templated format.

### Objectives

In this lab, you will:

* Load clinical-trial PDF documents and embeddings into an Opensearch index
* Use langchain output parser to produce sectional summaries for a document template
* Generate a sample compliance submission form from OCI Generative AI LLM

### Prerequisites

This lab assumes you have:

* Working knowledge of Python and Notebooks
* Working knowledge of OCI Data science and conda packs
* Some knowledge of langchain framework but not required.

## Task 1: Load clinical-trials documents and metadata

1. Get the following information into a notepad or a script

* Compartment OCID for *clinical-trials* compartment. (Search on OCI console for compartments, click your compartment and copy the OCID)
* Opensearch username - The username entered while provisioning Opensearch cluster (i,e *osmaster*)
* Opensearch password - The password entered while provisioning Opensearch cluster
* API end point Private IP from OCI Opensearch service console

2. Double click to open up the notebook *demo-generate-document.ipynb* Run each of the cells one by one from top by using *Shift+Enter* or play button at the top

3. Substitute the following definitions in the cell as shown below
![Image alt text](images/lab5-note-os-1.png)

4. Load all PDF documents using PyPDFDirectory loader to load all documents into a pandas data frame

5. Generate page_content and document metadata embeddings using OCI Generative AI

6. Check Opensearch client connectivity. It should show the *OpenSearch([{'host': 'hostname', 'port': 9200}])* as output

7. Load both text and embeddings data into the *idx_oci_genai_clinical_trials* index

8. Paste the title retrieved from the previous lab *demo-vector-search-ext.ipynb* to query based on page_content embeddings

9. Report file metadata and the score.

## Task 2: Generate Compliance document form

1. Selecting the top retrieved document from the query search above.

2. Run the rest of the cells to generate the compliance form for the trial.

3. View the generated compliance form from */home/datascience/conda/data/outputs* directory
![Image alt text](images/lab5-comp-doc.png)

This involves

1. Defining a pydantic Object base model class called *TrialInfo* to structure document sections and their description instructions. These are formatted instructions that are passed to the OCI Generative AI LLM at runtime.
2. This *TrialInfo* class is a superset representing sectional headers for all clinical-trial documents.
3. Defining a langchain pydantic output parser object and passing the format instructions.
4. Defining a chat prompt template with specific instructions to use the format instructions
5. Using OCI Generative AI chat llm to perform sectional summarization based on the format instructions.
6. Creating a dictionary based on a pre-built PDF form template
7. Filling the template with a PDF filler to generate a compliance form document

## Task 3: Various other ways to customize this notebook

Ways to customize

1. Using langchain chunking classes to split document, embed and load to an index
2. Perform embedding search on chunked documents index
3. Compare chunked retrievals vs full document retrievals and evaluate scores
4. Use a different template or use multiple clinical-trial templates by disease
5. Use other prompting techniques with different format specifications
6. Use a better PDF form filler.

## Learn More

* [Deploy Langchain applications as OCI Model Deployments](https://blogs.oracle.com/ai-and-datascience/post/deploy-langchain-application-as-model-deployment)
* [OCI AI Quick actions](https://docs.oracle.com/en-us/iaas/data-science/using/ai-quick-actions.htm)

## Acknowledgements

* **Author** - Rajib Ghosh, Master Principal Cloud Architect, OCI AI and Gen AI Center of Excellence
* **Last Updated By/Date** - Aug 2024
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# Getting started

OCI Generative AI and associated tools provide a scalable and flexible automation framework for Compliance form generation and related use-cases. Though this workshop do not reflect an actual regulatory authority form, but the principles outlined here can be effectively used for this purpose.

## Objectives

In this workshop, you will learn how to:

* Use OCI Generative AI, OCI DataScience and vector embedding processes
* Embedding and loading data into a vector store like OCI Opensearch
* Implement a Retrieval Augmented Generation (RAG) Interface
* Effective prompting to generate query outputs from OCI Gen AI LLM
* Using a pre-canned template to generate a compliance form

## Prerequisites

This lab assumes you have:

* Basic familiarity with Generative AI concepts, RAG and Industry
* Some familiarity with OCI Generative AI Services and Tool sets
* Familiarity with Python programming language.
* Basic understanding of large language models (LLM)
* Some familiarity with OCI Opensearch service
* Some familiarity with open source langchain framework
* Familiarity with clinical trial and compliance submission process would be helpful but not required

## Downloads

All downloads for this lab are zipped and can be downloaded as part of LAB-3 Developing with OCI data Science

## Provision Oracle cloud tenancy and login

Use the live-lab link below to provision a cloud tenancy and testing your login
[provision new cloud tenancy account](https://github.com/oracle-livelabs/common/blob/main/labs/cloud-login/event-register-free-tier-account.md)

## Learn More

* [Oracle Generative AI Capabilities](https://www.oracle.com/artificial-intelligence/generative-ai/)
* [Oracle Clinical Digital Assistant](https://www.oracle.com/health/clinical-suite/clinical-digital-assistant/)

## Acknowledgements

* **Author** - Rajib Ghosh, Master Principal Cloud Architect, OCI GenAI Center of excellence
* **Last Updated By/Date** - Aug, 2024
Loading

0 comments on commit eb99ee8

Please sign in to comment.