Skip to content

Commit

Permalink
updated datalake doc with portal analytics
Browse files Browse the repository at this point in the history
  • Loading branch information
manikandansubramanian committed Nov 19, 2024
1 parent f38ebcf commit 0efa045
Show file tree
Hide file tree
Showing 3 changed files with 125 additions and 5 deletions.
130 changes: 125 additions & 5 deletions docs/datalake/epilot-datalake.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ sidebar_position: 1

# Epilot Datalake

Welcome to the documentation for our Data Lake feature, which serves as the centralized repository for real-time event streams of entity operations, snapshots of workflow executions and journey analytics data. This feature empowers users to access and analyze essential data generated by the epilot portal, including changes to entities such as orders, opportunities, contacts, accounts, products, interactions from user journeys and more.
Welcome to the documentation for our Data Lake feature, which serves as the centralized repository for real-time event streams of entity operations, snapshots of workflow executions, journey analytics and portal analytics data. This feature empowers users to access and analyze essential data generated by the epilot portal, including changes to entities such as orders, opportunities, contacts, accounts, products, interactions from user journeys, customer portal, installer portal and more.

Our Data Lake is seamlessly integrated with Clickhouse for data warehousing, enabling the users to leverage Business Intelligence (BI) tools and create insightful reports. This documentation will guide you through the key components of the Data Lake feature, including data schemas, usage, and credential management.

Expand Down Expand Up @@ -82,11 +82,11 @@ The schema for journey sessions is as follows:
{
"id": "string", // Unique identifier for the session (UUID)
"org_id": "string", // Unique identifier for the organization associated with the session (UUID)
"session_id": "string", //
"journey_id": "string", // Unique identifier for the user journey to which the session belongs (UUID)
"type": "datetime", // Timestamp indicating when the session started
"start_time": "datetime", // Timestamp indicating when the session started
"details": "string", // Additional details about the session, in JSON format
"created_at": "datetime" // Timestamp indicating when the session ended
"end_time": "datetime", // Timestamp indicating when the session ended
"last_updated_at": "datetime", // Timestamp indicating when was the session last updated at
}
```

Expand All @@ -102,14 +102,15 @@ Fields of interest in this schema include:
- `languagePreference`: The preferred language set in the user's browser (e.g., en-US, fr-FR).
- `ip`: The IP address of the user's device when the event was recorded.
- `embeddedIn`: The website or platform on which the journey is embedded.
- `isLauncherJourney`: The journey for which the session was created is a launcher journey or not


The schema for journey events is as follows:

```json
{
"id": "string", // Unique identifier for the event (UUID)
"org_id": "string", // Unique identifier for the organization associated with the event (UUID)
"org_id": "string", // Identifier of the organization associated with the event
"session_id": "string", // Unique identifier for the session in which the event occurred (UUID)
"journey_id": "string", // Unique identifier for the user journey to which the event belongs (UUID)
"type": "string", // Type of event (e.g., "step_navigation", "journey_submit", "journey_exit")
Expand Down Expand Up @@ -137,6 +138,105 @@ This table stores the above attributes related to journey events.

![Journey analytics db schema](/img/datalake/journey-analytics-db-schema.png)

### 4. Portal Analytics

This dataset highlights user sessions generated during each portal login and details the interactions that occur within these sessions. It captures key metrics and events to provide insights into user behavior and engagement patterns.

The schema for portal sessions is as follows:

```json
{
"id": "string", // Unique identifier for the session (UUID)
"org_id": "string", // Identifier of the organization associated with the event
"app_name": "string", // Name of the app for which the session was created (customer portal, installer portal etc.)
"start_time": "datetime", // Timestamp indicating when the session started
"details": "string", // Additional details about the session, in JSON format
"end_time": "datetime", // Timestamp indicating when the session ended
"last_updated_at": "datetime", // Timestamp indicating when was the session last updated at
}
```

Fields of interest in this schema include:

**details**: This field possibly contains basic user's browser session details
- `deviceType`: The type of device used by the user (e.g., mobile, desktop).
- `osType`: The operating system of the device (e.g., iOS, Android, Windows).
- `browserTypeAndVersion`: The browser name and version used by the user (e.g., Chrome 92, Firefox 89).
- `screenResolution`: The screen resolution of the user's device (e.g., 1920x1080).
- `viewportSize`: The size of the viewport in the browser (i.e., the visible area of the web page).
- `colorDepth`: The color depth of the user's display (e.g., 24).
- `languagePreference`: The preferred language set in the user's browser (e.g., en-US, fr-FR).
- `ip`: The IP address of the user's device when the event was recorded.
- `email`: The email ID of the logged in user.
- `domainName`: The app domain in which the session was created
- `referrer`: The user was referred or directed from which site

The schema for portal events is as follows:

```json
{
"id": "string", // Unique identifier for the event (UUID)
"org_id": "string", // Identifier of the organization associated with the event
"session_id": "string", // Unique identifier for the session in which the event occurred (UUID)
"app_name": "string", // Name of the app for which the event was created (customer portal, installer portal etc.)
"type": "string", // Type of event (e.g., "user_logged_in", "page_navigation", etc.)
"details": "string", // Additional details about the event, in JSON format
"created_at": "datetime" // Timestamp indicating when the event was recorded
}
```

Fields of interest in this schema include:

**type**: This field contains the type of user interaction or event that occurs on the session
- `user_logged_in`: User successfully logged into the portal.
- `user_registered`: A new user successfully registered.
- `add_contract_initiated`: User started adding a new contract.
- `additional_info_update_initiated`: User began updating additional info (it is a wrapper term for all the additional or custom information section that an organization configures for its portal)
- `additional_info_updated`: User successfully updated additional information.
- `additional_info_viewed`: User viewed additional information details.
- `all_documents_downloaded`: User downloaded all available documents in a section.
- `contract_added`: A new contract was successfully added.
- `contract_due_date_changed`: User updated the contract’s due date.
- `contract_payment_rate_changed`: User changed the payment rate for a contract.
- `contracts_listing_viewed`: User viewed the list of all contracts.
- `contract_viewed`: User opened and viewed a specific contract.
- `document_deleted`: A document was removed by the user.
- `document_downloaded`: User downloaded a specific document.
- `document_uploaded`: A document was uploaded by the user.
- `documents_listing_viewed`: User viewed a list of uploaded documents.
- `external_website_opened`: User navigated to an external website.
- `journey_opened`: User initiated a new journey workflow.
- `journey_closed`: User exited a specific journey workflow.
- `meter_reading_submission_initiated`: User started submitting meter reading.
- `meter_reading_submitted`: A meter reading was successfully submitted.
- `meters_listing_viewed`: User viewed the list of available meters.
- `meter_viewed`: User opened details for a specific meter.
- `meter_widget_viewed`: User viewed a widget related to meters.
- `opportunities_listing_viewed`: User browsed the list of opportunities.
- `opportunity_viewed`: User opened and viewed an opportunity’s details.
- `order_accepted`: User accepted a specific order.
- `order_refused`: User declined a specific order.
- `order_viewed`: User viewed details of a particular order.
- `orders_listing_viewed`: User accessed the list of available orders.
- `page_navigation`: User navigated between application pages.
- `password_changed`: User successfully updated their account password.
- `payments_listing_viewed`: User browsed the list of all payments.
- `product_viewed`: User viewed details of a specific product.
- `teaser_opened`: User clicked and opened a teaser content.
- `user_account_deleted`: User’s account was removed or deleted.
- `user_email_changed`: User updated their email address.
- `user_logged_out`: User logged out from the application.

**details**: This field contains additional details about the type of event that occurs. Basically from and to page URLs for page_navigation, entity details for contracts, orders, meter readings, documents, journeys etc, external links opened, custom information updates etc.

- **{org_id}_portal_sessions_final**
This table stores the above attributes related to portal sessions.

- **{org_id}_portal_events_final**
This table stores the above attributes related to portal events.

![Portal analytics db schema](/img/datalake/portal-analytics-db-schema.png)

## Setting up Datalake

### Generate datalake credentials
Expand Down Expand Up @@ -204,6 +304,26 @@ This SQL query retrieves the journey sessions created over time for a specific j

![Datalake page](/img/datalake/journey_analytics_query_result.png)

**Example 3: Reporting Portal Sessions Created Over Time**
Suppose you need to create a report showing portal sessions created over time for a specific portal like customer portal or installer portal. You can use SQL to accomplish this task:

``` sql
select
app_name,
start_time,
details
from
{org_id}_portal_sessions_final
where
app_name = {your_app_name}
and
start_time > '2024-08-01 00:00:00'
```

This SQL query retrieves the portal sessions created over time for a specific portal

![Datalake page](/img/datalake/portal_analytics_query_result.png)

You can use any SQL client to connect to the Clickhouse Data Warehouse (DWH) using the credentials provided. For more detailed information, please refer to [this link](https://clickhouse.com/docs/en/integrations/datagrip).


Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 0efa045

Please sign in to comment.