Current Situation is a company that deals with analyzing data on the current job market due to the recent advancement in Artificial Intelligence. To power their job placement and talent matching pool, they are seeking a solution that integrates with their data stored in an on-premises Microsoft SQL Server database.
Spark SQL
- For data transformationsDatabricks
- for data transformations and warehousing environment.Azure Blob Storage
- Raw data storageAzure Data Factory
- ETLMicrosoft SQL Server
- Operational database for transactional systems.
Data Integration
- Azure Data Factory pipeline development.Big Data Processing
- Spark SQL for data transformations.ETL/ELT Processes
- Understanding and implementing Extract, Transform, Load (ETL) or Extract, Load, Transform (ELT) workflows.Data Visualization
- Power BI report creation and dashboard design.SQL
- T-SQL (for MS SQL Server) and Spark SQL.
1. MS SQL Server - Operational database for transactional systems
Justification
- Robust relational database system for OLTP workloads
- Strong integration with Microsoft ecosystem
2. Azure Data Factory - Orchestration and data movement
Justification
- Managed ETL service in Azure
- Supports various data sources and destinations
3. Azure Blob Storage - Data lake for raw and processed data
Justification
- Cost-effective storage for large volumes of unstructured data
- Integrates well with other Azure services
4. Databricks - Big data processing and advanced analytics
Justification
- Managed Spark environment
- Collaborative notebook interface
- Supports machine learning workflows
5. Spark SQL - Data transformation and analysis
Justification
- SQL interface for Spark, familiar to SQL developers
- Distributed processing for large-scale data
6. Power BI - User-friendly interface for creating reports and dashboards
Justification
- Strong integration with Azure and Microsoft products
- Supports both self-service and enterprise BI
- Creating a storage account.
- Creating a container.
- Loading the data from MS SQL Server to Azure Blob Storage using Azure Data Factory.
- Data transformation in Databricks.
- Visualization in Power BI.
Read the full implementation steps here
Contributions to improve the project are welcome!