Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RAPIDS on Microsoft Fabric #527

Open
jacobtomlinson opened this issue Feb 13, 2025 · 0 comments
Open

RAPIDS on Microsoft Fabric #527

jacobtomlinson opened this issue Feb 13, 2025 · 0 comments

Comments

@jacobtomlinson
Copy link
Member

Microsoft Fabric is a SaaS Data Analytics platform from Microsoft. It allows you to work with Data Warehouses and perform Data Engineering and Data Science tasks in a managed platform. The Fabric product is more closely related to other business focused data analysis tools like Power BI rather than Azure Cloud Data Science platforms like Azure ML.

RAPIDS Alignment

Microsoft Fabric has two services which could leverage RAPIDS.

Spark

Fabric has a close history with Microsoft Synapse which is a managed Spark platform. The Spark RAPIDS team has a documentation page on using Synapse, however accorsing to the Microsoft documentation GPU support on Synapse has been deprecated.

It looks like Synapse is being overhauled as part of Fabric, so perhaps this support may return at some point in the future.

Python Notebooks

Microsoft Fabric also has a managed Python Notebooks service. This allows users to execute arbitrary Python code within a Fabric environment and comes with standard PyData libraries like pandas and scikit-learn out of the box. This would be a good candidate for accelerating via RAPIDS libraries like cudf and cuml and could support zero-code-change acceleration with cudf.pandas. It's also likely libraries like polars are being used in these environments, which support GPU acceleration via RAPIDS today.

There is currently no way to configure the hardware of the underlying VM so it isn't possible to install and leverage the RAPIDS libraries in this environment. For more information see #503.

Other distributed frameworks

Given that it is possible to launch Spark clusters it is feasible to run other distributed frameworks like Dask, Ray or Legate on this infrastructure. However we would need GPU hardware support before we could explore this further.

Next steps

In order to enable RAPIDS usage on Microsoft Fabric we first need GPUs to be made available in both the Spark and Python Notebooks environments.

Support for Spark RAPIDS on Microsoft Fabric Spark clusters will be explored by the Spark RAPIDS team if GPU hardwar becomes available.

If and when GPUs become available in the Python Notebooks environment the RAPIDS Cloud Deployment Team can investigate the best practice methods to install RAPIDS libraries into those environments.

We can then build out workflow examples showing how to read data and perform GPU accelerated Data Analytics in Microsoft Fabric.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant