Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Microsoft.Azure.Databricks.Client library on NuGet doesn't expose '/fs/files' API #235

Open
VivendoByte opened this issue Nov 5, 2024 · 6 comments

Comments

@VivendoByte
Copy link

Hello everybody!

I'm using this library from NuGet:
https://www.nuget.org/packages/Microsoft.Azure.Databricks.Client/
because I need to connect and ingest data inside a Databricks service hosted on Azure.

In my particular case, I need to upload JSON file into a volume.
According to this documentation:
https://docs.databricks.com/api/workspace/files/upload
I need to use the endpoint with PUT method:
/api/2.0/fs/files{file_path}

It seems that this endpoint is not exposed into the latest version of Microsoft.Azure.Databricks.Client (currently 2.6.0).
Am I wrong?

@VivendoByte
Copy link
Author

At the moment, and just to make a test on my local machine, I downloaded the source code from GitHub, I added required new method on IDbfsApi interface + DbfsApiClient class implementation. And using the endpoint specified on the documentation, I'm able to upload file in the correct volume on Databricks.
It seems strange to me that Microsoft.Azure.Databricks.Client on NuGet doesn't support this kind operation.
Anyone can help me?

@memoryz
Copy link
Contributor

memoryz commented Nov 6, 2024

Can you use DbfsApiClient.Upload to upload the file? Azure Databricks supports dbfs format for volume paths:
dbfs:/Volumes/<catalog_identifier>/<schema_identifier>/<volume_identifier>/<path>/<file_name>

@VivendoByte
Copy link
Author

Thanks for your reply @memoryz . I tried with DbfsApiClient.Upload method, but it doesn't works. I get this error:

Image

I'm using this format to specify the remote filename:
var remoteFilename = "dbfs:/Volumes///<volume_name>/" + fi.Name;

This Upload method internally use the Create method, that build and use the endpoint: $"{ApiVersion}/dbfs/create".
But I get the error above.

Just to make a test, I implemented a Create2 method:

public async Task<long> Create2(string path, bool overwrite, CancellationToken cancellationToken = default)
{
    var request = new { path, overwrite };
    var endpoint = $"/api/{ApiVersion}/fs/files{path}";
    var response = await HttpPut<dynamic, FileHandle>(this.HttpClient, endpoint, request, cancellationToken).ConfigureAwait(false);
    return response.Handle;
}

that use PUT method, on a different endpoint, following the documentation here: https://docs.databricks.com/api/workspace/files/upload.
And using this I'm able to upload JSON file in my volume.
Where is my mistake?

@memoryz
Copy link
Contributor

memoryz commented Nov 6, 2024

I see. Maybe the DBFS API doesn't support volumes, given that volumes feature was released much later than DBFS. I'll see if I can setup an environment with catalog enabled and give it a try.

@VivendoByte
Copy link
Author

I'm trying to push a new branch on this repo with my temporary solution on my issue, but it seems that I don't have grants/permissions.
:-)

@memoryz
Copy link
Contributor

memoryz commented Nov 7, 2024

Can you fork the repo and send a PR from your fork?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants