Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WASB support #161

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

isaacabraham
Copy link
Contributor

This should close #158 and puts in place the groundwork for a resolution to #65.

This PR is probably not quite ready for accepting but I'd still like some review on this if possible.

@krontogiannis
Copy link
Member

krontogiannis commented Jun 3, 2017

  • Is this going to be breaking change?
  • I'd also like to see some tests exercising the new (+new vs old) behaviour.
  • Can (+ does it make sense) this be implemented as a wrapper/extension methods instead of changing the core store impl? Maybe in another package/repo MBrace.Azure.Extensions
  • You will also need to make sure that the runtime itself doesn't use (or is not affected in any disruptive way) the affected store interfaces for it's own files (e.g. uploaded assemblies, results of computations, etc).

I saw a comment on a related issue:

Then, when you create a CloudFileInfo, if you forget to put a leading / in the path, it inserts the /mbraceuserdata folder at the front. I have no idea what the reasoning behind this is (or the implications of removing it). If you put a / at the front, this doesn't happen at all.

'Processes' start in a specific directory called /mbraceuserdata. That is the current working directory. If you put a leading / then your path becomes an absolute path. If you don't then it's relative to the current working directory.

The same way that you start your shell and

$ pwd
/Users/username # this is your current directory
$ touch foo.txt     # /Users/username/foo.txt
$ touch /foo.txt    # /foo.txt

@isaacabraham
Copy link
Contributor Author

isaacabraham commented Jun 4, 2017

@krontogiannis Thanks for feedback :-)

  1. Yes, it's a breaking change in MBrace.Azure because it affects the way that paths are used. foo/bar.txt would be invalid, as would /foo/bar.txt. Instead, you must specify a container using WASB notation e.g. [email protected]. There are a few reasons I want to go down this route, but the main one is that the current implement is somewhat inconsistent and there's "magic" going on: -
  • If you forget to put a leading slash, it goes to a completely different location i.e. /foo/bar maps to a container foo and a file bar, but foo/bar goes to a container called $root and a file called foo/bar.
  • There's the "magic" mbraceuserdata container as well. How this is used is again a little magic and unclear for the user (and there's CloudFile returns different path to what is uploaded #158 as a bug related to this as well).
  1. This has no impact (currently) on MBrace.Core - it's just on MBrace.Azure. Clearly the build currently doesn't work because there are some tests that need to be updated, but currently the only changes are in MBrace.Azure.

  2. I'm not sure how this could be implemented as an extension.

  3. Yes, this needs to look at the runtime stuff (I think that's why the build currently fails).

Again, this PR isn't ready for consumption but I'm grateful for your time to tell me some points to look at - I'll continue looking at this next week.

@isaacabraham
Copy link
Contributor Author

I've had a think about this over the weekend, and I'm wavering on this. The main problem I have with the current implementation is that it's easy to accidentally start working with files in another location than what you thought, simply by forgetting the leading slash. The idea of the default directory only further serves to confuse matters.

Perhaps there's a way to keep the current behaviour and optionally also allow WASB paths.

@dsyme
Copy link
Contributor

dsyme commented Jun 5, 2017

I think this is great feature work, though I agree tests are needed

Is this going to be breaking change?

This wouldn't concern me - growing functionality and usage is most important at this point

Can (+ does it make sense) this be implemented as a wrapper/extension methods instead of changing the core store impl? Maybe in another package/repo MBrace.Azure.Extensions

For MBrace.Azure I'd like to see WASB be the default, per this comment from @eiriktsarpalis:

I think that the WASB approach would be great to adopt as the default uri scheme in MBrace.Azure, since it allows support for multiple accounts and it trims down verbosity by allowing relative paths. The current "container/file.txt" uri scheme is problematic, since it is ambiguous; this could simply be the name of a blob and not a full path.

@isaacabraham
Copy link
Contributor Author

isaacabraham commented Jun 8, 2017

I've made some changes (although there's still some tests I need to write + feedback) so it's essentially backwards compatible now (hence why it's green).

The table below should show the different permutations. Default Container means the contextual "default container" of the blob store - this might be something like mbraceassemblies or mbraceuserdata depending on the case. Unfortunately, without making more drastic changes to MBrace.Core, we can't remove this "implicit" path - I understand why it's there, but it probably shouldn't have been used for user operations, rather only for system-generated ops (e.g. uploading vagabond and assembly data). In my view, we would recommend the last two rows as the "recommended" way of using the blob store in future. The others are kind of "magic" - you have to know about this default folder, you can easily forget a leading slash etc., and you can also come unstuck if for some reason you get a blob store without a default folder and start writing to the $root container.

Client Path Default Container Old Blob Path New Blob Path
foo/bar.txt vagabond vagabond@foo/bar.txt vagabond@foo/bar.txt
foo/bar.txt None $root@foo/bar.txt WASB Exception
/foo/bar.txt vagabond [email protected] vagabond@foo/bar.txt
/foo/bar.txt None [email protected] WASB Exception
bar.txt vagabond [email protected] [email protected]
bar.txt None [email protected] WASB Exception
blah@foo/bar.txt vagabond not supported blah@foo/bar.txt
blah@foo/bar.txt None not supported blah@foo/bar.txt

This applies for both CloudFile and CloudDirectory operations.

There's one hack I've had to put in currently which revolves around container names when creating a dictionary.

Also note that this isn't the full WASB spec (which also allows you to include an account as well) but it puts us in a much better place for if we address the ability to connect to multiple storage accounts from an MBrace cluster.

Feedback welcome.

@dsyme
Copy link
Contributor

dsyme commented Jun 8, 2017

Unfortunately, without making more drastic changes to MBrace.Core, we can't remove this "implicit" path - I understand why it's there, but it probably shouldn't have been used for user operations, rather only for system-generated ops (e.g. uploading vagabond and assembly data).

I very much agree in retrospect. I recommend we just go ahead with that change?

@isaacabraham
Copy link
Contributor Author

Which change - this one or also change MBrace.Core to not put in a "default" directory for user operations?

@dsyme
Copy link
Contributor

dsyme commented Jun 8, 2017

... also change MBrace.Core to not put in a "default" directory for user operations?

this

@isaacabraham
Copy link
Contributor Author

isaacabraham commented Jun 9, 2017

I've discovered that the default user directory is a feature that's localised to MBrace.Azure, and it was a relatively painless removal. This means that the above matrix can be simplified when working with user data as follows: -

Client Path Old Blob Path New Blob Path
foo/bar.txt $root@foo/bar.txt WASB Exception
/foo/bar.txt [email protected] WASB Exception
bar.txt [email protected] WASB Exception
blah@foo/bar.txt not supported blah@foo/bar.txt

In other words - this would now be a breaking change, but FWIW I think it's a lot less open to misinterpretation and acknowledges that trying to map conventional file paths to blobs is a somewhat leaky abstraction.

@isaacabraham
Copy link
Contributor Author

isaacabraham commented Jun 9, 2017

Basic testing being done on: -

  • Basic serialization (checking that assemblies + .vgb files can be accessed by mbrace itself)
  • CloudFile and CloudDirectory
  • CloudFlow
  • CloudValue
  • CloudDictionary

I'm not sure what else is needed.

@dsyme
Copy link
Contributor

dsyme commented Jun 9, 2017

@isaacabraham Do tests pass? Remember CI testing is not enabled on this repo, so we currently have to run tests manually

Copy link
Contributor

@dsyme dsyme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nits

My main concern is that no tests have been added or adjusted


let ensureRooted (path : string) =
if isPathRooted path then path else raise <| FormatException(sprintf "Invalid path %A. Paths should start with '/' or '\\'." path)
let splitPath =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use a function declaration for a function :)

let splitPath (path:string) = ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doh :-) There was some code happening previously between the name and the argument.

@@ -213,37 +210,42 @@ type BlobStore private (account : AzureStorageAccount, defaultContainer : string
member this.Name = "MBrace.Azure.Store.BlobStore"
member this.Id : string = account.CloudStorageAccount.BlobStorageUri.PrimaryUri.ToString()

member this.DefaultDirectory = defaultContainer
member this.DefaultDirectory = defaultContainer |> defaultArg <| ""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't use x |> y <| z. A good rule is never to have both |> and <| on the same line. (A better rule is almost never to use <| in code you want to be readable by others)

@isaacabraham
Copy link
Contributor Author

I need to figure out how to run the tests :-) I tried build.cmd a week or so ago but it took absolutely ages to run (and failed half way through).

Once that's working I'll add more tests.

@dsyme
Copy link
Contributor

dsyme commented Jun 9, 2017

I need to figure out how to run the tests :-) I tried build.cmd a week or so ago but it took absolutely ages to run (and failed half way through). Once that's working I'll add more tests.

@isaacabraham Getting a subset of the tests running on CI under simulated Azure storage would be absolutely fantastic progress in the long term. It's going to be hard to get contributions without that

The other thing I think we really need to help enable contributions is a way to co-develop FsPickler+Vagabond+MBrace.Core+MBrace.Azure/AWS+StarterKit (or + one's own data scripting code) in one smooth build+test+deploy+use inner-dev workflow. In some ways I regret how the repos have been split into multiple github project, as propagating a fix from FsPickler or Mono.Cecil through to actually testing it in your own data scripting code is incredibly painful currently requiring updating about 15 nuget packages.

The future of MBrace surely has to be as a go-to solution for F#-centric data scripters who have Spark-like needs and want to have control over a big data scripting stack that they can easily comprehend and contribute to. This means it must be possible to iterate from an idea "MBrace would be better if it just had XYZ" to refining, using and contributing that idea rapidly.

@isaacabraham
Copy link
Contributor Author

@dsyme agree 100% - i'm feeling that same pain now (and is exactly why I'm shying away from changes to MBrace.Core here). One of the things I looked at a while ago was dumping reliance on service bus and moving entirely onto storage queues - this should mean that we could just use the storage emulator entirely for MBrace.Azure in a local context for all three components (storage = blobs, state = tables, messaging = queues).

@krontogiannis krontogiannis removed their request for review April 10, 2024 08:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CloudFile returns different path to what is uploaded
3 participants