Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add bulk create by copy operation for files #4483

Merged
merged 21 commits into from
Dec 18, 2023
Merged

Conversation

dantb
Copy link
Contributor

@dantb dantb commented Nov 8, 2023

Fixes #4400 with a bulk file copy operation.

This PR unfortunately got massive and I'd be happy to pair with someone to go through anything not clear.

Docs still to come

Copy link
Contributor

@shinyhappydan shinyhappydan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the permissions: I would imagine you should only need files/read on the source and files/write on the destination

@dantb dantb marked this pull request as ready for review November 14, 2023 12:51
@dantb dantb changed the title WIP - Support copying files to another project Add copy operation to files Nov 14, 2023
@dantb dantb changed the title Add copy operation to files Add file create by copy operation Nov 14, 2023
@dantb dantb changed the title Add file create by copy operation Add create by copy operation for files Nov 14, 2023
"destinationFilename": "{destinationFilename}",
"sourceProjectRef": "{sourceOrg}/{sourceProj}",
"sourceFileId": "{sourceFileId}",
"sourceTag": "{sourceTagName}",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In some other places, we just pass the id suffixed by the rev/tag (example: multi-fetch)

destFilesAttributes.traverse { destFileAttributes =>
for {
iri <- generateId(pc)
command = CreateFile(iri, dest.project, destStorageRef, destStorageTpe, destFileAttributes, c.subject, dest.tag)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be interesting to keep track that this file is a copy

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good shout, Cristina mentioned wanting that - this is already massive so I'm thinking do in a follow up?

@dantb dantb changed the title Add create by copy operation for files Add bulk create by copy operation for files Dec 13, 2023
@dantb dantb marked this pull request as ready for review December 14, 2023 09:14
/**
* Rejection returned when a storage cannot fetch a file's attributes
*/
sealed abstract class CopyFileRejection(loggedDetails: String) extends StorageFileRejection(loggedDetails)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we have it only on the FileRejection side ?

Copy link
Contributor Author

@dantb dantb Dec 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh as in ditch the existing pattern since it just adds more error types than are necessary? I guess this came from monix typed errors in the past

import io.circe.syntax._
import io.circe.{Encoder, Json, JsonObject}

final case class BulkOperationResults[A](results: Seq[A])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the end, it is not something that is restrained to bulk operations, we could have a more generic name

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would go the opposite direction and call it CopyFilesResult 😆

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it not? The only purpose of this type was to encode the bulk-operation context in a contained way

Copy link
Contributor

@imsdu imsdu Dec 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it is a CopyFileResult, it should go in the storage plugin (and the related context too)

Ref.of[IO, Option[CopyOperationFailed]](None).flatMap { errorRef =>
files
.parTraverse { case c @ CopyBetween(source, dest) =>
copySingle(source, dest).onError(_ => errorRef.set(Some(CopyOperationFailed(c))))
copySingle(source, dest).onError(e => errorRef.set(Some(CopyOperationFailed(c, e))))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does the ref get used here rather than modifying the error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't think I understand the question 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we setting a ref when we could just return the error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll refactor this afterwards

Comment on lines +45 to +55
val events = ListBuffer.empty[Event]
val (sourceFileRes, sourceStorage) = genFileResourceAndStorage(sourceFileId, sourceProj.context, diskVal)
val (user, aclCheck) = userAuthorizedOnProjectStorage(sourceStorage.value)

val batchCopy = mkBatchCopy(
fetchFile = stubbedFetchFile(sourceFileRes, events),
fetchStorage = stubbedFetchStorage(sourceStorage, events),
aclCheck = aclCheck,
stats = stubbedStorageStats(storageStatEntry, events),
diskCopy = stubbedDiskCopy(stubbedFileAttr, events)
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a big setup

val destStorage: DiskStorage = genDiskStorage()

batchCopy.copyFiles(source, destStorage)(caller(user)).map { obtained =>
val obtainedEvents = events.toList
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if an abstraction around the events list might be a bit clearer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is actually a really nice pure FP abstraction for this kinda of testing (no mutable state), but it relies on tagless final / monad transformers so not a good fit here. I'll think about how to make it clearer tho 👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the mutable state is the problem, it just needs to read better. I was thinking just a tiny class / enclosure

Comment on lines 29 to 56
test("batch copying should fetch storage, perform copy and evaluate create file commands") {
val events = ListBuffer.empty[Event]
val destProj: Project = genProject()
val (destStorageRef, destStorage) = (genRevision(), genStorage(destProj.ref, diskVal))
val fetchFileStorage = mockFetchFileStorage(destStorageRef, destStorage.storage, events)
val stubbedDestAttributes = genAttributes()
val batchCopy = BatchCopyMock.withStubbedCopyFiles(events, stubbedDestAttributes)
val destFileUUId = UUID.randomUUID() // Not testing UUID generation, same for all of them

val batchFiles: BatchFiles = mkBatchFiles(events, destProj, destFileUUId, fetchFileStorage, batchCopy)
implicit val c: Caller = Caller(genUser(), Set())
val (source, destination) = (genCopyFileSource(), genCopyFileDestination(destProj.ref, destStorage.storage))
val obtained = batchFiles.copyFiles(source, destination).accepted

val expectedFileIri = destProj.base.iri / destFileUUId.toString
val expectedCmds = stubbedDestAttributes.map(
CreateFile(expectedFileIri, destProj.ref, destStorageRef, destStorage.value.tpe, _, c.subject, destination.tag)
)

// resources returned are based on file command evaluation
assertEquals(obtained, expectedCmds.map(genFileResourceFromCmd))

val expectedActiveStorageFetched = ActiveStorageFetched(destination.storage, destProj.ref, destProj.context, c)
val expectedBatchCopyCalled = BatchCopyCalled(source, destStorage.storage, c)
val expectedCommandsEvaluated = expectedCmds.toList.map(FileCommandEvaluated)
val expectedEvents = List(expectedActiveStorageFetched, expectedBatchCopyCalled) ++ expectedCommandsEvaluated
assertEquals(events.toList, expectedEvents)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what to suggest but this feels like a really big test

import io.circe.syntax._
import io.circe.{Encoder, Json, JsonObject}

final case class BulkOperationResults[A](results: Seq[A])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would go the opposite direction and call it CopyFilesResult 😆

for {
iri <- generateId(pc)
command =
CreateFile(iri, dest.project, destStorageRef, destStorageTpe, destFileAttributes, c.subject, dest.tag)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if we want to preserve the original tag.
It would be nice in some way to know that this file has been copied from another

@@ -8,7 +8,7 @@ import ch.epfl.bluebrain.nexus.tests.iam.types.Permission
import io.circe.Json
import org.scalatest.Assertion

class DiskStorageSpec extends StorageSpec {
class DiskStorageSpec extends StorageSpec with CopyFilesSpec {
Copy link
Contributor

@imsdu imsdu Dec 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be other test classes dedicated for the batch operation ?
Those are already complicated, it may be better to split and just keep the creation of storages in common between them ?

@dantb dantb merged commit 50de0ec into BlueBrain:master Dec 18, 2023
8 checks passed
@dantb dantb deleted the copy-files branch December 18, 2023 17:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add a copy operation to files
4 participants