You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As we've discussed earlier, I decided to build my own S3 integration rather than using your library. This issue explains the problem I have and why I chose not to use your library.
We've got a service that handles file uploads from a client. It gets the stream of data from the client and send it directly to an object in S3. We've recently introduced parallel uploads in the client and saw some reliability issues in our service. Instead of increasing the size of the machine, I decided to look if we could stream the bytes from the client throught the service all the way into the S3 bucket. We did it but it was not w/o issues.
The original implementation did something along the lines of your implementation: load the bytes into memory and then send that Array[Byte] to S3.
There are two downside to streaming directly in S3:
you have to know the size ahead of time - it does not work for unbounded streams of bytes
the implementation we came up with can't be automatically retried (as opposed to the original one (and yours))
For 1, we have files written to disk on the client size, so we know the size.
For 2, we decided 503 from the service if the upload to S3 fails so the client can retry. This is heavier than retrying locally (in the service), but it happens very occasionally, so we're okay with that.
Here is a full implementation:
//>usinglib"software.amazon.awssdk:s3:2.22.1"//>usinglib"org.typelevel::toolkit:0.1.20"importcats.effect.IOimportcats.effect.IOAppimportcats.effect.kernel.Resourceimportfs2.io.file.Filesimportfs2.io.file.Pathimportsoftware.amazon.awssdk.core.async.AsyncRequestBodyimportsoftware.amazon.awssdk.core.exception.NonRetryableExceptionimportsoftware.amazon.awssdk.regions.Regionimportsoftware.amazon.awssdk.services.s3.S3AsyncClientimportsoftware.amazon.awssdk.services.s3.model.PutObjectRequestimportjava.io.OutputStreamfinalcaseclassToUpload(
bucket: String,
nameInBucket: String,
size: Long,
data: fs2.Stream[IO, Byte]
)
/** Downsides: * - you have to know the file size in advance * - using `forBlockingOutputStream` implies you can't rely on automatic S3 * client retries * - suboptimal for large files as a failure means you need to re-upload * everything*/objectMainextendsIOApp.Simple {
valpath=Path("./s3-upload.scala")
valrun= upload(path, "bucket", "pathInBucket")
defupload(p: Path, bucket: String, nameInBucket: String):IO[Unit] = prepareS3().use { s3 =>for {
toUpload <- prepare(p, bucket, nameInBucket)
_ <- streamToS3(s3, toUpload)
} yield ()
}
/** This will throw and @NonRetryableException if upload to S3 fails.*/defstreamToS3(s3: S3AsyncClient, toUpload: ToUpload):IO[Unit] = {
valputOb=PutObjectRequest
.builder()
.bucket(toUpload.bucket)
.key(toUpload.nameInBucket)
.build()
IO.delay(AsyncRequestBody.forBlockingOutputStream(toUpload.size)).flatMap {
arb =>valout=IO.blocking(arb.outputStream()).map(x => (x: OutputStream))
valpipe= fs2.io.writeOutputStream(out, closeAfterUse =true)
valwriteRes= toUpload.data.through(pipe).compile.drain.background
writeRes.use { outcome =>valsendReq=IO
.fromCompletableFuture(IO.delay(s3.putObject(putOb, arb)))
.void
sendReq *> outcome.flatMap(_.embedError)
}
}
}
defprepareS3():Resource[IO, S3AsyncClient] = {
Resource.fromAutoCloseable(
IO.blocking(
S3AsyncClient.builder().region(Region.US_EAST_1).build()
)
)
}
defprepare(p: Path, bucket: String, nameInBucket: String):IO[ToUpload] = {
Files[IO].size(p).map { size =>ToUpload(bucket, nameInBucket, size, Files[IO].readAll(p))
}
}
}
I've created this so that you can play with it yourself and see if this is something you'd like to see in your library
The text was updated successfully, but these errors were encountered:
Hey, thanks for the library it's great.
As we've discussed earlier, I decided to build my own S3 integration rather than using your library. This issue explains the problem I have and why I chose not to use your library.
We've got a service that handles file uploads from a client. It gets the stream of data from the client and send it directly to an object in S3. We've recently introduced parallel uploads in the client and saw some reliability issues in our service. Instead of increasing the size of the machine, I decided to look if we could stream the bytes from the client throught the service all the way into the S3 bucket. We did it but it was not w/o issues.
The original implementation did something along the lines of your implementation: load the bytes into memory and then send that
Array[Byte]
to S3.There are two downside to streaming directly in S3:
For 1, we have files written to disk on the client size, so we know the size.
For 2, we decided 503 from the service if the upload to S3 fails so the client can retry. This is heavier than retrying locally (in the service), but it happens very occasionally, so we're okay with that.
Here is a full implementation:
I've created this so that you can play with it yourself and see if this is something you'd like to see in your library
The text was updated successfully, but these errors were encountered: