Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial API #2

Open
wants to merge 12 commits into
base: MOODLE_402_STABLE
Choose a base branch
from
10 changes: 10 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# .github/workflows/ci.yml
name: ci

on: [push, pull_request]

jobs:
ci:
uses: catalyst/catalyst-moodle-workflows/.github/workflows/ci.yml@main
with:
disable_phpunit: true # There are no phpunit tests, and this breaks the Moodle CI if phpunit runs and there are no tests.
8 changes: 2 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,3 @@
> [!CAUTION]
> This plugin is under development and is currently not ready for general use.

# Azure Blob Storage SDK - Moodle Plugin

A moodle plugin with functions to interact with the Microsoft Azure Blob Storage service.
Expand All @@ -16,7 +13,7 @@ This is mainly used as a dependency when using Azure storage with tool_objectfs,

| Branch | Version support | PHP Version |
| ---------------- | --------------- | ------------ |
| MOODLE_44_STABLE | 4.4 + | 8.1.0+ |
| MOODLE_402_STABLE | 4.2 + | 8.0.0+ |

## Installation

Expand All @@ -30,7 +27,7 @@ git clone https://github.com/catalyst/moodle-local_azureblobstorage local/azureb
## How to use
There are two usage options:
1. Call the API functions directly e.g. `api::put_blob`
2. Register the stream wrapper, which allows you to use PHP's built in file methods to move files e.g. `copy('/file.txt', 'blob://container/file')`
2. Register the stream wrapper, which allows you to use PHP's built in file methods to move files e.g. `copy('/file.txt', 'azure://container/file')`

# Crafted by Catalyst IT

Expand All @@ -52,4 +49,3 @@ If you would like commercial support or would like to sponsor additional improve
to this plugin please contact us:

https://www.catalyst-au.net/contact-us

344 changes: 344 additions & 0 deletions classes/api.php
Original file line number Diff line number Diff line change
@@ -0,0 +1,344 @@
<?php
// This file is part of Moodle - http://moodle.org/
//
// Moodle is free software: you can redistribute it and/or modify
// it under the terms of the GNU General Public License as published by
// the Free Software Foundation, either version 3 of the License, or
// (at your option) any later version.
//
// Moodle is distributed in the hope that it will be useful,
// but WITHOUT ANY WARRANTY; without even the implied warranty of
// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
// GNU General Public License for more details.
//
// You should have received a copy of the GNU General Public License
// along with Moodle. If not, see <http://www.gnu.org/licenses/>.

namespace local_azureblobstorage;

use GuzzleHttp\Client;
use GuzzleHttp\Promise\Promise;
use GuzzleHttp\Promise\PromiseInterface;
use GuzzleHttp\Promise\Utils;
use GuzzleHttp\Psr7\Request;
use Psr\Http\Message\StreamInterface;
use coding_exception;
use GuzzleHttp\Exception\RequestException;

/**
* Azure blob storage API.
*
* This class is intended to generically implement basic blob storage operations (get,put,delete,etc...)
* which can then be referenced in other plugins.
*
* @package local_azureblobstorage
* @author Matthew Hilton <[email protected]>
* @copyright 2024 Catalyst IT
* @license http://www.gnu.org/copyleft/gpl.html GNU GPL v3 or later
*/
class api {
/**
* @var Client Guzzle HTTP client for making requests
*/
private Client $client;

/**
* @var int Threshold before blob uploads using multipart upload.
*/
const MULTIPART_THRESHOLD = 32 * 1024 * 1024; // 32MB.

/**
* @var int Number of bytes per multipart block.
*
* As of 2019-12-12 api version the max size is 4000MB.
* @see https://learn.microsoft.com/en-us/rest/api/storageservices/understanding-block-blobs--append-blobs--and-page-blobs#about-block-blobs
*/
const MULTIPART_BLOCK_SIZE = 32 * 1024 * 1024; // 32MB.

/**
* @var int Maximum number of blocks allowed. This is set by Azure.
* @see https://learn.microsoft.com/en-us/rest/api/storageservices/understanding-block-blobs--append-blobs--and-page-blobs#about-block-blobs
*/
const MAX_NUMBER_BLOCKS = 50000;

/**
* @var int Maximum block size. This is set by azure
* @see https://learn.microsoft.com/en-us/azure/storage/blobs/scalability-targets
*/
const MAX_BLOCK_SIZE = 50000 * 4000 * 1024; // 50,000 x 4000 MB blocks, approx 190 TB

/**
* @var string the default content type if none is given.
*/
const DEFAULT_CONTENT_TYPE = 'application/octet-stream';

/**
* Create a API
* @param string $account Azure storage account name
* @param string $container Azure storage container name (inside the given storage account).
* @param string $sastoken SAS (Shared access secret) token for authentication.
* @param bool $redactsastoken If should react SAS token from error messages to avoid accidental leakage.
*/
public function __construct(
/** @var string Azure storage account name */
public string $account,
/** @var string Azure storage container name */
public string $container,
/** @var string SAS token for authentication */
public string $sastoken,
/** @var bool If should redact SAS token from error messages to avoid accidental leakage */
public bool $redactsastoken = true
) {
$this->client = new Client();
}

/**
* URL for blob
* @param string $blobkey key of blob
* @return string
*/
private function build_blob_url(string $blobkey): string {
return 'https://' . $this->account . '.blob.core.windows.net/' . $this->container . '/' . $blobkey . '?' . $this->sastoken;
}

/**
* Blob block URL. Blocks are 'pieces' of a blob.
* @param string $blobkey key of blob
* @param string $blockid id of block. Note, for each blob, every blockid must have the exact same length and is base64 encoded.
* @see https://learn.microsoft.com/en-us/rest/api/storageservices/put-block
* @return string
*/
private function build_blob_block_url(string $blobkey, string $blockid): string {
return $this->build_blob_url($blobkey) . '&comp=block&blockid=' . $blockid;
}

/**
* Builds block list url. Block list of a list of blocks.
* @param string $blobkey key of blob
* @return string
*/
private function build_blocklist_url(string $blobkey): string {
return $this->build_blob_url($blobkey) . '&comp=blocklist';
}

/**
* Build blob properties URL.
* @param string $blobkey key of blob
* @return string
*/
private function build_blob_properties_url(string $blobkey): string {
return $this->build_blob_url($blobkey) . '&comp=properties';
}

/**
* Get blob.
* @param string $key blob key
* @return PromiseInterface Promise that resolves a ResponseInterface value where the body is a stream of the blob contents.
*/
public function get_blob_async(string $key): PromiseInterface {
// Enable streaming response, useful for large files e.g. videos.
return $this->client->getAsync($this->build_blob_url($key), ['stream' => true])
->then(null, $this->clean_exception_sas_if_needed());
}

/**
* Get blob properties.
* @param string $key blob key
* @return PromiseInterface Promise that resolves a ResponseInterface value where the properties are in the response headers.
*/
public function get_blob_properties_async(string $key): PromiseInterface {
return $this->client->headAsync($this->build_blob_url($key))->then(null, $this->clean_exception_sas_if_needed());
}

/**
* Deletes a given blob
* @param string $key blob key
* @return PromiseInterface Promise that resolves once the delete request succeeds.
*/
public function delete_blob_async(string $key): PromiseInterface {
return $this->client->deleteAsync($this->build_blob_url($key))->then(null, $this->clean_exception_sas_if_needed());
}

/**
* Put (create/update) blob.
* Note depending on the size of the stream, it may be uploaded via single or multipart upload.
*
* @param string $key blob key
* @param StreamInterface $contentstream the blob contents as a stream
* @param string $md5 binary md5 hash of file contents. You likely need to call hex2bin before passing in here.
* @param string $contenttype Content type to set for the file.
* @return PromiseInterface Promise that resolves a ResponseInterface value.
*/
public function put_blob_async(string $key, StreamInterface $contentstream, string $md5,
string $contenttype = self::DEFAULT_CONTENT_TYPE): PromiseInterface {
if ($this->should_stream_upload_multipart($contentstream)) {
return $this->put_blob_multipart_async($key, $contentstream, $md5, $contenttype);
} else {
return $this->put_blob_single_async($key, $contentstream, $md5, $contenttype);
}
}

/**
* Puts a blob using single upload. Suitable for small blobs.
*
* @param string $key blob key
* @param StreamInterface $contentstream the blob contents as a stream
* @param string $md5 binary md5 hash of file contents. You likely need to call hex2bin before passing in here.
* @param string $contenttype Content type to set for the file.
* @return PromiseInterface Promise that resolves a ResponseInterface value.
*/
public function put_blob_single_async(string $key, StreamInterface $contentstream, string $md5,
string $contenttype = self::DEFAULT_CONTENT_TYPE): PromiseInterface {
return $this->client->putAsync(
$this->build_blob_url($key),
[
'headers' => [
'x-ms-blob-type' => 'BlockBlob',
'x-ms-blob-content-type' => $contenttype,
'content-md5' => base64_encode($md5),
],
'body' => $contentstream,
]
)->then(null, $this->clean_exception_sas_if_needed());
}

/**
* Puts a blob using multipart/block upload. Suitable for large blobs.
* This is done by splitting the blob into multiple blocks, and then combining them using a BlockList on the Azure side
* before finally setting the final md5 by setting the blob properties.
*
* @param string $key blob key
* @param StreamInterface $contentstream the blob contents as a stream
* @param string $md5 binary md5 hash of file contents. You likely need to call hex2bin before passing in here.
* @param string $contenttype Content type to set for the file.
* @return PromiseInterface Promise that resolves when complete. Note the response is NOT available here,
* because this operation involves many separate requests.
*/
public function put_blob_multipart_async(string $key, StreamInterface $contentstream, string $md5,
string $contenttype = self::DEFAULT_CONTENT_TYPE): PromiseInterface {
// We make multiple calls to the Azure API to do multipart uploads, so wrap the entire thing
// into a single promise.
$entirepromise = new Promise(function() use (&$entirepromise, $key, $contentstream, $md5, $contenttype) {
// Split into blocks.
$counter = 0;
$blockids = [];
$promises = [];

while (true) {
$content = $contentstream->read(self::MULTIPART_BLOCK_SIZE);

// Finished reading, nothing more to upload.
if (empty($content)) {
break;
}

// Each block has its own md5 specific to itself.
$blockmd5 = base64_encode(hex2bin(md5($content)));

// The block ID must be the same length regardles of the counter value.
// So pad them with zeros.
$blockid = base64_encode(
str_pad($counter++, 6, '0', STR_PAD_LEFT)
);

$request = new Request('PUT', $this->build_blob_block_url($key, $blockid), ['content-md5' => $blockmd5], $content);
$promises[] = $this->client->sendAsync($request)->then(null, $this->clean_exception_sas_if_needed());
$blockids[] = $blockid;
};

if (count($blockids) > self::MAX_NUMBER_BLOCKS) {
throw new coding_exception("Max number of blocks reached, block size too small ?");
}

// Will throw exception if any fail - if any fail we want to abort early.
Utils::unwrap($promises);

// Commit the blocks together into a single blob.
$body = $this->make_block_list_xml($blockids);
$bodymd5 = base64_encode(hex2bin(md5($body)));
$request = new Request('PUT', $this->build_blocklist_url($key),
['Content-Type' => 'application/xml', 'content-md5' => $bodymd5], $body);
$this->client->sendAsync($request)->then(null, $this->clean_exception_sas_if_needed())->wait();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are only using sync operations in general. Does is make sense to use the sync API instead?

Copy link
Collaborator Author

@matthewhilton matthewhilton Nov 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need the async api so we can use ->then to catch the exception and handle it - with the sync api I don't think this is possible without standard try/catch which is a lot messier


// Now it is combined, set the md5 and content type on the completed blob.
$request = new Request('PUT', $this->build_blob_properties_url($key), [
'x-ms-blob-content-md5' => base64_encode($md5),
'x-ms-blob-content-type' => $contenttype,
]);
$this->client->sendAsync($request)->then(null, $this->clean_exception_sas_if_needed())->wait();

// Done, resolve the entire promise.
$entirepromise->resolve('fulfilled');
});

return $entirepromise;
}

/**
* If the stream should upload using multipart upload.
* @param StreamInterface $stream
* @return bool
*/
private function should_stream_upload_multipart(StreamInterface $stream): bool {
return $stream->getSize() > self::MULTIPART_THRESHOLD;
}

/**
* Generates a blocklist XML.
* @see https://learn.microsoft.com/en-us/rest/api/storageservices/put-block-list#request-body
* @param array $blockidlist list of block ids.
* @return string blocklist xml string.
*/
private function make_block_list_xml(array $blockidlist): string {
// We use 'Latest' since we don't care about committing different
// blob block versions - we always want the latest.
$string = "<?xml version=\"1.0\" encoding=\"utf-8\"?>\n<BlockList>";
foreach ($blockidlist as $blockid) {
$string .= "\n<Latest>" . $blockid . '</Latest>';
}
$string .= "\n</BlockList>";
return $string;
}

/**
* Returns a request exception handling function that redacts the SAS token from error messages if needed.
* @return callable
*/
private function clean_exception_sas_if_needed(): callable {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we just call this clean_exception_sas_token? seems a bit cleaner, I dont think the if_needed is required

Copy link
Collaborator Author

@matthewhilton matthewhilton Nov 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i mean, it doesn't 100% of the time clean the exception sas, only if the flag/setting is enabled. So technically is is if_needed - while a bit verbose I think its good as it conveys it doesn't do it 100% of the time

return function(RequestException $ex) {
if ($this->redactsastoken) {
$newmsg = str_replace($this->sastoken, '[SAS TOKEN REDACTED]', $ex->getMessage());
$exceptiontype = get_class($ex);
throw new $exceptiontype($newmsg, $ex->getRequest(), $ex->getResponse(), $ex, $ex->getHandlerContext());
}
throw $ex;
};
}

/**
* Returns the unix timestamp when the sas token expires.
* @return int|null unix timestamp, or null if unable to parse.
*/
public function get_token_expiry_time(): ?int {
// Parse the sas token (it just uses url parameter encoding).
$parts = [];
parse_str($this->sastoken, $parts);

// Get the 'se' part (signed expiry).
if (!isset($parts['se'])) {
// Assume expired (malformed).
return null;
}

// Parse timestamp string into unix timestamp int.
$expirystr = $parts['se'];
$parsed = strtotime($expirystr);

if ($parsed === false) {
// Failed to parse string time.
return null;
}

return $parsed;
}
}
Loading
Loading