Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cached storage #24

Merged
merged 3 commits into from
Nov 8, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .github/FUNDING.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
github: [aloneguid]
custom: ['https://www.buymeacoffee.com/alonecoffee']
4 changes: 2 additions & 2 deletions .github/workflows/lib.yml
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
name: lib

env:
VERSION: 2.0.2
VERSION: 2.1.0
ASM_VERSION: 2.0.0
VERSION_SUFFIX: '-pre.4'
VERSION_SUFFIX: '-pre.1'
# VERSION_SUFFIX: ''

on:
Expand Down
21 changes: 16 additions & 5 deletions docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -190,6 +190,19 @@ Interactive authentication with user credentials, and managed identities are not

Azure emulator is supported, just use `AzureBlobStorageWithLocalEmulator()` method to connect to it.

### Exotic providers

#### Local disk cache

This storage essentially wraps around another storage to provide content caching capabilities. Example:

```csharp
IFileStorage storage = Files.Of.AzureBlobStorage(accountName, sharedKey);
IFileStorage cachedStorage = Files.Of.LocalDiskCacheStorage(storage);
```

When using `cachedStorage`, all the operations are forwarded to `storage` as is, except for `OpenRead` which downloads content locally and opens a stream to the local file.

## 🦓 Connection Strings

You can also use connection strings, which are useful when implementation type is unknown beforehand, should be configurable, or you just don't want to implement implementation factory yourself. To create a storage using connection string use the following method:
Expand Down Expand Up @@ -257,17 +270,15 @@ I'm a strong advocate of simplicity and not going to repeat the mistake of turni

## ❔ Who?

- Used by:
- [databricks-sql-cli](https://github.com/aloneguid/databricks-sql-cli) - Unofficial Databricks SQL management console.
- [Pocket Bricks](https://www.aloneguid.uk/projects/pocketbricks/) - Databricks client for Android.
- [Stowage Explorer](https://github.com/DaneVinson/StowageExplorer) - experimental explorer project?
- Featured in [The .NET MAUI Podcast, episode 98](https://www.dotnetmauipodcast.com/98).
- Blog post [Exploring Stowage](https://developingdane.com/exploring-stowage/).

Raise a PR to appear here.

## Related Projects

- [R**CLONE**](https://rclone.org/) - cross-platform open-source cloud sync tool.
- [Storage.Net](https://github.com/aloneguid/storage) - the roots of this project.
- [Storage.Net](https://www.aloneguid.uk/projects/storagenet/) - the roots of this project.

## 💰 Contributing

Expand Down
17 changes: 15 additions & 2 deletions docs/release-history.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,23 @@
## 2.0.2
## 2.1.0

### New

- Implemented cached file storage.

### Improved

- Placing entries in memory cache also remembers creation and modification times.
- Placing entries in memory cache calculates MD5.

## 2.0.2

### New

- Connection string for disk (`disk://`) can be passed without path, in which case the instance is created against entire disk.
- Added overload to connect to azure storage using SAS token by @stevehansen in #22

## Improvements
### Improvements

- AWS S3 connection string can now be parameterless, in which case it will default to using default profile from AWS CLI configuration.
- Documentation on connection strings (or most of it) can be found in the readme now.

Expand Down
46 changes: 46 additions & 0 deletions src/Stowage.Test/Integration/Impl/LocalCacheStorageTest.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using Xunit;

namespace Stowage.Test.Integration.Impl {

[Trait("Category", "Integration")]
public class LocalCacheStorageTest {
private readonly IFileStorage _parent;
private readonly ICachedStorage _storage;

public LocalCacheStorageTest() {
_parent = Files.Of.InternalMemory();
_storage = Files.Of.LocalDiskCacheStorage(_parent);
}

[Fact]
public async Task ReadCached() {
await _parent.WriteText(nameof(ReadCached), "test");

string? contentBeforeRm = await _storage.ReadText(nameof(ReadCached));
Assert.Equal("test", contentBeforeRm);
}

[Fact]
public async Task DeletionWillNotReadCached() {

string path = new IOPath(nameof(DeletionWillNotReadCached));

await _parent.WriteText(path, "test");

// state before
Assert.True(await _storage.Exists(path));
Assert.Equal("test", await _storage.ReadText(path));

// delete entry in parent backend
await _parent.Rm(path);

// check entry is not in caching backend
Assert.False(await _storage.Exists(path));
}
}
}
25 changes: 25 additions & 0 deletions src/Stowage/Files.cs
Original file line number Diff line number Diff line change
Expand Up @@ -284,6 +284,31 @@ public static IFileStorage DatabricksDbfsFromLocalProfile(this IFilesFactory _,
return new DatabricksRestClient(profileName);
}

/// <summary>
/// Creates a wrapper around a storage that caches the content on local disk.
/// </summary>
/// <param name="_"></param>
/// <param name="parent">Storage interface to create caching around.</param>
/// <param name="cacheDir">Caching directory. When not specified, will create 'stowage.cache' subdirectory in your OS'es temp directory.</param>
/// <param name="maxAge">Maximum period to keep the files in.</param>
/// <param name="clearOnDispose">Whether to scan storage and run checks and evictions</param>
/// <returns></returns>
public static ICachedStorage LocalDiskCacheStorage(this IFilesFactory _, IFileStorage parent,
string? cacheDir = null,
TimeSpan? maxAge = null,
bool clearOnDispose = false) {

string diskDir = cacheDir ?? Path.Combine(Path.GetTempPath(), "stowage.cache");

var diskBackend = new LocalDiskFileStorage(diskDir);

return new CachedFileContentStorage(parent,
diskBackend,
maxAge ?? TimeSpan.FromHours(1),
clearOnDispose,
true);
}

/// <summary>
/// Sets the logger for the library
/// </summary>
Expand Down
28 changes: 28 additions & 0 deletions src/Stowage/ICachedStorage.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading;
using System.Threading.Tasks;

namespace Stowage {
/// <summary>
/// Storage with caching capabilities.
/// </summary>
public interface ICachedStorage : IFileStorage {
/// <summary>
/// Invalidate a path in the cache.
/// </summary>
/// <param name="path"></param>
/// <param name="cancellationToken"></param>
/// <returns>True if path was successfully invalidated, false if the entry did not exist so there was nothing to do.</returns>
Task<bool> Invaliadate(IOPath path, CancellationToken cancellationToken);

/// <summary>
/// Clears all cache entries
/// </summary>
/// <param name="cancellationToken"></param>
/// <returns></returns>
Task Clear(CancellationToken cancellationToken);
}
}
7 changes: 7 additions & 0 deletions src/Stowage/IFileStorage.cs
Original file line number Diff line number Diff line change
Expand Up @@ -98,6 +98,13 @@ Task<IReadOnlyCollection<IOEntry>> Ls(
/// <returns>Object metadata, or null if object does not exist.</returns>
Task<IOEntry?> Stat(IOPath path, CancellationToken cancellationToken = default);

/// <summary>
/// Renames object
/// </summary>
/// <param name="name"></param>
/// <param name="newName"></param>
/// <param name="cancellationToken"></param>
/// <returns></returns>
Task Ren(IOPath name, IOPath newName, CancellationToken cancellationToken = default);

// todo:
Expand Down
129 changes: 129 additions & 0 deletions src/Stowage/Impl/CachedFileContentStorage.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,129 @@
using System;
using System.Collections.Generic;
using System.IO;
using System.Threading;
using System.Threading.Tasks;

namespace Stowage.Impl {
/// <summary>
/// Storage with a backing cache for content reading. Only reads are cached (<see cref="OpenRead(IOPath, CancellationToken)"/>
/// and nothing else. This means that the parent storage is always the source of truth, and will always be invoked even
/// on read operation in order to check if we still have the valid copy locally
/// </summary>
class CachedFileContentStorage : PolyfilledFileStorage, ICachedStorage {

private readonly IFileStorage _parent;
private readonly IFileStorage _cachingBackend;
private readonly TimeSpan _maxAge;
private readonly bool _cleanupOnDispose;
private readonly bool _disposeCachingBackend;

public CachedFileContentStorage(IFileStorage parent, IFileStorage cachingBackend,
TimeSpan maxAge,
bool cleanupOnDispose,
bool disposeCachingBackend) {
_parent = parent;
_cachingBackend = cachingBackend;
_maxAge = maxAge;
_cleanupOnDispose = cleanupOnDispose;
_disposeCachingBackend = disposeCachingBackend;
}

async Task Cleanup(TimeSpan maxAge, CancellationToken cancellationToken) {
IReadOnlyCollection<IOEntry> allEntries = await _cachingBackend.Ls(null, true, cancellationToken);
foreach(IOEntry entry in allEntries) {
if(entry.LastModificationTime == null || DateTime.UtcNow - entry.LastModificationTime.Value >= maxAge) {
await _cachingBackend.Rm(entry.Path, cancellationToken);
}
}
}

private static IOPath GetCachingPath(IOEntry entry) {
if(entry.LastModificationTime == null)
throw new ArgumentException($"Entry needs to have a {nameof(IOEntry.LastModificationTime)} in order to determine cache validity", nameof(entry));
return new IOPath(entry.Path, entry.LastModificationTime.Value.Ticks.ToString());
}

/// <summary>
///
/// </summary>
public override void Dispose() {

if(_cleanupOnDispose) {
Cleanup(_maxAge, CancellationToken.None).Forget();
}

base.Dispose();

_parent.Dispose();

if(_disposeCachingBackend) {
_cachingBackend.Dispose();
}
}

public override async Task<Stream?> OpenRead(IOPath path, CancellationToken cancellationToken = default) {

IOEntry? entryNow = await _parent.Stat(path, cancellationToken);
if(entryNow == null) {
// entry does not exist
return null;
}

if(entryNow.LastModificationTime == null)
throw new ArgumentException($"Entry needs to have a {nameof(IOEntry.LastModificationTime)} in order to determine cache validity", nameof(entryNow));

TimeSpan age = DateTime.UtcNow - entryNow.LastModificationTime.Value;
IOPath cachePath = GetCachingPath(entryNow);

if(age >= _maxAge) {
await _cachingBackend.Rm(cachePath, cancellationToken);
}

// create cached entry
if(!await _cachingBackend.Exists(cachePath, cancellationToken)) {
using Stream? src = await _parent.OpenRead(path, cancellationToken);
if(src == null) {
return null;
}

// copy file to caching backend
using Stream dest = await _cachingBackend.OpenWrite(cachePath, cancellationToken);
await src.CopyToAsync(dest, cancellationToken);
}

return await _cachingBackend.OpenRead(cachePath, cancellationToken);
}

public override Task<IReadOnlyCollection<IOEntry>> Ls(IOPath? path = null, bool recurse = false, CancellationToken cancellationToken = default) =>
_parent.Ls(path, recurse, cancellationToken);

public override Task<Stream> OpenWrite(IOPath path, CancellationToken cancellationToken = default) => throw new NotImplementedException();

public override async Task Rm(IOPath path, CancellationToken cancellationToken = default) {
await _parent.Rm(path, cancellationToken);
await Invaliadate(path, cancellationToken);
}

public override Task<IOEntry?> Stat(IOPath path, CancellationToken cancellationToken = default) =>
_parent.Stat(path, cancellationToken);

public async Task<bool> Invaliadate(IOPath path, CancellationToken cancellationToken) {
bool dirty = false;

// we are going to list all the versions of the file and remove them
IReadOnlyCollection<IOEntry> entries = await _cachingBackend.Ls(path + IOPath.PathSeparatorString, false, cancellationToken);
foreach(IOEntry entry in entries) {
await _cachingBackend.Rm(entry.Path, cancellationToken);
dirty = true;
if(cancellationToken.IsCancellationRequested)
break;
}

return dirty;
}

public Task Clear(CancellationToken cancellationToken) =>
Cleanup(TimeSpan.Zero, cancellationToken);
}
}
3 changes: 3 additions & 0 deletions src/Stowage/Impl/InMemoryFileStorage.cs
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@

private InMemoryFileStorage() { }

public override Task<IReadOnlyCollection<IOEntry>> Ls(IOPath path, bool recurse = false, CancellationToken cancellationToken = default) {

Check warning on line 50 in src/Stowage/Impl/InMemoryFileStorage.cs

View workflow job for this annotation

GitHub Actions / build

Nullability of type of parameter 'path' doesn't match overridden member (possibly because of nullability attributes).
if(path == null)
path = IOPath.Root;

Expand Down Expand Up @@ -138,6 +138,9 @@
}

tag.entry = path;
tag.entry.CreatedTime = DateTime.UtcNow;
tag.entry.LastModificationTime = tag.entry.CreatedTime;
tag.entry.MD5 = sourceStream.ToByteArray().MD5().ToHexString();
tag.data = sourceStream;

_pathToTag[path] = tag;
Expand Down
Loading