Skip to content

Commit

Permalink
* fix disposing the DbContext created by dbContextDefaultFactory
Browse files Browse the repository at this point in the history
…instead of the `Func<Owned<>>` factory itself

@ `DoWork()`
+ primary ctor param `replyContentImageSaver` to re-insert `ImageInReply` entites with images url filename from newly re-extracted reply contents
@ SimplifyImagesInAllReplyContentsWorker.cs
@ crawler

* using async overload for `DbContext.SaveChangesAsync()`
* add params for captured variables `processEntityCount` & `process` to fix ReSharper inspection `AccessToModifiedClosure`
* renamed from `SaveAndLog()`
@ `SaveThenLog()`
+ param `writingEntitiesAction` for `SaveThenLog()` & move param `writingEntityEntryAction` before it
@ `Transform()`
@ TransformEntityWorker.cs
@ shared
@ c#
  • Loading branch information
n0099 committed Jun 10, 2024
1 parent 0362053 commit efb62cb
Show file tree
Hide file tree
Showing 2 changed files with 28 additions and 16 deletions.
26 changes: 18 additions & 8 deletions c#/crawler/src/Worker/SimplifyImagesInAllReplyContentsWorker.cs
Original file line number Diff line number Diff line change
Expand Up @@ -3,29 +3,39 @@ namespace tbm.Crawler.Worker;
public class SimplifyImagesInAllReplyContentsWorker(
ILogger<SimplifyImagesInAllReplyContentsWorker> logger,
Func<Owned<CrawlerDbContext.NewDefault>> dbContextDefaultFactory,
Func<Owned<CrawlerDbContext.New>> dbContextFactory)
Func<Owned<CrawlerDbContext.New>> dbContextFactory,
ReplyContentImageSaver replyContentImageSaver)
: TransformEntityWorker<CrawlerDbContext, ReplyContent, ReplyContent, Pid>(logger)
{
protected override async Task DoWork(CancellationToken stoppingToken)
{
await using var db = dbContextDefaultFactory().Value();
await using var dbDefaultFactory = dbContextDefaultFactory();
var db = dbDefaultFactory.Value();
foreach (var fid in from e in db.Forums select e.Fid)
{
await using var dbFactory = dbContextFactory();
await Transform(
() => dbContextFactory().Value(fid),
() => dbFactory.Value(fid),
saveByNthEntityCount: 10000,
writingEntityEntry =>
{
var p = writingEntityEntry.Property(e => e.ProtoBufBytes);
p.IsModified = !ByteArrayEqualityComparer.Instance.Equals(p.OriginalValue, p.CurrentValue);
},
readingEntity => readingEntity.Pid,
readingEntity =>
{
var protoBuf = Reply.Parser.ParseFrom(readingEntity.ProtoBufBytes);
ReplyParser.SimplifyImagesInReplyContent(logger, ref protoBuf);
return new() {Pid = readingEntity.Pid, ProtoBufBytes = protoBuf.ToByteArray()};
},
writingEntityEntry =>
{
var p = writingEntityEntry.Property(e => e.ProtoBufBytes);
p.IsModified = !ByteArrayEqualityComparer.Instance.Equals(p.OriginalValue, p.CurrentValue);
},
(writingDb, writingEntities) => replyContentImageSaver
.Save(writingDb, writingEntities.Select(e => new ReplyPost
{
Pid = e.Pid,
Content = null!,
ContentsProtoBuf = Reply.Parser.ParseFrom(e.ProtoBufBytes).Content
})),
stoppingToken);
}
}
Expand Down
18 changes: 10 additions & 8 deletions c#/shared/src/TransformEntityWorker.cs
Original file line number Diff line number Diff line change
Expand Up @@ -28,9 +28,10 @@ public abstract class TransformEntityWorker<TDbContext, TReadingEntity, TWriting
protected async Task Transform(
Func<TDbContext> dbContextFactory,
int saveByNthEntityCount,
Action<EntityEntry<TWritingEntity>> writingEntityEntryAction,
Func<TReadingEntity, TExceptionId> readingEntityExceptionIdSelector,
Func<TReadingEntity, TWritingEntity> entityTransformer,
Action<EntityEntry<TWritingEntity>> writingEntityEntryAction,
Action<TDbContext, IEnumerable<TWritingEntity>> writingEntitiesAction,
CancellationToken stoppingToken = default)
{
var processedEntityCount = 0;
Expand All @@ -45,26 +46,27 @@ protected async Task Transform(
from e in readingDb.Set<TReadingEntity>().AsNoTracking() select e;
var writingEntities = new List<TWritingEntity>();

void SaveAndLog()
async Task SaveThenLog(int processedCount, Process currentProcess)
{
writingDb.Set<TWritingEntity>().UpdateRange(writingEntities);
writingDb.Set<TWritingEntity>().AttachRange(writingEntities);
writingDb.ChangeTracker.Entries<TWritingEntity>().ForEach(writingEntityEntryAction);
var updatedEntityCount = writingDb.SaveChanges();
writingEntitiesAction(writingDb, writingEntities);
var updatedEntityCount = await writingDb.SaveChangesAsync(stoppingToken);
writingEntities.Clear();
writingDb.ChangeTracker.Clear();

logger.LogTrace("processedEntityCount:{} updatedEntityCount:{} elapsed:{}ms processMemory:{}MiB exceptions:{}",
processedEntityCount, updatedEntityCount,
processedCount, updatedEntityCount,
stopwatch.ElapsedMilliseconds,
process.PrivateMemorySize64 / 1024 / 1024,
currentProcess.PrivateMemorySize64 / 1024 / 1024,

Check failure on line 61 in c#/shared/src/TransformEntityWorker.cs

View workflow job for this annotation

GitHub Actions / build (crawler)

The operands in the divisive expression currentProcess.PrivateMemorySize64 / 1024 / 1024 are both integers and result in an implicit rounding. (https://github.com/Vannevelj/SharpSource/blob/master/docs/SS003-DivideIntegerByInteger.md)

Check failure on line 61 in c#/shared/src/TransformEntityWorker.cs

View workflow job for this annotation

GitHub Actions / build (crawler)

The operands in the divisive expression currentProcess.PrivateMemorySize64 / 1024 are both integers and result in an implicit rounding. (https://github.com/Vannevelj/SharpSource/blob/master/docs/SS003-DivideIntegerByInteger.md)

Check failure on line 61 in c#/shared/src/TransformEntityWorker.cs

View workflow job for this annotation

GitHub Actions / build (crawler)

The operands in the divisive expression currentProcess.PrivateMemorySize64 / 1024 / 1024 are both integers and result in an implicit rounding. (https://github.com/Vannevelj/SharpSource/blob/master/docs/SS003-DivideIntegerByInteger.md)

Check failure on line 61 in c#/shared/src/TransformEntityWorker.cs

View workflow job for this annotation

GitHub Actions / build (crawler)

The operands in the divisive expression currentProcess.PrivateMemorySize64 / 1024 are both integers and result in an implicit rounding. (https://github.com/Vannevelj/SharpSource/blob/master/docs/SS003-DivideIntegerByInteger.md)

Check failure on line 61 in c#/shared/src/TransformEntityWorker.cs

View workflow job for this annotation

GitHub Actions / build (imagePipeline)

The operands in the divisive expression currentProcess.PrivateMemorySize64 / 1024 / 1024 are both integers and result in an implicit rounding. (https://github.com/Vannevelj/SharpSource/blob/master/docs/SS003-DivideIntegerByInteger.md)

Check failure on line 61 in c#/shared/src/TransformEntityWorker.cs

View workflow job for this annotation

GitHub Actions / build (imagePipeline)

The operands in the divisive expression currentProcess.PrivateMemorySize64 / 1024 are both integers and result in an implicit rounding. (https://github.com/Vannevelj/SharpSource/blob/master/docs/SS003-DivideIntegerByInteger.md)

Check failure on line 61 in c#/shared/src/TransformEntityWorker.cs

View workflow job for this annotation

GitHub Actions / build (imagePipeline)

The operands in the divisive expression currentProcess.PrivateMemorySize64 / 1024 / 1024 are both integers and result in an implicit rounding. (https://github.com/Vannevelj/SharpSource/blob/master/docs/SS003-DivideIntegerByInteger.md)

Check failure on line 61 in c#/shared/src/TransformEntityWorker.cs

View workflow job for this annotation

GitHub Actions / build (imagePipeline)

The operands in the divisive expression currentProcess.PrivateMemorySize64 / 1024 are both integers and result in an implicit rounding. (https://github.com/Vannevelj/SharpSource/blob/master/docs/SS003-DivideIntegerByInteger.md)

Check failure on line 61 in c#/shared/src/TransformEntityWorker.cs

View workflow job for this annotation

GitHub Actions / build (shared)

The operands in the divisive expression currentProcess.PrivateMemorySize64 / 1024 / 1024 are both integers and result in an implicit rounding. (https://github.com/Vannevelj/SharpSource/blob/master/docs/SS003-DivideIntegerByInteger.md)

Check failure on line 61 in c#/shared/src/TransformEntityWorker.cs

View workflow job for this annotation

GitHub Actions / build (shared)

The operands in the divisive expression currentProcess.PrivateMemorySize64 / 1024 are both integers and result in an implicit rounding. (https://github.com/Vannevelj/SharpSource/blob/master/docs/SS003-DivideIntegerByInteger.md)

Check failure on line 61 in c#/shared/src/TransformEntityWorker.cs

View workflow job for this annotation

GitHub Actions / build (shared)

The operands in the divisive expression currentProcess.PrivateMemorySize64 / 1024 / 1024 are both integers and result in an implicit rounding. (https://github.com/Vannevelj/SharpSource/blob/master/docs/SS003-DivideIntegerByInteger.md)

Check failure on line 61 in c#/shared/src/TransformEntityWorker.cs

View workflow job for this annotation

GitHub Actions / build (shared)

The operands in the divisive expression currentProcess.PrivateMemorySize64 / 1024 are both integers and result in an implicit rounding. (https://github.com/Vannevelj/SharpSource/blob/master/docs/SS003-DivideIntegerByInteger.md)
JsonSerializer.Serialize(exceptions, JsonSerializerOptions));
stopwatch.Restart();
}

foreach (var readingEntity in readingEntities)
{
processedEntityCount++;
if (processedEntityCount % saveByNthEntityCount == 0) SaveAndLog();
if (processedEntityCount % saveByNthEntityCount == 0) await SaveThenLog(processedEntityCount, process);
if (stoppingToken.IsCancellationRequested) break;
try
{
Expand All @@ -85,6 +87,6 @@ void SaveAndLog()
}
}

SaveAndLog();
await SaveThenLog(processedEntityCount, process);
}
}

0 comments on commit efb62cb

Please sign in to comment.