Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add instrumentation #1001

Open
wants to merge 14 commits into
base: main
Choose a base branch
from
Open

Add instrumentation #1001

wants to merge 14 commits into from

Conversation

dleviminzi
Copy link
Collaborator

@dleviminzi dleviminzi commented Feb 28, 2025

Resolve BE-2377

@dleviminzi dleviminzi marked this pull request as draft February 28, 2025 22:35
@dleviminzi dleviminzi changed the title Dlm/add instrumentation Add instrumentation Mar 3, 2025
@dleviminzi dleviminzi marked this pull request as ready for review March 3, 2025 19:06
conn, err := network.ConnectToHost(rb.ctx, address, timeout, rb.tailscale, rb.tsConfig)
if err != nil {
return nil, err
}
// Use goroutine to avoid potential blocking from rate limiter's internal mutex
go metrics.RecordDialTime(time.Since(start))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a bit confused on this, why would it block? the rate limiter doesn't just return if its not "allowed". I could see that other weird issues

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a mutex inside the rate limiter to acquire the count. Its likely a small and quick acquisition, but it could still block

@@ -68,7 +68,7 @@ type Gateway struct {
WorkerPoolRepo repository.WorkerPoolRepository
EventRepo repository.EventRepository
Tailscale *network.Tailscale
metricsRepo repository.MetricsRepository
metricsRepo repository.UsageRepository
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be called usageRepo here?

@@ -66,7 +66,7 @@ func NewSchedulerForTest() (*Scheduler, error) {
workerPoolManager: workerPoolManager,
requestBacklog: requestBacklog,
containerRepo: containerRepo,
schedulerMetrics: schedulerMetrics,
schedulerUsage: schedulerMetrics,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

names should match

}
}
log.Info().Str("local_cache_path", localCachePath).Msg("local cache path")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need this log anymore? this was for debugging right

@@ -293,7 +300,14 @@ func (c *ImageClient) BuildAndArchiveImage(ctx context.Context, outputLogger *sl
if err != nil {
return err
}
ociImageInfo, err := os.Stat(ociPath)
if err != nil {
log.Warn().Err(err).Msg("unable to inspect image size")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this fails, won't the following code not work? float64(ociImageInfo.Size()) / 1024 / 1024 or will it just report 0MB?

log.Warn().Err(err).Msg("unable to inspect image size")
}
bundleMB := float64(bundleInfo.Size()) / 1024 / 1024
metrics.RecordImageUnpackSpeed(bundleMB, time.Since(startTime))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as above


outputLogger.Info("Unpacking image...\n")
tmpBundlePath := filepath.Join(baseTmpBundlePath, request.ImageId)
err = c.unpack(ctx, baseImage.Repo, baseImage.Tag, tmpBundlePath)
err = c.unpack(ctx, baseImage.Repo, baseImage.Tag, tmpBundlePath, imageMB)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe call this imageSizeMB

}

func (c *ImageClient) unpack(ctx context.Context, baseImageName string, baseImageTag string, bundlePath string) error {
func (c *ImageClient) unpack(ctx context.Context, baseImageName string, baseImageTag string, bundlePath string, imageMB float64) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as above. although its kinda weird that we should even have to pass this in here, feels wrong.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe would push this out

@@ -56,7 +56,7 @@ type Worker struct {
containerLock sync.Mutex
containerWg sync.WaitGroup
containerLogger *ContainerLogger
workerMetrics *WorkerMetrics
workerMetrics *WorkerUsage
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

workerUsageMetrics? or workerUsage? I think in general all of these places where we have the repo called usage etc, it might still make sense to call it UsageMetrics. Usage alone isn't really clear imo

@dleviminzi dleviminzi requested a review from luke-lombardi March 4, 2025 19:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants