Skip to content

Commit

Permalink
Explicitly casting user ids to string
Browse files Browse the repository at this point in the history
This is to improve query performance. We used user_ids as array keys
in the implementations and PHP converted them to integers.
When integer is passed as a query parameter to the string column,
index is not used and query performance is hit drastically.

We needed to explicitly cast these external IDs as strings
to the index is used correctly.

remp/remp#1088
  • Loading branch information
rootpd committed Feb 21, 2022
1 parent 22c10ba commit b23e616
Show file tree
Hide file tree
Showing 2 changed files with 19 additions and 7 deletions.
23 changes: 17 additions & 6 deletions Beam/app/Console/Commands/ComputeAuthorsSegments.php
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,7 @@ class ComputeAuthorsSegments extends Command

public function handle()
{
ini_set('memory_limit', -1);
// Using Cursor on large number of results causing memory issues
// https://github.com/laravel/framework/issues/14919
DB::connection()->getPdo()->setAttribute(PDO::MYSQL_ATTR_USE_BUFFERED_QUERY, false);
Expand Down Expand Up @@ -253,15 +254,22 @@ private function aggregatedPageviewsFor($groupParameter)

private function groupDataFor($groupParameter)
{
$this->line("Computing total pageviews for parameter '$groupParameter'");
$this->getOutput()->write("Computing total pageviews for parameter '$groupParameter': ");
$totalPageviews = $this->aggregatedPageviewsFor($groupParameter);
$this->line("Done");
$this->line(count($totalPageviews));

$segments = [];
$this->line("Computing segment items for parameter '$groupParameter'");

foreach (array_chunk($totalPageviews, 500, true) as $totalPageviewsChunk) {
$forItems = array_keys($totalPageviewsChunk);
$total = count($totalPageviews);
$processed = 0;
$step = 500;

$bar = $this->output->createProgressBar(count($totalPageviews));
$bar->setFormat('%message%: %current%/%max% [%bar%] %percent:1s%% %elapsed:6s%/%estimated:-6s% %memory:6s%');
$bar->setMessage("Computing segment items for parameter '$groupParameter'");

foreach (array_chunk($totalPageviews, $step, true) as $totalPageviewsChunk) {
$forItems = array_map('strval', array_keys($totalPageviewsChunk));

$queryItems = DB::table(ArticleAggregatedView::getTableName())->select(
$groupParameter,
Expand Down Expand Up @@ -289,9 +297,12 @@ private function groupDataFor($groupParameter)
$segments[$item->author_id][] = $item->$groupParameter;
}
}

$processed += $step;
$bar->setProgress($processed);
}

$this->line("Done");
$bar->finish();

return $segments;
}
Expand Down
3 changes: 2 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,8 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/) and this p
- Fixed retrieval of browser_id in `conversions:aggregate-events` command which leads to more thorough definition of user's conversion path. remp/remp#1049
- Previously some events (mainly pageviews) could have been not matched correctly and missing in the aggregated data.
- Fixed occasional incorrect page_progress parameter being tracked causing progress update not to be tracked at all.
- Due to JS floating points being JS floating points sometimes the page_progress was >1 which server refused to accept.
- Due to JS floating points being JS floating points sometimes the page_progress was >1 which server refused to accept.
- Fixed issues with very slow author/section segment recalculation for instances with bigger amount of data. remp/remp#1088

## [0.30.0] - 2022-02-10

Expand Down

0 comments on commit b23e616

Please sign in to comment.