Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize db queries on the front page #420

Merged
merged 7 commits into from
Jun 20, 2018

Conversation

BatedUrGonnaDie
Copy link
Collaborator

The following is the base line setup: 7 pbs in the table, 2 of which have rivals attached to the category. There are 3 followed runs, 2 of those have time differentials to show.

With this pr, I was able to shave ~200ms off the load time in dev env (~750ms -> 550ms), and bring the query count way down (42 uncached, 50 total -> 16 uncached, 29 total).

Let me know if you have any questions about my decisions in any parts of the code. You could technically reduce more calls if you loaded the runs at the same time as loading the rival for the main table, but this seemed like a great start.

Gather all rivals with category id, and search through that instead of searching for a rival every single run iteration, find all pbs in one query using unions, use any? instead of present to only load pb query once, get all user runs in one go, instead of checking every iteration, eager load run game and category, use pb_for and check if present to reduce another query if there is a pb
@@ -24,7 +24,7 @@
- if cols.include?(:uploaded)
th.align-right = th_sorter('Uploaded', 'created_at')
tbody
- runs.includes(:user).each do |run|
- runs.includes(:user, :game, :category).each do |run|
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When ctrl+f'ing the project, every single call to the partial had the name col, so it seemed fine to just include these always.

Copy link
Owner

@glacials glacials left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow this is sick, nice work. wixSanic

Just a few issues and questions

@@ -65,6 +65,7 @@ gem 'cancancan'
gem 'doorkeeper'

# db
gem 'active_record_union'
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow nice, hadn't heard of this before. This seems sick, and can definitely clean up a lot.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the other spot I noticed that had something similar was the game search, where we find game IDs from both games and game aliases. It seemed useful for the long term for some more advanced things.

@@ -4,7 +4,7 @@ class User < ApplicationRecord
include RivalUser
include TwitchUser

has_many :runs
has_many :runs, dependent: :destroy
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We give users the option to keep their runs when deleting:

screenshot 2018-06-19 10 02 06

The code that handles this does nothing to the runs if the user chooses to anonymize them, which seems wrong now that I'm looking at it. So should probably be

  • :nullify if we want to keep that code as-is, or
  • :restrict_with_exception if we want to fix that code to disown all runs, since in that situation, trying to destroy a user with any runs would be a bug since they all should have already been destroyed or disowned.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll revert this one, and make an issue for this. It should probably be restricted with exception though since nullify doesn't make sense if the user wants all runs deleted, so the code that handles this should raise an error if some are left.

has_many :runs, through: :categories
has_many :users, -> { distinct }, through: :runs
has_many :runners, -> { distinct }, through: :runs, class_name: 'User'
has_many :aliases, class_name: 'GameAlias'
has_many :aliases, class_name: 'GameAlias', dependent: :destroy
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I actually can't think of any current or future use case for destroying a game other than the existing merge flow, and in that case any aliases and categories should have already been ported over to the new game. It sounds a bit weird in theory, but realistically I'd consider the destroying of a game that still has aliases or categories to be a bug, so maybe these should be :restrict_with_exception.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This broke the tests, and since it was only supposed to be a rubocop fix, I'll leave this for another time.

@@ -39,9 +39,9 @@ def self.search(term)

# avatar returns the user's avatar, or a default avatar if the user has not set one. It cannot return nil.
def avatar
return "https://splits.io/logo.svg" if read_attribute(:avatar).nil?
return 'https://splits.io/logo.svg' if self[:avatar].nil?
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like self[:avatar] will error if nil, see: rubocop/rails-style-guide#155

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rubocop made it sound identical and I was trying to squash a warning, so I'll just revert this.

runs.where(
id: categories.map { |category| pb_for(category).id10 } | runs.where(category: nil).pluck(:id)
)
Run.where(user: self).select('DISTINCT ON (category_id) *').order('category_id, realtime_duration_ms ASC')
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason for using Run.where(user: self) here over runs?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, when I started testing stuff I was working with joins which needs to be done on User, and then got stuck in that mindset. I'll take a look and see if this and the one below are equivalent.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doing it like you said, i was able to add an extra where clause to make it so there will not be any duplicates, and then use union_all which doesn't have a de-duplication step which has a bit better performance.

id: categories.map { |category| pb_for(category).id10 } | runs.where(category: nil).pluck(:id)
)
Run.where(user: self).select('DISTINCT ON (category_id) *').order('category_id, realtime_duration_ms ASC')
.union(Run.where(user: self, category: nil))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question as above, also there's the runs.by_category(nil) scope available if that is equivalent to this

@@ -76,6 +75,11 @@ def to_s
twitch_display_name || name || 'somebody'
end

def rivals
User.joins('INNER JOIN rivalries ON rivalries.to_user_id = users.id AND rivalries.from_user_id = ' + id.to_s)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of string concatenation, use SQL interpolation like

User.joins('INNER JOIN ... AND rivalries.from_user_id = ?', id)

which will sanitize inputs to prevent SQL injection. It's just IDs in this case, but it's still a good practice (and who knows whether there might be a separate attack in the future to inject or set custom IDs).

@@ -76,6 +75,11 @@ def to_s
twitch_display_name || name || 'somebody'
end

def rivals
User.joins('INNER JOIN rivalries ON rivalries.to_user_id = users.id AND rivalries.from_user_id = ' + id.to_s)
.select('users.*, rivalries.category_id')
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose of this select?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To have access to category_id in the returned relation object. Since rivalries aren't global like twitch follows, they only make sense within the context of a category. Otherwise we would get back a list of users and have no way to tell which user goes to which run. This way we get back a list of all users (which might have more than one of the same user), and then be able to iterate on them and figure out what category to then look up easily.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I'm confused. I understand the need to correlate a User with a Category, but doesn't this User#rivals function return an array of ActiveRecord Users? But later I see you calling both .category_id (a Rivalry method) and .pb_for (a User method) on them. What type of object are these?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It returns an array of Users that have the category_id that they are a rival in embeded in them. That is why we do the weird select statement .select('users.*, rivalries.category_id'). It selects all fields from the user, and the category_id field from the rivalries table and merges it into one record.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoa. I had no idea that was a thing Rails could do. It's like a... frankenstein object. Not sure how I feel about that, lol. Is it possible / still efficient to pass around Rivalrys as an alternative?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this pr, it never loads any of the Rivalry objects into memory, it just loads your rival user frankenstein objects. It could probably have user.rivals return a tuple of the User and Rival objects, with only one more db request (one to actually retrieve all rivals), and then zip them with the users (just remove the select statement from current code). I'm not sure how clean that would be though. Up to you on what you think is best.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, nevermind I'm just retarded. Adds one sql query but is much cleaner.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol ok nice

@@ -43,8 +43,8 @@
- if cols.include?(:rival)
td.align-right
small
- rival = run.user.rival_for(run.category)
- if rival.present? && rival.runs?(run.category)
- rival = run.user.rivals.detect { |r| r.category_id == run.category_id }
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this better because it caches the run.user.rivals query for use in all table rows?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, it caches the rivals call for every loop. I'm not sure if it caches the array that it turns it into though, so if you wanted maximum performance you could call rivals once at the top, and then use that local variable for each loop iteration.

This still requires an extra call to get the rivals run, just like before, but we eliminate that first call for every run.

use union_all now since we know there will not be any duplicates, change dependents as well
Copy link
Owner

@glacials glacials left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SeemsGood

@BatedUrGonnaDie BatedUrGonnaDie merged commit cadb558 into glacials:master Jun 20, 2018
@BatedUrGonnaDie BatedUrGonnaDie deleted the optimize-front-page branch June 20, 2018 23:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants