Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new Rails/IndexNames cop #1380

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

corsonknowles
Copy link

@corsonknowles corsonknowles commented Oct 22, 2024

This cop enforces the standard index naming provided by Rails, allowing developers to enjoy a standardized schema.

A bonus impact of following this convention is that fully duplicated indexes are prevented by default.

Custom index names were once a required workaround for default names that were too long for various database implementations. That could happen because of long table names, long field names, or long lists of compound index keys.

However, that problem has been fully resolved. The current automated naming approach will deterministically abbreviate indexes that would have been too long before.

Background:


Before submitting the PR make sure the following are checked:

  • The PR relates to only one subject with a clear title and description in grammatically correct, complete sentences.
  • Wrote good commit messages.
  • Commit message starts with [Fix #issue-number] (if the related issue exists).
  • Feature branch is up-to-date with master (if not - rebase it).
  • Squashed related commits together.
  • Added tests.
  • Ran bundle exec rake default. It executes all tests and runs RuboCop on its own code.
  • Added an entry (file) to the changelog folder named {change_type}_{change_description}.md if the new code introduces user-observable changes. See changelog entry format for details.
  • If this is a new cop, consider making a corresponding update to the Rails Style Guide.

@Earlopain
Copy link
Contributor

Earlopain commented Oct 23, 2024

Hi, thanks for the PR! I don't think I've ever bothered with manually naming my indicies but can you just generalise this?

Generally, I would say let users name their indicies as they wish. Indicies don't have to be only over a column, they can have query constraints, modify the column (lower(column_name)), use different index strategies (gin, gist, etc.). Per the PR you linked, it doesn't look like rails will respect any of these options in the index name.


I have some general feedback for your PR. I would wait until more feedback until adressing this:

  • This cop should not run over the whole codebase. Limit it only to migrations, same as ThreeStateBooleanColumn and similar cops.
  • If only in Rails 7.1 the index is safe to create, this should not run over migrations generated with older rails.
  • It might be a good idea to add config StartAfter. It's a bit awkward right now, there are a few PRs wanting to do this but none have been merged so there is no common module yet. Add new Rails/DateAndTimeColumnName cop #335 Add new Rails/ForeignKeyName cop #277 Add new Rails/DateAndTimeColumnName cop #335
  • You can also add indicies with add_index, these should be an offense as well.
  • adoc is automatically generated. Please drop these changes.

@corsonknowles
Copy link
Author

Thanks for the great comments @Earlopain with pointers as well!

All addressed, and I added specs to confirm no offenses both for earlier Rails versions and earlier Migrations versions.

@Earlopain
Copy link
Contributor

Earlopain commented Oct 23, 2024

That was quick, cool. One additional thing I forgot about: autocorrect will be unsafe since it changes index names which may be referenced later. It might be best to just not support autocorrect. (also remove_index exists).

In particular I'm aware of this pattern that good_job does, where it first checks if a index exists before creating it. I'm not entirely sure why, seems to only matter when a migraiton is run more than once in the same direction, but it's pobably valid in some way: https://github.com/bensheldon/good_job/blob/956e872d321df21738c599aa1a86a8514e05a7ef/demo/db/migrate/20240518175058_create_good_job_process_lock_indexes.rb

I'd also see this as one counter-argument for introducing such a cop. I would be pretty uncomfortable referecing the auto-generated index name without being explicit about it somewhere.

@corsonknowles
Copy link
Author

Okay, @Earlopain, much to my surprise it does seem rather possible to generalize and accommodate valid reasons for a custom name that could lead to duplication of indexes with the same column names but different handling or properties (e.g. fulltext as another example where you may want both a fulltext and non fulltext index on the same columns).

I've added an allowlist, and enumerated in a comment the current keys that I think don't justify a custom name as well, for scrutiny.

Comments and improvements welcome!

MSG = 'Avoid specifying a custom name for common indexes. Let Rails handle the index name automatically.'
RESTRICT_ON_SEND = %i[index add_index].freeze
# Keys that do not justify a duplicative index or a custom name:
# unique, length, order, opclass, nulls_not_distinct
Copy link
Author

@corsonknowles corsonknowles Oct 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My reasoning is that even if we were rolling out a replacement index adding one of these properties, the rubocop: disable statement would serve as a valid reminder to come back and remove the superseded index.

If I misunderstood one of these and there exists a good reason to keep 2 indexes (both with and without this property) on identical columns for the long term, please let me know and I will update accordingly by moving it to the allowlist.

I think opclass is the most debatable, happy to go either way -- since using is included as a valid reason, I didn't think opclass needed to be on its own.

I don't want this allow list to distract from the main issue, as the vast majority of indexes are created without any special parameters beyond :unique and :length

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bensheldon/good_job#1213 introduces the same index with different ordering for good_job so there definitly can be use-cases where it is appropriate. But it of course depends on each use-case. Personally I'm not qualified enough to say it one way or the other (except where I observed it like above). There may be valid use-cases for the other options as well

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I'm going to move order and opclass into the allow list to make this rollout smoother.

I think we can make a strong case that no one needs the same keys to be both a unique and a non-unique index at the same time in the long term. Similarly, we should not need the same columns to have indexes with different lengths specified. Adding nulls_not_distinct also looks like it would result in duplication only during a migration, as it is strictly stricter than the normal unique index . With this option, NULL is checked for uniqueness just like any other field.

@corsonknowles
Copy link
Author

corsonknowles commented Oct 23, 2024

Thanks @Earlopain, I'm going to mark this unsafe and come up with some relevant description. It's never safe to edit checked in or run migrations. I wonder if I can explain that general principle concisely and with enough detail or perhaps I can find a Rails doc link that's relevant.

Update: Found the doc: https://guides.rubyonrails.org/active_record_migrations.html#changing-existing-migrations

This cop enforces the standard index naming provided by Rails, allowing developers to enjoy a standardized schema.

A bonus impact of following this convention is that fully duplicated indexes are prevented by default.

Custom index names were once a required workaround for default names that were too long for various database implementations. That could happen because of long table names, long field names, or long lists of compound index keys.

However, that problem has been fully resolved. The current automated naming approach will deterministically abbreviate indexes that would have been too long before.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants