Skip to content
Andrew Geweke edited this page Nov 2, 2013 · 5 revisions

Migrating tables under low_card_tables works almost identically to without low_card_tables. (Obviously, the table structure is different, but migrations themselves change almost not at all.)

Key Hint: Keep any VARCHAR fields (or other such fields) in your low-card table limited to relatively short lengths — e.g., :limit => 20 or whatever. Many databases impose a restriction on the maximum width of an index key (in MySQL InnoDB, it's 767 bytes); if you simply say t.string :foo in your migrations, this creates a VARCHAR(255), and it only takes three of these to exceed that limit. Generally speaking, this isn't much of an issue, since you're storing only strings with programmatic meaning in a low-card table, anyway.

There are a couple of key exceptions:

  • Low-card tables require a unique index across all attribute columns (and in fact the low_card_tables gem tests for this index, and will blow up and refuse to run if it's not present), and so there's support for automatically creating and maintaining this index.
  • Removing columns from a low-card table is slightly more complex, exactly because of the requirement that there always be a unique index and thus exactly one row in the low-card table for each unique combination of attribute values. There's automatic support for this, too.

The Unique Index

As mentioned previously, there must always be a unique index on all columns in a low-card table. Fortunately, low_card_tables will create and maintain this index for you automatically — all it has to do is know that a table is a low-card table. It can do this in one of two ways:

  • Explicitly: if you add :low_card => true to the options Hash of migration statements like create_table, add_column, remove_column, or change_table, it will know it's a low-card table.
  • Implicitly: if there is a model defined for a given table, and it declares is_low_card_table, then low_card_tables knows that it's a low-card table.

(Hint: it never hurts to be too explicit. ;)

Either way, when columns are changed on the table in question, low_card_tables will first drop the unique index, then run the migration code, then recreate the unique index automatically. low_card_tables always names the index something like index_user_statuses_lc_on_all; it will truncate the embedded table name as necessary to maintain an index name that's not too long for your database system. However, the index can be called anything at all — the code simply verifies that there is a unique index over all columns in the low-card table, not that it is named anything in particular.

(See also Options — the unique index will not cover, and is not required to cover, columns like created_at or updated_at that are not conceptually attributes on the low-card table.)

Explicit Index Support

If you need to perform several migration operations at once, or run any code at all, and want the unique index dropped and created only once, simply do the following in a migration:

change_low_card_table(:user_statuses) do
  ...any code at all...
end

The index will be dropped before executing the block and added after it, and automatic dropping/adding of the index disabled for the duration of the block.

Adding Columns

Adding columns to a live system is always slightly tricky, but it is no more tricky using low_card_tables than it would be without it:

  • When you add a new column, currently-running production code will not know about it. Thus, you must always add new columns as nullable (no :null => false in your migration) or with a default (:default => ...) — otherwise, any production code that tries to insert into the table will fail, since it won't know about the new column or what it should put there.
  • It is perfectly safe to add a nullable new column to a low-card table and then run an UPDATE statement (e.g., UserStatus.update_all(...)) to fill it in — low_card_tables will pick up the new values the next time it flushes its cache. (See Caching for more information.)
  • If you need to add a new column whose values may be different for different referring rows — for example, imagine you decide to add a key_customer column that's computed based on that customer's history — then you simply have to do the same thing you would without low_card_tables. Add the column and either make it nullable or give it a default, and then bulk-update referring rows as necessary. See Bulk Operations for more details about how to update in bulk efficiently.

Removing Columns

Removing columns from a low-card table is interesting exactly because of the unique-index requirement: when you drop a column, you will almost certainly end up with duplicate rows, because there were previously rows that differed only in the value present in the now-removed column. The right thing to do is to collapse these rows into one (pick a winner and delete the other rows)...and then go look at every column in every table that refers to the low-card table, find all values that point to 'loser' rows, and update them to point to the remaining 'winner' rows. Fortunately, this happens automatically for you — you can just use remove_column and forget about it.

So why are we writing this?

Because sometimes updating referring tables is a big task, as there might be millions (or billions) of rows, and you might want some control over this task. As such, there's additional support in low_card_tables for controlling how you want to handle this:

remove_column :foo, :low_card_collapse_rows => false

If you do this, low_card_tables will not delete any rows or update any referring tables; you will be left with duplicate rows in your low-card table, and no unique index. (Note that this will prevent low_card_tables from working until you fix it.) You're on your own to do this yourself, in whatever way you think is best; see API for methods that are useful, including low_card_collapse_rows_and_update_referrers! on the low-card model class.

remove_column :foo, :low_card_update_referring_models => false

If you do this, low_card_tables will delete duplicate rows, picking a winner, but will not update any referring models. This means you're on your own to update those models properly — or else, you'll have dangling foreign keys. This is very dangerous, because you'll have no way of knowing what the dangling foreign keys used to point to; use it only in exceptional circumstances. Usually, you're better off using :low_card_update_referring_models => false, then collapsing the rows with an explicit call to low_card_collapse_rows_and_update_referrers!(:low_card_update_referring_models => false), which will return you a Hash mapping winners to an Array of losers; you can then update data properly.

remove_column :foo, :low_card_referrers => [ User, Admin ]

This actually simply forces low_card_tables to look at User and Admin when it looks for referring models that it needs to update, in addition to the models it picks up automatically. This can be used in case there are models it doesn't automatically pick up (which, theoretically, should never happen).

Transactionality

All updates in question, including the remove_column itself, are automatically wrapped in a transaction that runs across the low-card model and any referring models. However, most databases don't support transactional DDL; as a result, running code will 'see' an inconsistent state, where the column has been removed but rows haven't been updated, and this can be dangerous.

Fixing this is relatively straightforward:

  1. First, modify your low-card model to exclude the column you're going to remove: is_low_card_table :exclude_column_names => [ :foo ].
  2. Run an empty migration over that model — change_low_card_table :user_statuses { }. This will cause the migration code to remove the unique index and then re-create it without the column in question.
  3. Deploy this code with the excluded column to production. Your entire system will now completely ignore the missing column.
  4. Use remove_column to remove the column.

The Collapsing Update Scheme

There are different schemes for updating referring models. By default, the code will update the referring models in chunks of 10,000 rows at a time, but wrapped in a transaction so that it appears to all happen at once to running code. However, this might not be what you want. So, on a referring model:

class User < ActiveRecord::Base
  has_low_card_table :status
  low_card_value_collapsing_update_scheme lambda { |map| ... }
end

Now, instead of any automatic updating, the lambda specified will get called with a Hash mapping model objects that are remaining in the table to the corresponding model objects that have been deleted; you can use this to update rows however you want.

class User < ActiveRecord::Base
  has_low_card_table :status
  low_card_value_collapsing_update_scheme :none
end

You're on your own — no automatic updating will happen at all.

class User < ActiveRecord::Base
  has_low_card_table :status
  low_card_value_collapsing_update_scheme 500
end

low_card_tables will update this table in chunks of 500 rows, instead of 10,000.

Clone this wiki locally