Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add uniq on datacenters in #host_to_folder #17422

Conversation

NickLaMuro
Copy link
Member

@NickLaMuro NickLaMuro commented May 14, 2018

The more stripped down version of this method, prior to the change, is as follows:

def host_to_folder(src)
  sources = src[:host].nil? ? allowed_hosts_obj : [src[:host]]
  datacenters = sources.collect { |h|
    find_datacenter_for_ci(h)
  end.compact
  datacenters.each_with_object({}) do |dc, folders|
    folders.merge!(get_ems_folders(dc))
  end
end

When setting the datacenters local variable in that method, the find_datacenter_for_ci method basically looks up the datacenter for the matching Host object in the xml tree that has most likely already been generated. In most/all cases, though, their will be more than one host per datacenter, and that wasn't taken into account for this method since the next block blindly takes the datacenters it found previously and merges the folder structure it gets from get_ems_folders for each datacenter.

By adding a .uniq prior to the assignment of datacenters, this saves a significant number of unnecessary CPU cycles and object allocations to fetch the same data and apply it to the folders variable, without changing the end result.

Note: Ideally we would want a better algorithm for finding the datacenters for a collection of hosts, one that allowed for ejecting early from find_datacenter_for_ci when we knew an existing datacenter for a given host had already been found. But the time taken to get the datacenters from the hosts is far less in CPU time than the time taken to call #get_ems_folders per datacenter object and merge the hash with the existing results.

Benchmarks

The benchmark script used here basically runs the following:

ManageIQ::Providers::Vmware::InfraManager::ProvisionWorkflow.new.init_from_dialog

And is targeting a fairly large EMS, with about 600 hosts.

ms queries rows
before 15023 1961 48395
after 6068 1961 48395
Before                                                     After
------                                                     -----

Total allocated: 2069091751 bytes (23226536 objects)   |   Total allocated: 376117422 bytes (4221185 objects)
                                                       |   
allocated objects by gem                               |   allocated objects by gem
-----------------------------------                    |   -----------------------------------
   9665227  pending                                    |      2512007  activerecord-5.0.7
   5561668  manageiq/app  <<<<<<<<<<                   |       418737  ancestry-2.2.2
   3513258  manageiq/lib  <<<<<<<<<<                   |       278793  activemodel-5.0.7
   2512007  activerecord-5.0.7  <<<<<<<<<<             |       207491  manageiq/app
    861270  activesupport-5.0.7  <<<<<<<<<<            |       190717  pending
    418737  ancestry-2.2.2                             |       178419  ruby-2.3.3/lib
    278793  activemodel-5.0.7  <<<<<<<<<<              |       165577  arel-7.1.4
    178419  ruby-2.3.3/lib  <<<<<<<<<<                 |       165168  activesupport-5.0.7
    165577  arel-7.1.4                                 |        52875  manageiq-providers-vmware-e2e2b8029538
     52875  manageiq-providers-vmware-0be2f13a0dc9     |        32696  manageiq/lib
     14424  fast_gettext-1.2.0                         |        14424  fast_gettext-1.2.0
       ...                                             |          ...

Note: The benchmarks for this change do NOT include the changes from other PRs in the links below. Benchmarks of all changes can be found here.

Links

The more stripped down version of this method, prior to the change, is
as follows:

    def host_to_folder(src)
      sources = src[:host].nil? ? allowed_hosts_obj : [src[:host]]
      datacenters = sources.collect { |h|
        find_datacenter_for_ci(h)
      end.compact
      datacenters.each_with_object({}) do |dc, folders|
        folders.merge!(get_ems_folders(dc))
      end
    end

The `find_datacenter_for_ci` method basically looks up the datacenter
for the matching host object in the xml tree that has been generated.
In most (all) cases though, their will be more than one host per
datacenter, and that wasn't taken into account for this method since the
next block blindly takes those datacenters and merges the folder
structure it gets from `get_ems_folders` for each datacenter.

By adding a `.uniq` prior to the assignment of datacenters, this saves a
significant number of unnecessary CPU cycles and object allocations to
fetch the same data and apply it to the `folders` variable, without
changing the end result.

Note:  Ideally we would want a better algorithm for finding the
datacenters for a collection of hosts, one that allowed for ejecting
early from find_datacenter_for_ci when we knew an existing datacenter
for a given host had already been found.  But the time taken to get the
datacenters from the hosts is far less in CPU time than the time taken
to call `#get_ems_folders` per datacenter object and merge the hash with
the existing results.
@miq-bot
Copy link
Member

miq-bot commented May 14, 2018

Checked commit NickLaMuro@1bc8c3f with ruby 2.3.3, rubocop 0.52.1, haml-lint 0.20.0, and yamllint 1.10.0
1 file checked, 0 offenses detected
Everything looks fine. ⭐

Copy link
Contributor

@syncrou syncrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NickLaMuro - LGTM

@kbrock kbrock self-assigned this May 18, 2018
@kbrock kbrock added this to the Sprint 86 Ending May 21, 2018 milestone May 18, 2018
@kbrock kbrock merged commit 9b33e0b into ManageIQ:master May 18, 2018
@NickLaMuro
Copy link
Member Author

@miq-bot add_label fine/yes, gaprindashvili/yes

@NickLaMuro
Copy link
Member Author

@miq-bot add_label performance, provisioning

simaishi pushed a commit that referenced this pull request Jun 21, 2018
…ders_for_datacenters

Add uniq on datacenters in #host_to_folder
(cherry picked from commit 9b33e0b)

https://bugzilla.redhat.com/show_bug.cgi?id=1593798
@simaishi
Copy link
Contributor

Fine backport details:

$ git log -1
commit 347a98c00148a05acfeabf480285d49cddb68f26
Author: Keenan Brock <[email protected]>
Date:   Fri May 18 12:12:14 2018 -0400

    Merge pull request #17422 from NickLaMuro/avoid_duplicate_get_ems_folders_for_datacenters
    
    Add uniq on datacenters in #host_to_folder
    (cherry picked from commit 9b33e0b3919266f70096b3ccacfab0657c1839ab)
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1593798

simaishi pushed a commit that referenced this pull request Jun 21, 2018
…ders_for_datacenters

Add uniq on datacenters in #host_to_folder
(cherry picked from commit 9b33e0b)

https://bugzilla.redhat.com/show_bug.cgi?id=1593797
@simaishi
Copy link
Contributor

Gaprindashvili backport details:

$ git log -1
commit 38b974b124cd1bfdb106c5eb88b384a8f66ae21d
Author: Keenan Brock <[email protected]>
Date:   Fri May 18 12:12:14 2018 -0400

    Merge pull request #17422 from NickLaMuro/avoid_duplicate_get_ems_folders_for_datacenters
    
    Add uniq on datacenters in #host_to_folder
    (cherry picked from commit 9b33e0b3919266f70096b3ccacfab0657c1839ab)
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1593797

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants