Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split semian operations into more cohesive translation units #101

Closed
wants to merge 4 commits into from

Conversation

dalehamel
Copy link
Member

@dalehamel dalehamel commented Feb 12, 2017

Note: This is an omnibus PR, broken down into smaller, more reviewable chunks. See below.

What

semian.c has been broken up into several files, each with a more cohesive purpose.

ext
└── semian
    ├── extconf.rb
    ├── resource.c
    ├── resource.h
    ├── semian.c
    ├── semian.h
    ├── sysv_semaphores.c
    ├── sysv_semaphores.h
    ├── tickets.c
    ├── tickets.h
    └── types.h
  • semian.c defines the C/Ruby bridge function Init_semian, and nothing else. semian.h acts as the global header include file for other files to avoid complex inclusion logic.
  • resource.h declares the semian_resource method prototypes, to be used by semian.c and exported into ruby. resource.c defines these, and some additional static (private) functions to assist in this.
  • tickets.h declares the prototypes for ticket management, and tickets.c defines these functions. Note that these functions have been refactored to allow for a quota based allocation scheme to be more easily implemented in a following pull request.
  • types.h declares and defines special types needed for semian operations.
  • sysv_semaphores.h declares the function prototypes for operations on our sys V semaphore set, and implements any of them that aren't used just during initialization / infrequently as inline functions. The remaining functions are defined in sysv_semaphores.c. sysv_semaphores.h also defines macros for naming our semaphore set, and keeping the enum in sync with a string representation. This simplifies the process of verbosely printing which semaphore an operation may have failed on while debugging.

I have also fixed travis CI for ruby 2.3.1 by adding a line to update rubygems, resulting in bundler completing successfully.

Note that while there are quite a lot of additions / deletions, very little logic has changed. Where it has, it has been pulled out into separate functions, and the control flow should remain identical. This is evinced by the fact that CI still passes, so the behavior is the same.

Per C conventions, header files have been used almost entirely for function declarations, where the corresponding C files provide definitions for those prototypes. This rule is only broken for inline functions, as described below. Pre-processor macros have also been relegated to the header files, per C conventions.

The biggest change here is that where all functions were previously declared static, we now have 3 types of functions:

  • static - used for "private" functions that need not be exposed to the header file.
  • non-static - used for "public" functions that are declared in a header file, and implemented in a c file. These are functions intended to be used by other files.
  • static inline functions - these are functions implemented in header files, particularly for high-use functions that will be called many times by many different callers, and have relatively simple logic. Making them inline is an optimization choice, that should allow the compiler to treat them as efficiently as if they were a macro. This may result in larger object code, but should prevent unnecessary returns which should at least slightly reduce overhead and the number of instructions. While this likely doesn't matter much, the fact is that their scoping hasn't really changed from the previous "one big file", and this should allow us to chop off at least a few machine instructions. This is why only functions that are relatively simple that get (or could be) called a lot have been inlined.

I have also adopted the convention that function prototypes should be documented within their header file. If a function is static, it may be documented above the definition. Each header file also offers a brief explanation for its own purpose, which applies to the the corresponding C file.

The only actual logic changes here are:

  • the generate_key function has been modified to append the semaphore set cardinality to the key. This is for compatibility purposes as we add additional semaphores to the semaphore set.
  • argument checking in resource_initialize is being performed by helper functions that will check the arguments provided and cast them appropriately. These checks are the same as they were before, only now the casting is happening at the same time. This was done mostly to make this function smaller and more cohesive.

Why

Why do this enormous refactor?

I found it difficult to reason about semian in a single large source file. Though 500 lines isn't large by most standards, the lack of forward declarations forced the code to be organized in a particular way that was somewhat unintuitive.

Moreover, there were clearly a few distinct concerns that were being muddled together:

  • Semian resource operations (as exposed to ruby)
  • Semian ticket management strategies (more important as we add a second one)
  • Semian semaphore operations on sysV primitives

By separating these out into more cohesive translation units, I find it much easier to reason about these separate concerns independently.

As a side effect, I also chose to optimize a few simple functions as inlines. I doubt this will have a significant performance improvement, but saving a few instructions here or there was never a bad thing.

The biggest motivation to this was that I found it difficult to reason about how tickets were being configured and managed. By abstracting this and cleaning it up, I think it's become very obvious, and the implementation of the ticket quota strategy (and, potentially, addition future strategies) should be easy to drop in.

How

This omnibus PR is being broken out into smaller, more reviewable PRs

Note: Many of these PRs won't work on their own, as they depend on other aspects of the restructuring. For instance, the first 2 PRs are fine to merge independently, but subsequent PRs depend on prior ones. This is not yet an exhaustive list.

Ready for review:

Pending dependent PR merges:

@@ -6,3 +6,5 @@
/html/
Gemfile.lock
vendor/
*.swp
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, i'm a vim junky.

@@ -3,6 +3,7 @@ language: ruby
sudo: true

before_install:
- gem update --system
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fixes CI which was failing on installing native extensions for rainbows

@@ -0,0 +1,12 @@
#include <semian_resource_alloc.h>

const rb_data_type_t
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could not be inlined because it's not a function, and defining a variable in a header file would have resulted in multiples definitions of the symbol. That is the only reason this is in its own C file.

}

void
configure_tickets(int sem_id, int tickets, int should_initialize)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function has been split apart into multiple functions to simplify the implementation of additional ticket strategies (quota).

semian.gemspec Outdated
@@ -12,7 +12,7 @@ Gem::Specification.new do |s|
across process boundaries with SysV semaphores.
DOC
s.homepage = 'https://github.com/shopify/semian'
s.authors = ['Scott Francis', 'Simon Eskildsen']
s.authors = ['Scott Francis', 'Simon Eskildsen', 'Dale Hamel']
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit gratuitous maybe ;)

@@ -0,0 +1,69 @@
#include <semset.h>

const char *SEMINDEX_STRING[] = {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This must be defined within a C file to prevent multiple definitions. This is why this is declared as an extern in the corresponding header file.

// It is necessary for the cardinatily of the semaphore set to be part of the key
// or else sem_get will complain that we have requested an incorrect number of sems
// for the desired key, and have changed the number of semaphores for a given key
sprintf(semset_size_key, "_NUM_SEMS_%d", SI_NUM_SEMAPHORES);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit of a hack, adding the cardinality of the semaphore set to the key before hashing it.

This allows for an upgrade path, increasing the number of semaphores.

Note that we never actually clean up the semaphore set of the old cardinality, but we could manually if we chose to. Rebooting would also release them.

@dalehamel
Copy link
Member Author

Note that I have another WIP branch where I have started implementing the quota logic, but to make that upcoming PR easier to review I decided to pull this refactor out to be reviewed separately

@jacobbednarz
Copy link

For what it's worth, this change is one that I'll probably be dragging our feet a little bit on releasing to our production environment due to a couple of things are creating some caution:

  • This is a big PR to review and understand the moving parts. I'm a fan of small incremental changes because if something breaks I'll have a fair idea on where the wheels started to fall off.
  • The changes have all been done in a single commit making it difficult for me to follow along with your thought process and potentially debugging solutions that I could hit.

Don't get me wrong, I ❤️ the intention and thought behind this and applaud the effort - I might just sit back and watch someone else test this one in production first :)

@dalehamel
Copy link
Member Author

dalehamel commented Feb 12, 2017 via email

{
semian_resource_t *res = NULL;
TypedData_Get_Struct(self, semian_resource_t, &semian_resource_type, res);
if (perform_semop(res->sem_id, SI_SEM_TICKETS, 1, SEM_UNDO, NULL) == -1) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that in the original C this appeared to be using the magic number 0 to refer to the first semaphore in the set, which happens to be the ticket semaphore

semian_resource_t *res = NULL;

TypedData_Get_Struct(self, semian_resource_t, &semian_resource_type, res);
if (semctl(res->sem_id, SI_NUM_SEMAPHORES, IPC_RMID) == -1) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that in the original C this was also usin the index 0. The index seems to be a required arg but it doesn't care what it is.

I think that using an invalid semaphore ID is a better practice for this purpose, and this one is..

raise_semian_syscall_error("semget()", errno);
}

set_semaphore_permissions(res->sem_id, c_permissions);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that in the original C, this was coercing a Long to a fixed int.


/* it's possible that we haven't actually initialized the
semaphore structure yet - wait a bit in that case */
if (get_sem_val(sem_id, SI_SEM_CONFIGURED_TICKETS) == 0) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be abstracted into a generic spinlock inline, to poll arbitrary sems for arbitrary values

@jacobbednarz
Copy link

@dalehamel Legendary effort! If you break these up, I'm 👍 to canarying a bunch of this in our production environment once it's ready.

@dalehamel
Copy link
Member Author

@jacobbednarz These should be ready for review:

Note that pretty much every one of these PRs is going to depend on the previous one (I can't think of any sane way to do this otherwise), so I can't put up additional PRs until the prior ones are reviewed and merged.

@dalehamel
Copy link
Member Author

obsoletes #67

@dalehamel
Copy link
Member Author

Closing this, it's going to be done over a series of smaller PRs and this is just confusing people

@dalehamel dalehamel closed this Feb 14, 2017
@epk epk deleted the multi-file-refactor branch June 26, 2019 01:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants