Split semian operations into more cohesive translation units #101

dalehamel · 2017-02-12T15:19:46Z

Note: This is an omnibus PR, broken down into smaller, more reviewable chunks. See below.

What

semian.c has been broken up into several files, each with a more cohesive purpose.

ext
└── semian
    ├── extconf.rb
    ├── resource.c
    ├── resource.h
    ├── semian.c
    ├── semian.h
    ├── sysv_semaphores.c
    ├── sysv_semaphores.h
    ├── tickets.c
    ├── tickets.h
    └── types.h

semian.c defines the C/Ruby bridge function Init_semian, and nothing else. semian.h acts as the global header include file for other files to avoid complex inclusion logic.
resource.h declares the semian_resource method prototypes, to be used by semian.c and exported into ruby. resource.c defines these, and some additional static (private) functions to assist in this.
tickets.h declares the prototypes for ticket management, and tickets.c defines these functions. Note that these functions have been refactored to allow for a quota based allocation scheme to be more easily implemented in a following pull request.
types.h declares and defines special types needed for semian operations.
sysv_semaphores.h declares the function prototypes for operations on our sys V semaphore set, and implements any of them that aren't used just during initialization / infrequently as inline functions. The remaining functions are defined in sysv_semaphores.c. sysv_semaphores.h also defines macros for naming our semaphore set, and keeping the enum in sync with a string representation. This simplifies the process of verbosely printing which semaphore an operation may have failed on while debugging.

I have also fixed travis CI for ruby 2.3.1 by adding a line to update rubygems, resulting in bundler completing successfully.

Note that while there are quite a lot of additions / deletions, very little logic has changed. Where it has, it has been pulled out into separate functions, and the control flow should remain identical. This is evinced by the fact that CI still passes, so the behavior is the same.

Per C conventions, header files have been used almost entirely for function declarations, where the corresponding C files provide definitions for those prototypes. This rule is only broken for inline functions, as described below. Pre-processor macros have also been relegated to the header files, per C conventions.

The biggest change here is that where all functions were previously declared static, we now have 3 types of functions:

static - used for "private" functions that need not be exposed to the header file.
non-static - used for "public" functions that are declared in a header file, and implemented in a c file. These are functions intended to be used by other files.
static inline functions - these are functions implemented in header files, particularly for high-use functions that will be called many times by many different callers, and have relatively simple logic. Making them inline is an optimization choice, that should allow the compiler to treat them as efficiently as if they were a macro. This may result in larger object code, but should prevent unnecessary returns which should at least slightly reduce overhead and the number of instructions. While this likely doesn't matter much, the fact is that their scoping hasn't really changed from the previous "one big file", and this should allow us to chop off at least a few machine instructions. This is why only functions that are relatively simple that get (or could be) called a lot have been inlined.

I have also adopted the convention that function prototypes should be documented within their header file. If a function is static, it may be documented above the definition. Each header file also offers a brief explanation for its own purpose, which applies to the the corresponding C file.

The only actual logic changes here are:

the generate_key function has been modified to append the semaphore set cardinality to the key. This is for compatibility purposes as we add additional semaphores to the semaphore set.
argument checking in resource_initialize is being performed by helper functions that will check the arguments provided and cast them appropriately. These checks are the same as they were before, only now the casting is happening at the same time. This was done mostly to make this function smaller and more cohesive.

Why

Why do this enormous refactor?

I found it difficult to reason about semian in a single large source file. Though 500 lines isn't large by most standards, the lack of forward declarations forced the code to be organized in a particular way that was somewhat unintuitive.

Moreover, there were clearly a few distinct concerns that were being muddled together:

Semian resource operations (as exposed to ruby)
Semian ticket management strategies (more important as we add a second one)
Semian semaphore operations on sysV primitives

By separating these out into more cohesive translation units, I find it much easier to reason about these separate concerns independently.

As a side effect, I also chose to optimize a few simple functions as inlines. I doubt this will have a significant performance improvement, but saving a few instructions here or there was never a bad thing.

The biggest motivation to this was that I found it difficult to reason about how tickets were being configured and managed. By abstracting this and cleaning it up, I think it's become very obvious, and the implementation of the ticket quota strategy (and, potentially, addition future strategies) should be easy to drop in.

How

This omnibus PR is being broken out into smaller, more reviewable PRs

Note: Many of these PRs won't work on their own, as they depend on other aspects of the restructuring. For instance, the first 2 PRs are fine to merge independently, but subsequent PRs depend on prior ones. This is not yet an exhaustive list.

Ready for review:

Pending dependent PR merges:

Additional minor changes to bring in line with this PR.
Split semian resource functions out
Semian sysv_semaphore operations
Semian ticket management

dalehamel · 2017-02-12T15:20:44Z

.gitignore

@@ -6,3 +6,5 @@
 /html/
 Gemfile.lock
 vendor/
+*.swp


sorry, i'm a vim junky.

dalehamel · 2017-02-12T15:21:41Z

.travis.yml

@@ -3,6 +3,7 @@ language: ruby
 sudo: true

 before_install:
+  - gem update --system


This fixes CI which was failing on installing native extensions for rainbows

dalehamel · 2017-02-12T15:22:56Z

ext/semian/semian_resource_alloc.c

@@ -0,0 +1,12 @@
+#include <semian_resource_alloc.h>
+
+const rb_data_type_t


This could not be inlined because it's not a function, and defining a variable in a header file would have resulted in multiples definitions of the symbol. That is the only reason this is in its own C file.

dalehamel · 2017-02-12T15:24:57Z

ext/semian/semian_tickets.c

+}
+
+void
+configure_tickets(int sem_id, int tickets, int should_initialize)


This function has been split apart into multiple functions to simplify the implementation of additional ticket strategies (quota).

dalehamel · 2017-02-12T15:25:40Z

semian.gemspec

@@ -12,7 +12,7 @@ Gem::Specification.new do |s|
    across process boundaries with SysV semaphores.
  DOC
  s.homepage = 'https://github.com/shopify/semian'
-  s.authors = ['Scott Francis', 'Simon Eskildsen']
+  s.authors = ['Scott Francis', 'Simon Eskildsen', 'Dale Hamel']


A bit gratuitous maybe ;)

dalehamel · 2017-02-12T15:29:06Z

ext/semian/semset.c

@@ -0,0 +1,69 @@
+#include <semset.h>
+
+const char *SEMINDEX_STRING[] = {


This must be defined within a C file to prevent multiple definitions. This is why this is declared as an extern in the corresponding header file.

dalehamel · 2017-02-12T15:33:16Z

ext/semian/semset.c

+  // It is necessary for the cardinatily of the semaphore set to be part of the key
+  // or else sem_get will complain that we have requested an incorrect number of sems
+  // for the desired key, and have changed the number of semaphores for a given key
+  sprintf(semset_size_key, "_NUM_SEMS_%d", SI_NUM_SEMAPHORES);


This is a bit of a hack, adding the cardinality of the semaphore set to the key before hashing it.

This allows for an upgrade path, increasing the number of semaphores.

Note that we never actually clean up the semaphore set of the old cardinality, but we could manually if we chose to. Rebooting would also release them.

dalehamel · 2017-02-12T17:17:05Z

Note that I have another WIP branch where I have started implementing the quota logic, but to make that upcoming PR easier to review I decided to pull this refactor out to be reviewed separately

jacobbednarz · 2017-02-12T23:19:23Z

For what it's worth, this change is one that I'll probably be dragging our feet a little bit on releasing to our production environment due to a couple of things are creating some caution:

This is a big PR to review and understand the moving parts. I'm a fan of small incremental changes because if something breaks I'll have a fair idea on where the wheels started to fall off.
The changes have all been done in a single commit making it difficult for me to follow along with your thought process and potentially debugging solutions that I could hit.

Don't get me wrong, I ❤️ the intention and thought behind this and applaud the effort - I might just sit back and watch someone else test this one in production first :)

dalehamel · 2017-02-12T23:25:36Z

I had considered breaking his up even further, Perhaps building up just a couple of the files at a time. I'll resubmit this as spears PRa to facilitate the review process, but leave this larger omnibus one open to track

…

On Sun, Feb 12, 2017 at 18:19 Jacob Bednarz ***@***.***> wrote: For what it's worth, this change is one that I'll probably be dragging our feet a little bit on releasing to our production environment due to a couple of things are creating some caution: - This is a big PR to review and understand the moving parts. I'm a fan of small incremental changes because if something breaks I'll have a fair idea on where the wheels started to fall off. - The changes have all been done in a single commit making it difficult for me to follow along with your thought process and potentially debugging solutions that I could hit. Don't get me wrong, I ❤️ the intention and thought behind this and applaud the effort - I might just sit back and watch someone else test this one in production first :) — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#101 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAlwd5phk1cRP2RkSYUU7XG0EGLvcu87ks5rb5N8gaJpZM4L-g9O> .

dalehamel · 2017-02-12T20:55:48Z

ext/semian/semian_resource.c

+{
+  semian_resource_t *res = NULL;
+  TypedData_Get_Struct(self, semian_resource_t, &semian_resource_type, res);
+  if (perform_semop(res->sem_id, SI_SEM_TICKETS, 1, SEM_UNDO, NULL) == -1) {


Note that in the original C this appeared to be using the magic number 0 to refer to the first semaphore in the set, which happens to be the ticket semaphore

dalehamel · 2017-02-12T20:57:44Z

ext/semian/semian_resource.c

+  semian_resource_t *res = NULL;
+
+  TypedData_Get_Struct(self, semian_resource_t, &semian_resource_type, res);
+  if (semctl(res->sem_id, SI_NUM_SEMAPHORES, IPC_RMID) == -1) {


Note that in the original C this was also usin the index 0. The index seems to be a required arg but it doesn't care what it is.

I think that using an invalid semaphore ID is a better practice for this purpose, and this one is..

dalehamel · 2017-02-12T20:59:02Z

ext/semian/semian_resource.c

+    raise_semian_syscall_error("semget()", errno);
+  }
+
+  set_semaphore_permissions(res->sem_id, c_permissions);


Note that in the original C, this was coercing a Long to a fixed int.

dalehamel · 2017-02-12T21:03:58Z

ext/semian/semian_tickets.c

+
+  /* it's possible that we haven't actually initialized the
+     semaphore structure yet - wait a bit in that case */
+  if (get_sem_val(sem_id, SI_SEM_CONFIGURED_TICKETS) == 0) {


This should probably be abstracted into a generic spinlock inline, to poll arbitrary sems for arbitrary values

jacobbednarz · 2017-02-12T23:48:02Z

@dalehamel Legendary effort! If you break these up, I'm 👍 to canarying a bunch of this in our production environment once it's ready.

dalehamel · 2017-02-13T00:21:19Z

@jacobbednarz These should be ready for review:

Note that pretty much every one of these PRs is going to depend on the previous one (I can't think of any sane way to do this otherwise), so I can't put up additional PRs until the prior ones are reviewed and merged.

dalehamel · 2017-02-13T21:11:48Z

obsoletes #67

dalehamel · 2017-02-14T21:10:27Z

Closing this, it's going to be done over a series of smaller PRs and this is just confusing people

dalehamel requested review from sirupsen and csfrancis February 12, 2017 15:20

dalehamel commented Feb 12, 2017

View reviewed changes

.gitignore

@@ -6,3 +6,5 @@

/html/

Gemfile.lock

vendor/

*.swp

Copy link

Member Author

dalehamel Feb 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, i'm a vim junky.

dalehamel commented Feb 12, 2017

View reviewed changes

dalehamel force-pushed the multi-file-refactor branch from 5665747 to 3dd0a91 Compare February 12, 2017 15:27

dalehamel commented Feb 12, 2017

View reviewed changes

Split semian operations into more cohesive translation units

0bc6b8b

dalehamel force-pushed the multi-file-refactor branch from 3dd0a91 to 0bc6b8b Compare February 12, 2017 15:30

dalehamel commented Feb 12, 2017

View reviewed changes

dalehamel mentioned this pull request Feb 12, 2017

Refactor semian.c includes and types into header files #102

Merged

dalehamel added 3 commits February 13, 2017 13:48

Rename files

0de8a01

Cleanup

341d9cf

wip

734f626

dalehamel closed this Feb 14, 2017

epk deleted the multi-file-refactor branch June 26, 2019 01:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Split semian operations into more cohesive translation units #101

Split semian operations into more cohesive translation units #101

dalehamel commented Feb 12, 2017 •

edited

Loading

dalehamel Feb 12, 2017

dalehamel Feb 12, 2017

dalehamel Feb 12, 2017

dalehamel Feb 12, 2017

dalehamel Feb 12, 2017

dalehamel Feb 12, 2017

dalehamel Feb 12, 2017

dalehamel commented Feb 12, 2017

jacobbednarz commented Feb 12, 2017

dalehamel commented Feb 12, 2017 via email

dalehamel Feb 12, 2017

dalehamel Feb 12, 2017

dalehamel Feb 12, 2017

dalehamel Feb 12, 2017

jacobbednarz commented Feb 12, 2017

dalehamel commented Feb 13, 2017

dalehamel commented Feb 13, 2017

dalehamel commented Feb 14, 2017

		@@ -0,0 +1,12 @@
		#include <semian_resource_alloc.h>

		const rb_data_type_t

		@@ -0,0 +1,69 @@
		#include <semset.h>

		const char *SEMINDEX_STRING[] = {

Split semian operations into more cohesive translation units #101

Split semian operations into more cohesive translation units #101

Conversation

dalehamel commented Feb 12, 2017 • edited Loading

What

Why

How

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dalehamel commented Feb 12, 2017

jacobbednarz commented Feb 12, 2017

dalehamel commented Feb 12, 2017 via email

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jacobbednarz commented Feb 12, 2017

dalehamel commented Feb 13, 2017

dalehamel commented Feb 13, 2017

dalehamel commented Feb 14, 2017

dalehamel commented Feb 12, 2017 •

edited

Loading