Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/course demo lifespan #120

Open
wants to merge 7 commits into
base: master
Choose a base branch
from
Open

Conversation

tbbrown
Copy link
Contributor

@tbbrown tbbrown commented Jun 19, 2019

Changes to Examples/Scripts/PreFreeSurferPipelineBatch.sh to support processing LifeSpan data while leaving HCP-YA data as the default.

@tbbrown tbbrown requested review from glasserm and mharms June 19, 2019 22:35
@glasserm
Copy link
Contributor

I guess the batch script is a lot longer now. I guess I'm not sure that having 4 different ways of specifying the same argument is a good idea. Also, the stuff that the user modifies is all over the script. I prefer for all the user modifiable variables to be in one place.

@coalsont
Copy link
Member

I agree that user-editable stuff would be best to keep in as few sections as possible, and closer to the top of the script is better (but not if it requires additional code to be able to, say, move something outside a loop when it uses the loop variable).

There is a better way to take multiple flags to do the same thing:

--subject=* | --Session=* | --session=*)

On a side note, I have been working on a new set of functions to make this kind of argument parsing (and associated help info) simpler (but that shouldn't hold up pull requests using the existing conventions).

@mharms
Copy link
Contributor

mharms commented Jun 20, 2019

Based on a quick look, this does appear to have changed more than I was anticipating. It looks like the batch script is itself being turned into another script that would be called with an extensive set of arguments, rather than users just directly editing parameters in the script itself. That wasn't necessarily the direction that I was suggesting we go. @tbbrown Can you give us a little more context as to why you went this route?

@glasserm
Copy link
Contributor

In regards to Tim C.'s functions, this would be very useful if it could simplify the I/O of these scripts. I've found the Tim B. style of I/O quite cumbersome to write and so I just don't use it for new code. If a lot of that could be in external functions that might make things much easier to use.

@coalsont
Copy link
Member

As a preview, here is some example code that uses the new parameter functions I wrote (the example will be at the top of the new shlib file):

    opts_AddOptional '--foo' 'myfoo' 'my foo' "give me a value, and i'll store it in myfoo" 'defaultfoo' '--oldoptionname' '--evenoldername'
    opts_AddMandatory '--bar' 'mybar' 'your bar' "a required value, and this description is really long, but the opts_ShowArguments function automatically splits at spaces, or hyphenates if there aren't enough spaces"
    opts_ParseArguments "$@"
    
    #display the parsed/default values
    opts_ShowValues
    
    #processing code goes here

The usage function manually prints a few lines explaining the overall purpose of the script, then calls opts_ShowArguments, which is built from that extra info in each opts_Add* line.

I have put the functions through a little testing, and wrote them with mac's ancient bash in mind (requiring a creative hack to avoid readarray and an obscure property of printf -v).

@tbbrown
Copy link
Contributor Author

tbbrown commented Jun 20, 2019

  • The goal was for every single command line option is optional. None of them actually have to be specified. If none are specified, then this script is backward compatible with the previous version of the script (runs things in HCP-YA style, tries to submit jobs instead of running them locally, etc.)
  • Commonly, the only command line options that would be specified are --runlocal and perhaps --lifespan.
  • The location in the code for the user to edit things is right at the top after the header comments where the DEFAULT_STUDY_FOLDER, DEFAULT_SESSION_LIST, and DEFAULT_ENVIRONMENT_SCRIPT are set. After editing those, a user could invoke this script with no options and it would work just as the previous version did. The user can add --runlocal and/or --lifespan.
  • Changing to the "or" ( | ) notation for making several options do the exact same thing is a good idea.
  • Using a set of functions that Tim C creates for specifying options would be great, but I have to have this is working shape for the first practical in the course soon (as in last week soon). I have to write the materials that correspond to this last week.
  • When specifying what "subjects" to run in the HCP-YA context, it really is specification of a subject ID since that is the way the Study Folder is organized (with subject ID sub-directories). When specifying what "sessions" to run in the LifeSpan context, it really is a specification of a Session ID since that is the way the Study Folder is organized (with session ID sub-directories). Thus having allowing for --session= and --subject= to mean the same thing to this script.
  • I can take out duplicate options like --StudyFolder=, --study=, and --working-dir=, and support of things like --session= and --Session=, but those were just intended to be a convenience for the user. They are trivial to add (especially if I adopt Tim C's use of the "or" ( | ) notation in the case statement. But if supporting this is generally disliked, I'll remove it.
  • I started just trying to have a simple variable in the script that indicated HCP-YA style vs LifeSpan style. Then it started devolving into using that variable over and over in numerous "if" statements for setting the variables that end up being used in the actual PreFreeSurferPipeline.sh invocation command. So if that variable is set to indicate lifespan, then set T1wInputImages this way, otherwise set T1wInputImages this other way. Then if it is set to indicate lifespan, then set T1wSampleSpacing this way, otherwise set T1wSampleSpacing some other way. And so on. It seemed that it would be easier to just have one check of that variable (${LifeSpanStyle}) and then set all the variables for running LifeSpan or HCP-YA in 2 large blocks of code.

…lt global variables for run_local and lifespan style options. Added some comments.
@tbbrown
Copy link
Contributor Author

tbbrown commented Jun 20, 2019

I've push changes to:

  • Add a DEFAULT_RUN_LOCAL variable
  • Add a DEFAULT_LIFESPAN_STYLE variable
  • Add comments to more clearly note where changes would be made by a user wishing to invoke this script without any command line options
  • Combine equivalent cases in the case statement in the get_options function
  • Added comments trying to clarify that one large block of code is for setting parameter values and running processing on LifeSpan style data and another large block of code is for setting parameter values and running processing on HCP-YA style data.

@mharms
Copy link
Contributor

mharms commented Jun 20, 2019

I'm afraid that this has turned into something much more complicated than I was envisioning when I proposed that we do something along the lines of just the following:

## Comments describing variable
varA=XXXXXXXX   # For HCP-YA data
#varA=XXXXXXXX  # For HCP-D/A data

What @tbbrown has drafted is sophisticated, but importantly, if we go this route, we are fundamentally changing the role/purpose of the Example scripts. Presently, the Example scripts are intended to serve as a didactic introduction to each sub-pipeline, in which users were expected to (1) review and engage with the Example script, and (2) frequently need to make changes appropriate for the parameters used in their particular study, guided by a fairly extensive set of comments explaining those parameters.

Naturally, it makes sense for us to preset the values to something appropriate for HCP-YA or HCP-D/A. But, for me at least, it is dangerous to switch to a mode in which users are simply launching the Example scripts with or without a --lifespan flag, and a whole host of parameters get set automatically in a black box fashion (to the user), without any review by the user.

While this may be convenient for users for studies that literally acquired their data with the exact HCP-YA or HCP-D/A protocols, or are trying to process (by themselves) actual HCP-YA or HCP-D/A data, it does not well support users that are trying to use the Example scripts to process their own data, collected with their own modified protocols (possibly collected on GE or Philips scanners).

For users in that latter situation, the splitting of the default parameter definitions into completely separate LifeSpanStyle and HCP-YA style sections is going to be confusing (e.g., Which section am I in? Which set of parameters should I change?)

And even for users for whom executing the script in simply LifeSpanStyle or HCP-YA mode would be correct and appropriate, the increased complexity of the script will I think diminish its ability to serve in a didactic role.

Also, there is the practical matter that extending this type of Example script organization to all the other sub-pipelines is going to entail a lot more work than I was envisioning.

So, initially at least, I'm not a proponent of this particular implementation. That said, perhaps I'm missing some complexities that come to light if you try to implement my "simple" approach proposed above, which will inevitably drive the solution to this sort of more complicated implementation?

@coalsont
Copy link
Member

It doesn't seem like a good idea to have two copies/versions of the error checking and job launching code, so if those can be moved outside of the conditional, that would be good.

There are indeed a lot of parameters, and a lot of them change from YA to lifespan, so it would be painful to uncomment a large number of lines (and a user might miss one or two, causing weird hard-to-locate issues). However, I agree that having two sections that look inviting to edit is probably confusing. Maybe the YA should be the defaults, and documented for editing (and not inside a conditional), and --lifespan can just trigger a conditional that overrides the defaults, with no per-variable comments, and maybe only a comment like "this is the setup for lifespan processing, please edit the defaults above if you are using your own sequences".

@mharms
Copy link
Contributor

mharms commented Jun 20, 2019

While, in general, I'm not a fan of duplicating content into separate scripts (because then edits to one in the future have to propagated to both), perhaps this is a situation where we'd be better off starting a separate "Lifespan" directory, and putting lifespan specific example scripts into that? (Similar to how we have an Examples/Scripts/7T directory...)

@coalsont
Copy link
Member

Separate example launch scripts with settings for YA and lifespan does seem like a possibility worth exploring, given the large number of differences in parameters. The user is expected to copy one or the other and then edit it, so as long as we can get them past the confusion of "which file should I copy", having only one section with defaults to edit in the chosen file should reduce any later confusion.

@mharms
Copy link
Contributor

mharms commented Jun 20, 2019

And we could duplicate all the variable related comments in both scripts, to make it easier to keep the scripts otherwise aligned, except where absolutely necessary to add info specific to only HCP-YA or Lifespan. Along those lines, I'm not sure if referring to the sidecar json files with parameter values is appropriate for the Example scripts.

@tbbrown
Copy link
Contributor Author

tbbrown commented Jun 20, 2019

My first cut at this was indeed to create a separate PreFreeSurferPipelineBatch.sh that was specific for LifeSpan data. I was either going to keep it in the same directory and name it something like LSPreFreeSurferPipelineBatch.sh or put it in a new sub-directory. I actually started writing that file. Then I rejected that idea because I figured there would be a lot of objections to maintaining 2 different versions of each of the example batch scripts (one for HCP-YA and one for LifeSpan). Thus the combination of functionalities in one version of the file.

The requirement to have something that could successfully be run with a relatively minimal set of changes during the course (and also potentially for users beyond the course) made me reject the idea of just having every single parameter variable set with a bit of code that looks like:

<variable>=<HCP-YA appropriate value> # comment out this line for LifeSpan style
#<variable>=<LifeSpan appropriate value> # uncomment this line for LifeSpan style

As @coalsont points out, it would be painful to have all participants have to change every one of these variable settings to get things working. Particularly for an exercise during the course. Additionally, not all of these variables are just set to simple values. Look at how the T1wInputImages value is determined in the HCP-YA case for example. It is not just a simple assignment of a value. There's a little code to execute to figure out how many T1w's there are and then build an appropriate string. Similarly, look at how SpinEchoPhaseEncodePositive is set in the LifeSpan case. There's a little bit of code to execute there too to first get a list of the positive SpinEchoFieldMaps available and then pick the first one.

Matt's requirement is that these batch scripts behave, by default, in a way that is HCP-YA compatible. So all the parameter values must be set, by default, to the values that are appropriate for HCP-YA. That means that having a user modify the script to run LifeSpan data requires that they modify the settings of quite a lot of variable values. That's painful and error prone.

As for pulling the job launching code out of the conditional, I assume you mean the code that builds the actual PreFreeSurferCmd using the various variables set above. It certainly makes sense to do that, but it isn't quite as simple as just pulling the building of the command out of the conditional because the two versions of the command use different command line options for invoking the PreFreeSurferPipeline.sh script. Notice that the HCP-YA version of the command specifies --fmapmag=, fmapphase=, and fmapgeneralelectric= values that are not used at all in the LifeSpan version of the command. These may be the only differences and we could probably specify NONE values for them in the LifeSpan case, but in the interest of time, I decided to just keep the command building and command invocation inside the conditional. Notice that I did pull the setting of PRINTCOM, QUEUE, and queuing_command up out of the conditional (and also all the way up out of the loop) because those value are not set differently for LifeSpan versus HCP-YA and are not set differently based on the iteration through the loop.

@mharms, I think referring to the JSON sidecar files in these example batch scripts where parameters are set is exactly what should happen. My thought here is that when we set the T1wSampleSpacing parameter value in the LifeSpan block to "0.000002100", the first questions that I would think would come to a user's mind when seeing that is, "So how in the world do you know what that value should be set to? Where did you come up with "0.000002100"?

The whole point of the JSON sidecar files is that they are where such "meta data" about the image files are stored. So pointing out that you get these values from the sidecar files and even giving the user some information about what sidecar file to look in and what named values to look for in the sidecar file is a real value to the users. Telling the user that the value set for the SEEchoSpacing parameter value comes from the sidecar file that corresponds to the SpinEchoFieldMap and not the sidecar file that corresponds to the actual T1w image, and that the required value is in the "EffectiveEchoSpacing" named value in said sidecar file seems to me to be a very helpful thing.

If the consensus is to leave the original PreFreeSurferPipelineBatch.sh alone and create a new PreFreeSurferPipelineBatch.sh that is specific to LifeSpan, then I'm fine with that. I assume based on comments so far that putting that LifeSpan specific version in a new sub-directory is preferred over giving it a different name in the same directory. Is that true?

As @coalsont points out this leads only to the issue of helping users answer the "which file should I copy?" question and removes the potentially more confusing issue of "These 2 blocks of code look very much the same, and I can't tell which one I should be changing."

Is the plan to create a separate example batch script for LifeSpan acceptable to everyone?

@coalsont
Copy link
Member

I can live with a separate example file for lifespan when there are a large number of parameters to change (and in the interest of time). I don't know if that generalizes to duplicating any other example scripts (and when there is more time to think about it).

As for building the command to run, yes, that is what I meant, but you can instead do a "case" in the middle of building the string to execute, and for TOPUP add one set of arguments, and for FIELDMAP a different set. This means users wouldn't potentially have to edit the code building the command to run, only the settings, and the command building is generic enough to adapt to any sane set of settings.

…LifeSpan/PreFreeSurferPipelineBatch.LifeSpan.sh
@coalsont
Copy link
Member

Also, I think it may be beneficial to reference JSON sidecars, especially if that is the definitive source of those values for some datasets. I don't follow why @mharms would want that removed.

At some point, some effort should probably be put into having both the YA numbers and the potential JSON sidecar source of each applicable setting in the comments documenting them (it seems like the two versions of the example script deserve to be identical except for the settings). But, this is probably a topic for some other time.

@tbbrown
Copy link
Contributor Author

tbbrown commented Jun 20, 2019

I've reverted the PreFreeSurferPipelineBatch.sh file back to exactly how it was before I started this and created an Examples/Scripts/LifeSpan/PreFreeSurferPipelineBatch.LifeSpan.sh file. The PreFreeSurferPipelineBatch.LifeSpan.sh file is simplified some since only one version of each of the possible command line options is supported. Of course there is now no need for conditional based on whether one is running LifeSpan or HCP-YA data. So there is now no need to add any conditional inclusion of parameters to the command that invokes the PreFreeSurferPipeline.sh script.

I am about to push this code over to the course machine so I can test it there in the context of the first practical session.

@coalsont
Copy link
Member

As for "--Session" and "--session", etc, I am not a fan of supporting mixed capitalization of option flags (I would go all lower case). Other than compatibility concerns, I think one spelling/capitalization of each different word choice of the option is probably the right balance (so "--subject" and "--session" are okay, but adding "--Session" or "--session-id" looks too silly to me). Make the user always use lowercase, and they will stop making that mistake soon enough - support capitals in some scripts, and they will keep trying capitals in others.

@tbbrown
Copy link
Contributor Author

tbbrown commented Jun 21, 2019

The change to just supporting the lowercase version (and only one version) of the command line options has been made in the PreFreeSurferPipelineBatch.LifeSpan.sh version of the file.

@glasserm
Copy link
Contributor

I make a new copy of the ".bat" or "batch" script for each study and change the variables to be appropriate for that study. These scripts are indeed only supposed to be examples for the users to copy and modify appropriately based on the documentation contained within them. It would be nice to read parameters out of a json side car for some of the more esoteric arguments as an alternative to having to look them up and specify them.

@mharms
Copy link
Contributor

mharms commented Jun 21, 2019

Regarding the jsons: I'm ok with us referring to them in some fashion, but we need to keep in mind that the generation of the sidecar json's (from dcm2niix) is not part of the HCPpipelines per se. So, while they are part of our specific Lifespan "unproc" packages, we can't refer to them in the comments as if their existence is a given. Someone converting their DICOMs to NIFTI via a different mechanism, or converting GE or Philips data (for which all the same json fields will not necessarily be available) may have no clue what we are talking about regarding "jsons" if they aren't explained in some manner. But rather than needing to include this explanation in every single example batch script going forward, I'd suggest that we create a README with the details.

Also, I'd very much like to keep all the comments regarding the parameters in sync between the two versions. That means that the HCP-YA version should incorporate the same comments regarding where to find the values in the sidecar jsons if they exist (they don't exist in our HCP-YA "unproc" packages, but if we re-converted that data to NIFTI with a modern version of dcm2niix they would). And the Lifespan version should continue to include any comments about where to find to the values in Siemens (or GE/Philips) DICOMs, for the situation where the sidecar json doesn't exist. i.e., the difference between the two sets of example scripts will literally boil down to different default settings, and perhaps some slightly different code for selecting/prepping certain values.

Also, I very much support @coalsont's point that in terms of building the command to run, it needs to be flexible enough to work if users modify the parameter values (e.g., change to a different type of field map correction). Other than the two different example scripts having a different set of defaults, for users that have their own customized protocol and need to modify the parameter values, they should be able to start from either example script, and get to where they need to be. i.e., the HCP-YA version needs to continue to support using SE fields, and the Lifespan version needs to continue to support using gradient-echo field maps, and in neither case should users ever have to muck with changing the arguments provided to the actual invocation of the pipeline.

Last, if we've settled on an approach, I was hoping that we could get this done for all 3 structural-related example scripts, and both fMRIVolume and fMRISurface (i.e., all the scripts for which we currently know what the variables will look like for the Lifespan data).

@glasserm
Copy link
Contributor

For jsons perhaps there could be optional arguments to specify them?

@mharms
Copy link
Contributor

mharms commented Jun 21, 2019

That would be a nice feature, but would be a non-trivial addition to the example scripts. Plus, that sort of automated parsing of the jsons and retrieval of parameters will be something that the containerized version handles.

@tbbrown
Copy link
Contributor Author

tbbrown commented Jun 22, 2019

I conditionally added the --fmapmag=, --fmapphase=, and --fmapgeneralelectric= options to the build of the PreFreeSurferCmd in PreFreeSurferPipelineBatch.LifeSpan.sh. I expect that there is more work to do to make the building of the PreFreeSurferCmd completely generic. That work will have to wait until after other things for the course are done.

I also changed the comments referring to the JSON files along the lines suggested by @mharms.

We've already written Python code to extract values from the JSON files, but adding use of it to these examples would seem to be unnecessary given that this is all being transitioned into the QuNex container.

Writing similar example scripts for the rest of the structural preprocessing and functional preprocessing etc. will have to wait until other things are done for the course.

This arrangement of the example batch scripts (having separate examples on a per-project basis) calls into question whether there is any value/need for supporting any command line options in these example batch scripts at all.

@coalsont
Copy link
Member

I often run locally when editing and testing the pipelines, though I have typically edited the default rather than using the --runlocal option.

My new options code doesn't currently accept repeatable options (like --subject in this script, which would need to be turned into a single option with a delimiter-separated list to be supported with my current functions - a delimiter-separated list would be shorter than repeating the option), nor parameters without an = (like --runlocal in this script, which could be made into --runlocal=TRUE, which would also allow an edited default of true to be overridden with it - "--help" is an exception which is hardcoded into my parsing function currently). Thoughts on whether I should add support for these cases?

@glasserm
Copy link
Contributor

Usually subjects are delimited by spaces in the Batch file.

@coalsont
Copy link
Member

If you use quotes, you can do that on the command line, too. Or, the option can set a different variable and you can test that for nonempty.

@mharms
Copy link
Contributor

mharms commented Jun 27, 2019

The PreFreeSurferPipelineBatch.LifeSpan.sh example script still differs too much at a gross organizational level from its HCP-YA equivalent. e.g., the order in which variables are set is very different between the two. And many (all?) of the potentially helpful comments in the HCP-YA version about how to set parameters from the DICOM fields have been lost in the Lifespan version. Maintaining both of these, given their lack of parallel organization, is going to be too much work.

@coalsont
Copy link
Member

I expect once the course is done and things have calmed down, and the command-building code is fully generic, we can copy the missing comments into the generic version, replace the second version with a copy of the first, and change its settings to match lifespan. From then on, it would be two copies of the same file, with only the settings changed.

@tbbrown
Copy link
Contributor Author

tbbrown commented Jun 28, 2019

I expect that if I had started this set of changes along the lines of copying the original example to create a version for LifeSpan, then it would have ended up just as @coalsont describes. The original goals I had were to try to somehow keep things all in one example file, have that example file work for HCP-YA data by default, have a very minimal set of code changes for course participants (or others) to make to allow it to work on LifeSpan data (I could not have them changing each and every parameter setting), have it done by the time I had to submit my materials for the course, and have it done in time to be on the course master machine before cloning.

The modifications to what used to be the get_batch_options function and is now just the get_options function, were done only for convenience. To me they are quick and easy changes that were originally intended to make it so that the script could be used for various projects without actually having to change any code if one didn't want to. Those changes have been largely removed, and no command line options need be specified at all to use the script.

Given that we are now going along the path of having a separate example for each project, such things as:

  • parallel organization
  • setting of variables all in the same order
  • duplicated comments in each version of the example
  • complete removal of the get_options function

all make perfect sense.

I might be able to get some of that done next week, but that will probably not be in time for inclusion on the course machine and may require some minor modifications to the materials for the first practical that just cannot be made at this point.

@mharms
Copy link
Contributor

mharms commented Jun 28, 2019

At this point, I don't really care much about what ends up on the course machine as long as it works, but hope that we can get this all tidied up before you start your new job.

@mharms
Copy link
Contributor

mharms commented May 16, 2023

@coalsont : Should we close this PR?

@coalsont
Copy link
Member

This lifespan example looks slightly more capable than the current example, in that you can change between fieldmap types without editing the command section. But, it would be the only lifespan example.

@coalsont
Copy link
Member

Actually, the current one seems to handle it by passing in NONE for the unused options (while the lifespan actually avoids specifying the flags), so maybe it doesn't matter in that respect (though I have never liked the NONE convention).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants