HISTORY

0.213   Critical locking overhaul resolves vrpipe-server not functioning
        correctly.
        Fixed vrpipe-output so that it works again.
        Fixed all vrpipe-* scripts and processes to not spawn extraneous redis
        servers; now only vrpipe-server will spawn 1 as needed.
        bcftools_cnv step fixed to get true sample name from vcf.
        bcf_to_vcf step improved so that $bcftools can be replaced by
        bcftools_exe in post_calling_vcftools option.
        bam_import_from_irods_and_index step fixed so that it can be installed.
0.212   Critical speed fix for steps that take datasource input with multiple
        types and many files.
        cram_index and bam_index steps now die if they can't delete an out-of-
        date index file.
        Overhauled internal locking mechanism should result in a slightly faster
        system with fewer, shorter-lived stalls.
        (Sanger specific:) irods warehouse datasource and npg_cram_stats_parser
        step now default to getting F0xB00 stats files instead of F0x900. Also,
        the correct studies according to warehouse are related to the input
        file, ignoring study information stored in irods metadata. irods
        datasource no longer falls over if given a query file with blank lines.
        GATK steps can now use a gatk_key to turn off phoning home, enabled by
        default if environment variable GATK_KEY is set.
        New gatk_genotype_gvcfs pipeline which can do the calling to chrom vcf
        in one setup (using both datasource and step chunking).
        New pipelines: bam_calling_with_gatk_haplotype_caller,
        bam_calling_with_mpileup_via_bcf, bam_import_from_irods_and_index,
        bam_indel_realignment_and_bqsr, bam_merge_and_mark_duplicates,
        bam_merge_and_streaming_mark_duplicates,
        gvcf_calling_one_combine_with_gatk_haplotype_caller,
        gvcf_calling_two_combine_with_gatk_haplotype_caller,
        gvcf_calling_with_gatk_haplotype_caller,
        sample_bam_remapping_with_bwa_mem
        Deleted pipelines (some of them renamed to new ones above):
        sample_gvcf_via_bcf_with_mpileup,
        sample_gvcf_with_gatk_haplotype_caller,
        sample_indel_realignment_and_bqsr,
        sample_irods_import_and_mark_duplicates,
        sample_irods_import_and_streaming_mark_duplicates
0.211   bam_improvement_gatk_v2 pipeline fixed to work again.
        irods_sample_import pipeline renamed as
        sample_irods_import_and_mark_duplicates.
        bam_reheader step fixed to work when reads metadata on input file is
        wrong.
        bcftools_cnv step fixed to always remove invalid plot<-sample
        relationships.
0.210   Various bam-outputting steps are now more tolerant of input file
        reads metadata being wrong.
        bcftools_cnv step fixed to properly relate all plots to the sample node.
        New sample_indel_realignment_and_bqsr pipeline.
        Critical speed fix for vrtrack_update_mapstats step.
        Fixed polysomy step so the output files have the correct sample name
        metadata.
0.209   Minor fix for vcf_merge_different_samples_control_aware step when the
        samples have no control.
        Fix for the polysomy step when the vcf has duplicate samples.
        Fixed step post processing when an output file needs permission changes
        and wasn't output by anything.
0.208   Critical speed fix for graph_filter queries in datasources (allowing
        those datasources to work now).
        Further critical speed fixes for processing all pipelines, allowing them
        to proceed.
0.207   Critical speed fix for processing all pipelines (especially evident when
        unstalling a stalled/started-from-scratch setup)
        vrpipe-setup fixed to allow creating new setups again.
        bwa_mem_to_bam fixed to solve conflict issues with samtools sort.
        Various steps in the polysomy_cnv_caller pipeline have been fixed to
        work as intended.
0.206   Proper fix for v0.205.
0.205   Fix for hipsci QC to handle _CTRL appended to sample names in files.
0.204   CNV and Pluritest-related steps now store their results properly in the
        graph database against sample and donor nodes.
        vrpipe-setup now allows reactivate/deactivate of multiple setups.
        New sample_gvcf_via_bcf_with_mpileup pipeline.
        New sample_gvcf_with_gatk_haplotype_caller pipeline.
        vrtrack qc website donor view now allows setting donor sample qc status.
0.203   Fixed vrpipe-setup to handle new default options in the irods
        datasource correctly.
0.202   npg_cram_stats_parser step now stores md5_of_ref_seq_md5s
0.201   irods datasource methods fixed to all work with the new irods protocol
        system.
0.200   delete_inputs step behaviour now checks that input files are not needed
        by another setup before deleting.
        New pipeline for calling with fermikit.
        Bwa mem steps can now take sequence dict file as input.
        Steps can get chunks across the gnome with the new
        StepGenomeChunkingRole.
        npg_cram_stats_parser step now compares cram header to expectations,
        with result stored in new Header_Mistakes graph node.
0.199   This is a big change from the previous version. Re-read README.md for
        installation changes and study IMPORTANT_NOTES if upgrading.
        Newer steps and pipelines, and some vrpipe internal and website
        features, use the Neo4J graph database to store information. Neo4J may
        eventually replace the use of MySQL/postrgres.
        Issues with running out of database connections have been (mostly)
        resolved, so you can increase or get rid of --max_submissions arg to
        vrpipe-server.
        Now multiple setup handlers are run on the farm, each handling a few
        setups, to spread the load and help avoid some setups never being
        updated because a single setup handler never got around to dealing with
        it before being killed.
        The location of output files has been improved to better avoid different
        PipelineSetups with the same datasource and pipeline writing to the
        same paths.
        New supported filetypes hts, aln and var allow steps to work with
        bams, crams, vcf and bcf files easily. num_records() now only considers
        primary alignments.
        Copying files using VRPipe API now preserves mode and ownership.
        New file "protocol" system, allowing things like open() to work on
        files stored on remote filesystems such as irods without downloading
        them to local disc first.
        The irods datasource no longer requires the local_root_dir option,
        making use of the protocol system to represent the files as their
        location in irods instead of a non-existent local file. The
        *_with_warehouse_metadata methods also now look for qc files associated
        with queried bam/cram files.
        The vrpipe_with_genome_chunking datasource now works with all its
        methods.
        The irods and vrpipe datasources can now use a graph_filter, to filter
        based on properties of nodes related to the file of interest.
        New web front-end (vastly improved status) and beginnings of a QC
        sub-site. All pages are now served securely with https.
        vrpipe-rm can take a fofn.
        vrpipe-permissions can now change permissions on input files by
        selecting step 0.
        vrpipe-fileinfo can now work from a fofn, and has options to suppress
        the header and get info on setup input files by selecting step 0.
        Various minor bug fixes and speed improvements, especially for steps
        that deal with thousands of files. Unsticking stalled setups should now
        also be many times faster on multi-processor systems.
        New pipelines: sequencing_qc_from_irods*, bam_2015_vrtrack_processing*,
        irods_sample_import, gatk_combine_gvcfs, gatk_genotype_gvcfs,
        chipseq_qc_and_peak_calling, 
        Cram-related steps now use samtools (1.2+) instead of cramtools.jar.
        Added support for picard 1.128+ to picard steps, which has a new command
        line format.
        New irods_convert_cram_to_bam option for irods_get_files_by_basename
        step.
        bamtofastq now done with biobambam bamtofastq.
0.198   Fixes the vcf_merge_different_samples* steps to cope with hundreds of
        input VCFs (requires latest bcftools).
        vrpipe-submissions has new --partial_reset option which is useful when
        a step produces many submissions per data element, some of which worked,
        but the failures need a new command line; --partial_reset will only
        deleted the failed submissions instead of also restarting the ones that
        worked (as --full_reset does). If updgrading, read IMPORTANT_NOTES.
0.197   Fixed inability to cope with file paths with commas in them.
        New polysomy_cnv_caller pipeline and related steps.
0.196   New STAR pipeline and steps.
        verify_bamid step now actually works.
0.195   vrtrack_populate_from_vrpipe_metadata step now sets npg_qc_status;
        sample name now always comes from the sample metadata; public_name
        metadata is now stored in individual alias. The irods datasource also
        no longer ignores it when manual_qc metadata changes.
        bcf_to_vcf step has new vcf_sample_from_metadata option which reheaders
        the output vcf; bcftools_gtcheck step expected_sample_from_metadata_key
        option now takes multiple values
0.194   mpileup_vcf/bcf_to_vcf step has new vcf_sample_from_metadata option
        which reheaders the output vcf; bcftools_gtcheck step
        expected_sample_from_metadata_key option now takes multi values
0.193   sequenom_csv_to_vcf step now handles fluidigm csvs as well; now produces
        calculated_gender metadata instead of sequenom_gender, and takes
        plex_storage_dir option instead of sequenom_plex_storage_dir.
        vrpipe-fileinfo and vrpipe-output --step option now takes the format
        <step_name_or_number>:<kind_of_output> - a colon separates them instead
        of a pipe, making it easier to type on the command line and matching the
        syntax used elsewhere. vrpipe-submissions --step option can now take
        a step number instead of just a step name.
        Fixed bug in vcf_merge_different_samples_control_aware step to corretly
        handle single-sample merges.
        The vrpipe datasource methods gain a new include_in_all_elements option
        that allows you to include the output of one or more setups in every
        element of a normally configured vrpipe datasource. This can be useful
        for eg. a genotype checking pipeline where the normal input is VCF files
        from setup X, each element being a single VCF file. But then you also
        want to use a genotypes bcf file that is generated by setup Y, and that
        single bcf should be included with each different VCF. In this case your
        datasource source option would be X, and the include_in_all_elements
        option would be Y.
0.192   Fixed split_genome_studio_genotype_files step to work when there are
        multiple analysis files.
        Better parsing of the imeta qu output in the irods datasource now allows
        grep -v pipes to work reliably in the source.
0.191   Critical fix to restore functionality to the setups handler, lost in
        0.190 - it now monitors setups again to keep them unstalled and look out
        for new data elements.
0.190   sequenom_import_from_irods_and_covert_to_vcf pipeline now indexes the
        vcf file it produces.
        irods datasource improvements: all_with_warehouse_metadata method now
        has a 'required_metadata' option to ignore files lacking those keys. It
        also now retries irods commands multiple times on failure. All analysis
        files (Sanger-specific) are now noted.
        An issue exists where if multiple setups claim to have made the same
        file, VRPipe will no longer automatically restart a step that produced
        that file. This most often comes up for VCF index files. Some recent
        VCF-producing and consuming pipelines have been altered to remove
        the VCF-indexing step from the consuming piplines, and the producing
        pipeline now always indexes. See IMPORTANT_NOTES if upgrading. VRPipe
        itself also now checks if a file is in a setup's unique directory and
        allows automatic step restarts in that case.
        VRPipe::File has a new merge_metadata() method for combining metadata
        from multiple files.
        Critical fix for the bam_improvement_no_recal pipeline, which was
        deleting a required file if the cleanup option was on.
        When VRPipe datasources automatically start over dataelements because
        file metadata changed, the details of the metadata change are now logged
        (see vrpipe-logs).
        vrtrack_populate_from_vrpipe_metadata step was improved to store bam
        lane metadata in the way expected by most bam pipelines and to store
        control metadata for stem cell data.
        The pluritest-related pipelines and steps received improvements and
        fixes.
        vrpipe-setup --based_on now lets you unset an value by answering ''.
        The setups handler now catches and emails about datasource errors in new
        setups.
0.189   vrtrack_auto_qc_with_genotype_checking pipeline fixed and improved to
        now take genotypes bcf as an input.
0.188   Updates to various bcftools-using steps to cope with both v0 and v1 of
        bcftools automatically, and vcf merging is better at estimating memory
        required.
        vrpipe-rm is now no longer silent about failed or skipped deletes.
        vcf_merge_different_samples_control_aware step now combines all meta-
        data from its inputs instead of only keeping the intersection.
        Tweaks and improvements to some Sanger-specific hipsci-related
        pipelines.
        New vrtrack_auto_qc_with_genotype_checking pipeline that uses latest
        bcftools for genotype checking.
0.187   vrpipe-disk_usage now provides a complete report on all usage across all
        disks.
        Critical fixes and improvements for the htscmd_genotype_analysis step,
        which now takes different options and calculates gt_status correctly.
        pluritest_plot_gene_expression step fixed to not break due to an
        undefined environment variable.
        Critical fix for ARRAY-ref-related errors when trying to add metadata to
        files.
        Overhaul of genome-studio and loh-calling pipelines to work with the
        irods datasource.
0.186   Fix to stop restarting steps that by design do no create their output
        files, when the output files do not have all required metadata.
0.185   Fixed bug in VRPipe::File->openw to prevent deep recursion when lacking
        permission to write.
        VRTrack datasource analysis_genome_studio method now allows grouping on
        analysis_uuid.
        irods datasource now sets the type of files to their type in irods.
        irods_analysis_files_download step now outputs analysis and input files
        under different output keys. It also adds irods_local_storage_dir
        metadata to say where the irods_analysis_files were downloaded to.
        vrtrack_populate_from_vrpipe_metadata step now sets library ssid and
        tag sequence.
        pluritest_annotation_profile_files step now (only) works with idat files
        from the irods datasource.
0.184   Sanger-specific improvements to the irods datasource, and a new step and
        pipeline to update a VRTrack database and import the files based on the
        irods datasource.
0.183   Fixed the new bam_improvement_gatk_v2 pipeline.
        StepCmdSummary version strings can now be up to 64 characters long.
        Full support for setting and getting multiple values per metadata key
        has been added to Files.
        The irods datasource now stores multiple values for the same metadata
        key, and the all_with_warehouse_metadata method now gets all the
        Sanger-specific metadata we'd need for complete tracking.
0.182   New vcf_merge_different_samples_control_aware step.
        New vcf_merge_different_samples_to_indexed_bcf and hipsci_loh_caller
        steps and pipelines.
        New bam_improvement_gatk_v2 pipeline for doing the recalibration part
        of the improvement pipeline with GATK v2 or higher (v3 needed for
        working with bams produced by bwa mem).
        iRODs datasource now has the Sanger-specific warehouse code split out
        in to a new all_with_warehouse_metadata method.
        htscmd_genotype_analysis now confirms the genotype when the expected
        sample has a score with a ratio of 1.000 to the score of the highest
        score.
        bwa_mem_fastq and bam_merge_lane_splits steps now pass through sample_id
        metadata.
0.181   Critical fix for infinite restart bug affecting setups for pipelines
        with temp files where the user set a unix_group.
        pluritest_annotation_profile_files step now does case insensitive
        matching on its regex options and only considers annotation files with
        9 columns.
0.180   Fixed resolve bug in vrpipe-fileinfo.
        htscmd_gtcheck step now has a expected_sample_from_metadata_key option.
        split_genome_studio_genotype_files step fixed to get correct samples.
        penncnv_detect_cnv step now has a perl_for_penncnv_exe option.
0.179   The unix_group option during vrpipe-setup no longer results in file
        permissions changing for files produced by other setups.
        Added r_libs option to pluritest_plot_gene_expression step to allow it
        to be used with a specific R install.
        vcf_merge_different_samples step now works with the latest bcftools.
        genome_studio_import_from_irods pipeline replaced with
        genome_studio_import_from_irods_and_convert_to_vcf pipeline featuring
        new illumina_coreexome_manifest_to_map step and updated
        genome_studio_fcr_to_vcf step that both use fcr-to-vcf script from the
        vr-codebase git repository.
        Steps in VRPipe can now specify that they take as input arbitrary file
        types, useful for when your step takes multiple different text formats
        and you need to distinguish between them or take these from the
        datasource.
        The irods step got a search_by_metadata() method.
        The sequenom_csv_to_vcf step now takes a sequenom_plex_storage_dir and
        reference_name option and gets the correct plex manifest from irods.
        The irods datasource now queries the Sanger-specific 'warehouse'
        database and adds public_name and donor_id metadata to files.
0.178   Front-end vrpipe-* scripts now strip control codes the user may enter
        while typing.
        Fix for the vcf_merge step so that it works when the post_merge_vcftools
        option includes a .gz path.
0.177   vrpipe-handler for setups can sometimes need more memory, so its
        requirements have been increased to 2900MB.
        vrpipe-permissions now has a --filter option to pick files to alter
        based on their metadata.
        vrtrack datasource now checks individual name and alias, and prefers to
        use alias as the individual.
        New Sanger-specific sequenom import pipeline.
0.176   Hotfix so that the vrtrack datasource ignores when a lane changes to
        0 reads.
0.175   Hotfix so that the vrtrack datasource notices and updates when a lane
        changes it number of reads.
0.174   Hotfix so that the LSF scheduler can correctly choose queues that have
        no time limit defined.
0.173   LSF scheduler now has an improved method of picking the queue to submit
        jobs to.
        Jobs with cmd lines that have mutltiple things piped together will now
        be considered failed if any cmd in the set of pipes exits non-0, instead 
        of if just the last cmd exits non-0.
        Fix for the bam parser, allowing tests to pass on modern systems.
0.172   Fixed setup stalling introduced in previous version.
        Submission failure emails now include the setup name.
        vrpipe-server now always spawns a setup handler in production, so that
        new work for setups can be discovered when all setups were complete
        when the server was last stopped.
0.171   vrpipe-fileinfo --path mode now always show details on that path, even
        if it has been deleted (no need to specify --include_removed).
        Some bam processing steps now transfer over individual, project and
        species metadata from their input to their output files.
        Triggering setups (working out what work they have to do next) no longer
        occurs while that setup's datasource is being updated, avoiding some
        wasted effort.
        vrpipe-setup now allows changing the datasource in some (safe)
        circumstances.
        vrpipe-elements --start_from_scratch now works more reliably.
        vrtrack datasource now notices changes to sample names.
0.170   Fix for "Could not continue submission management for farm" issue.
0.169   Critical fix for the "Magic number checking on storable string failed"
        error affecting DataSource updates and vrpipe-server.
0.168   Critical fix to stop VRPipe overloading the system's network
        connections; associated large speed increase.
        Critical fix for ec2 scheduler so that it works again.
        Avoidance of possible DataElement "corruption".
        Possible fix for strange DataSource update behaviour - they should now
        update properly in a single attempt.
        vrpipe-create_step now asks how many CPUs a cmd uses.
        vrpipe-file_info works properly on VRPipe-created symlinks again, and
        --dataelement option works on --paths again.
        PipelineSetup names can now be 128 characters long instead of just 64.
0.167   Critical fix for correctly retrying submissions that failed due to
        using too much memory.
0.166   Critical fix for adding file metadata correctly without losing
        existing metadata.
        Critical fix for steps that accept multiple file types.
        Improved reliability of vrpipe-handler processes.
0.165   Critical fix for updating DataSources.
0.164   archive_files step improved so that it no longer generates submissions
        that are guaranteed to fail.
        Massive improvement to speed and reliability for steps that deal with
        1000s of input files.
        vrpipe-rm --filter option renamed --search_by_metadata for consistency
        with vrpipe-fileinfo; this option now also works properly and more
        quickly.
        Critical fix for sge_ec2 scheduler so that everything does not stall out
        when SGE has a dead host.
        bwa_mem_fastq step fixed to record correct reads metadata.
0.163   Improvements and bugfixes for the ec2 and sge_ec2 schedulers, resulting
        in more reliable instance launching and termination. EC2 spot requests
        are now supported.
        Increased the likelyhood of generating single large job arrays instead
        of multiple smaller ones, for greater scheduling efficiency.
        When vrpipe-server encounters problems, the admin is no longer emailed
        the same error more than once per hour.
        The default user for new setups is now the admin, not the fake user
        'vrpipe'.
        The fastq_metadata step no longer claims to output a file, avoiding
        possible problems when resetting the step.
        GATK steps that take bam files now correctly specify bai files as
        required inputs to avoid restart-related failures.
        Possible fix for vrpipe-handlers getting stuck after temporary loss of
        db connection.
        Fixed copy and move operations to discs that were nearly empty.
        vrpipe-setup now allows specifying which unix group output files will be
        chowned as, and if vrpipe-setup was run as root the files will also be
        owned by the user that created the setup. --extra_options can now also
        be used when creating a new setup, allowing the overriding of memory and
        time for certain steps.
        vrpipe-fileinfo --path, --setup and --search_by_metadata options now
        work addively instead of independently.
        The vrpipe datasource filter option now allows multiple comma separated
        filters to be specified.
        vrpipe-status has a new --show_steps option to show what steps a
        pipeline has, letting you work out which step outputs your files of
        interest.
0.162   Fixes for the ec2-related schedulers. README file reformatted as
        README.md. The VRPipe wiki on github now contains detailed installation
        and usage guides.
0.161   bam_to_fastq step now supports merged bam files containing multiple
        lanes.
0.160   Large speed improvement and fixes for steps when they are fed thousands
        of input files.
0.159   Fixes for compatibility with MySQL 5.5, which is now the recommended
        version to use; see updated README/IMPORTANT_NOTES for recommended
        configuration settings.
0.158   Critical fix to ensure Setups do not get reset when the version of your
        installed Storable module changes. See IMPORTANT_NOTES if upgrading.
        Various fixes related to moving VRPipe-tracked files from one disc to
        another.
        Fixed filetype checking for bams, vcfs and bcfs.
        Fix for the LSF scheduler so that if your LSF administrator has set
        LSF_UNIT_FOR_LIMITS to something other than KB, VRPipe no longer fails
        to submit jobs.
        Fix for the bam_index step so that it will reindex bams if they have a
        newer timestamp than their .bai files.
        Fixed the setups handler so that it correctly watches the source file
        of file-based datasources, updating them when the file is altered.
        Emails from "VRPipe Server" now have an address of the configured admin
        instead of a fake address.
        New sge_ec2 scheduler, providing SunGridEngine load balancing on
        Amazon EC2.
        New genome_studio_import_genotype_files pipeline.
        vrpipe-fileinfo can now be used to search for files that have certain
        metadata, and is also capable of reporting on files that were inputs
        to setups, not just setup output files.
        vrpipe-setup can now be used to change the output root of an existing
        setup (this does not move any files; it just determines where new
        output files would go).
0.157   Hotfix for the bamcheck step, so that it can cope with bams that have
        no mapped reads.
0.156   Critical fix for steps that generate hundreds of output files - now
        setups that use these steps will not stall with no submissions created.
0.155   Critical fixes and improvements for vrpipe-mv: in --setup mode it now
        finds all the files for a setup and warns you if it can't move them
        (because they are outside of the output_root). It also gains a new mode
        of operation that lets you easily move all VRPipe output files in one
        area to another, useful when moving the entire contents of a disc.
        vrpipe-permissions fixed so that it can change permissions on files that
        were moved or became apparently non-existent because a user without
        permission to see the files used VRPipe to try and look at them.
        vrpipe-fileinfo fixed so that it can more reliably show information such
        as --setup_info on all files, instead of just giving up and saying
        'unknown'.
0.154   New SGE scheduler for use with clusters running SunGridEngine.
        vrpipe-fileinfo --setup_info now outputs a vrpipe datasource compatible
        string of id[step_num:output_key] instead of duplicating the output of
        vrpipe-status. --display tab is also now the default display mode. It
        can now report what made a given --path even when multiple setups
        created it.
0.153   The local scheduler was overhauled and is now much faster and more
        reliable - it should actually work now. Using SQLite also works now,
        even in combination with the local scheduler. There is an experimental
        new ec2 scheduler for using VRPipe in Amazon's cloud (see README_EC2).
        Fixed edge case where Submissions with the exact same time requirements
        as the maximum amount of time allowed in the queue they were submitted
        to would never run.
        New bam_htscmd_genotype_checking and bam_lanelet_gt_check pipelines.
0.152   Critical fix for database connection exhaustion; critical fix for
        logging.
        Improved stall reduction. Improved database update consistency: fixed
        remaining reliability issues.
        vrpipe-fileinfo gains new options --dataelement and --input_file
        vrpipe-elements --element option now takes a dataelement id, and gains a
        --elementstate option that takes a dataelementstate id.
0.151   Critical fix for Submission failures due to mysterious SIGINTs.
        Critical fix for database update inconsistencies: the pipeline system
        itself should no longer be the cause of pipelines failing or entering
        into strange states.
        Fixed vrpipe-status --deactivated option to work again.
        PipelineSetups will no longer remain stalled (not moving to the next
        step) for too long (or forever), and fewer stalls should happen.
        New vrpipe-logs script to investigate what happened to a setup.
        The trigger process was sped up, in some case by orders of magnitude.
        archive_files step no longer re-archives files that are already in the
        archive pool.
        Further reduction to database connection exhaustion (though the problem
        remains).
0.150   Further fixes for database connection exhaustion. Fixed step limit
        system to work again.
0.149   Possible fix for database connection exhaustion.
0.148   Further improved database update consistency. Improved speed.
        vrpipe-status gains new --global_summary option.
0.147   Critical fix to avoid using up all database connections.
0.146   Improvement to database update consistency (should now avoid strange
        failures where input files got deleted).
        New bam_mapping_with_bwa_via_fastq_no_namesort pipeline to supercede
        the other bam_mapping_with_bwa* pipelines.
        Improved speed and efficiency for getting jobs that need be run
        submitted to the scheduler.
0.145   Critical speed fix for slow DataSource updates. Fix for some reset
        situations. Improved database consistency. LocalScheduler now functions
        correctly (though the server itself is suffering from a crash issue on
        certain low-cpu machines). Be sure to read IMPORTANT_NOTES.
0.144   Corrected database schema version number.
0.143   Critical (partial) fix for database inconsistency issues that have led
        to random issues such as missing output files, jobs pending forever,
        submissions restarting forever etc.
        Scheduler job arrays are now used for better scheduling efficiency.
        sequence_index datasource gains new method sample_fastqs.
        vrpipe-file_info now shows resolved file paths by default, and can now
        show files that have been deleted.
        New PipelineSetupLog class that records all major events that occur for
        each PipelineSetup.
        vrpipe-elements can now be used to tempoarily withdraw elements, and can
        also now show input paths and the output root.
        vrpipe-setup gains a 'touch' option to force the refresh of a
        datasource.
        vrpipe-output now describes all output files, not just those that still
        exist. You can also now --force_overwrite existing symlinks.
        vrpipe-submissions now reports the host that the jobs are running on.
        bam_to_fastq step reimplemented to use the bam2fastq exe.
0.142   Critical fix for File copy() and move() methods so that interrupted
        attempts are no longer considered successful on a retry. bam_metadata
        step now has a store_original_pg_chain option, which when turned off
        allows it to be used on bams produced by VRPipe in a single-step
        pipeline. 
0.141   Critical fix for bug in PipelineSetup introduced in 0.140. Setup handler
        fixed to avoid unnecessary full triggers. Submission handlers can
        sometimes get stuck doing nothing when they are no longer needed; the
        server now kills them off where possible.
0.140   Worked around an issue that could result in only a few of a setup's
        incomplete dataelements being worked on. bamcheck parser updated for
        compatibility with latest version of bamcheck.
        The archive_files pipeline no longer advertises that it outputs any
        files, to avoid rare risk of file loss due to resets at the wrong
        moment.
        Altered many steps and pipelines so that the requirement of bam index
        files is properly known by the system, allowing automatic resets to
        work properly.
0.139   Fixes error introduced in 0.138. bam_index step now checks if the index
        already exists and skips actually indexing if it does.
0.138   Another fix for submissions that could pend forever.
0.137   Fixed some areas where database connections were left open for a long
        time idling, wasting them. Improved (and reduced extraneous) error
        messages when post-processing steps. Further fix for submissions that
        could pend for forever.
0.136   Reduces database contention whilst selecting submissions to run, which
        should increase speed and help avoid running out of database
        connections.
0.135   Critical fix to avoid 0.133's fix from locking up the database for too
        long (and to unstick even more pending submissions).
0.134   Fixed bug that meant submissions were queued to run and reported on in
        vrpipe-status even if they were for dataelements that were withdrawn,
        which was very wasteful and confusing.
0.133   Critical fix for submissions that pended forever.
0.132   Emergency change in behaviour when the log file can not be locked:
        instead of emailing the admin, the log file is simply deleted. This is
        to prevent thousands of email messages being sent and also should allow
        subsequent log messages to be written to disc successfully.
0.131   Fixed VRPipe's internal confirmation of the existance of step output
        files so that it is no longer restarts jobs if the output file had
        previously beem moved elsewhere and deleted.
0.130   Critical fix for vrpipe datasources to prevent invalid dataelement
        generation.
0.129   VRPipe can now cope better if job stdout/err files are deleted by some
        external process. A linux-specific work-around has been implemented to
        get correct memory usage stats on systems where Proc::ProcessTable gives
        invalid results.
0.128   Critical speed improvement for pipelines with one or more "block and
        skip" steps. Improved step parsing error messages.
0.127   In concert with the latest version of bamcheck, the bam import and
        improvement pipelines can now cope with the use of bamcheck -f/F.
0.126   Fixed spurious error messages when a setup changes to having 0 elements.
        Improved error messages for missing input/output files. vrtrack_auto_qc
        step updated to work with latest bamcheck.
0.125   Fixed step limit handling that got broken in 0.124.
0.124   Critical fixes to try and lower database load and minimise wasted jobs
        submitted to the job scheduler.
0.123   New elements for a setup now get triggered soon after the datasource
        changes, instead of the next time the setup has 0 unfinished
        submissions, resulting in a large speed-up potential.
        Some critical database queries have potentially been optimised to
        hopefully avoid overloading the database when many jobs are running at
        once.
        Fixes for the bam_spatial_filter and rna_seq_map_gsnap pipelines.
0.122   Fixed repeated emailing of the same setup problem.
0.121   Potential fix for complex vrpipe datasources that fail to update as
        soon as they should. bam_spatial_filter pipeline fixes. Reduced chances
        of a setup not doing work when it should be doing work. Submission
        handlers that are no longer needed are now killed as soon as possible,
        providing a large efficiency improvement.
0.120   Potential fix to stop vrpipe-server silently stalling. Errors
        encountered during a Step post_process are now emailed out. New
        bam_spatial_filter pipeline.
0.119   Critical fix for changes in previous version. Critical fix for
        bam_split_by_region step, so it works with bams with arbitrary
        split_sequence metadata. vrpipe-status stall messaging improved
        slightly.
0.118   Critical fix to prevent steps that fail due to not creating any output
        files from being considered completed OK.
0.117   Fixes to gmap/gsnap step/pipeline.
0.116   Improved indication of possible stalled setups, and general fixes to
        vrpipe-server.
0.115   Critical fix for new datasources.
0.114   Critical fix for the setups handler: now setups actually get updated as
        their datasources change.
0.113   Improved memory reservation. vrtrack datasource now simplified by
        removing extraneous confusing options.
0.112   Critical bug fixes to 0.111, allowing it to work in production.
0.111   Radical overhaul to how VRPipe actually gets command lines run - should
        result in much greater speed and efficiency, with no more sporadic
        periods where nothing seems to be happening when there is work to be
        done. Be sure to read through IMPORTANT_NOTES if upgrading.
        New frontend scripts: vrpipe-create_step, vrpipe-create_pipeline and
        vrpipe-disk_usage.
        New VRPipe module for easy VRPipe perl 1-liners.
        Users are now emailed when their setups complete or run into problems.
0.110   Fixes to vrpipe-output to show input paths properly and allow # in
        output basenames.
0.109   Additional critical fixes for convex_plots and fastq_split steps.
0.108   Critical fixes for convex_plots and fastq_split steps.
0.107   New vrpipe-mv tool, useful for moving the output root directory of a
        setup to a new location. vrpipe-fileinfo now reports on files produced
        by dataelements that have not yet finished the pipeline.
        GATK2-specific steps now default to finding the GATK2 jar file in a
        GATK2 environment variable.
        New convex plot generation pipeline.
        Fixes for the grouping methods of some datasources, so that they do not
        create empty, useless dataelements.
0.106   Improved handling of paths entered on the command-line: leading and
        trailing whitespace is now stripped. New pipelines and steps for
        carrying out tasks with GATK v2. bwa_index step updated with support for
        the latest version of BWA. fastq_merge_and_index step speeded up.
        vrpipe-status gains a --defunct option to report on bad setups that
        are candidates for deletion.
        vrpipe-setup gains a --cleanup option to delete output files made for
        now withdrawn dataelements, recovering wasted disk space for completed
        projects.
0.105   Fix for vrpipe-status so that it no longer produces wild submission
        state numbers while reporting on a setup with a large number of
        currently changing submissions. Fix for the fastq_merge_and_index step
        so that it no longer uses up all database connections.
0.104   No changes to the code were made in this version. This release only
        corrects the upgrading instructions in IMPORTANT_NOTES for v0.103. If
        you already upgraded to 0.103, IMPORTANT_NOTES also contains advice on
        how to fix issues that may have arrisen.
0.103   WARNING: if upgrading, follow the instructions in IMPORTANT_NOTES before
        installing this version.
        Changes were made to how DataElements store input file paths in the
        database, now allowing a virtually unlimited number of paths, needed
        for some kinds of Step. This also fixes cases of extraneous DataElement
        withdrawal and needless repetition of work.
        A fix was made to make VRPipe fully functional on recent stock Ubuntu
        installs.
        A fix was made so that Jobs get properly killed when necessary.
        vrpipe-fileinfo and vrpipe-output gained a --include_withdrawn option.
        General improvements were made to SNP calling pipelines.
0.102   SGA-related improvements, including the ability to create and call on
        bam chunks.
0.101   vrtrack datasource fixed so that it does not create elements in
        group_by_metadata mode if any member of the group has no files.
0.100   New auto_qc_min_ins_to_del_ratio option for AutoQC pipeline. When Job
        stdout/err is archived, it is now limited to first and last 500 lines
        to avoid storing massive files. See IMPORTANT_NOTES if you want to clean
        up any large files you've already created.
0.99    Critical fix for memory leaks and slow-down problems when dealing with
        pipeline steps that have 10s of input files and produce 100s of
        duplicate submissions - like the new SNP calling pipeline(s). See
        IMPORTANT_NOTES if upgrading.
0.98    Fix for fastqc_quality_report and cufflinks steps to make them work in
        production.
0.97    Fix for vrpipe group_all method to wait until all elements are complete.
        Fix for the chunking DataSource variants, so their methods are shown
        when running vrpipe-setup. Fix for the retroseq_call step, for
        compatability with the latest version of retroseq.
0.96    Fix for Requirments, allowing reservation of over 999 hrs. See
        IMPORTANT_NOTES if upgrading. The use of Environement variables has
        been cleaned up and clarified; see the updated README.
0.95    This version overhauls a number of SNP-calling-related piplines and
        steps. If upgrading, be sure to read IMPORTANT_NOTES. The changes are
        primarily concerned with moving the choice to do 'chunked' calling (as
        opposed to calling across the whole genome at once) to the DataSource,
        instead of having 2 separate piplines, one for whole-genome, one for
        genome chunks. The change improves efficiency and makes it easier to
        restart/trouble-shoot failed chunks.
        Deleted pipelines:
            gatk_genotype (renamed snp_calling_gatk_unified_genotyper)
            gatk_variant_calling_and_filter_vcf (renamed snp_calling_gatk_unified_genotyper_and_filter_vcf)
            mpileup_with_leftaln
            snp_calling_chunked_mpileup_bcf (replaced by snp_calling_mpileup_via_bcf.pm + genome chunking)
            snp_calling_chunked_mpileup_vcf (replaced by snp_calling_mpileup.pm + genome chunking)
            snp_calling_gatk_vcf (replaced by snp_calling_gatk_unified_genotyper_and_filter_vcf + vqsr_for_snps)
            snp_calling_mpileup_vcf (renamed snp_calling_mpileup)
            snp_calling_mpileup_bcf (renamed snp_calling_mpileup_via_bcf)
            vcf_chunked_vep_annotate (renamed vcf_split_and_vep_annotate)
        Modified pipelines:
            vcf_filter_merge_and_vep_annotate (vcf_index step added throughout)
            vcf_vep_annotate (vcf_index step added at the end)
        New pipelines:
            vcf_concat.pm
            merge_vcfs_to_site_list_and_recall_from_bcf
            snp_calling_mpileup
            snp_calling_gatk_unified_genotyper
            snp_calling_mpileup_from_bcf
            snp_calling_mpileup_via_bcf
            vcf_split_and_vep_annotate
            vqsr_for_snps
            snp_calling_gatk_unified_genotyper_and_filter_vcf
0.94    Critical fix for getting interface_port when it is set as an environment
        variable. See also the recent 0.93 changes.
0.93    This version features the beginning of a more radicial overhaul,
        introducing vrpipe-server, which is a trivial-memory, trivial-cpu
        daemon process that will eventually run the whole system, discovering
        and dispatching jobs. In this version it only serves the frontends (both
        cmd-line and web, and currently only for vrpipe-status) and runs the
        local scheduler (if you have that configured). vrpipe-server will be
        started automatically when needed. When it starts it gives you the
        website address you can visit. This is actually faster than using the
        equivalent cmd-line tool. Please see IMPORTANT_NOTES if upgrading.
        
        The fofn, fofn_with_metadata and vrpipe DataSources now have a new
        method 'group_all', which is useful for 'merge' type pipelines.
0.92    vrpipe-setup script can now --reset or --delete an existing
        PipelineSetups, to completely wipe out all progress on them. Fixed
        critical bug in the sequence_index DataSource.
0.91    When VRPipe counts the number of records in a bam, it now uses the
        samtools executable in the $SAMTOOLS directory, not the first one in the
        $PATH.
0.90    Upgraded irods step to make it compatible with latest version of
        ichksum (critical fix for any irods-related pipeline).
0.89    Fixed critical bug in Submission reserved memory calculation that could
        prevent certain pipelines from proceeding when a step ran out of memory.
0.88    bam_to_fastq step can now take an option to allow it to not care about
        keeping the forward and reverse fastq files "in-sync", which helps with
        the SGA-related steps.
0.87    vrpipe-db_upgrade no longer takes --from and --to options, but instead
        will correctly upgrade the database from its current version to the
        latest, avoiding user-error. Bug-fix for vrpipe-elements so that -f
        works again.
0.86    Minor bug fix for getting file md5s. When the system has created
        symlinks (eg. vrpipe-output was used) and the source file of a symlink
        is moved, all symlinks are automatically corrected. The LSF scheduler
        now implements the cpus requirement. SGA steps bug fixed and now have
        support for gzipped fastq files.
0.85    bam_add_readgroup step now allows you to choose what metadata key SM is
        set from, eg. you could have it come from individual instead of sample
0.84    Critical fix for the VRTrack datasource, so that it copes with metadata
        changes better.
0.83    Further critical fixes to the new PerlTidy module. All Perl code has now
        been tidied.
0.82    Critical fix to the new PerlTidy module so that it does not break
        classes when tidying. Critical fix for VRPipe::File to revert to
        previous behaviour of keeping file metadata even after file deletion.
0.81    This version features major changes to how the underlying system
        interacts with the database, which results in greatly improved speed
        (orders of magnitude in certain critical areas) and bounded memory
        usage. End-users of the front-end scripts like vrpipe-status are not
        really affected by the changes, but developers who have written their
        own VRPipe scripts, Steps or Pipelines should be aware of the following:
        
        Persistent methods (see updated POD of VRPipe::Persistent for details):
        create() replaces what get() used to do: get an instance of a Persistent
        object from the database, creating or updating it if necessary. get()
        has been changed to only retrieve and update - it no longer creates but
        throws if the row isn't already in the database. get() should still be
        used whenever possible, especially in end-user scripts.
        Persistent instances now no longer access the database every time you
        call one of its methods to retrieve a column value. This means if you
        get an instance, then change column values in a different process, your
        instance will have out-of-date values. You can use new method
        reselect_values_from_db() to update your instance.
        There are new methods search(), get_column_values(), (also with
        *_paged() variants) and search_rs() for fast retrieval of many rows.
        New method bulk_create_or_update() lets you create many rows quickly.
        New method dump() can be used when debugging, letting you Dumper a
        Persistent instance without outputting tons of irrelevent information to
        the screen.
        New method do_transaction() can be used for doing a series of operations
        in a transaction.
        
        DataSource authors:
        source methods are now called safely, guaranteed single simultaneous
        process only. They no longer directly create or return DataElements
        themselves, but should call _create_elements() method instead.
        
        If you wish to contribute code with a pull request, please read the new
        DEVELOPERS file. It explains details of how to use our custom perltidy
        setup.
0.80    New bam_improvement_no_recal pipeline. New SGA-related pipelines. The
        vrtrack_auto_qc step now stores test results in the new VRTrack AutoQC
        table (instead of in a text file), and so requires VRTrack schema 20,
        which is found in the vr-codebase git repository version 0.04 or higher.
0.79    Another critical bug fix for dcc_metadata step. Bug fix for bam_reheader
        step. vrtrack_auto_qc step now has an extra metric.
0.78    Critical bug fix for dcc_metadata step. Bug fix for when a step input
        is a symlink and the pointed-to file has been deleted without VRPipe's
        knowledge.
0.77    When using SQLite as the database it may now lock up less, though the
        local scheduler remains incompatible with it. Fixes to metadata stored
        when merging bams, important for pipelines using bam_reheader step.
0.76    Another schema upgrade to add a missing index. See IMPORTANT_NOTES if
        upgrading.
0.75    Schema upgrade to provide better database indexes. See IMPORTANT_NOTES
        if upgrading.
0.74    Schema upgrade to allow the stats of multi-week-long running steps to
        be stored.
0.73    (re-)Added support for sqlite, though it is only really suitable for
        parsing as it may lock up if used for running pipelines. Fixed bug in
        LSF stdout parser.
0.72    Small fixes/improvements to steps fastq_split, bin2hapmap_sites and
        bam_name_sort.
0.71    New bam_improvement_and_update_vrtrack_no_recal pipeline.
0.70    New improvement pipeline that works with older versions of GATK. When
        QC step updates VRTrack database, now no longer overwrites manually
        applied qc_status.
0.69    New vrtrack_qc_graphs_and_auto_qc pipeline, suitable for rerunning QC
        on already imported or improved bams. Copyright and license information
        is now present on all source code files.
0.68    Critical fix for sequence_index datasource, so that it does not reset
        elements just because their center_name changed case.
0.67    New single-step bam indexing pipeline. New Conifer pipeline. New
        retroseq pipeline. Improved breakdancer pipeline. The vrpipe datasource
        now has an option to filter after grouping. The vrtrack_auto_qc pipeline
        now always fails a lane if the NPG status was failed. When a submission
        fails and is retried, the stdout/err of previous attempts is now
        accessible, eg. with vrpipe-submissions.
0.66    vrpipe-setup can now be used to change pipeline behaviours.
0.65    New breakdancer pipeline, single-step bam splitting pipeline, and the
        vrpipe datasource now applies the filter after grouping, requiring only
        1 file in the group to match the filter.
0.64    Fix for bam_reheader, affecting 1000 genomes pipelines.
0.63    Fix for rare bug in fastq_split which prevented it from working with
        certain input.
0.62    Critical fix for new queue switching code.
0.61    Now, if a job is running in a time-limited queue, and the limit is
        approaching, the job will be switched to a queue with a longer time
        limit.
0.60    Fix for vrpipe-setup to make it compatible with the new vrtrack_auto_qc
        pipeline.
0.59    New vrtrack_auto_qc pipeline. New (alternate) SNP pipeline. New
        vrpipe-permissions script.
0.58    Various fixes to enable initial install and testing for new users using
        latest CPAN modules.
0.57    vrpipe-fileinfo can now tell you how a file was generated.
0.56    New gatk_variant_calling_and_filter_vcf pipeline.
0.55    Further merge pipeline fixes. New bam realignment around discovered
        indels pipeline.
0.54    Further fix for new merge pipeline.
0.53    Fixed issues with bam merging pipelines, and renamed tham all.
0.52    New fofn_with_metadata DataSource - useful for inputting external bams
        into pipelines. VRTrack-releated steps now have
        deadlock-failure-avoidance.
0.51    VRTrack DataSource now has an option to group_by_metadata.
0.50    New merge_bams pipelines, to do "merge across". VRTrack datasource now
        allows filtering on more status types, and can get VRPipe improved bams.
0.49    Critical bug fix in bam_to_fastq step.
0.48    Tweaks and fixes to finalise new bam_genotype_checking pipeline.
0.47    Minor tweaks to finalise yet-unused pipelines.
0.46    New versions of merge lanes and stampy mapping pipelines with extra
        features.
0.45    Critical speed fix for vrtrack datasource. Library merge pipelines now
        index the resulting bams.
0.44    Fix for plot_bamcheck step, letting it work when then is no insert size.
0.43    Efficiency fix for vrtrack datasource.
0.42    Critical fix for vrtrack datasource, so that it now updates file
        metadata when vrtrack metadata changes.
0.41    vrtrack_update_improved step now sets lane reads and bases.
0.40    Critical fix for vrtrack_update_mapstats step, letting it work without
        exome_targets_file.
0.39    vrpipe DataSource behaviour changed, so that a child pipeline that
        deletes inputs won't mess up a parent that still needs those files.
        Overhauled the genotype checking pipeline and steps.
0.38	Fix for gatk_target_interval_creator step, increasing its default memory
        reservation.
0.37    Overhaul of qc graphs & stats-related steps and pipelines so that now
        wgs and exome projects all use the same pipeline, with a single bamcheck
        call. bam_to_fastq step fixed so that it runs in constant <500MB and
        copes with bams that miss reads.
0.36    Critical fixes to the underlying system to ensure job submission doesn't
        stall out forever, to handle limits on steps better, and to avoid issues
        when there are multiple submissions for the same job. Also a fix for
        java to increase likelyhood of jvm starting up.
0.35    vrpipe-status script improved to give a better overview of what the
        pipeline is doing, with warnings about pipeline stalls. bam_to_fastq
        step reimplemented, should now be much better.
0.34    Critical speed fix for the VRTrack datasource. Fixes for the
        archive_files pipeline and the vrtrack_update_mapstats step.
0.33    Optimised bam_import_from_irods_and_vrtrack_qc_wgs pipeline. Memory and
        time reserved for jobs is now less likely to be insufficient.
0.32    Fixes for bam_mapping_with_bwa_via_fastq and bam_reheader step.
        Efficiency improvement in how step max_simultaneous is handled.
0.31    Database independence now properly implemented. New separate bam
        improvement pipeline, remapping bams via fastq pipeline, and some
        Sanger-specific pipelines added.
0.30    Fixes related to archive_files pipeline.
0.29    New archive_files pipeline.
0.28    Really fix java-using steps so they get the memory they need.
0.27    Outputs of near-identical PipelineSetups will now never risk overwriting
        themselves. Java-using steps get better recommended memory. New
        IMPORTANT_NOTES file - you must read this!
0.26    Critical performance fix for StepStats.
0.25    New StepStats system for quick access to memory/time used stats.
0.24    Critical fix for mapping pipeline.
0.23    New Stampy mapping pipline. Fixes for SNP and DCC pipelines.
0.22    Critical fix for input files that are relative symlinks.
0.21    SNP discovery pipeline(s) now firming up; fixes for merging pipelines
0.20    Improved handling of limits, so that a good amount of jobs are always
        running.
0.19    Various fixes to 1000 genomes-related pipelines.
0.18    Fix to allow sqlite to be used in production.
0.17    Install process for new external users should now work/be easy.
0.16    New merging pipelines and associated vrpipe datasource (for chaining
        different pipelines together). Critical bug fixes that allow changes in
        datasources to trigger restarts for the changed elements.
0.15    Front-end for creating PipelineSetups; improvements to smalt mapping so
        we can map 454 data in 1000 genomes.
0.14    More front-end scripts added. Sequence index datasource now starts.
        changed elements over from scratch, so we can now change the source file
        safely.
0.13    Various fixes for pipelines. Memory leak issues fixed. Various front-end
        scripts added.
0.12    Fixes for bam_mapping_with_bwa. New VCF annotation-related steps and
        pipelines. Triggering pipelines in Manager has been optimised slightly.
0.11    Fixes for bam_mapping_with_bwa. New smalt mapping pipeline for handling
        454 sequence data.
0.10    Bam Improvement steps now fully implemented. New bam_mapping_with_bwa
        pipeline.
0.09    Scheduler independence: local can now be used for testing.
0.08    Submission retries now add time where necessary.
0.07    Fixed critical bug in mapping pipeline; should now work properly.
0.06    Myriad performance and stability improvements necessary to get the
        mapping pipeline running smoothly.
0.05    Critical performance fix for dealing with large datasources.
0.04    Critical performance fix for checking bam file type.
0.03    0.02 only worked on test dataset; this should be the first version to
        work on real data, following important schema changes and Step fixes.
0.02    Most interesting features not yet implemented, but this is the first
        working version, needed to do the 1000genomes phase2 (re)mapping.
0.01    No real files; just starting up repository.