-
Notifications
You must be signed in to change notification settings - Fork 6
/
Copy pathChanges
365 lines (280 loc) · 16.3 KB
/
Changes
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
Revision history for Transposome
Version Date Location
0.12.1 11/08/17 Vancouver, BC
- Fix for #40 with reads not being found in cluster merging step. This was due to how different
Casava versions are handled. The original routine was correct but an issue lingered due to
this not always being invoked.
- The above fixes an secondary issue that was causing the annotated repeat fraction to be reported
as being lower than in reality, though this has been resolved.
- Add YES/NO boolean for setting in-memory analysis in configuration. This should be more intuitive
than the 1/0 form (which will still work).
- Remove the 'each' hash loop syntax from all classes. This is known to cause some issues, especially
when adding/removing keys so we changed all methods to use 'keys' instead.
- Label repeat summaries in log as either theoretical or biological if they are from just clustering or
clustering + annotated repeats, respectively.
0.12.0 10/31/17 Vancouver, BC
- Refactor the complete analysis procedure to run individual analysis steps sequentially.
- Add Transposome::Analysis::Pipeline class for controlling analysis steps rather than having
the code in one script.
- Add Transposome::Log class for getting/setting logging properties. This was previously all
rolled into one script and now the methods and code are shared across numerous classes.
- Add execution scripts for running analysis steps instead of controlling the program flow from
a single command script. This greatly reduces the memory usage of the full analysis, it improves
the efficiency for large data sets, and delivers the same biological results.
- Remove redundant test script ('15-transposome_app.t') for running the full pipeline. The individual
steps were already tested thoroughly so this set of tests became completely redundant. The full
pipeline is also tested in the '14-analysis_steps.t' script.
- Fix bug with Illumina sequence format not being parsed correctly when the command line analysis
was done.
0.11.4 10/31/17 Vancouver, BC
- Update typemap to include 'Unknown_TIR' type.
0.11.3 08/09/17 Vancouver, BC
- Change family mapping procedure in Transposome::Annotation::Mapping class to be a single abstract
regex insted of trying to conditionally match names for each transposon type. This method captures
far more annotated families, so it is more reflective of the TE diversity in a sample.
- Add 'TRIM' and 'LARD' LTR retrotransposon types to type map in Transposome::Annotation::Typemap
class.
- Reverse sort annotation summary file to show largest families (by genome abundance) first.
This was the old behaviour but it appears to have been reverted to an ascending sort.
0.11.2 06/05/17 Vancouver, BC
- Bug fix for transforming sequence IDs that appeared to be Illumina but are not (fixes #37).
- Refactor SeqIO superclass to have ID setting method inherited by subclasses instead of repeating
the code in each subclass.
- Refactor SeqFactory class to take a hash of options/arguments instead of repeating code to set
class attributes based on the sequence format.
- Add 'seqtype' class attribute so that Illumina data can be safely transformed into a
single ID with no spaces to work better with Transposome and external tools like BLAST.
0.11.1 05/31/17 Vancouver, BC
- Add method to return the common superfamily name based on the commonly used 3-letter
codes for transposons.
- Add tests for annotation methods that do mapping of family and superfamily
names from IDs.
0.11.0 04/01/17 Vancouver, BC
- Add methods to annotate 'autonomous replication sequence' types in wheat,
fruit fly, and other species.
- Modify annotation summary methods to include sequence types other than
transposable elements, such as: pseudogenes, integrated virus, and autonomous
replication sequences.
- Change summary annotation format to incorporate sequence types other than
transposons. We added a column for the 'Order' since the 'Superfamily' and 'Family' may not be
applicable or defined.
0.10.1 03/07/17 Vancouver, BC
- Bug fix for directory structure of archived cluster FASTA files and
annotations. The results were being extracted to the results directory
under a directory of the same name, so the results were in a useless
subdirectory. Now the results are simply archived/compressed as is.
0.10.0 08/31/16 Vancouver, BC
- Reduce fields in summary annotation file, and expand column headdings,
to simply interpretation of the results.
0.09.9 03/10/16 Vancouver, BC
- Add handler for exceptions with BLAST threads so all children exit correctly.
- Fix variable naming issue with logging in Transposome::Run::Blast class.
0.09.8 08/06/15 Vancouver, BC
- Allow program to run if not under some shell.
- Do not rely on environment variables for finding Perl and building package
or running tests.
0.09.7 07/10/15 Vancouver, BC
- Fix type constraint for allowing numbers to be passed in with underscores.
0.09.6 04/09/15 Vancouver, BC
- Report cluster number after clustering process so it is easy to run the standalone
annotation step.
- Fix empty clusters (i.e, singletons) being written to the cluster file.
- Remove use of DBM-Deep, which was contributing to two bugs (a memory leak and an issue
with creating large DBM files).
- Rework methods in PairFinder and Cluster classes to be more memory
efficient. This includes using a new indexing method with SQLite. A number of methods now use
autovivication to reduce the number of hash lookups and simplify the code.
- Fix type constaints on the format of numbers accepted by the Transposome::Run::Blast class.
- Rework tests for clustering methods to better handle behavior of indexing vs in-memory analysis.
0.09.5 04/09/15 Vancouver, BC
- Fix bug in annotation summary file not reporting family correctly (top hit for each
cluster was being reported).
0.09.4 04/08/15 Vancouver, BC
- Fix bug in logging some methods twice.
- Fix bug in reporting incorrect number of total reads to annotation summary (introduced in v0.09.3).
- Fix bug in configuration tests with running the same tests unintentionally.
- Remove cluster membership file after clustering because it is really of no
use since the clusters and FASTA files are provided.
- Remove merge_threshold option in configuration to simplify usage.
0.09.3 03/30/15 Vancouver, BC
- Streamline logging methods by not using a MooseX role (now using Log::Any).
This also allows users to plug in any logger they want for their own
applications and it doesn't require constant checking if Log4perl is initialized.
- Remove Method::Signatures and type checking on method calls. This module is very
fast and adds readability to the code, but it has almost as many deps as Moose.
Type checking has some performance penalty, so the method calls should be faster
now.
0.09.2 03/17/15 Vancouver, BC
- Rework annotation methods to not walk the entire repeat taxonomy. The methods
are now in four separate roles and are consumed in one interface role.
- The summary and annotation to family mapping methods are more accurate now by
considering all TE families, and are more efficient (3X faster) by not applying
redundant methods to the summary calculations.
- Bug fix for some versions of BLAST joining sequence identifiers, which causes
the annotation mapping step to not work properly.
0.09.1 03/06/15 Vancouver, BC
- Remove 'sequence_num', 'cpu' and 'blast_evalue' options from the configuration
file to simplify usage (this addresses #18). Many tests were added (~30) to ensure
backwards compatibility with previous configuration files, so nothing should break
with these changes.
.
0.09.0 03/04/15 Vancouver, BC
- Modify annotation and cluster methods to take and return a hash of arguments
instead of long lists of arguments (#15). This change breaks the API, indicated by
the major version change (instead of minor version change).
0.08.7 02/23/15 Vancouver, BC
- Add check if any clusters were produced before doing the annotation step.
- Add separate command line options for each stage of the analysis. The analysis options
are: all, blast, findpairs, cluster, and annotation.
- Change the name of the most important field in the annotation summary file (the sixth column)
from "GenomePerc" to "GenomeFrac" to reflect that this is a fraction, not a percent.
This is correctly reported in the log but the column name in the summary file header is misleading.
0.08.6 02/12/15 Vancouver, BC
- Remove sequence index file when analysis is not done in memory.
- Write index file in the results directory unless the method is called from a script
or the CLI, in which case the index file is written to the working directory.
- Add more information to the error message when no clusters were found. Specifically,
it now states that the issue results from using too few sequences.
- Minor bug fix in annotation class to not return by next.
0.08.5 02/09/15 Vancouver, BC
- Minor fix in log format (remove extra colons in displaying configuration details).
- Index files for calculating matches on disk are written to the results directory
instead of working directory.
0.08.4 02/07/15 Vancouver, BC
- Remove intermediate files from the CL application but not from method calls.
This reduces the output by removing the clustering and BLAST files.
- Compress cluster and annotation directories.
0.08.3 12/24/14 Vancouver, BC
- Rework SeqIO class to follow a factory pattern. Now, sequence I/O is done
through the Transposome::SeqFactory class.
- Add separate FASTA and FASTQ subclasses of the SeqIO class, which are returned
from the Transposome::SeqFactory class. The new methods do not involve seeking back
on a filehandle, so now it is possible to read from STDIN. Overall, there was a more
than 3X speed improvement for both FASTA and FASTQ reading, though this does not
account for format validation.
- Add support for reading sequence data from compressed files, STDIN (a pipe),
or from a file handle.
- Add tests for command line application and increase test coverage of methods. In total,
220+ tests have been added to this version, including 3 new test files.
0.08.2 12/08/14 Vancouver, BC
- Fix bug with variables for blast paths not being defined when
searching path.
- Add comments to example configuration, and provide real values for all slots instead
of test values that may cause confusion (and lead to issues if not changed).
0.08.1 12/08/14 Vancouver, BC
- Fix a bug with Transposome::Run::Blast class not being
able to find executables.
- Add results section to log so results can be easily grepped
at the command line.
0.08.0 12/07/14 Vancouver, BC
- Rework annotation method by removing use of 'each' operator, which
fixes a bug determining TE family name.
- Lower version requirement to 5.10 from 5.12, because of the above change.
- Improve installation instructions in the README file by putting deps in one command
and removing the need to install cpanminus. Also, improve wiki documentation for
installation and configuration of the package.
- Rename main test class from "TestUtils" to "TestFixture" because that is a more appropriate name.
When used with the aliased module in tests, it is now very clear what "TestFixture->new()"
is doing, which is setting up test data.
0.07.9 11/26/2014 Vancouver, BC
- Rework test framework to be under the Transposome namespace. This
will make maintenance and adding new features/tests much simpler.
This also avoids hardcoding paths in test files or manually add the
include paths.
- Add mgblast and formatdb executables for Linux to be installed, which will keep
users (at least on Linux) from having to compile this by hand.
- Remove use of MooseX roles for handling options in favor of core Getopt::Long.
- Reduce verbosity in log by not logging output of each blast process. Clean up
log formatting for readability.
- Rework tests to use local bin, so tests will pass without modifying the PATH.
- Add configuration details to log.
- Update documentation for all options to main application.
0.07.8 11/05/2014 Vancouver, BC
- Remove use of BerkeleyDB for indexing methods and use SQLite
instead. This makes installation easier as the BerkeleyDB module
required C libraries installed at the system level. Also, this change
adds a major improvement in performance for large data sets.
- Fix dependencies listed in Makefile (YAML was listed, though it is
no longer used).
- Add binary for mgblast for CI testing. This solves two main issues, 1)
we don't have to rely on remote servers for getting deps, and 2) the
build process is about 10X faster now. The latter point means that
adding/removing features should be painless now, instead of waiting a
long time to get test results.
0.07.7 9/25/2014 Vancouver, BC
- Update deps to include stable version of MooseX-Getopt-Usage
on CPAN.
0.07.6 9/23/2014 Vancouver, BC
- Add tied hash indexing method to Pairfinder class. This allowed
the code to be reduced by almost half as compared to using a tied
object for indexing.
- Add developmental release of MooseX-Getopt-Usage which fixes a
bug for Moose 2.014 (because this fix involves some trickery requiring
cpanm, no release will be made for this version).
- Add import for main_community.cpp file that will allow compilation on
Debian machines without editing the code.
0.07.5 8/19/2014 Vancouver, BC
- Add support for FASTQ format in main application. Prior versions
would only handle FASTA for doing the self comparison, though the
*SeqIO class would handle FASTQ.
0.07.4 8/15/2014 Vancouver, BC
- Modify graph-naming scheme (doesn't impact performance).
- Add DOI to make repo citable, until the manuscript is
published.
- Check if all vs. all blast was successful, otherwise
exit from application or check if input is empty when
constructing Transposome::Pairfinder object (so both
application and API methods should handle this case).
0.07.3 6/03/2014 Athens, GA
- Add method to evaluate if classes in main application can
be loaded.
0.07.2 5/27/2014 Athens, GA
- Complete redesign of main application for faster loading.
- Remove string eval of version causing issues on some machines.
0.07.1 5/19/2014 Athens, GA
- Fix typo in logging clustering results.
- Delete commented out types in SeqUtil class since tests are passing.
0.07 5/04/2014 Athens, GA
- Add method for incorporating singleton repeat sequences into
annotation summary (changes API).
- Add explicit check if repeat database and sequence file exist.
- Move custom types to separate types library.
- Add tests for missing files that are required.
0.06 4/25/2014 Athens, GA
- Change from using YAML to YAML::Tiny for parsing configuration.
This gives faster loading with less memory usage.
- Rename methods for parsing/getting configuration so there is no
ambiguity about which class methods are being called.
- Add option to get version.
- Modify option handling to generate usage faster, and
not try to parse the configuration by default.
- Bug fix for calculationg repeat percentage of singletons.
- Clean up file creation methods for all test.
0.05 4/17/2014 Athens, GA
- Rework CI build system by isolating build and tests, which
are now working as expected.
- Update package names in dependency list so there are no warnings
during install.
- Changed logic for checking dependencies when testing so required programs
can be in any custom location.
- Changed locations searched for clustering code so any type of installation
should work now. In particular, using local::lib or a custom lib path should
work just the same as a default installation.
0.04 4/08/2014 Athens, GA
- Add method for writing FASTA file of singleton sequences (changes API).
- Add method for annotating and logging repeat content
of singleton sequences (changes API).
- Add BerkeleyDB indexing method for finding pairs.
- Add portable methods for working with files and paths.
0.03 12/19/2014 Athens, GA
- Add CI support and include instructions for installing deps.
- Modified method for setting path to blast executables so there
is nothing custom about the install.
0.02 12/02/2013 Athens, GA
- Add method signatures (via Method::Signatures).
- Bug fixes and code clean up for annotation summary method.
- Rework next_seq method in SeqIO class to build fh instead of
taking one as an option. This is much more familiar to use, though
considerably slower according to my benchmarks.
0.01 10/01/2013 Athens, GA
Initial release.