Releases: th2-net/th2-data-services
v2.0.0
User impact and migration instructions
By installing the package you will no longer get RDP package.
If you want to use RDP you have to specify dependency in square brackets [ ]
-
[I] Adapter interface got required handle_stream method.
[M] Implement new method for your adapters. -
[I] It's no longer possible to import Data object directly from
th2_data_services package.
[M] All records should be changed from "from th2_data_services import Data"
to "from th2_data_services.data import Data". -
[I] Provider module is removed.
[M] You should use data source implementations, like th2-ds-source-lwdp. -
[I] INTERACTIVE_MODE cannot be accessed like
th2_data_services.INTERACTIVE_MODE anymore.
[M] It's now changed to th2_data_services.config.options.INTERACTIVE_MODE -
[I] EventsTree renamed to EventTree
[M] All records should be changed to EventTree -
[I] Message utils method
expand_message
moved intoMessageFieldResolver
.
[M] Implement new method in your resolver. -
[I] Data iteration logic is changed.
Why? Current behavior causes problems in some cases. E.g. when we don't want
to iterate objects inside the DataSet.[I.1] Lists and tuples used in building Data objects are treated as single
item and items inside them aren't iterated anymore.
[M.1] Update Data objects initialized with lists or tuples.[I.2] Change in iteration logic also changed how
map
function behaves.
Ifmap
function returns lists or tuples their content won't be iterated
anymore.
[M.2.1] If you are interest previousmap
function behavior, just updatemap
tomap_yield
.[M.2.2] Update
data.map(mfr.expand_message)
todata.map_yield(mfr.expand_message)
[I.3] Data object will not iterate over contents of its stream if any of the
items are iterables (but not Data object).
It means that Data object will not iterate lists and tuples inside the
provided DataSet and will return they as is.
Only exception will be if all of the items are Data objects themselves.
[M.3] Update nested lists in Data object initializations to either Data
objects or switch to using addition operator.
[Examples]
d1 = Data(['a', 'b'])
a.Data([1, 2, [3, 4], d1])
will yield 1,2,[3,4],d1. Prev. behavior: 1,2,3,4,'a','b'
b.Data([d1, d2])
where d1 and d2 are Data objects. It will yield from d1, and after that yield from d2.
c. You can update the example froma
toData([1,2,3,4])
or tonew_data = Data([1,2]) + Data([3,4]) + d1
.
d. You also can return prev behaviour doing the following:new_data = Data([1, 2, [3, 4], d1]).map_yield(lambda r: r)
-
[I] A new version of
orjson
lib require python 3.8+.
[M] Change your python version if you use 3.7 to 3.8+.
Features
-
[TH2-4128] pip no longer installs RDP by default
-
[TH2-4128][TH2-4738] extra dependencies can be installed using square brackets after
package name.- Example:
pip install th2-data-services[lwdp]
Available data sources implementations:
dependency name provider version lwdp latest version of lwdp lwdp2 latest version of lwdp v2 lwdp3 latest version of lwdp v3 utils-rpt-viewer latest version of utils-rpt-viewer utils-rpt-viewer5 latest version of utils-rpt-viewer v5 utils-advanced latest version of ds-utils - Example:
-
[TH2-4493] Adapter interface got handle_stream method.
-
[TH2-4490] Added
map_stream
method to Data.- Almost same as
map
, except it's designed to handle a stream of data
rather than a single record. - Method accepts a generator function or a class which implements
IStreamAdapter with generator function.
- Almost same as
-
[TH2-4582] IAdapter interface removed.
- IStreamAdapter interface added to handle streams.
- IRecordAdapter interface added to handle single record.
- Method accepts Generator function or IStreamAdapter interface class with
Generator function.
-
[TH2-4609] Data.filter implementation changed to use
yield
. -
[TH2-4491] metadata attribute added to Data. It will contain request urls.
-
[TH2-4577] map method now can take either Callable function or Adapter which
implements IRecordAdapter. -
[TH2-4611] DatetimeConverter, ProtobufTimestampConverter converters added.
-
[TH2-4646]
- metadata gets carried when using Data methods.
- update_metadata method added to update metadata.
- [TH2-4684] Tree names changed from plural to singular. (e.g Events
Tree -> EventTree) - [TH2-4693] Implemented namespace packages structure, allowing other th2
libraries to be grouped together. - [TH2-4713] Added options module which enables user to tweak library
settings. DummyDataSource
added.- [TH2-4881]
Data.from_json
method was added. - [TH2-4919]
Data.from_any_file
method was added. - [TH2-4928]
Data.from_csv
method was added. - [TH2-4932]
Data.to_json
method was added. Puts your data to a valid json
object. - [TH2-4957] Added
gzip
option forData.to_json
method. - [TH2-4957] Added
decompress_gzip_file
function to utils.converters. - Added
to_csv
method toPerfectTable
class. utils.converters.flatten_dict
converter added.- Added
Data.to_jsons
method that put your data object to jsons file
(file where every line is separate json-format line. That's not a valid json
format.)
Renamedto_jsons
toto_json_lines
later.
- to_jsons -- is deprecated now.
- [TH2-5049] Added ExpandedMessageFieldResolver
- [TH2-5053] Added
pickle_version
to Data.from_cache_file method. decode_base64
function added to converter utils.- [TH2-5156]
UniversalDatetimeStringConverter
andUnixTimestampConverter
added. - [TH2-5167]
Data.is_sorted
,event_utils.is_sorted
,message_utils.is_sorted
andstream_utils.is_sorted
methods were added. - [TH2-5176]
to_th2_timestamp
method was added for converters. - [TH2-5081] Added
map_yield
function, that should behave similar to
oldmap
method.
That means thatmap_yield
will iterate lists and tuples if the user map
function returns them. - [TH2-5197] Added the function
read_all_pickle_files_from_the_folder
to get Data object from the folder with pickle files. - [TH2-5213] Added
Data.to_csv
method, that converts data to valid csv. - [TH2-4900] Added
Data.sort
method, that also works with large amount of Data.
BugFixes
- [TH2-4711] EventTreeCollection max_count parameter of findall functions
worked wrongly. - [TH2-4917] Readme duplicates removed.
- [TH2-5083] Fixed comparison line formatting. Every event in block isn't
formatted as failed now if parent is failed. - [TH2-5081] Fixed iteration bug for case where Data object was made using
lists and tuple. - [TH2-5100] Fixed bug when we get Recursion Exception if we have too much
number of Data objects that iterate each other. - [TH2-5190] Fixed Data.to_json
- [TH2-5193] orjson versions 3.7.0 through 3.9.14 library has vulnerability
https://devhub.checkmarx.com/cve-details/CVE-2024-27454/. - [TH2-5201] Fixed DatetimeStringConverter.to_th2_timestamp() bug which occurred for inputs not ending with 'Z'.
- [TH2-5902] Fixed bug when cache file was removed after calling data.show().
- [TH2-5220] Fixed bug when Data.update_metadata() would change a string into a list.
- [TH2-5101] Fixed bug when merging date objects via + or += overwrites the source file.
Improvements
- Added vulnerabilities scanning
- [TH2-4828] EventNotFound and MessageNotFound now return error description as
argument instead of pre-written one. - [TH2-4775] Speed up
Data.build_cache
by disabling garbage collection at the
time of storing pickle file. - [TH2-4901] Added gap_mode and zero_anchor parameters for message and event
utils get_category_frequencies methods.
See doc - [TH2-5048] Added typing hints for resolver methods.
- [TH2-5172] Add faster implementations of the following
ProtobufTimestampConverter functions: to_microseconds, to_milliseconds,
to_nanoseconds. - [TH2-5081]
Data.__str__
was changed --> useData.show()
instead ofprint(data)
- [TH2-5201] Performance improvements have been made to converters:
- [TH2-5101] Data.update_metadata() now takes
change_type
argument (values:update
default,change
which denotes
whether to update or overwrite with new values. - [TH2-5099] Fixed slow iteration for Data objects created with many addition operators.
Benchmark.
- 1mln iterations per test
- input: 2022-03-05T23:56:44.123456789Z
Converter | Method | Before (seconds) | After (seconds) | Improvement (rate) |
---|---|---|---|---|
DatetimeStringConverter | parse_timestamp | 7.1721964 | 1.4974268 | x4.78 |
to_datetime | 8.9945099 | 0.1266325 | x71.02 | |
to_seconds | 8.6180093 | 1.5360991 | x5.62 | |
to_microseconds | 7.9066440 | 1.7628856 | x4.48 | |
to_nanoseconds | 7.6787507 | 1.7114960 | x4.48 | |
to_milliseconds | 7.6059985 | 1.7688387 | x4.29 | |
... |
v1.3.1
Improvements
- Updated Wheel to
~0.38
- Added vulnerabilities scanning
v1.3.0
User impact and migration instructions
This release implements performance bug fixes and provides Data object cache file saving and loading.
-
[I] Logging were removed from library. Only special builds will have logging.
User cannot useadd_stderr_logger
andadd_file_logger
logging functions.
[M] Remove DS lib logging usage anywhere. -
[I] Since
v1.3.0
, the library doesn't provide data source dependencies.[M] You should provide it manually during installation.
You just need to add square brackets after library name and put dependency name.pip install th2-data-services[dependency_name]
Dependencies list
dependency name provider version RDP5 5 RDP6 6 Example
pip install th2-data-services[rdp5]
Features
- [TH2-4289] Data.build_cache and Data.from_cache_file features were added.
- Added
Data.cache_status
property
Improvements
- [TH2-4379] Speed improvements in json deserialization.
- StreamingSSEAdapter will now handle bytes from sse-stream into Dict objects.
- SSEAdapter is now deprecated class.
- Data object will generate a warning if you put to it an object that has generator type.
BugFixes
- [TH2-4385] Logging in Data object slows down the ds library very much.
- Logging was removed.
add_stderr_logger
andadd_file_logger
are not available anymore.
- [TH2-4380] Fixed apply_adpater feature for GetMessages / GetEvents / GetEventById / GetMessageById
- [TH2-3767] Fixed bug with limit of Data object in Windows.
- [TH2-4460] Fixed bug where GRPC omitted fields with None value in response.
v1.2.3
v0.6.3
v1.2.2
v1.2.1
v1.2.0
v1.2.0
User impact and migration instructions
This release implements rdp v6 support that requres new grpc version. It means you cannot connect to rdp5.grpc and rdp6.grpc via the same environment. This DS lib version will have grpc version for rdp v6 == th2-grpc-data-provider v1.1.0.
- [I] The new version of grpc has been added.
[M] If you require the rdp v6 version of the interface, you do not need to do anything.
Otherwise, you need to reinstall th2-grpc-data-provider lib to the required one for your rdp.
More detail in here
Features
- [TH2-3083] The problem with several versions of the grpc interface is solved.
- [TH2-3512] Provider V6 module is developed.
- [TH2-4141] Option to disable ssl certificate for rdp5 is added
- [TH2-4098] Added Streams class for the param 'stream'.
BugFixes
- [TH2-4072] Now ETC doesn't raise a warning for missing detached_events.
- GRPC requests (start_timestamp, end_timestamp) are now made in UTC.
v1.1.1
v1.1.0
User impact and migration instructions
This release is not required any additional steps to use.
Features
- [TH2-3497] EventsTreeCollection got
get_leaves_iter
method. - [TH2-3497] EventsTreeCollection got
len_trees
andlen_detached_events
properties. - [TH2-3497] EventsTree and EventsTreeCollection got representation(
__repr__
) andsummary
methods. - [TH2-3558] Added module-level functions
add_stderr_logger
andadd_file_logger
to easily enable logging. - [TH2-3546][TH2-3583]
INTERACTIVE_MODE
- global parameter was introduced. Data.use_cache()
<- True by default.- Added data methods to get cache files paths
Data.get_cache_filepath()
andData.get_pending_cache_filepath()
. - [TH2-3665] Added method get_tree_by_id in ETC.
- [TH2-3592] Added logging in EventsTreeCollection module when ETC create with detached events.
- [TH2-3475] Implement Data objects joining
- [TH2-3467] Added utils classes to convert timestamps.
- [TH2-3662][TH2-3492] Added
get_detached_events_iter
andget_detached_events
methods in EventsTreeCollections.- Warning: Property
detached_events
is deprecated and will be removed in the future.
- Warning: Property
- [TH2-3496] Added get_parentless_tree_collection method in EventsTreeCollection.
- [TH2-3905] Separate filer classes added instead of
th2_data_services.Filter
class.- Warning: Class
th2_data_services.Filter
is deprecated and will be removed in the future.
- Warning: Class
Improvements
- [TH2-3003] Added automatic attachment of example.py code in readme.md.
- [TH2-3558] Added more debug info about Data cache using.
- [TH2-3389] GetXById http-provider command handles 404 error status instead of JsonDecodeException.
- [TH2-3663] Speed up len_detached_events property
BugFixes
- [TH2-3557][TH2-3560] Parent Data cache file will be created if you iterate a child Data object now.
- [TH2-3545][TH2-3580] The Data object now uses an absolute path, so it doesn't lose its cache file if you change the working directory.
- [TH2-3546][TH2-3583] Data cache file will not be removed if you use
INTERACTIVE_MODE
and the file is being read. - [TH2-3487][TH2-3585]
data = Data(source_data, cache=True).map(func)
Data object didn't write the cache in such case before. Fixed. - [TH2-3558] Used loggers name fixed. Changed to name.
- [TH2-3733] Provider API class generate standard URL (without duplicate '/' and '/' before query)
- [TH2-3598] Method get_subtree returns tree as EventsTree class.
- [TH2-3593][TH2-3664] Method get_root_by_id returns root by any non-root ID as Th2Event.
- [TH2-3595] When ETC creates subtree or itself ETC doesn't copy incoming data-stream.
- [TH2-3732] Log message in http.GetMessages contains name of the stream.
- [TH2-3734] EventsTreeCollection append_event method doesn't add duplicate event.
- [TH2-3596][TH2-3594][TH2-3473] EventsTreeCollections. Get or find methods includes parentless results, if parentless exists.