Skip to content

Releases: th2-net/th2-data-services

v2.0.0

07 Feb 17:06
18d7979
Compare
Choose a tag to compare

User impact and migration instructions

By installing the package you will no longer get RDP package.
If you want to use RDP you have to specify dependency in square brackets [ ]

  1. [I] Adapter interface got required handle_stream method.
    [M] Implement new method for your adapters.

  2. [I] It's no longer possible to import Data object directly from
    th2_data_services package.
    [M] All records should be changed from "from th2_data_services import Data"
    to "from th2_data_services.data import Data".

  3. [I] Provider module is removed.
    [M] You should use data source implementations, like th2-ds-source-lwdp.

  4. [I] INTERACTIVE_MODE cannot be accessed like
    th2_data_services.INTERACTIVE_MODE anymore.
    [M] It's now changed to th2_data_services.config.options.INTERACTIVE_MODE

  5. [I] EventsTree renamed to EventTree
    [M] All records should be changed to EventTree

  6. [I] Message utils method expand_message moved into MessageFieldResolver.
    [M] Implement new method in your resolver.

  7. [I] Data iteration logic is changed.
    Why? Current behavior causes problems in some cases. E.g. when we don't want
    to iterate objects inside the DataSet.

    [I.1] Lists and tuples used in building Data objects are treated as single
    item and items inside them aren't iterated anymore.
    [M.1] Update Data objects initialized with lists or tuples.

    [I.2] Change in iteration logic also changed how map function behaves.
    If map function returns lists or tuples their content won't be iterated
    anymore.
    [M.2.1] If you are interest previous map function behavior, just update map
    to map_yield.

    [M.2.2] Update data.map(mfr.expand_message) to data.map_yield(mfr.expand_message)

    [I.3] Data object will not iterate over contents of its stream if any of the
    items are iterables (but not Data object).
    It means that Data object will not iterate lists and tuples inside the
    provided DataSet and will return they as is.
    Only exception will be if all of the items are Data objects themselves.
    [M.3] Update nested lists in Data object initializations to either Data
    objects or switch to using addition operator.
    [Examples]
    d1 = Data(['a', 'b'])
    a. Data([1, 2, [3, 4], d1]) will yield 1,2,[3,4],d1. Prev. behavior: 1,2,3,4,'a','b'
    b. Data([d1, d2]) where d1 and d2 are Data objects. It will yield from d1, and after that yield from d2.
    c. You can update the example from a to Data([1,2,3,4]) or to new_data = Data([1,2]) + Data([3,4]) + d1.
    d. You also can return prev behaviour doing the following: new_data = Data([1, 2, [3, 4], d1]).map_yield(lambda r: r)

  8. [I] A new version of orjson lib require python 3.8+.
    [M] Change your python version if you use 3.7 to 3.8+.

Features

  1. [TH2-4128] pip no longer installs RDP by default

  2. [TH2-4128][TH2-4738] extra dependencies can be installed using square brackets after
    package name.

    • Example: pip install th2-data-services[lwdp]

    Available data sources implementations:

    dependency name provider version
    lwdp latest version of lwdp
    lwdp2 latest version of lwdp v2
    lwdp3 latest version of lwdp v3
    utils-rpt-viewer latest version of utils-rpt-viewer
    utils-rpt-viewer5 latest version of utils-rpt-viewer v5
    utils-advanced latest version of ds-utils
  3. [TH2-4493] Adapter interface got handle_stream method.

  4. [TH2-4490] Added map_stream method to Data.

    • Almost same as map, except it's designed to handle a stream of data
      rather than a single record.
    • Method accepts a generator function or a class which implements
      IStreamAdapter with generator function.
  5. [TH2-4582] IAdapter interface removed.

    • IStreamAdapter interface added to handle streams.
    • IRecordAdapter interface added to handle single record.
    • Method accepts Generator function or IStreamAdapter interface class with
      Generator function.
  6. [TH2-4609] Data.filter implementation changed to use yield.

  7. [TH2-4491] metadata attribute added to Data. It will contain request urls.

  8. [TH2-4577] map method now can take either Callable function or Adapter which
    implements IRecordAdapter.

  9. [TH2-4611] DatetimeConverter, ProtobufTimestampConverter converters added.

  10. [TH2-4646]

  • metadata gets carried when using Data methods.
  • update_metadata method added to update metadata.
  1. [TH2-4684] Tree names changed from plural to singular. (e.g Events
    Tree -> EventTree)
  2. [TH2-4693] Implemented namespace packages structure, allowing other th2
    libraries to be grouped together.
  3. [TH2-4713] Added options module which enables user to tweak library
    settings.
  4. DummyDataSource added.
  5. [TH2-4881] Data.from_json method was added.
  6. [TH2-4919] Data.from_any_file method was added.
  7. [TH2-4928] Data.from_csv method was added.
  8. [TH2-4932] Data.to_json method was added. Puts your data to a valid json
    object.
  9. [TH2-4957] Added gzip option for Data.to_json method.
  10. [TH2-4957] Added decompress_gzip_file function to utils.converters.
  11. Added to_csv method to PerfectTable class.
  12. utils.converters.flatten_dict converter added.
  13. Added Data.to_jsons method that put your data object to jsons file
    (file where every line is separate json-format line. That's not a valid json
    format.)
    Renamed to_jsons to to_json_lines later.
  • to_jsons -- is deprecated now.
  1. [TH2-5049] Added ExpandedMessageFieldResolver
  2. [TH2-5053] Added pickle_version to Data.from_cache_file method.
  3. decode_base64 function added to converter utils.
  4. [TH2-5156] UniversalDatetimeStringConverter and UnixTimestampConverter
    added.
  5. [TH2-5167] Data.is_sorted, event_utils.is_sorted, message_utils.is_sorted
    and stream_utils.is_sorted methods were added.
  6. [TH2-5176] to_th2_timestamp method was added for converters.
  7. [TH2-5081] Added map_yield function, that should behave similar to
    old map method.
    That means that map_yield will iterate lists and tuples if the user map
    function returns them.
  8. [TH2-5197] Added the function read_all_pickle_files_from_the_folder
    to get Data object from the folder with pickle files.
  9. [TH2-5213] Added Data.to_csv method, that converts data to valid csv.
  10. [TH2-4900] Added Data.sort method, that also works with large amount of Data.

BugFixes

  1. [TH2-4711] EventTreeCollection max_count parameter of findall functions
    worked wrongly.
  2. [TH2-4917] Readme duplicates removed.
  3. [TH2-5083] Fixed comparison line formatting. Every event in block isn't
    formatted as failed now if parent is failed.
  4. [TH2-5081] Fixed iteration bug for case where Data object was made using
    lists and tuple.
  5. [TH2-5100] Fixed bug when we get Recursion Exception if we have too much
    number of Data objects that iterate each other.
  6. [TH2-5190] Fixed Data.to_json
  7. [TH2-5193] orjson versions 3.7.0 through 3.9.14 library has vulnerability
    https://devhub.checkmarx.com/cve-details/CVE-2024-27454/.
  8. [TH2-5201] Fixed DatetimeStringConverter.to_th2_timestamp() bug which occurred for inputs not ending with 'Z'.
  9. [TH2-5902] Fixed bug when cache file was removed after calling data.show().
  10. [TH2-5220] Fixed bug when Data.update_metadata() would change a string into a list.
  11. [TH2-5101] Fixed bug when merging date objects via + or += overwrites the source file.

Improvements

  1. Added vulnerabilities scanning
  2. [TH2-4828] EventNotFound and MessageNotFound now return error description as
    argument instead of pre-written one.
  3. [TH2-4775] Speed up Data.build_cache by disabling garbage collection at the
    time of storing pickle file.
  4. [TH2-4901] Added gap_mode and zero_anchor parameters for message and event
    utils get_category_frequencies methods.
    See doc
  5. [TH2-5048] Added typing hints for resolver methods.
  6. [TH2-5172] Add faster implementations of the following
    ProtobufTimestampConverter functions: to_microseconds, to_milliseconds,
    to_nanoseconds.
  7. [TH2-5081] Data.__str__ was changed --> use Data.show() instead of print(data)
  8. [TH2-5201] Performance improvements have been made to converters:
  9. [TH2-5101] Data.update_metadata() now takes change_type argument (values: update default, change which denotes
    whether to update or overwrite with new values.
  10. [TH2-5099] Fixed slow iteration for Data objects created with many addition operators.

Benchmark.

  • 1mln iterations per test
  • input: 2022-03-05T23:56:44.123456789Z
Converter Method Before (seconds) After (seconds) Improvement (rate)
DatetimeStringConverter parse_timestamp 7.1721964 1.4974268 x4.78
to_datetime 8.9945099 0.1266325 x71.02
to_seconds 8.6180093 1.5360991 x5.62
to_microseconds 7.9066440 1.7628856 x4.48
to_nanoseconds 7.6787507 1.7114960 x4.48
to_milliseconds 7.6059985 1.7688387 x4.29
...
Read more

v1.3.1

27 Feb 08:45
4b06ac5
Compare
Choose a tag to compare

Improvements

  1. Updated Wheel to ~0.38
  2. Added vulnerabilities scanning

v1.3.0

24 Nov 09:04
febcf86
Compare
Choose a tag to compare

User impact and migration instructions

This release implements performance bug fixes and provides Data object cache file saving and loading.

  1. [I] Logging were removed from library. Only special builds will have logging.
    User cannot use add_stderr_logger and add_file_logger logging functions.
    [M] Remove DS lib logging usage anywhere.

  2. [I] Since v1.3.0, the library doesn't provide data source dependencies.

    [M] You should provide it manually during installation.
    You just need to add square brackets after library name and put dependency name.

    pip install th2-data-services[dependency_name]
    

    Dependencies list

    dependency name provider version
    RDP5 5
    RDP6 6

    Example

    pip install th2-data-services[rdp5]
    

Features

  1. [TH2-4289] Data.build_cache and Data.from_cache_file features were added.
  2. Added Data.cache_status property

Improvements

  1. [TH2-4379] Speed improvements in json deserialization.
    • StreamingSSEAdapter will now handle bytes from sse-stream into Dict objects.
    • SSEAdapter is now deprecated class.
  2. Data object will generate a warning if you put to it an object that has generator type.

BugFixes

  1. [TH2-4385] Logging in Data object slows down the ds library very much.
    • Logging was removed.
    • add_stderr_logger and add_file_logger are not available anymore.
  2. [TH2-4380] Fixed apply_adpater feature for GetMessages / GetEvents / GetEventById / GetMessageById
  3. [TH2-3767] Fixed bug with limit of Data object in Windows.
  4. [TH2-4460] Fixed bug where GRPC omitted fields with None value in response.

v1.2.3

26 Oct 14:55
bb79ca8
Compare
Choose a tag to compare

BugFixes

  1. [TH2-4234] The library can now be run on Windows.

v0.6.3

15 Sep 08:19
a1ea42f
Compare
Choose a tag to compare

BugFixes

  1. [TH2-3168] Fixed iterations in nested loops for Data object with limit.
  2. [TH2-3336] Url now uses the utf-8 encoding.
  3. [TH2-3700] Filter iterate values only once - fixed.

v1.2.2

13 Sep 14:26
df3e2a4
Compare
Choose a tag to compare

BugFixes

  1. [TH2-4195] EventsTree without parent raises EventIdNotInTree exception when trying to use get_parent() method

v1.2.1

14 Sep 07:40
df3e2a4
Compare
Choose a tag to compare

BugFixes

  1. Added missing library importlib_metadata

v1.2.0

05 Sep 07:49
e0faab7
Compare
Choose a tag to compare

v1.2.0

User impact and migration instructions

This release implements rdp v6 support that requres new grpc version. It means you cannot connect to rdp5.grpc and rdp6.grpc via the same environment. This DS lib version will have grpc version for rdp v6 == th2-grpc-data-provider v1.1.0.

  1. [I] The new version of grpc has been added.
    [M] If you require the rdp v6 version of the interface, you do not need to do anything.
    Otherwise, you need to reinstall th2-grpc-data-provider lib to the required one for your rdp.

More detail in here

Features

  1. [TH2-3083] The problem with several versions of the grpc interface is solved.
  2. [TH2-3512] Provider V6 module is developed.
  3. [TH2-4141] Option to disable ssl certificate for rdp5 is added
  4. [TH2-4098] Added Streams class for the param 'stream'.

BugFixes

  1. [TH2-4072] Now ETC doesn't raise a warning for missing detached_events.
  2. GRPC requests (start_timestamp, end_timestamp) are now made in UTC.

v1.1.1

03 Aug 09:31
fee0cea
Compare
Choose a tag to compare

BugFixes

  1. [TH2-4039] An empty filter is validated.

v1.1.0

20 Jul 08:40
2998cc0
Compare
Choose a tag to compare

User impact and migration instructions

This release is not required any additional steps to use.

Features

  1. [TH2-3497] EventsTreeCollection got get_leaves_iter method.
  2. [TH2-3497] EventsTreeCollection got len_trees and len_detached_events properties.
  3. [TH2-3497] EventsTree and EventsTreeCollection got representation(__repr__) and summary methods.
  4. [TH2-3558] Added module-level functions add_stderr_logger and add_file_logger to easily enable logging.
  5. [TH2-3546][TH2-3583] INTERACTIVE_MODE - global parameter was introduced.
  6. Data.use_cache() <- True by default.
  7. Added data methods to get cache files paths Data.get_cache_filepath() and Data.get_pending_cache_filepath().
  8. [TH2-3665] Added method get_tree_by_id in ETC.
  9. [TH2-3592] Added logging in EventsTreeCollection module when ETC create with detached events.
  10. [TH2-3475] Implement Data objects joining
  11. [TH2-3467] Added utils classes to convert timestamps.
  12. [TH2-3662][TH2-3492] Added get_detached_events_iter and get_detached_events methods in EventsTreeCollections.
    • Warning: Property detached_events is deprecated and will be removed in the future.
  13. [TH2-3496] Added get_parentless_tree_collection method in EventsTreeCollection.
  14. [TH2-3905] Separate filer classes added instead of th2_data_services.Filter class.
    • Warning: Class th2_data_services.Filter is deprecated and will be removed in the future.

Improvements

  1. [TH2-3003] Added automatic attachment of example.py code in readme.md.
  2. [TH2-3558] Added more debug info about Data cache using.
  3. [TH2-3389] GetXById http-provider command handles 404 error status instead of JsonDecodeException.
  4. [TH2-3663] Speed up len_detached_events property

BugFixes

  1. [TH2-3557][TH2-3560] Parent Data cache file will be created if you iterate a child Data object now.
  2. [TH2-3545][TH2-3580] The Data object now uses an absolute path, so it doesn't lose its cache file if you change the working directory.
  3. [TH2-3546][TH2-3583] Data cache file will not be removed if you use INTERACTIVE_MODE and the file is being read.
  4. [TH2-3487][TH2-3585] data = Data(source_data, cache=True).map(func) Data object didn't write the cache in such case before. Fixed.
  5. [TH2-3558] Used loggers name fixed. Changed to name.
  6. [TH2-3733] Provider API class generate standard URL (without duplicate '/' and '/' before query)
  7. [TH2-3598] Method get_subtree returns tree as EventsTree class.
  8. [TH2-3593][TH2-3664] Method get_root_by_id returns root by any non-root ID as Th2Event.
  9. [TH2-3595] When ETC creates subtree or itself ETC doesn't copy incoming data-stream.
  10. [TH2-3732] Log message in http.GetMessages contains name of the stream.
  11. [TH2-3734] EventsTreeCollection append_event method doesn't add duplicate event.
  12. [TH2-3596][TH2-3594][TH2-3473] EventsTreeCollections. Get or find methods includes parentless results, if parentless exists.