Skip to content

Commit

Permalink
WhyLabsWriter refactor (#1484)
Browse files Browse the repository at this point in the history
## Description

`Writable` represents output to be serialized by whylogs. It will write
itself to 1 or more temporary files. A `Writer` takes the temporary
file(s) and sends them to their intended destination. The interface
allows any `Writer` to handle any `Writable`, but in practice some
`Writer`s are only interested in certain types of `Writable`s.

`WhyLabsWriter` acts as the main entry point for sending data to
WhyLabs. It makes use of a new `WhyLabsClient` interface for interacting
with WhyLabs REST APIs. `WhyLabsWriter` has some deprecated methods that
duplicate `WhyLabsClient` functionality the sake of backwards
compatibility. `WhyLabsWriter` delegates to the
`WhyLabsTransactionWriter`, `WhyLabsBatchWriter`, and
`WhyLabsReferenceWriter` classes according to the
`WhyLabsWriter::write()` use case. The 3 classes correspond to the 3
REST API endpoints for uploading profiles (transaction, batch/async, and
reference).

`WhyLabsWriterBase` implements a few utility methods shared by the
various `WhyLabs*Writer` classes. In particular,
`_prepare_view_for_upload()` handles processing required before
uploading a profile (uncompounding, custom performance metric tagging).
`_send_writable_to_whylabs()` serializes a profile in either V0 or V1
format and uploads it to WhyLabs. `_upload_view()` is a convenience
method that just calls `_prepare_view_for_upload()` then
`_send_writable_to_whylabs()`. `WhyLabsWriter::write()` accepts a
variety of data structures representing a profile: `ViewResultSet`,
`ProfileResultSet`, `SegmentedResultSet`, `DatasetProfile`,
`DatasetProfileView`, and `SegmentedDatasetProfileView`.
`WhyLabsWriterBase::_get_view_of_writable()` converts all of those
except `SegmentedResultSet` to either `DatasetProfileView` or
`SegmentedDatasetProfileView`, which represent a single profile/segment
to be uploaded to WhyLabs. The `WhyLabs*Writer` classes generally
iterate over `SegmentedResultSet` uploading each segment.

* Transactions do not support zipped profiles or reference profiles
* Segmented batch or reference profiles can be zipped by adding
`zip=True` argument to `write()`

## Changes

TODO: describe API changes

## Related

[zipped batch profiles](https://app.clickup.com/t/86azmk2t6)

- [ ] I have reviewed the [Guidelines for Contributing](CONTRIBUTING.md)
and the [Code of Conduct](CODE_OF_CONDUCT.md).

---------

Co-authored-by: Richard Rogers <[email protected]>
  • Loading branch information
richard-rogers and Richard Rogers authored May 13, 2024
1 parent 3a777f4 commit 27c8a7e
Show file tree
Hide file tree
Showing 47 changed files with 3,748 additions and 2,587 deletions.
2 changes: 1 addition & 1 deletion python/.bumpversion.cfg
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
[bumpversion]
current_version = 1.3.32
current_version = 1.4.0
tag = False
parse = (?P<major>\d+)\.(?P<minor>\d+)\.(?P<patch>\d+)(\-(?P<release>[a-z]+)(?P<build>\d+))?
serialize =
Expand Down
2 changes: 1 addition & 1 deletion python/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ src.proto.dir := ../proto/src
src.proto := $(shell find $(src.proto.dir) -type f -name "*.proto")
src.proto.v0.dir := ../proto/v0
src.proto.v0 := $(shell find $(src.proto.v0.dir) -type f -name "*.proto")
version := 1.3.32
version := 1.4.0

dist.dir := dist
egg.dir := .eggs
Expand Down
2 changes: 1 addition & 1 deletion python/docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
print("Pandoc is required to build our documentation.")
sys.exit(1)

version = "1.3.32"
version = "1.4.0"

project = "whylogs"
author = "whylogs developers"
Expand Down
11 changes: 6 additions & 5 deletions python/examples/advanced/Transaction_Examples.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,8 @@
"from whylabs_client.api.transactions_api import TransactionsApi\n",
"from whylogs.core.schema import DatasetSchema\n",
"from whylogs.core.segmentation_partition import segment_on_column\n",
"from whylogs.api.writer.whylabs import WhyLabsWriter, WhyLabsTransaction\n",
"from whylogs.api.writer.whylabs import WhyLabsWriter\n",
"from whylogs.api.writer.whylabs_transaction_writer import WhyLabsTransactionWrirter\n",
"import os\n",
"from uuid import uuid4\n",
"from whylogs.datasets import Ecommerce\n",
Expand Down Expand Up @@ -593,7 +594,7 @@
"cell_type": "markdown",
"source": [
"## Uploading multiple profiles with the same batch timestamp\n",
"The `WhyLabsTransaction` context manager can simplify error handling."
"The `WhyLabsTransactionWriter` can be used as a context manager to simplify transaction error handling and ensure `commit_transaction()` is called."
],
"metadata": {
"id": "K7hEBIyuzXh0"
Expand Down Expand Up @@ -630,7 +631,7 @@
"cell_type": "code",
"source": [
"try:\n",
" with WhyLabsTransaction(writer):\n",
" with WhyLabsTransactionWriter() as writer:\n",
" print(\"Started transaction\")\n",
" for i in range(5):\n",
" batch_df = list_daily_batches[i].data[columns]\n",
Expand Down Expand Up @@ -670,7 +671,7 @@
{
"cell_type": "markdown",
"source": [
"If a `write()` call returns a `False` status, the profile will not be included in the transaction. You might want to retry writing it. If not, that profile will be left out of the transaction, but those successfully written will still be included."
"If a `write()` call during the transaction fails (returns a `False` status), the transaction's commit will fail raising an exception."
],
"metadata": {
"id": "PaAQy-RDftXU"
Expand All @@ -692,7 +693,7 @@
"source": [
"schema = DatasetSchema(segments=segment_on_column(\"output_discount\"))\n",
"profile = why.log(df, schema=schema)\n",
"with WhyLabsTransaction(writer):\n",
"with WhyLabsTransactionWriter() as writer:\n",
" status, id = writer.write(profile)\n",
"\n",
"print(f\"{status} {id}\")"
Expand Down
4 changes: 2 additions & 2 deletions python/examples/basic/Getting_Started.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@
"outputs": [],
"source": [
"# Note: you may need to restart the kernel to use updated packages.\n",
"%pip install whylogs"
"%pip install whylogs==1.3.29.dev0"
]
},
{
Expand Down Expand Up @@ -394,7 +394,7 @@
"metadata": {},
"outputs": [],
"source": [
"why.write(profile,\"profile.bin\")"
"why.write(profile, \"profile.bin\")"
]
},
{
Expand Down
4 changes: 2 additions & 2 deletions python/examples/integrations/Feature_Stores_and_whylogs.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -3183,7 +3183,7 @@
" # If new request is from the next day, close logger, save profile in-memory and start logger for the next day\n",
" if request_timestamp.day > day_to_log.day:\n",
" # let's write our profiles to whylogs_output folder\n",
" why.write(profile,os.path.join(\"whylogs_output\",\"profile_{}_{}_{}.bin\".format(day_to_log.day,day_to_log.month,day_to_log.year)))\n",
" why.write(profile,\"whylogs_output\",\"profile_{}_{}_{}.bin\".format(day_to_log.day,day_to_log.month,day_to_log.year))\n",
" day_to_log = request_timestamp.replace(hour=0, minute=0, second=0, microsecond=0)\n",
" print(\"Starting logger for day {}....\".format(day_to_log))\n",
" profile = None\n",
Expand Down Expand Up @@ -3221,7 +3221,7 @@
" else:\n",
" profile.track(assembled_feature_vector)\n",
" \n",
"why.write(profile,os.path.join(\"whylogs_output\",\"profile_{}_{}_{}.bin\".format(day_to_log.day,day_to_log.month,day_to_log.year)))"
"why.write(profile,\"whylogs_output\",\"profile_{}_{}_{}.bin\".format(day_to_log.day,day_to_log.month,day_to_log.year))"
]
},
{
Expand Down
Loading

0 comments on commit 27c8a7e

Please sign in to comment.