Add standardized and overridable logging #234

andreped · 2024-02-05T19:58:10Z

This PR adds standardized, overridable logging as referred to in issue #176.

The solution is using native python logging. I only added the simple logging solution and replaced all print statements with the appropriate logging type.

I tested the solution using the Getting_started.ipynb Jupyter Notebook. Logging seems to work as intended.

CC: @justbee007, @zainhoda

andreped · 2024-02-05T20:39:51Z

I realized that setting the log level does not require its own method. You can simply do:

import vanna as vn

vn.logger.get_logger().set_level(level="INFO")

Perhaps we should update the documentations regarding this? Or do we want anything fancier than this.

Perhaps we want to support an API like this instead:

import vanna as vn

vn.get_logger().setLevel(level="INFO")

This is similar to what TensorFlow does, or at least used to do. Looks nicer, IMO.

Or should we go even simpler:

import vanna as vn

vn.setLogLevel(level="INFO")

Maybe call the method something else, like setVerbosity?

Suggestions, @zainhoda?

andreped · 2024-02-05T21:05:39Z

I added that setVerbosity convenience method as discussed above. So now logging verbosity can be easily set by:

import vanna as vn

vn.setVerbository(level="DEBUG")

By default the logging level is "INFO".

NickCrews

Thanks for putting this up! Several suggestions, but this is a good start, thanks for going through and doing the needed boilerplate conversions, figuring out which prints should be .info, .warn, .error, etc.

NickCrews · 2024-02-06T19:25:45Z

src/vanna/logger.py

+    Returns:
+        logging: Logger
+    """
+    return logging.getLogger(__name__)


If inside vanna/mistral/mistral.py, if we call .get_logger(), I would expect to get the logger for vanna.mistral.mistral. But I think this always return the logger for "vanna.logger"?

I think we should just remove this function entirely. Callers inside this library should just call import logging; logger = logging.getLogger(__name__) directly, it is the exact same amount of code.

Application developers should just call logging.getLogger("vanna") directly. We should add this to the docs somewhere too.

Im not sure I follow. I thought the goal of the PR is to have one standardized logger that is fixed for the entire framework?

Have you read the python logging cookbook? in the library code, we log to the vanna.mistral.mistral logger. Then, a user can either configure that logger directly using logging.getLogger("vanna.mistral.mistral"), or they configure ALL of vanna's loggers at the same time with logging.getLogger("vanna")

NickCrews · 2024-02-06T19:31:07Z

src/vanna/logger.py

+    return get_logger()
+
+
+def setVerbosity(level: str = "INFO"):


Drop this, we don't want multiple ways of doing things. Users should call logging.getLogger("vanna").setLevel(level) instead. Should document this.

Hmm, I disagree. Having a convenience method for fetching the logging used throughout the framework is useful. Also for end users. TensorFlow does exactly this, AFAIK

Sorry, I could have suggested this in a nicer way! I think this is more of a matter of taste, I'm curious if anyone else has thoughts. At the least, I would like this to be def set_log_level(level: str) so that there isn't the verbosity/level mismatch

My opinions;

use snake case, like the rest of codebase

setVerbosity currently does only 1 thing (1 line), so this as an interface isn't providing a helpful abstraction for anything. If in the future, this function should take care of other things, like pointing to the correct logger, maybe then it might be justified... but that would be building things for a potential use case, not a right-now use case.

perhaps later, if a real need arises to have more complicated logging (across multiple libraries etc), then maybe a convenience func could be a nice addition. but best wait for that situation to arise.

I've not been following this PR for long, so please take that into consideration with the above, these opinions may be uninformed!

Not much point doing anything here until I hear from @zainhoda, IMO.

We can always sketch some solution that works, but it needs to suit what the Vanna team wants. But I agree with @NickCrews proposals, I am just waiting for instructions.

NickCrews · 2024-02-06T19:34:14Z

src/vanna/__init__.py

@@ -184,13 +185,15 @@
 )
 from .utils import sanitize_model_name, validate_config_path

+_logger = initialize_logger()


you should not configure the output of the logger at the global level. Then as soon as someone inports this module, the logger will start printing to stdout. Instead, configure it as one of the first things in the actual main() function.

Also, if this is the only place where initialize_logger() is used, it doesn't need to be in that distant module, but just define it directly here.

Then as soon as someone inports this module, the logger will start printing to stdout

I thought the goal was to standardize all prints to the new logging solution? If you import vanna using my fix, all prints will follow this new standard. You are also free to change it yourself from outside the framework, if you'd like, using vn.setVerbosity("DEBUG"). That sounded like behaviour that we would like, no?

hmm I see what you're saying. I think it just comes down to what we want as the default:

default to not configuring the logger, and then users opt-in to logging with logging.basicConfig(...) or the more finetipped logging.getLogger("vanna").do_stuff(...).

default to configuring the logger, and have users opt-out.

Currently the behavior is default-on. But arguably the reason for that is just because we can't turn print() off. Some libraries go for default on (eg splink), others go for default off, flask is more nuanced. I prefer quietness, so I prefer to not do anything in library code, and let the end app developer configure logging. But I see the pros/cons of all 3.

hmm, this is good reading: https://docs.python.org/3/howto/logging.html#configuring-logging-for-a-library

quote from there:

Note It is strongly advised that you do not add any handlers other than NullHandler to your library’s loggers. This is because the configuration of handlers is the prerogative of the application developer who uses your library. The application developer knows their target audience and what handlers are most appropriate for their application: if you add handlers ‘under the hood’, you might well interfere with their ability to carry out unit tests and deliver logs which suit their requirements.

So at least I think we should lean on the conservative side of adding handlers, maybe doing something similar to flask?

For all CLI/main entrypoints of this lib, we should configure logging of course, so behavior will stay similar to today. But if I'm using vanna as a lib, this is when behavior would change.

Hmm, I see what you mean. It was a little unclear to me what the you wanted to solution to look like. But still, we do not want print() inside the framework, as we cannot mute these. A good example is that when training Vanna, the documents and SQLs will be printed directly in the console. This is unfortunate for production use.

I'm fine with removing the _logger solution given the new insight, but I guess I could revert to just doing logging.info() (and similar) and then it is up to the end user to change the logging behaviour.

I can make an attempt at this later today. A bit busy atm.

Really appreciate the work here. I am rereading my review and it sounds a bit harsh, I could have been more graceful, so I'm sorry if I came off as a jerk.

Maybe wait for @zainhoda to chime in before you go through the rewrite? They might have a different idea totally for what they want the result to look like.

Really appreciate the work here. I am rereading my review and it sounds a bit harsh, I could have been more graceful, so I'm sorry if I came off as a jerk.

@NickCrews No worries! I figured it was late in the day ;) Have done some open-source and learned to take some criticism on my bad ideas and code suggestions. So its all good!

Maybe wait for @zainhoda to chime in before you go through the rewrite? They might have a different idea totally for what they want the result to look like.

Yeah, that sounds like a good idea. I am in no rush. Just ping me if I fail to see an update :]

But it is a little unfortunate that when training Vanna, the training data is printed directly in the console. Not ideal for our production use case...

zainhoda · 2024-02-16T16:37:27Z

@NickCrews @andreped my apologies for the delay here. I haven’t yet had a chance to keep up with this PR and the thread. I’ll review this weekend and come back with thoughts. Thank you so much!

zainhoda · 2024-02-21T15:41:24Z

@NickCrews @andreped thank you so much for your contributions here. If you don't mind, I would like to put a temporary "hold" on this feature while we define it a little more. I'm going to take the suggestions you've come up with and use that as a basis for how we implement this. I'll submit an independent PR for this but I'll make sure you're in the loop.

To give you a sense of what I'm thinking:

We're moving to a model where vn is always an instantiated object that inherits (directly or indirectly) from the VannaBase class
Behavior customizations to Vanna are done by overriding certain methods or potentially adding new methods to the class
I think logging should function in a similar manner where the default option is to print the output and the user can override the method to do "something else" with the logs
A couple of the options for the logs should be sending them to a database as well as sending them to an optional Vanna endpoint. I think this is particularly helpful for reviewing the history of fully hydrated prompts. To do this well, I'll be looking to add some additional parameters to the logging function so that these fields can be routed to a database. A few fields that come to mind are timestamp, question_id, method in addition to the log level and the actual contents of the log.

I might also need to add some functions to retrieve the logs although I'll give that some more thought. Basically I want to be able to optionally support viewing the logs associated with a particular question within the UI

andreped · 2024-02-21T16:30:22Z

thank you so much for your contributions here. If you don't mind, I would like to put a temporary "hold" on this feature while we define it a little more. I'm going to take the suggestions you've come up with and use that as a basis for how we implement this. I'll submit an independent PR for this but I'll make sure you're in the loop.

@zainhoda Sounds good. Your argument is sound. Feel free to close this PR and tag it when you have opened a separate PR :]

andreped · 2024-03-18T11:50:05Z

@zainhoda I believe there has been done some work on this already, which have already been merged. The original issue will remain open till the issue is entirely closed, but I am closing this PR, as this proposal will not be used.

andreped added 4 commits February 5, 2024 20:35

Added logging convenience methods

6ce66ef

Replaced stdout python print with new logging solution

7ab403e

Fix streamhandler in logging; init logger in vanna __init__

98f6f59

Added documentation to logger

7fceb31

andreped marked this pull request as ready for review February 5, 2024 20:32

Removed redundant set_log_level method

be22354

andreped changed the title ~~Logging~~ Add standardised and overridable logging Feb 5, 2024

andreped mentioned this pull request Feb 5, 2024

Add an overridable logging function #176

Open

andreped added 2 commits February 5, 2024 21:57

Added setVerbosity convenience method in logger

b19ef11

Use str input for setVerbosity instead of int

10c3590

andreped added 2 commits February 5, 2024 22:42

Fixed _logger.info

41b5220

Minor _logger.info multi-argument fixes; typo fixes; linting

57eee60

andreped changed the title ~~Add standardised and overridable logging~~ Add standardized and overridable logging Feb 5, 2024

NickCrews suggested changes Feb 6, 2024

View reviewed changes

andreped closed this Mar 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add standardized and overridable logging #234

Add standardized and overridable logging #234

andreped commented Feb 5, 2024 •

edited

Loading

andreped commented Feb 5, 2024 •

edited

Loading

andreped commented Feb 5, 2024 •

edited

Loading

NickCrews left a comment

NickCrews Feb 6, 2024

NickCrews Feb 6, 2024

andreped Feb 6, 2024

NickCrews Feb 6, 2024

NickCrews Feb 6, 2024

andreped Feb 6, 2024

NickCrews Feb 6, 2024

samoliverschumacher Feb 16, 2024

andreped Feb 16, 2024

NickCrews Feb 6, 2024

andreped Feb 6, 2024 •

edited

Loading

NickCrews Feb 6, 2024

NickCrews Feb 6, 2024

NickCrews Feb 6, 2024

NickCrews Feb 6, 2024

andreped Feb 7, 2024

NickCrews Feb 7, 2024

andreped Feb 7, 2024 •

edited

Loading

zainhoda commented Feb 16, 2024

zainhoda commented Feb 21, 2024

andreped commented Feb 21, 2024 •

edited

Loading

andreped commented Mar 18, 2024

Add standardized and overridable logging #234

Add standardized and overridable logging #234

Conversation

andreped commented Feb 5, 2024 • edited Loading

andreped commented Feb 5, 2024 • edited Loading

andreped commented Feb 5, 2024 • edited Loading

NickCrews left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andreped Feb 6, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andreped Feb 7, 2024 • edited Loading

Choose a reason for hiding this comment

zainhoda commented Feb 16, 2024

zainhoda commented Feb 21, 2024

andreped commented Feb 21, 2024 • edited Loading

andreped commented Mar 18, 2024

andreped commented Feb 5, 2024 •

edited

Loading

andreped commented Feb 5, 2024 •

edited

Loading

andreped commented Feb 5, 2024 •

edited

Loading

andreped Feb 6, 2024 •

edited

Loading

andreped Feb 7, 2024 •

edited

Loading

andreped commented Feb 21, 2024 •

edited

Loading