Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add standardized and overridable logging #234

Closed
wants to merge 9 commits into from

Conversation

andreped
Copy link
Contributor

@andreped andreped commented Feb 5, 2024

This PR adds standardized, overridable logging as referred to in issue #176.

The solution is using native python logging. I only added the simple logging solution and replaced all print statements with the appropriate logging type.

I tested the solution using the Getting_started.ipynb Jupyter Notebook. Logging seems to work as intended.

CC: @justbee007, @zainhoda

@andreped andreped marked this pull request as ready for review February 5, 2024 20:32
@andreped
Copy link
Contributor Author

andreped commented Feb 5, 2024

I realized that setting the log level does not require its own method. You can simply do:

import vanna as vn

vn.logger.get_logger().set_level(level="INFO")

Perhaps we should update the documentations regarding this? Or do we want anything fancier than this.

Perhaps we want to support an API like this instead:

import vanna as vn

vn.get_logger().setLevel(level="INFO")

This is similar to what TensorFlow does, or at least used to do. Looks nicer, IMO.

Or should we go even simpler:

import vanna as vn

vn.setLogLevel(level="INFO")

Maybe call the method something else, like setVerbosity?

Suggestions, @zainhoda?

@andreped andreped changed the title Logging Add standardised and overridable logging Feb 5, 2024
@andreped
Copy link
Contributor Author

andreped commented Feb 5, 2024

I added that setVerbosity convenience method as discussed above. So now logging verbosity can be easily set by:

import vanna as vn

vn.setVerbository(level="DEBUG")

By default the logging level is "INFO".

@andreped andreped changed the title Add standardised and overridable logging Add standardized and overridable logging Feb 5, 2024
Copy link

@NickCrews NickCrews left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for putting this up! Several suggestions, but this is a good start, thanks for going through and doing the needed boilerplate conversions, figuring out which prints should be .info, .warn, .error, etc.

Returns:
logging: Logger
"""
return logging.getLogger(__name__)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If inside vanna/mistral/mistral.py, if we call .get_logger(), I would expect to get the logger for vanna.mistral.mistral. But I think this always return the logger for "vanna.logger"?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should just remove this function entirely. Callers inside this library should just call import logging; logger = logging.getLogger(__name__) directly, it is the exact same amount of code.

Application developers should just call logging.getLogger("vanna") directly. We should add this to the docs somewhere too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Im not sure I follow. I thought the goal of the PR is to have one standardized logger that is fixed for the entire framework?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you read the python logging cookbook? in the library code, we log to the vanna.mistral.mistral logger. Then, a user can either configure that logger directly using logging.getLogger("vanna.mistral.mistral"), or they configure ALL of vanna's loggers at the same time with logging.getLogger("vanna")

return get_logger()


def setVerbosity(level: str = "INFO"):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drop this, we don't want multiple ways of doing things. Users should call logging.getLogger("vanna").setLevel(level) instead. Should document this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I disagree. Having a convenience method for fetching the logging used throughout the framework is useful. Also for end users. TensorFlow does exactly this, AFAIK

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I could have suggested this in a nicer way! I think this is more of a matter of taste, I'm curious if anyone else has thoughts. At the least, I would like this to be def set_log_level(level: str) so that there isn't the verbosity/level mismatch

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My opinions;

  • use snake case, like the rest of codebase
  • setVerbosity currently does only 1 thing (1 line), so this as an interface isn't providing a helpful abstraction for anything. If in the future, this function should take care of other things, like pointing to the correct logger, maybe then it might be justified... but that would be building things for a potential use case, not a right-now use case.
  • perhaps later, if a real need arises to have more complicated logging (across multiple libraries etc), then maybe a convenience func could be a nice addition. but best wait for that situation to arise.

I've not been following this PR for long, so please take that into consideration with the above, these opinions may be uninformed!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not much point doing anything here until I hear from @zainhoda, IMO.

We can always sketch some solution that works, but it needs to suit what the Vanna team wants. But I agree with @NickCrews proposals, I am just waiting for instructions.

@@ -184,13 +185,15 @@
)
from .utils import sanitize_model_name, validate_config_path

_logger = initialize_logger()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you should not configure the output of the logger at the global level. Then as soon as someone inports this module, the logger will start printing to stdout. Instead, configure it as one of the first things in the actual main() function.

Also, if this is the only place where initialize_logger() is used, it doesn't need to be in that distant module, but just define it directly here.

Copy link
Contributor Author

@andreped andreped Feb 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then as soon as someone inports this module, the logger will start printing to stdout

I thought the goal was to standardize all prints to the new logging solution? If you import vanna using my fix, all prints will follow this new standard. You are also free to change it yourself from outside the framework, if you'd like, using vn.setVerbosity("DEBUG"). That sounded like behaviour that we would like, no?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm I see what you're saying. I think it just comes down to what we want as the default:

  • default to not configuring the logger, and then users opt-in to logging with logging.basicConfig(...) or the more finetipped logging.getLogger("vanna").do_stuff(...).
  • default to configuring the logger, and have users opt-out.

Currently the behavior is default-on. But arguably the reason for that is just because we can't turn print() off. Some libraries go for default on (eg splink), others go for default off, flask is more nuanced. I prefer quietness, so I prefer to not do anything in library code, and let the end app developer configure logging. But I see the pros/cons of all 3.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

quote from there:

Note It is strongly advised that you do not add any handlers other than NullHandler to your library’s loggers. This is because the configuration of handlers is the prerogative of the application developer who uses your library. The application developer knows their target audience and what handlers are most appropriate for their application: if you add handlers ‘under the hood’, you might well interfere with their ability to carry out unit tests and deliver logs which suit their requirements.

So at least I think we should lean on the conservative side of adding handlers, maybe doing something similar to flask?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For all CLI/main entrypoints of this lib, we should configure logging of course, so behavior will stay similar to today. But if I'm using vanna as a lib, this is when behavior would change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I see what you mean. It was a little unclear to me what the you wanted to solution to look like. But still, we do not want print() inside the framework, as we cannot mute these. A good example is that when training Vanna, the documents and SQLs will be printed directly in the console. This is unfortunate for production use.

I'm fine with removing the _logger solution given the new insight, but I guess I could revert to just doing logging.info() (and similar) and then it is up to the end user to change the logging behaviour.

I can make an attempt at this later today. A bit busy atm.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really appreciate the work here. I am rereading my review and it sounds a bit harsh, I could have been more graceful, so I'm sorry if I came off as a jerk.

Maybe wait for @zainhoda to chime in before you go through the rewrite? They might have a different idea totally for what they want the result to look like.

Copy link
Contributor Author

@andreped andreped Feb 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really appreciate the work here. I am rereading my review and it sounds a bit harsh, I could have been more graceful, so I'm sorry if I came off as a jerk.

@NickCrews No worries! I figured it was late in the day ;) Have done some open-source and learned to take some criticism on my bad ideas and code suggestions. So its all good!

Maybe wait for @zainhoda to chime in before you go through the rewrite? They might have a different idea totally for what they want the result to look like.

Yeah, that sounds like a good idea. I am in no rush. Just ping me if I fail to see an update :]

But it is a little unfortunate that when training Vanna, the training data is printed directly in the console. Not ideal for our production use case...

@zainhoda
Copy link
Contributor

@NickCrews @andreped my apologies for the delay here. I haven’t yet had a chance to keep up with this PR and the thread. I’ll review this weekend and come back with thoughts. Thank you so much!

@zainhoda
Copy link
Contributor

@NickCrews @andreped thank you so much for your contributions here. If you don't mind, I would like to put a temporary "hold" on this feature while we define it a little more. I'm going to take the suggestions you've come up with and use that as a basis for how we implement this. I'll submit an independent PR for this but I'll make sure you're in the loop.

To give you a sense of what I'm thinking:

  • We're moving to a model where vn is always an instantiated object that inherits (directly or indirectly) from the VannaBase class
  • Behavior customizations to Vanna are done by overriding certain methods or potentially adding new methods to the class
  • I think logging should function in a similar manner where the default option is to print the output and the user can override the method to do "something else" with the logs
  • A couple of the options for the logs should be sending them to a database as well as sending them to an optional Vanna endpoint. I think this is particularly helpful for reviewing the history of fully hydrated prompts. To do this well, I'll be looking to add some additional parameters to the logging function so that these fields can be routed to a database. A few fields that come to mind are timestamp, question_id, method in addition to the log level and the actual contents of the log.

I might also need to add some functions to retrieve the logs although I'll give that some more thought. Basically I want to be able to optionally support viewing the logs associated with a particular question within the UI

@andreped
Copy link
Contributor Author

andreped commented Feb 21, 2024

thank you so much for your contributions here. If you don't mind, I would like to put a temporary "hold" on this feature while we define it a little more. I'm going to take the suggestions you've come up with and use that as a basis for how we implement this. I'll submit an independent PR for this but I'll make sure you're in the loop.

@zainhoda Sounds good. Your argument is sound. Feel free to close this PR and tag it when you have opened a separate PR :]

@andreped
Copy link
Contributor Author

@zainhoda I believe there has been done some work on this already, which have already been merged. The original issue will remain open till the issue is entirely closed, but I am closing this PR, as this proposal will not be used.

@andreped andreped closed this Mar 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants