Merge pull request #45 from cvs-health/release-branch/v0.2.0

Release PR: v0.2.0
cvs-health · Nov 21, 2024 · 679464c · 679464c
2 parents 9cbfc71 + d634800
commit 679464c
Show file tree

Hide file tree

Showing 34 changed files with 1,751 additions and 1,453 deletions.
diff --git a/assets/images/autoeval_process.png b/assets/images/autoeval_process.png
diff --git a/examples/evaluations/text_generation/auto_eval_demo.ipynb b/examples/evaluations/text_generation/auto_eval_demo.ipynb
@@ -39,7 +39,7 @@
      "name": "stderr",
      "output_type": "stream",
      "text": [
-      "/opt/conda/envs/brand-new3/lib/python3.9/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
+      "/opt/conda/envs/langchain/lib/python3.11/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n",
       "  from .autonotebook import tqdm as notebook_tqdm\n"
      ]
     }
@@ -54,6 +54,7 @@
     "\n",
     "import pandas as pd\n",
     "from dotenv import find_dotenv, load_dotenv\n",
+    "from langchain_core.rate_limiters import InMemoryRateLimiter\n",
     "\n",
     "from langfair.auto import AutoEval\n",
     "\n",
@@ -152,14 +153,14 @@
     "- `responses` - (**list of strings, default=None**)\n",
     "A list of generated output from an LLM. If not available, responses are computed using the model.\n",
     "- `langchain_llm` (**langchain llm (Runnable), default=None**) A langchain llm object to get passed to LLMChain `llm` argument. \n",
-    "- `max_calls_per_min` (**int, default=None**) Specifies how many api calls to make per minute to avoid a rate limit error. By default, no limit is specified.\n",
     "- `suppressed_exceptions` (**tuple, default=None**) Specifies which exceptions to handle as 'Unable to get response' rather than raising the exception\n",
     "- `metrics` - (**dict or list of str, default is all metrics**)\n",
     "Specifies which metrics to evaluate.\n",
     "- `toxicity_device` - (**str or torch.device input or torch.device object, default=\"cpu\"**)\n",
     "Specifies the device that toxicity classifiers use for prediction. Set to \"cuda\" for classifiers to be able to leverage the GPU. Currently, 'detoxify_unbiased' and 'detoxify_original' will use this parameter.\n",
     "- `neutralize_tokens` - (**bool, default=True**)\n",
     "An indicator attribute to use masking for the computation of Blue and RougeL metrics. If True, counterfactual responses are masked using `CounterfactualGenerator.neutralize_tokens` method before computing the aforementioned metrics.\n",
+    "- `max_calls_per_min` (**Deprecated as of 0.2.0**) Use LangChain's InMemoryRateLimiter instead.\n",
     "\n",
     "**Class Methods:**\n",
     "1. `evaluate` - Compute supported metrics.\n",
@@ -181,9 +182,32 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "Below we use LangFair's `AutoEval` class to conduct a comprehensive bias and fairness assessment for our text generation/summarization use case. To instantiate the `AutoEval` class, provide prompts and LangChain LLM object. We provide two examples of LangChain LLMs below, but these can be replaced with a LangChain LLM of your choice.\n",
+    "Below we use LangFair's `AutoEval` class to conduct a comprehensive bias and fairness assessment for our text generation/summarization use case. To instantiate the `AutoEval` class, provide prompts and LangChain LLM object. \n",
     "\n",
-    "**Important:** When installing community packages for LangChain, please ensure that the package version is compatible with `langchain<0.2.0`. Incompatibility may lead to unexpected errors or issues."
+    "**Important note: We provide three examples of LangChain LLMs below, but these can be replaced with a LangChain LLM of your choice.**"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "# Use LangChain's InMemoryRateLimiter to avoid rate limit errors. Adjust parameters as necessary.\n",
+    "rate_limiter = InMemoryRateLimiter(\n",
+    "    requests_per_second=10, \n",
+    "    check_every_n_seconds=10, \n",
+    "    max_bucket_size=1000,  \n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "###### Example 1: Gemini Pro with VertexAI"
    ]
   },
   {
@@ -194,13 +218,52 @@
    },
    "outputs": [],
    "source": [
-    "# # Run if langchain-google-vertexai not installed (must be compatible with langchain<0.2.0). Note: kernel restart may be required.\n",
+    "# # Run if langchain-google-vertexai not installed. Note: kernel restart may be required.\n",
     "# import sys\n",
-    "# !{sys.executable} -m pip install langchain-google-vertexai==0.1.3\n",
+    "# !{sys.executable} -m pip install langchain-google-vertexai\n",
     "\n",
-    "# # Example with Gemini-Pro on VertexAI\n",
     "# from langchain_google_vertexai import VertexAI\n",
-    "# llm = VertexAI(model_name='gemini-pro', temperature=1)"
+    "# llm = VertexAI(model_name='gemini-pro', temperature=1, rate_limiter=rate_limiter)\n",
+    "\n",
+    "# # Define exceptions to suppress\n",
+    "# suppressed_exceptions = (IndexError, ) # suppresses error when gemini refuses to answer"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "###### Example 2: Mistral AI"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "# # Run if langchain-mistralai not installed. Note: kernel restart may be required.\n",
+    "# import sys\n",
+    "# !{sys.executable} -m pip install langchain-mistralai\n",
+    "\n",
+    "# os.environ[\"MISTRAL_API_KEY\"] = os.getenv('M_KEY')\n",
+    "# from langchain_mistralai import ChatMistralAI\n",
+    "\n",
+    "# llm = ChatMistralAI(\n",
+    "#     model=\"mistral-large-latest\",\n",
+    "#     temperature=1,\n",
+    "#     rate_limiter=rate_limiter\n",
+    "# )\n",
+    "# suppressed_exceptions = None"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "###### Example 3: OpenAI on Azure"
    ]
   },
   {
@@ -211,11 +274,10 @@
    },
    "outputs": [],
    "source": [
-    "# Run if langchain-openai not installed (must be compatible with langchain<0.2.0)\n",
+    "# # Run if langchain-openai not installed\n",
     "# import sys\n",
-    "# !{sys.executable} -m pip install langchain-openai==0.1.6\n",
+    "# !{sys.executable} -m pip install langchain-openai\n",
     "\n",
-    "# Example with AzureChatOpenAI\n",
     "import openai\n",
     "from langchain_openai import AzureChatOpenAI\n",
     "llm = AzureChatOpenAI(\n",
@@ -224,8 +286,19 @@
     "    azure_endpoint=API_BASE,\n",
     "    openai_api_type=API_TYPE,\n",
     "    openai_api_version=API_VERSION,\n",
-    "    temperature=1 # User to set temperature\n",
-    ")"
+    "    temperature=1, # User to set temperature\n",
+    "    rate_limiter=rate_limiter\n",
+    ")\n",
+    "\n",
+    "# Define exceptions to suppress\n",
+    "suppressed_exceptions = (openai.BadRequestError, ValueError) # this suppresses content filtering errors"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Instantiate `AutoEval` class"
    ]
   },
   {
@@ -236,14 +309,13 @@
    },
    "outputs": [],
    "source": [
-    "# import torch\n",
-    "# device = torch.device(\"cuda\") # use if GPU is available\n",
-    "auto_object = AutoEval(\n",
+    "# import torch # uncomment if GPU is available\n",
+    "# device = torch.device(\"cuda\") # uncomment if GPU is available\n",
+    "ae = AutoEval(\n",
     "    prompts=prompts, # small sample used as an example; in practice, a bigger sample should be used\n",
     "    langchain_llm=llm,\n",
-    "    suppressed_exceptions=(openai.BadRequestError, ValueError), # this suppresses content filtering errors    \n",
-    "    neutralize_tokens=True\n",
-    "    # toxicity_device=device # use if GPU is available\n",
+    "    suppressed_exceptions=suppressed_exceptions,\n",
+    "    # toxicity_device=device # uncomment if GPU is available\n",
     ")"
    ]
   },
@@ -267,43 +339,42 @@
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "\u001b[1mStep 1: Fairness Through Unawareness\u001b[0m\n",
-      "------------------------------------\n",
-      "langfair: Number of prompts containing race words: 0\n",
-      "- langfair: The prompts satisfy fairness through unawareness for race words, the recommended risk assessment only include Toxicity\n",
-      "langfair: Number of prompts containing gender words: 31\n",
-      "- langfair: The prompts do not satisfy fairness through unawareness for gender words, the recommended risk assessments include Toxicity, Stereotype, and Counterfactual Discrimination.\n",
+      "\u001b[1mStep 1: Fairness Through Unawareness Check\u001b[0m\n",
+      "------------------------------------------\n",
+      "Number of prompts containing race words: 0\n",
+      "Number of prompts containing gender words: 31\n",
+      "Fairness through unawareness is not satisfied. Toxicity, stereotype, and counterfactual fairness assessments will be conducted.\n",
       "\n",
       "\u001b[1mStep 2: Generate Counterfactual Dataset\u001b[0m\n",
       "---------------------------------------\n",
-      "langfair: gender words found in 31 prompts.\n",
+      "Gender words found in 31 prompts.\n",
       "Generating 25 responses for each gender prompt...\n",
-      "langfair: Responses successfully generated!\n",
+      "Responses successfully generated!\n",
       "\n",
       "\u001b[1mStep 3: Generating Model Responses\u001b[0m\n",
       "----------------------------------\n",
-      "langfair: Generating 25 responses per prompt...\n",
-      "langfair: Responses successfully generated!\n",
+      "Generating 25 responses per prompt...\n",
+      "Responses successfully generated!\n",
       "\n",
       "\u001b[1mStep 4: Evaluate Toxicity Metrics\u001b[0m\n",
       "---------------------------------\n",
-      "langfair: Computing toxicity scores...\n",
-      "langfair: Evaluating metrics...\n",
+      "Computing toxicity scores...\n",
+      "Evaluating metrics...\n",
       "\n",
       "\u001b[1mStep 5: Evaluate Stereotype Metrics\u001b[0m\n",
       "-----------------------------------\n",
-      "langfair: Computing stereotype scores...\n",
-      "langfair: Evaluating metrics...\n",
+      "Computing stereotype scores...\n",
+      "Evaluating metrics...\n",
       "\n",
       "\u001b[1mStep 6: Evaluate Counterfactual Metrics\u001b[0m\n",
       "---------------------------------------\n",
-      "langfair: Evaluating metrics...\n"
+      "Evaluating metrics...\n"
      ]
     }
    ],
    "source": [
     "warnings.filterwarnings(\"ignore\")\n",
-    "results = await auto_object.evaluate()"
+    "results = await ae.evaluate()"
    ]
   },
   {
@@ -325,27 +396,27 @@
      "output_type": "stream",
      "text": [
       "\u001b[1m1. Toxicity Assessment\u001b[0m \n",
-      "- Toxic Fraction                           0.0024 \n",
-      "- Expected Maximum Toxicity                0.0131 \n",
+      "- Toxic Fraction                           0.0052 \n",
+      "- Expected Maximum Toxicity                0.0151 \n",
       "- Toxicity Probability                     0.0100 \n",
       "\u001b[1m2. Stereotype Assessment\u001b[0m \n",
-      "- Stereotype Association                   0.3534 \n",
-      "- Cooccurrence Bias                        0.5349 \n",
-      "- Stereotype Fraction - gender             0.1892 \n",
-      "- Expected Maximum Stereotype - gender     0.3604 \n",
-      "- Stereotype Probability - gender          0.5800 \n",
+      "- Stereotype Association                   0.3349 \n",
+      "- Cooccurrence Bias                        0.5677 \n",
+      "- Stereotype Fraction - gender             0.2092 \n",
+      "- Expected Maximum Stereotype - gender     0.4009 \n",
+      "- Stereotype Probability - gender          0.6300 \n",
       "\u001b[1m3. Counterfactual Assessment\u001b[0m \n",
       "                         male-female     \n",
-      "- Cosine Similarity        0.8403          \n",
-      "- RougeL Similarity        0.5065          \n",
-      "- Bleu Similarity          0.2784          \n",
-      "- Sentiment Bias           0.0040          \n",
+      "- Cosine Similarity        0.8675          \n",
+      "- RougeL Similarity        0.5190          \n",
+      "- Bleu Similarity          0.2791          \n",
+      "- Sentiment Bias           0.0022          \n",
       "\n"
      ]
     }
    ],
    "source": [
-    "auto_object.print_results()"
+    "ae.print_results()"
    ]
   },
   {
@@ -363,7 +434,7 @@
    },
    "outputs": [],
    "source": [
-    "auto_object.export_results(file_name=\"final_metrics.txt\")"
+    "ae.export_results(file_name=\"final_metrics.txt\")"
    ]
   },
   {
@@ -463,7 +534,7 @@
     }
    ],
    "source": [
-    "toxicity_data = pd.DataFrame(auto_object.toxicity_data)\n",
+    "toxicity_data = pd.DataFrame(ae.toxicity_data)\n",
     "toxicity_data.sort_values(by='score', ascending=False).head()"
    ]
   },
@@ -564,7 +635,7 @@
     }
    ],
    "source": [
-    "stereotype_data = pd.DataFrame(auto_object.stereotype_data)\n",
+    "stereotype_data = pd.DataFrame(ae.stereotype_data)\n",
     "stereotype_data.sort_values(by='stereotype_score_gender', ascending=False).head()"
    ]
   },
@@ -662,15 +733,15 @@
  ],
  "metadata": {
   "environment": {
-   "kernel": "langfair0.1.2-beta",
+   "kernel": "langchain",
    "name": "workbench-notebooks.m125",
    "type": "gcloud",
    "uri": "us-docker.pkg.dev/deeplearning-platform-release/gcr.io/workbench-notebooks:m125"
   },
   "kernelspec": {
-   "display_name": "langfair0.1.2-beta",
+   "display_name": "langchain",
    "language": "python",
-   "name": "langfair0.1.2-beta"
+   "name": "langchain"
   },
   "language_info": {
    "codemirror_mode": {
@@ -682,7 +753,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.9.20"
+   "version": "3.11.10"
   }
  },
  "nbformat": 4,