Skip to content

Latest commit

 

History

History
53 lines (37 loc) · 3.25 KB

README.md

File metadata and controls

53 lines (37 loc) · 3.25 KB

Datapipes

Learn how to use LLMs and create datasets with simple and reproducible notebooks.

Note

All notebooks checked and updated last February 6, 2025.

I try to execute them once a month to make sure they still work on Google Colab. If you find any bugs or issues, please let me know! I'll try to fix them as soon as possible.

  1. Browse the notebooks.
  2. Open them in Google Colab using the links.
  3. In some notebooks you may need to set API keys or your Hugging Face token in order to interact with LLMs from OpenAI, Anthropic, and Google, or to interact with your huggingface hub.
  4. Run, explore, and modify to suit your needs!

LLM Usage Basics

How to use LLMs as a beginner?

Proprietary models

  1. How to use an OpenAI Chat model
  2. How to use Anthropic Claude model
  3. How to use Google Gemini model

Structured Output

Proprietary models

  1. Structured Output with OpenAI
  2. Structured Output with Anthropic
  3. Structured Output with Google Gemini

LLM Datasets

Dataset Creation

  1. Simple Question Generation with Distilabel and OpenAI
  2. Getting Started with Genstruct7B
  3. SelfInstruct with Distilabel and OpenAI

Dataset Tooling

  1. Everything you need to know to work with the 🤗 Datasets library

Dataset Transformation

  1. Semantically Filter Existing Datasets for Domain Specific Project

Text Classification Datasets Generation

  1. Fluff Detector Text Classification Dataset Generation

LLM Evaluation

Automated Metrics

  1. Evaluation_101.ipynb

How to Use


Created with ❤️ by Patrick Fleith.