Skip to content

patrickfleith/datapipes

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

80 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Datapipes

Simple guides to use LLMs and create datasets.

Note

All notebooks checked and updated last February 6, 2025.

I try to execute them once a month to make sure they still work on Google Colab. If you find any bugs or issues, please let me know! I'll try to fix them as soon as possible.

  1. Browse the notebooks.
  2. Open them in Google Colab using the links.
  3. In some notebooks you may need to set API keys or your Hugging Face token in order to interact with LLMs from OpenAI, Anthropic, and Google, or to interact with your huggingface hub.
  4. Run, explore, and modify to suit your needs!

LLM Usage Basics

How to use LLMs as a beginner?

Proprietary models

  1. How to use an OpenAI Chat model
  2. How to use Anthropic Claude model
  3. How to use Google Gemini model

Structured Output

Proprietary models

  1. Structured Output with OpenAI
  2. Structured Output with Anthropic
  3. Structured Output with Google Gemini

LLM Dataset Creation

Queries Generation

  1. Simple Question Generation with Distilabel and OpenAI
  2. Getting Started with Genstruct7B
  3. SelfInstruct with Distilabel and OpenAI

Text Classification Datasets Generation

  1. Fluff Detector Text Classification Dataset Generation

LLM Evaluation

Automated Metrics

  1. Evaluation_101.ipynb

How to Use


Created with ❤️ by Patrick Fleith.

About

Simple guides to create your dream LLM dataset

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published