Skip to content

Easily view and modify JSON datasets for large language models

License

Notifications You must be signed in to change notification settings

LostRuins/datasetexplorer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

32 Commits
 
 
 
 
 
 

Repository files navigation

Concedo's Dataset Explorer

Easily view and modify JSON and JSONL datasets for training large language models

image

Features

  • Easily view and modify JSON and JSONL datasets for training large language models
  • Supports Alpaca (Instruct), ShareGPT, and Text formats (and more)
  • Runs fully portable from your web browser, as a single file with zero other dependencies
  • Browse through your training datasets, with easy search and filter functions to segment your data
  • Supports searching and filtering with regex search or simple substrings search
  • Filter multiple samples by contents, length, matches, and number of turns. Allows combining multiple queries for composite results.
  • Includes an N-gram viewer to inspect selected examples for word frequency and repetition (word cloud)
  • Allows splitting and merging datasets by selecting desired subsets with different criteria.
  • Allows easy dataset deduplication
  • Includes a simple inline editor to modify individual samples or correct typos.
  • Pick individual samples or bulk-combine groups of them to curate your dataset, and save the results as a new JSON dataset
  • Fast and efficient, comfortably handles small to medium sized datasets of up to 400 MB. For larger datasets, it's recommended to split them first.
  • Fully open source, capable of running completely offline (just save the HTML file)

Free and open source. Try now at https://lostruins.github.io/datasetexplorer

Tips

  • JSON > Parquet
  • Alpaca > ChatML
  • Kobo > !Kobo

About

Easily view and modify JSON datasets for large language models

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages