Skip to content

RWKV implementation with pure torch and deepspeed.

Notifications You must be signed in to change notification settings

hanlinxuy/RWKV_UNKNOWN

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RWKV_UNKNOWN

This is my implementation of RWKV language model.

cd src
deepspeed main.py --deepspeed --deepspeed_config=./configs/ds_config.config

TODO

  • jsonl data loading
  • ckpt saving/load
  • logging dump
  • config dump
  • megatron
  • rwkv5
  • attention mask

Reference

  • RWKV: Parallelizable RNN with Transformer-level LLM Performance (pronounced as "RwaKuv", from 4 major params: R W K V)
  • Data preprocessor from TrainChatGalRWKV
  • neromous for the initaial code.
  • RWKV-infctx-trainer for the model initialization code.

About

RWKV implementation with pure torch and deepspeed.

Resources

Stars

Watchers

Forks