Skip to content

Pull requests: NVIDIA/Megatron-LM

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Sort

Pull requests list

Fix typo in GPTModel forward function comments
#1391 opened Feb 9, 2025 by Zzhiter Loading…
add qkv_bias
#1388 opened Feb 7, 2025 by Chandler-Bing Loading…
Update LICENSE
#1382 opened Feb 6, 2025 by maximevtush Loading…
Add LongRoPE support
#1377 opened Feb 5, 2025 by 16bitmood Loading…
add .sh file
#1373 opened Feb 4, 2025 by umsanmaru Loading…
KV-cache for T5 model
#1358 opened Jan 17, 2025 by YK-Fu Loading…
fix typo
#1352 opened Jan 10, 2025 by Jintao-Huang Loading…
fix param overwrite problem in saver_mcore
#1351 opened Jan 9, 2025 by Force1ess Loading…
Fix typo
#1347 opened Jan 4, 2025 by deep-sci Loading…
fix bugs of data preprocessing with multiple json keys
#1337 opened Dec 25, 2024 by junjzhang Loading…
Create python-package.yml
#1332 opened Dec 21, 2024 by invisiblepancake Loading…
Add Mamba TRTLLM support
#1320 opened Dec 12, 2024 by meatybobby Loading…
update network interface env
#1319 opened Dec 12, 2024 by lizamd Loading…
fix args.mock_data bug caused by func get_blend_and_blend_per_split stale No activity in 60 days on issue or PR
#1306 opened Nov 29, 2024 by 1195343015 Loading…
[Update] Print training log in rank0
#1296 opened Nov 21, 2024 by shijungg Loading…
support qwen2 hf<->mcore ckpt converter
#1290 opened Nov 19, 2024 by wenyujin333 Loading…
Fix: Resolve multimodal model errors and update README usage instructions stale No activity in 60 days on issue or PR
#1286 opened Nov 13, 2024 by singleheart Loading…
Set torch.multiprocessing start method as 'spawn' stale No activity in 60 days on issue or PR
#1285 opened Nov 12, 2024 by hxdtest Loading…
Huvu/update t5 attentionmasktype stale No activity in 60 days on issue or PR
#1273 opened Nov 4, 2024 by huvunvidia Loading…
Update t5_model.py stale No activity in 60 days on issue or PR
#1271 opened Nov 2, 2024 by huvunvidia Loading…
Enable huggingface tokenizer stale No activity in 60 days on issue or PR
#1268 opened Oct 30, 2024 by msiddaiah Loading…
ProTip! Type g i on any issue or pull request to go back to the issue listing page.