You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
git clone https://github.com/wangchunliu/DRS-pretrained-LMM.git
cd DRS-pretrained-LMM
# a case of drs-text generationfromtokenization_mlmimportMLMTokenizerfromtransformersimportMBartForConditionalGeneration# For DRS parsing, src_lang should be set to en_XX, de_DE, it_IT, or nl_XXtokenizer=MLMTokenizer.from_pretrained('laihuiyuan/DRS-LMM', src_lang='<drs>')
model=MBartForConditionalGeneration.from_pretrained('laihuiyuan/DRS-LMM')
# gold text: The court is adjourned until 3:00 p.m. on March 1st.inp_ids=tokenizer.encode(
"court.n.01 time.n.08 EQU now adjourn.v.01 Theme -2 Time -1 Finish +1 time.n.08 ClockTime 15:00 MonthOfYear 3 DayOfMonth 1",
return_tensors="pt")
# For DRS parsing, the forced bos token here should be <drs> foced_ids=tokenizer.encode("en_XX", add_special_tokens=False, return_tensors="pt")
outs=model.generate(input_ids=inp_ids, forced_bos_token_id=foced_ids.item(), num_beams=5, max_length=150)
text=tokenizer.decode(outs[0].tolist(), skip_special_tokens=True, clean_up_tokenization_spaces=False)
Preprocess
Remove unsed tokens based on the training text data, and then update tokenizer and model'embeeding.
You could download original tokenizer and model from huggingface
here,
and download some setting files (or both updated tokenization and model)
here.