Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

French Verbs Transformation #250

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 20 additions & 0 deletions transformations/french_synonym_verbs_transformation/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Verb Synonym Substitution 🦎 + ⌨️ → 🐍


This transformation change some words with synonyms according to if their POS tag is a VERB for simple french sentences. It requires Spacy_lefff (an extention of spacy for french POS and lemmatizing) and nltk package with the open multilingual wordnet dictionary.

Authors : Lisa Barthe and Louanes Hamla from Fablab by Inetum in Paris
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please, add email. Maybe, you can use this style:

  • Author name:
  • Author email:
  • Author Affiliation:


## What type of transformation it is ?
This transformation allows to create paraphrases with a different word in french. The general meaning of the sentence remains but it can be declined on different paraphrases with one verb variation.

## Supported Task

This perturbation can be used for any French task.

## What does it intend to benefit ?

This perturbation would benefit all tasks which have a sentence/paragraph/document as input like text classification, text generation, etc. that requires synthetic data augmentation / diversification.

## What are the limitation of this transformation ?
This tool does not take the general context into account, sometimes, the ouput will not match the general sense of te sentence.
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
from .transformation import *

Binary file not shown.
52 changes: 52 additions & 0 deletions transformations/french_synonym_verbs_transformation/test.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
{
"type": "french_verbs_transformation",
"test_cases": [

{
"class": "FrenchVerbsSynonymTransformation",
"inputs": {
"sentence": "je vais finir ce devoir avant demain"
},
"outputs": [{
"sentence": "je vais terminer ce devoir avant demain"
}]

},

{
"class": "FrenchVerbsSynonymTransformation",
"inputs": {
"sentence": "Puis-je entrer ? Cela fait 10 minutes que je suis en face."
},
"outputs": [{
"sentence": "Puis-je venir ? Cela fait 10 minutes que je suis en face."
}]

},

{
"class": "FrenchVerbsSynonymTransformation",
"inputs": {
"sentence": "Les psychologues vont devoir calmer les tensions"
},
"outputs": [{
"sentence": "Les psychologues vont devoir soulager les tensions"
}]

},


{
"class": "FrenchVerbsSynonymTransformation",
"inputs": {
"sentence": "J'ai enfin pu faire remorquer la voiture !"
},
"outputs": [{
"sentence": "J'ai enfin pu faire rouler la voiture !"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if you only change one verb per instance, i.e., you only generate one additional sentence per instance?

}]

}


]
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
from textblob import TextBlob, Blobber, Word
import re
from textblob_fr import PatternTagger, PatternAnalyzer
import nltk
nltk.download('wordnet')
from textblob.wordnet import NOUN, VERB, ADV, ADJ
import spacy
from spacy_lefff import LefffLemmatizer, POSTagger
from spacy.language import Language
from nltk.corpus import wordnet
import nltk
nltk.download('omw')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you might add the nltk in a similar way to spacy in initialize.py


from interfaces.SentenceOperation import SentenceOperation
from tasks.TaskTypes import TaskType

@Language.factory('french_lemmatizer')
def create_french_lemmatizer(nlp, name):
return LefffLemmatizer()

@Language.factory('POSTagger')
def create_POSTagger(nlp, name):
return POSTagger()


nlp = spacy.load('fr_core_news_md')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to use spacy like this.


nlp.add_pipe('POSTagger', name ='pos')
nlp.add_pipe('french_lemmatizer', name='lefff', after='pos')

def synonym_transformation(text):
doc = nlp(text)
verbs = [d.text for d in doc if d.pos_ == "VERB"]
synonyms_verb_list = []
for i in verbs :
dict_verb_synonyms = {}
dict_verb_synonyms['verb'] = i
dict_verb_synonyms['synonyms'] = list(set([l.name() for syn in wordnet.synsets(i, lang = 'fra', pos = VERB) for l in syn.lemmas('fra')]))
if len(dict_verb_synonyms['synonyms']) > 0:
synonyms_verb_list.append(dict_verb_synonyms)
valid_verb_list = []
for j in synonyms_verb_list:
for k in j['synonyms']:
valid_verb_dict = {}
valid_verb_dict['verb'] = j['verb']
valid_verb_dict['syn'] = k
if nlp(j['verb']).similarity(nlp(k)) > .60 and not nlp(j['verb']).similarity(nlp(k)) >= .999:
valid_verb_list.append(valid_verb_dict)
text_verb_generated = []
pertu=[]
for l in valid_verb_list:
text_verb_generated.append(text.replace(l['verb'], l['syn']))
text_verb_generated.sort(reverse=True)
for sent in text_verb_generated:
if nlp(text).similarity(nlp(i)) > .10 and not nlp(text).similarity(nlp(i)) >= .999:
pertu.append(sent)
break

return pertu





class FrenchVerbsSynonymTransformation(SentenceOperation):
tasks = [
TaskType.TEXT_CLASSIFICATION,
TaskType.TEXT_TO_TEXT_GENERATION,
TaskType.TEXT_TAGGING,
]
languages = ["fr"]

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add the some keywords too.

def __init__(self, seed=0, max_outputs=1):
super().__init__(seed, max_outputs=max_outputs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't use the param max_outputs. It means that you are generating all possible candidates?


def generate(self, sentence : str):
perturbed_texts = synonym_transformation(
sentence
)
print("perturbed text inside of class",perturbed_texts)
return perturbed_texts