This is Chatbot implementing NLP by Python-NLTK.
python
- python-NLTK
- python-Flask
- HTML & CSS
- JavaScript
STEPS | DESCRIPTION |
---|---|
"Sentence Tokenization" |
Sentence tokenizer breaks text paragraph into sentences. |
"Word Tokenization" |
Word tokenizer breaks text paragraph into words. |
"Stemming" |
stemming just removes the last few characters, often leading to incorrect meanings and spelling errors |
"Lemmatization" |
lemmatization considers the context and converts the word to its meaningful base form |
import nltk
nltk.dowload()
sent_tokens = nltk.sent_tokenize(text)
sent_tokens = nltk.sent_tokenize(text)
lemmer = nltk.stem.WordNetLemmatizer()
def LemTokens(tokens):
return [lemmer.lemmatize(token) for token in tokens]
remove_punct_dict = dict((ord(punct), None) for punct in string.punctuation)
def LemNormalize(text):
return LemTokens(nltk.word_tokenize(text.lower().translate(remove_punct_dict)))
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
TfidfVec = TfidfVectorizer(tokenizer=LemNormalize, stop_words='english')
tfidf = TfidfVec.fit_transform(sent_tokens)
vals = cosine_similarity(tfidf[-1], tfidf)