-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Aayush Bajaj
authored and
Aayush Bajaj
committed
Dec 27, 2024
1 parent
7d32fd7
commit 86285a9
Showing
25 changed files
with
611 additions
and
30 deletions.
There are no files selected for viewing
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1 +1 @@ | ||
/home/rpi/hugo/static/code/ | ||
/Users/aayushbajaj/Documents/site/static/code |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
# Fetch Kanye West's songs | ||
artist = genius.search_artist("Kanye West", max_songs=100, sort="title") | ||
|
||
# Save lyrics to a text file | ||
with open("kanye_lyrics.txt", "w") as file: | ||
for song in artist.songs: | ||
file.write(song.lyrics + "\n\n") | ||
|
||
#now we clean the data: | ||
|
||
# Load raw lyrics | ||
with open("kanye_lyrics.txt", "r") as file: | ||
raw_data = file.read() | ||
|
||
# Clean lyrics | ||
cleaned_data = re.sub(r"\[.*?\]", "", raw_data) # Remove metadata like [Chorus] | ||
cleaned_data = re.sub(r"\s+", " ", cleaned_data) # Replace multiple spaces with one | ||
|
||
# Save cleaned lyrics | ||
with open("cleaned_kanye_lyrics.txt", "w") as file: | ||
file.write(cleaned_data) | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,163 @@ | ||
+++ | ||
title = "Kanye West RNN" | ||
author = "Aayush Bajaj" | ||
categories = ["ai", "ml", "music", "supervised"] | ||
tags = ["rnn"] | ||
+++ | ||
|
||
{{< collapse folded="false">}} | ||
|
||
* About | ||
|
||
This document contains the code to create an RNN chatbot that emulates Kanye West's speech style. | ||
|
||
* Setting up the environment. | ||
|
||
I am starting from scratch on this machine: | ||
|
||
#+BEGIN_SRC sh | ||
/opt/homebrew/bin/neofetch --stdout | ||
#+END_SRC | ||
|
||
#+RESULTS: | ||
| [email protected] | | | | | | | | ||
| ------------------------------------- | | | | | | | | ||
| OS: | macOS | 15.2 | 24C101 | arm64 | | | | ||
| Host: | MacBookPro17,1 | | | | | | | ||
| Kernel: | 24.2.0 | | | | | | | ||
| Uptime: | 1 | day, | 22 | hours, | 56 | mins | | ||
| Shell: | zsh | 5.9 | | | | | | ||
| Resolution: | 3840x2160 | @ | UHDHz, | 2560x1600 | | | | ||
| DE: | Aqua | | | | | | | ||
| WM: | Quartz | Compositor | | | | | | ||
| WM | Theme: | Blue | (Dark) | | | | | ||
| Terminal: | Emacs-arm64-11 | | | | | | | ||
| CPU: | Apple | M1 | | | | | | ||
| GPU: | Apple | M1 | | | | | | ||
| Memory: | 1369MiB | / | 8192MiB | | | | | ||
| | | | | | | | | ||
|
||
It is why I first need to run install conda first. I went with the whole suite from https://www.anaconda.com/download. | ||
|
||
Then I initialised my environment and installed the correct packages: | ||
|
||
#+BEGIN_SRC sh | ||
conda create -n metal -f metal.yaml python=3.11 | ||
conda activate nlp | ||
conda install numpy | ||
conda install pandas | ||
pip install tensorflow-macos | ||
pip install lyricsgenius | ||
#+END_SRC | ||
|
||
* Sourcing data and cleaning: | ||
|
||
I go get an API key from [[https://genius.com][genius]] to pull Kanye's music into a text file: | ||
|
||
#+BEGIN_SRC python :tangle yes | ||
|
||
# Fetch Kanye West's songs | ||
artist = genius.search_artist("Kanye West", max_songs=100, sort="title") | ||
|
||
# Save lyrics to a text file | ||
with open("kanye_lyrics.txt", "w") as file: | ||
for song in artist.songs: | ||
file.write(song.lyrics + "\n\n") | ||
|
||
#now we clean the data: | ||
|
||
# Load raw lyrics | ||
with open("kanye_lyrics.txt", "r") as file: | ||
raw_data = file.read() | ||
|
||
# Clean lyrics | ||
cleaned_data = re.sub(r"\[.*?\]", "", raw_data) # Remove metadata like [Chorus] | ||
cleaned_data = re.sub(r"\s+", " ", cleaned_data) # Replace multiple spaces with one | ||
|
||
# Save cleaned lyrics | ||
with open("cleaned_kanye_lyrics.txt", "w") as file: | ||
file.write(cleaned_data) | ||
#+END_SRC | ||
|
||
|
||
|
||
* TODO Architecture | ||
|
||
* Code | ||
|
||
The below code works, but chatgpt wrote it for me. | ||
It was mainly a proof of concept for the moment. I shall refactor it all soon. | ||
|
||
|
||
#+BEGIN_SRC python :tangle yes | ||
import numpy as np | ||
import tensorflow as tf | ||
from tensorflow.keras.models import Sequential | ||
from tensorflow.keras.layers import LSTM, Dense, Embedding | ||
from tensorflow.keras.preprocessing.text import Tokenizer | ||
from tensorflow.keras.preprocessing.sequence import pad_sequences | ||
|
||
# Load the data | ||
with open("cleaned_kanye_lyrics.txt", "r") as file: | ||
data = file.read() | ||
|
||
# Tokenize text | ||
tokenizer = Tokenizer() | ||
tokenizer.fit_on_texts([data]) | ||
sequence_data = tokenizer.texts_to_sequences([data])[0] | ||
|
||
# Define vocabulary size and max sequence length | ||
vocab_size = len(tokenizer.word_index) + 1 | ||
sequence_length = 50 | ||
|
||
# Create sequences | ||
sequences = [] | ||
for i in range(sequence_length, len(sequence_data)): | ||
seq = sequence_data[i - sequence_length:i] | ||
sequences.append(seq) | ||
|
||
# Convert sequences into numpy array | ||
sequences = np.array(sequences) | ||
|
||
# Split sequences into input (X) and output (y) | ||
X, y = sequences[:, :-1], sequences[:, -1] | ||
y = tf.keras.utils.to_categorical(y, num_classes=vocab_size) | ||
|
||
# Build the RNN Model | ||
model = Sequential([ | ||
Embedding(input_dim=vocab_size, output_dim=100, input_length=sequence_length - 1), | ||
LSTM(units=128, return_sequences=True), | ||
LSTM(units=128), | ||
Dense(units=vocab_size, activation='softmax') | ||
]) | ||
|
||
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy']) | ||
|
||
# Train the Model | ||
model.fit(X, y, epochs=20, batch_size=32, validation_split=0.2) | ||
model.save('kanye_rnn_model.h5') | ||
|
||
# Generate Text | ||
def generate_text(seed_text, next_words, model, tokenizer, max_sequence_len): | ||
for _ in range(next_words): | ||
token_list = tokenizer.texts_to_sequences([seed_text])[0] | ||
token_list = pad_sequences([token_list], maxlen=max_sequence_len - 1, padding='pre') | ||
predicted = model.predict(token_list, verbose=0) | ||
output_word = tokenizer.index_word.get(np.argmax(predicted), "") | ||
seed_text += " " + output_word | ||
return seed_text.strip() | ||
|
||
# Chatbot Interface | ||
if __name__ == "__main__": | ||
print("Kanye Bot: Hi, I’m Kanye Bot. What’s on your mind?") | ||
while True: | ||
user_input = input("You: ") | ||
if user_input.lower() == "exit": | ||
print("Kanye Bot: Peace out!") | ||
break | ||
response = generate_text(user_input, next_words=20, model=model, tokenizer=tokenizer, max_sequence_len=sequence_length) | ||
print(f"Kanye Bot: {response}") | ||
|
||
|
||
#+END_SRC | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,29 @@ | ||
# Fetch Kanye West's songs | ||
artist = genius.search_artist("Kanye West", max_songs=100, sort="title") | ||
|
||
# Save lyrics to a text file | ||
with open("kanye_lyrics.txt", "w") as file: | ||
for song in artist.songs: | ||
file.write(song.lyrics + "\n\n") | ||
|
||
#now we clean the data: | ||
|
||
# Load raw lyrics | ||
with open("kanye_lyrics.txt", "r") as file: | ||
raw_data = file.read() | ||
|
||
# Clean lyrics | ||
cleaned_data = re.sub(r"\[.*?\]", "", raw_data) # Remove metadata like [Chorus] | ||
cleaned_data = re.sub(r"\s+", " ", cleaned_data) # Replace multiple spaces with one | ||
|
||
# Save cleaned lyrics | ||
with open("cleaned_kanye_lyrics.txt", "w") as file: | ||
file.write(cleaned_data) | ||
|
||
import numpy as np | ||
import pandas as pd | ||
import tensorflow as tf | ||
from tensorflow.keras.models import Sequential | ||
from tensorflow.keras.layers import LSTM, Dense, Embedding | ||
from tensorflow.keras.preprocessing.text import Tokenizer | ||
from tensorflow.keras.preprocessing.sequence import pad_sequences |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
import numpy as np | ||
import pandas as pd | ||
import tensorflow as tf | ||
from tensorflow.keras.models import Sequential | ||
from tensorflow.keras.layers import LSTM, Dense, Embedding | ||
from tensorflow.keras.preprocessing.text import Tokenizer | ||
from tensorflow.keras.preprocessing.sequence import pad_sequences |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
+++ | ||
title = "Non-descriptive Frisbee Statistics" | ||
categories = ["computer-vision", "dl"] | ||
tags = ["ultimate-frisbee", "statistics", "non-descriptive"] | ||
+++ | ||
|
||
** Non-descriptive frisbee stats | ||
A computer vision model that takes in streamed games and outputs a player statistic that factors in non-descriptive events --- i.e. giving the correct call at the correct time, or poaching in the lane to force a bad throw. | ||
|
||
I expect this to be trained using a transformer and written in Python. It is inspired by [[https://github.com/AndyWood91][Andrew Wood's]] analytical Ultimate dream. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
+++ | ||
title = "Non-descriptive Frisbee Statistics" | ||
categories = ["computer-vision", "dl"] | ||
tags = ["ultimate-frisbee", "statistics", "non-descriptive"] | ||
+++ | ||
|
||
## Non-descriptive frisbee stats | ||
A computer vision model that takes in streamed games and outputs a player statistic that factors in non-descriptive events --- i.e. giving the correct call at the correct time, or poaching in the lane to force a bad throw. | ||
|
||
I expect this to be trained using a transformer and written in Python. It is inspired by [[https://github.com/AndyWood91][Andrew Wood's]] analytical Ultimate dream. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
+++ | ||
title = "My notes on Andrej Karpathy's micrograd repository" | ||
tags = ["back-propagation", "gradient-descent", "differentiation"] | ||
+++ | ||
|
||
#+PROPERTY: HEADER-ARGS:python+ :python /opt/anaconda3/envs/metal/bin/python | ||
|
||
#+BEGIN_SRC python | ||
import tensorflow as tf | ||
#+END_SRC | ||
|
||
#+RESULTS: |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,5 @@ | ||
+++ | ||
title = "My notes on Andrej Karpathy's micrograd repository" | ||
tags = ["back-propagation", "gradient-descent", "differentiation"] | ||
+++ | ||
|
Oops, something went wrong.