kanye rnn

abaj8494 · Dec 27, 2024 · 86285a9 · 86285a9
1 parent 7d32fd7
commit 86285a9
Show file tree

Hide file tree

Showing 25 changed files with 611 additions and 30 deletions.
diff --git a/.hugo_build 2.lock b/.hugo_build 2.lock
diff --git a/content/code b/content/code
@@ -1 +1 @@
-/home/rpi/hugo/static/code/
+/Users/aayushbajaj/Documents/site/static/code
diff --git a/content/projects/_index.org b/content/projects/_index.org
@@ -99,34 +99,38 @@ These projects are all those that have had a lifecycle.
 :PROPERTIES:
 :CUSTOM_ID: supervised-learning
 :END:
-- [[/projects/ai/ml/supervised/mnist][MNIST]]
-- [[/projects/ai/ml/supervised/fmnist][FMNIST (Fashion)]]
-- [[/projects/ai/ml/supervised/kmnist][KMNIST (Kuzushiji)]] 
-- [[/projects/ai/ml/supervised/cifar][CIFAR]]
-- [[/projects/ai/ml/supervised/iris][IRIS]]
-- [[/projects/ai/ml/supervised/imagenet][ImageNet]]
-- [[/projects/ai/ml/supervised/boston-housing][Boston Housing]] 
-- [[/projects/ai/ml/supervised/wine-quality][Wine Quality]]
-- [[/projects/ai/ml/supervised/pima-indians][Pima Indians Diabetes]]
-- [[/projects/ai/ml/supervised/imdb-reviews][IMDB Reviews]]
-- [[/projects/ai/ml/supervised/titanic][Titanic Deaths]]
+- [[/projects/ml/supervised/mnist][MNIST]]
+- [[/projects/ml/supervised/fmnist][FMNIST (Fashion)]]
+- [[/projects/ml/supervised/kmnist][KMNIST (Kuzushiji)]] 
+- [[/projects/ml/supervised/cifar][CIFAR]]
+- [[/projects/ml/supervised/iris][IRIS]]
+- [[/projects/ml/supervised/imagenet][ImageNet]]
+- [[/projects/ml/supervised/boston-housing][Boston Housing]] 
+- [[/projects/ml/supervised/wine-quality][Wine Quality]]
+- [[/projects/ml/supervised/pima-indians][Pima Indians Diabetes]]
+- [[/projects/ml/supervised/imdb-reviews][IMDB Reviews]]
+- [[/projects/ml/supervised/titanic][Titanic Deaths]]
 
 *** [[/projects/ai/unsupervised][Unsupervised Learning]]
 :PROPERTIES:
 :CUSTOM_ID: unsupervised-learning
 :END:
-- [[/projects/ai/ml/unsupervised/kdd-cup][KDD Cup 1999]]
-- [[/projects/ai/ml/unsupervised/digits][Digits]]
+- [[/projects/ml/unsupervised/kdd-cup][KDD Cup 1999]]
+- [[/projects/ml/unsupervised/digits][Digits]]
 
 ** [[/projects/dl][Deep Learning]]
 :PROPERTIES:
 :CUSTOM_ID: deep-learning
 :END:
-- [[/projects/ai/dl/KiTS19][KiTS19 Kidney and Kidney Tumour Segmentation]]
-- [[/projects/ai/dl/llm-tune][Fine Tuning LLM]]
-- [[/projects/ai/dl/rag][RAG]]
-- [[/projects/ai/dl/cnn-scratch][CNN from scratch]]
-- [[/projects/ai/dl/llm-scratch][LLM from scratch]]
-- [[/projects/ai/dl/Kanye-West-RNN][RNN on the Music of Kanye West]]
-- [[/projects/ai/dl/sentiment-analysis][Sentiment Analysis]]
-- [[/projects/ai/dl/cartpole][CartPole]]
+- [[/projects/dl/KiTS19][KiTS19 Kidney and Kidney Tumour Segmentation]]
+- [[/projects/dl/llm-tune][Fine Tuning LLM]]
+- [[/projects/dl/rag][RAG]]
+- [[/projects/dl/cnn-scratch][CNN from scratch]]
+- [[/projects/dl/llm-scratch][LLM from scratch]]
+- [[/projects/dl/Kanye-West-RNN][RNN on the Music of Kanye West]]
+- [[/projects/ai/sentiment-analysis][Sentiment Analysis]]
+- [[/projects/dl/cartpole][CartPole]]
+- Neetcode.io
+- [[/projects/dl/micrograd.org][Micrograd - Andrej Karpathy]]
+- minGPT - Karpathy
+- nanoGPT - Karpathy
diff --git a/content/projects/csp/peg-solitaire.org b/content/projects/csp/peg-solitaire.org
@@ -4,4 +4,47 @@ categories = ["cs", "projects"]
 tags = ["bfs", "dfs", "memory", "puzzle", "combinatorics"]
 +++
 
-* TODO Import code
+* Personal Motivations
+
+I grew up as a child with this puzzle in my house. My mother could solve it, and maybe a couple of members on her side of the family.
+
+Mum never knew the algorithm, or any techniques beyond "My hand just knows"; as a result I spent 4 days on it in my youth until solving it.
+I learned that the trick is to consider the L shape `___|` and realise that for every set of this 4, you can perform legal operations until you are left with 1 marble.
+
+Then, since there are 32 marbles, you do this 8 times until you have 4 left, and then finally you do it once more to go a single peg in the middle of the board.
+
+#+BEGIN_SRC
+    O O O      
+    O O O      
+O O O O O O O  
+O O O . O O O  
+O O O O O O O  
+    O O O      
+    O O O   
+#+END_SRC
+to
+#+BEGIN_SRC
+    · · ·      
+    · · ·      
+· · · . · · ·  
+· · · O · · ·  
+· · · · · · ·  
+    · · ·      
+    · · ·
+#+END_SRC
+
+After battling hard for this solution, I find the wikipedia page and associated [[https://en.wikipedia.org/wiki/Peg_solitaire][article]] only to learn that there are upward of 18,000 distinct solutions.
+
+Anyways, fast-forward slightly, and now I can code so the above directory contains a *DFS* implementation that searches every possible move until it finds a winning configuration:
+
+`s s w w s w w w w s a a s d d a d d d d a`
+
+Here, the letters are the basic `wasd` movements, and the spaces are the execution of that move.
+
+Ultimately the game logic looks something like this: 
+`[[3, 3, 's', 10], [2, 3, 'd', 9], [2, 2, 's', 4], [0, 2, 'a', 2], [2, 1, 'a', 1], [2, 2, 'd', 8], [0, 4, 'w', 6], [2, 4, 'a', 12], [1, 2, 'w', 7], [2, 2, 's', 16], [3, 2, 'd', 15], [1, 2, 'w', 3], [1, 4, 'w', 13], [2, 4, 's', 17], [3, 4, 'a', 18], [1, 4, 'w', 11], [3, 2, 'w', 22], [4, 2, 'd', 21], [2, 2, 'w', 27], [3, 2, 's', 20], [3, 4, 'd', 5], [2, 4, 'w', 14], [3, 4, 's', 24], [4, 4, 'a', 25], [4, 5, 'd', 26], [4, 4, 'w', 29], [5, 4, 's', 32], [6, 4, 'd', 31], [4, 4, 'w', 19], [4, 3, 'a', 30], [3, 3, 'w', 23]]`
+
+where the first 2 moves are the coordinates of the peg being moved, the letter is the move and the corresponding number is the 'id' of the marble being _killed_.
+
+** Prospectives
+Looking forwards, I want to train a learner to solve this puzzle via reinforcement learning.
diff --git a/content/projects/dl/#Kanye-West-RNN.py# b/content/projects/dl/#Kanye-West-RNN.py#
@@ -0,0 +1,22 @@
+# Fetch Kanye West's songs
+artist = genius.search_artist("Kanye West", max_songs=100, sort="title")
+
+# Save lyrics to a text file
+with open("kanye_lyrics.txt", "w") as file:
+    for song in artist.songs:
+	file.write(song.lyrics + "\n\n")
+
+    #now we clean the data:
+
+# Load raw lyrics
+with open("kanye_lyrics.txt", "r") as file:
+    raw_data = file.read()
+
+  # Clean lyrics
+cleaned_data = re.sub(r"\[.*?\]", "", raw_data)  # Remove metadata like [Chorus]
+cleaned_data = re.sub(r"\s+", " ", cleaned_data)  # Replace multiple spaces with one
+
+# Save cleaned lyrics
+with open("cleaned_kanye_lyrics.txt", "w") as file:
+    file.write(cleaned_data)
+
diff --git a/content/projects/dl/Kanye-West-RNN.org b/content/projects/dl/Kanye-West-RNN.org
@@ -1,5 +1,163 @@
 +++
 title = "Kanye West RNN"
+author = "Aayush Bajaj"
 categories = ["ai", "ml", "music", "supervised"]
 tags = ["rnn"]
 +++
+
+{{< collapse folded="false">}}
+
+* About
+
+This document contains the code to create an RNN chatbot that emulates Kanye West's speech style.
+
+* Setting up the environment.
+
+I am starting from scratch on this machine:
+
+#+BEGIN_SRC sh
+/opt/homebrew/bin/neofetch --stdout
+#+END_SRC
+
+#+RESULTS:
+| [email protected] |                |            |         |           |    |      |
+| ------------------------------------- |                |            |         |           |    |      |
+| OS:                                   | macOS          | 15.2       | 24C101  | arm64     |    |      |
+| Host:                                 | MacBookPro17,1 |            |         |           |    |      |
+| Kernel:                               | 24.2.0         |            |         |           |    |      |
+| Uptime:                               | 1              | day,       | 22      | hours,    | 56 | mins |
+| Shell:                                | zsh            | 5.9        |         |           |    |      |
+| Resolution:                           | 3840x2160      | @          | UHDHz,  | 2560x1600 |    |      |
+| DE:                                   | Aqua           |            |         |           |    |      |
+| WM:                                   | Quartz         | Compositor |         |           |    |      |
+| WM                                    | Theme:         | Blue       | (Dark)  |           |    |      |
+| Terminal:                             | Emacs-arm64-11 |            |         |           |    |      |
+| CPU:                                  | Apple          | M1         |         |           |    |      |
+| GPU:                                  | Apple          | M1         |         |           |    |      |
+| Memory:                               | 1369MiB        | /          | 8192MiB |           |    |      |
+|                                       |                |            |         |           |    |      |
+
+It is why I first need to run install conda first. I went with the whole suite from https://www.anaconda.com/download.
+
+Then I initialised my environment and installed the correct packages:
+
+#+BEGIN_SRC sh
+  conda create -n metal -f metal.yaml python=3.11
+  conda activate nlp
+  conda install numpy
+  conda install pandas
+  pip install tensorflow-macos
+pip install lyricsgenius
+#+END_SRC
+
+* Sourcing data and cleaning:
+
+I go get an API key from [[https://genius.com][genius]] to pull Kanye's music into a text file:
+
+#+BEGIN_SRC python :tangle yes
+
+      # Fetch Kanye West's songs
+      artist = genius.search_artist("Kanye West", max_songs=100, sort="title")
+
+      # Save lyrics to a text file
+      with open("kanye_lyrics.txt", "w") as file:
+	  for song in artist.songs:
+	      file.write(song.lyrics + "\n\n")
+
+	  #now we clean the data:
+
+      # Load raw lyrics
+      with open("kanye_lyrics.txt", "r") as file:
+	  raw_data = file.read()
+
+	# Clean lyrics
+      cleaned_data = re.sub(r"\[.*?\]", "", raw_data)  # Remove metadata like [Chorus]
+      cleaned_data = re.sub(r"\s+", " ", cleaned_data)  # Replace multiple spaces with one
+
+      # Save cleaned lyrics
+      with open("cleaned_kanye_lyrics.txt", "w") as file:
+	  file.write(cleaned_data)
+#+END_SRC
+
+
+
+* TODO Architecture
+
+* Code
+
+The below code works, but chatgpt wrote it for me.
+It was mainly a proof of concept for the moment. I shall refactor it all soon.
+
+
+#+BEGIN_SRC python :tangle yes
+import numpy as np
+import tensorflow as tf
+from tensorflow.keras.models import Sequential
+from tensorflow.keras.layers import LSTM, Dense, Embedding
+from tensorflow.keras.preprocessing.text import Tokenizer
+from tensorflow.keras.preprocessing.sequence import pad_sequences
+
+# Load the data
+with open("cleaned_kanye_lyrics.txt", "r") as file:
+    data = file.read()
+
+# Tokenize text
+tokenizer = Tokenizer()
+tokenizer.fit_on_texts([data])
+sequence_data = tokenizer.texts_to_sequences([data])[0]
+
+# Define vocabulary size and max sequence length
+vocab_size = len(tokenizer.word_index) + 1
+sequence_length = 50
+
+# Create sequences
+sequences = []
+for i in range(sequence_length, len(sequence_data)):
+    seq = sequence_data[i - sequence_length:i]
+    sequences.append(seq)
+
+# Convert sequences into numpy array
+sequences = np.array(sequences)
+
+# Split sequences into input (X) and output (y)
+X, y = sequences[:, :-1], sequences[:, -1]
+y = tf.keras.utils.to_categorical(y, num_classes=vocab_size)
+
+# Build the RNN Model
+model = Sequential([
+    Embedding(input_dim=vocab_size, output_dim=100, input_length=sequence_length - 1),
+    LSTM(units=128, return_sequences=True),
+    LSTM(units=128),
+    Dense(units=vocab_size, activation='softmax')
+])
+
+model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
+
+# Train the Model
+model.fit(X, y, epochs=20, batch_size=32, validation_split=0.2)
+model.save('kanye_rnn_model.h5')
+
+# Generate Text
+def generate_text(seed_text, next_words, model, tokenizer, max_sequence_len):
+    for _ in range(next_words):
+        token_list = tokenizer.texts_to_sequences([seed_text])[0]
+        token_list = pad_sequences([token_list], maxlen=max_sequence_len - 1, padding='pre')
+        predicted = model.predict(token_list, verbose=0)
+        output_word = tokenizer.index_word.get(np.argmax(predicted), "")
+        seed_text += " " + output_word
+    return seed_text.strip()
+
+# Chatbot Interface
+if __name__ == "__main__":
+    print("Kanye Bot: Hi, I’m Kanye Bot. What’s on your mind?")
+    while True:
+        user_input = input("You: ")
+        if user_input.lower() == "exit":
+            print("Kanye Bot: Peace out!")
+            break
+        response = generate_text(user_input, next_words=20, model=model, tokenizer=tokenizer, max_sequence_len=sequence_length)
+        print(f"Kanye Bot: {response}")
+
+
+#+END_SRC
+
diff --git a/content/projects/dl/Kanye-West-RNN.py b/content/projects/dl/Kanye-West-RNN.py
@@ -0,0 +1,29 @@
+# Fetch Kanye West's songs
+artist = genius.search_artist("Kanye West", max_songs=100, sort="title")
+
+# Save lyrics to a text file
+with open("kanye_lyrics.txt", "w") as file:
+    for song in artist.songs:
+	file.write(song.lyrics + "\n\n")
+
+    #now we clean the data:
+
+# Load raw lyrics
+with open("kanye_lyrics.txt", "r") as file:
+    raw_data = file.read()
+
+  # Clean lyrics
+cleaned_data = re.sub(r"\[.*?\]", "", raw_data)  # Remove metadata like [Chorus]
+cleaned_data = re.sub(r"\s+", " ", cleaned_data)  # Replace multiple spaces with one
+
+# Save cleaned lyrics
+with open("cleaned_kanye_lyrics.txt", "w") as file:
+    file.write(cleaned_data)
+
+import numpy as np
+import pandas as pd
+import tensorflow as tf
+from tensorflow.keras.models import Sequential
+from tensorflow.keras.layers import LSTM, Dense, Embedding
+from tensorflow.keras.preprocessing.text import Tokenizer
+from tensorflow.keras.preprocessing.sequence import pad_sequences
diff --git a/content/projects/dl/Kanye-West-RNN.python b/content/projects/dl/Kanye-West-RNN.python
@@ -0,0 +1,7 @@
+import numpy as np
+import pandas as pd
+import tensorflow as tf
+from tensorflow.keras.models import Sequential
+from tensorflow.keras.layers import LSTM, Dense, Embedding
+from tensorflow.keras.preprocessing.text import Tokenizer
+from tensorflow.keras.preprocessing.sequence import pad_sequences
diff --git a/content/projects/dl/frisbee-stats.org b/content/projects/dl/frisbee-stats.org
@@ -0,0 +1,10 @@
++++
+title = "Non-descriptive Frisbee Statistics"
+categories = ["computer-vision", "dl"]
+tags = ["ultimate-frisbee", "statistics", "non-descriptive"]
++++
+
+** Non-descriptive frisbee stats
+A computer vision model that takes in streamed games and outputs a player statistic that factors in non-descriptive events --- i.e. giving the correct call at the correct time, or poaching in the lane to force a bad throw.
+
+I expect this to be trained using a transformer and written in Python. It is inspired by [[https://github.com/AndyWood91][Andrew Wood's]] analytical Ultimate dream.
diff --git a/content/projects/dl/frisbee-stats.org~ b/content/projects/dl/frisbee-stats.org~
@@ -0,0 +1,10 @@
++++
+title = "Non-descriptive Frisbee Statistics"
+categories = ["computer-vision", "dl"]
+tags = ["ultimate-frisbee", "statistics", "non-descriptive"]
++++
+
+## Non-descriptive frisbee stats
+A computer vision model that takes in streamed games and outputs a player statistic that factors in non-descriptive events --- i.e. giving the correct call at the correct time, or poaching in the lane to force a bad throw.
+
+I expect this to be trained using a transformer and written in Python. It is inspired by [[https://github.com/AndyWood91][Andrew Wood's]] analytical Ultimate dream.
diff --git a/content/projects/dl/micrograd.org b/content/projects/dl/micrograd.org
@@ -0,0 +1,12 @@
++++
+title = "My notes on Andrej Karpathy's micrograd repository"
+tags = ["back-propagation", "gradient-descent", "differentiation"]
++++
+
+#+PROPERTY: HEADER-ARGS:python+ :python /opt/anaconda3/envs/metal/bin/python
+
+#+BEGIN_SRC python
+import tensorflow as tf
+#+END_SRC
+
+#+RESULTS:
diff --git a/content/projects/dl/micrograd.org~ b/content/projects/dl/micrograd.org~
@@ -0,0 +1,5 @@
++++
+title = "My notes on Andrej Karpathy's micrograd repository"
+tags = ["back-propagation", "gradient-descent", "differentiation"]
++++
+
Original file line number	Diff line number	Diff line change
		@@ -1 +1 @@
		/home/rpi/hugo/static/code/
		/Users/aayushbajaj/Documents/site/static/code