Skip to content

Commit

Permalink
kanye rnn
Browse files Browse the repository at this point in the history
  • Loading branch information
Aayush Bajaj authored and Aayush Bajaj committed Dec 27, 2024
1 parent 7d32fd7 commit 86285a9
Show file tree
Hide file tree
Showing 25 changed files with 611 additions and 30 deletions.
Empty file added .hugo_build 2.lock
Empty file.
2 changes: 1 addition & 1 deletion content/code
46 changes: 25 additions & 21 deletions content/projects/_index.org
Original file line number Diff line number Diff line change
Expand Up @@ -99,34 +99,38 @@ These projects are all those that have had a lifecycle.
:PROPERTIES:
:CUSTOM_ID: supervised-learning
:END:
- [[/projects/ai/ml/supervised/mnist][MNIST]]
- [[/projects/ai/ml/supervised/fmnist][FMNIST (Fashion)]]
- [[/projects/ai/ml/supervised/kmnist][KMNIST (Kuzushiji)]]
- [[/projects/ai/ml/supervised/cifar][CIFAR]]
- [[/projects/ai/ml/supervised/iris][IRIS]]
- [[/projects/ai/ml/supervised/imagenet][ImageNet]]
- [[/projects/ai/ml/supervised/boston-housing][Boston Housing]]
- [[/projects/ai/ml/supervised/wine-quality][Wine Quality]]
- [[/projects/ai/ml/supervised/pima-indians][Pima Indians Diabetes]]
- [[/projects/ai/ml/supervised/imdb-reviews][IMDB Reviews]]
- [[/projects/ai/ml/supervised/titanic][Titanic Deaths]]
- [[/projects/ml/supervised/mnist][MNIST]]
- [[/projects/ml/supervised/fmnist][FMNIST (Fashion)]]
- [[/projects/ml/supervised/kmnist][KMNIST (Kuzushiji)]]
- [[/projects/ml/supervised/cifar][CIFAR]]
- [[/projects/ml/supervised/iris][IRIS]]
- [[/projects/ml/supervised/imagenet][ImageNet]]
- [[/projects/ml/supervised/boston-housing][Boston Housing]]
- [[/projects/ml/supervised/wine-quality][Wine Quality]]
- [[/projects/ml/supervised/pima-indians][Pima Indians Diabetes]]
- [[/projects/ml/supervised/imdb-reviews][IMDB Reviews]]
- [[/projects/ml/supervised/titanic][Titanic Deaths]]

*** [[/projects/ai/unsupervised][Unsupervised Learning]]
:PROPERTIES:
:CUSTOM_ID: unsupervised-learning
:END:
- [[/projects/ai/ml/unsupervised/kdd-cup][KDD Cup 1999]]
- [[/projects/ai/ml/unsupervised/digits][Digits]]
- [[/projects/ml/unsupervised/kdd-cup][KDD Cup 1999]]
- [[/projects/ml/unsupervised/digits][Digits]]

** [[/projects/dl][Deep Learning]]
:PROPERTIES:
:CUSTOM_ID: deep-learning
:END:
- [[/projects/ai/dl/KiTS19][KiTS19 Kidney and Kidney Tumour Segmentation]]
- [[/projects/ai/dl/llm-tune][Fine Tuning LLM]]
- [[/projects/ai/dl/rag][RAG]]
- [[/projects/ai/dl/cnn-scratch][CNN from scratch]]
- [[/projects/ai/dl/llm-scratch][LLM from scratch]]
- [[/projects/ai/dl/Kanye-West-RNN][RNN on the Music of Kanye West]]
- [[/projects/ai/dl/sentiment-analysis][Sentiment Analysis]]
- [[/projects/ai/dl/cartpole][CartPole]]
- [[/projects/dl/KiTS19][KiTS19 Kidney and Kidney Tumour Segmentation]]
- [[/projects/dl/llm-tune][Fine Tuning LLM]]
- [[/projects/dl/rag][RAG]]
- [[/projects/dl/cnn-scratch][CNN from scratch]]
- [[/projects/dl/llm-scratch][LLM from scratch]]
- [[/projects/dl/Kanye-West-RNN][RNN on the Music of Kanye West]]
- [[/projects/ai/sentiment-analysis][Sentiment Analysis]]
- [[/projects/dl/cartpole][CartPole]]
- Neetcode.io
- [[/projects/dl/micrograd.org][Micrograd - Andrej Karpathy]]
- minGPT - Karpathy
- nanoGPT - Karpathy
45 changes: 44 additions & 1 deletion content/projects/csp/peg-solitaire.org
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,47 @@ categories = ["cs", "projects"]
tags = ["bfs", "dfs", "memory", "puzzle", "combinatorics"]
+++

* TODO Import code
* Personal Motivations

I grew up as a child with this puzzle in my house. My mother could solve it, and maybe a couple of members on her side of the family.

Mum never knew the algorithm, or any techniques beyond "My hand just knows"; as a result I spent 4 days on it in my youth until solving it.
I learned that the trick is to consider the L shape `___|` and realise that for every set of this 4, you can perform legal operations until you are left with 1 marble.

Then, since there are 32 marbles, you do this 8 times until you have 4 left, and then finally you do it once more to go a single peg in the middle of the board.

#+BEGIN_SRC
O O O
O O O
O O O O O O O
O O O . O O O
O O O O O O O
O O O
O O O
#+END_SRC
to
#+BEGIN_SRC
· · ·
· · ·
· · · . · · ·
· · · O · · ·
· · · · · · ·
· · ·
· · ·
#+END_SRC

After battling hard for this solution, I find the wikipedia page and associated [[https://en.wikipedia.org/wiki/Peg_solitaire][article]] only to learn that there are upward of 18,000 distinct solutions.

Anyways, fast-forward slightly, and now I can code so the above directory contains a *DFS* implementation that searches every possible move until it finds a winning configuration:

`s s w w s w w w w s a a s d d a d d d d a`

Here, the letters are the basic `wasd` movements, and the spaces are the execution of that move.

Ultimately the game logic looks something like this:
`[[3, 3, 's', 10], [2, 3, 'd', 9], [2, 2, 's', 4], [0, 2, 'a', 2], [2, 1, 'a', 1], [2, 2, 'd', 8], [0, 4, 'w', 6], [2, 4, 'a', 12], [1, 2, 'w', 7], [2, 2, 's', 16], [3, 2, 'd', 15], [1, 2, 'w', 3], [1, 4, 'w', 13], [2, 4, 's', 17], [3, 4, 'a', 18], [1, 4, 'w', 11], [3, 2, 'w', 22], [4, 2, 'd', 21], [2, 2, 'w', 27], [3, 2, 's', 20], [3, 4, 'd', 5], [2, 4, 'w', 14], [3, 4, 's', 24], [4, 4, 'a', 25], [4, 5, 'd', 26], [4, 4, 'w', 29], [5, 4, 's', 32], [6, 4, 'd', 31], [4, 4, 'w', 19], [4, 3, 'a', 30], [3, 3, 'w', 23]]`

where the first 2 moves are the coordinates of the peg being moved, the letter is the move and the corresponding number is the 'id' of the marble being _killed_.

** Prospectives
Looking forwards, I want to train a learner to solve this puzzle via reinforcement learning.
22 changes: 22 additions & 0 deletions content/projects/dl/#Kanye-West-RNN.py#
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# Fetch Kanye West's songs
artist = genius.search_artist("Kanye West", max_songs=100, sort="title")

# Save lyrics to a text file
with open("kanye_lyrics.txt", "w") as file:
for song in artist.songs:
file.write(song.lyrics + "\n\n")

#now we clean the data:

# Load raw lyrics
with open("kanye_lyrics.txt", "r") as file:
raw_data = file.read()

# Clean lyrics
cleaned_data = re.sub(r"\[.*?\]", "", raw_data) # Remove metadata like [Chorus]
cleaned_data = re.sub(r"\s+", " ", cleaned_data) # Replace multiple spaces with one

# Save cleaned lyrics
with open("cleaned_kanye_lyrics.txt", "w") as file:
file.write(cleaned_data)

158 changes: 158 additions & 0 deletions content/projects/dl/Kanye-West-RNN.org
Original file line number Diff line number Diff line change
@@ -1,5 +1,163 @@
+++
title = "Kanye West RNN"
author = "Aayush Bajaj"
categories = ["ai", "ml", "music", "supervised"]
tags = ["rnn"]
+++

{{< collapse folded="false">}}

* About

This document contains the code to create an RNN chatbot that emulates Kanye West's speech style.

* Setting up the environment.

I am starting from scratch on this machine:

#+BEGIN_SRC sh
/opt/homebrew/bin/neofetch --stdout
#+END_SRC

#+RESULTS:
| [email protected] | | | | | | |
| ------------------------------------- | | | | | | |
| OS: | macOS | 15.2 | 24C101 | arm64 | | |
| Host: | MacBookPro17,1 | | | | | |
| Kernel: | 24.2.0 | | | | | |
| Uptime: | 1 | day, | 22 | hours, | 56 | mins |
| Shell: | zsh | 5.9 | | | | |
| Resolution: | 3840x2160 | @ | UHDHz, | 2560x1600 | | |
| DE: | Aqua | | | | | |
| WM: | Quartz | Compositor | | | | |
| WM | Theme: | Blue | (Dark) | | | |
| Terminal: | Emacs-arm64-11 | | | | | |
| CPU: | Apple | M1 | | | | |
| GPU: | Apple | M1 | | | | |
| Memory: | 1369MiB | / | 8192MiB | | | |
| | | | | | | |

It is why I first need to run install conda first. I went with the whole suite from https://www.anaconda.com/download.

Then I initialised my environment and installed the correct packages:

#+BEGIN_SRC sh
conda create -n metal -f metal.yaml python=3.11
conda activate nlp
conda install numpy
conda install pandas
pip install tensorflow-macos
pip install lyricsgenius
#+END_SRC

* Sourcing data and cleaning:

I go get an API key from [[https://genius.com][genius]] to pull Kanye's music into a text file:

#+BEGIN_SRC python :tangle yes

# Fetch Kanye West's songs
artist = genius.search_artist("Kanye West", max_songs=100, sort="title")

# Save lyrics to a text file
with open("kanye_lyrics.txt", "w") as file:
for song in artist.songs:
file.write(song.lyrics + "\n\n")

#now we clean the data:

# Load raw lyrics
with open("kanye_lyrics.txt", "r") as file:
raw_data = file.read()

# Clean lyrics
cleaned_data = re.sub(r"\[.*?\]", "", raw_data) # Remove metadata like [Chorus]
cleaned_data = re.sub(r"\s+", " ", cleaned_data) # Replace multiple spaces with one

# Save cleaned lyrics
with open("cleaned_kanye_lyrics.txt", "w") as file:
file.write(cleaned_data)
#+END_SRC



* TODO Architecture

* Code

The below code works, but chatgpt wrote it for me.
It was mainly a proof of concept for the moment. I shall refactor it all soon.


#+BEGIN_SRC python :tangle yes
import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Embedding
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences

# Load the data
with open("cleaned_kanye_lyrics.txt", "r") as file:
data = file.read()

# Tokenize text
tokenizer = Tokenizer()
tokenizer.fit_on_texts([data])
sequence_data = tokenizer.texts_to_sequences([data])[0]

# Define vocabulary size and max sequence length
vocab_size = len(tokenizer.word_index) + 1
sequence_length = 50

# Create sequences
sequences = []
for i in range(sequence_length, len(sequence_data)):
seq = sequence_data[i - sequence_length:i]
sequences.append(seq)

# Convert sequences into numpy array
sequences = np.array(sequences)

# Split sequences into input (X) and output (y)
X, y = sequences[:, :-1], sequences[:, -1]
y = tf.keras.utils.to_categorical(y, num_classes=vocab_size)

# Build the RNN Model
model = Sequential([
Embedding(input_dim=vocab_size, output_dim=100, input_length=sequence_length - 1),
LSTM(units=128, return_sequences=True),
LSTM(units=128),
Dense(units=vocab_size, activation='softmax')
])

model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

# Train the Model
model.fit(X, y, epochs=20, batch_size=32, validation_split=0.2)
model.save('kanye_rnn_model.h5')

# Generate Text
def generate_text(seed_text, next_words, model, tokenizer, max_sequence_len):
for _ in range(next_words):
token_list = tokenizer.texts_to_sequences([seed_text])[0]
token_list = pad_sequences([token_list], maxlen=max_sequence_len - 1, padding='pre')
predicted = model.predict(token_list, verbose=0)
output_word = tokenizer.index_word.get(np.argmax(predicted), "")
seed_text += " " + output_word
return seed_text.strip()

# Chatbot Interface
if __name__ == "__main__":
print("Kanye Bot: Hi, I’m Kanye Bot. What’s on your mind?")
while True:
user_input = input("You: ")
if user_input.lower() == "exit":
print("Kanye Bot: Peace out!")
break
response = generate_text(user_input, next_words=20, model=model, tokenizer=tokenizer, max_sequence_len=sequence_length)
print(f"Kanye Bot: {response}")


#+END_SRC

29 changes: 29 additions & 0 deletions content/projects/dl/Kanye-West-RNN.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# Fetch Kanye West's songs
artist = genius.search_artist("Kanye West", max_songs=100, sort="title")

# Save lyrics to a text file
with open("kanye_lyrics.txt", "w") as file:
for song in artist.songs:
file.write(song.lyrics + "\n\n")

#now we clean the data:

# Load raw lyrics
with open("kanye_lyrics.txt", "r") as file:
raw_data = file.read()

# Clean lyrics
cleaned_data = re.sub(r"\[.*?\]", "", raw_data) # Remove metadata like [Chorus]
cleaned_data = re.sub(r"\s+", " ", cleaned_data) # Replace multiple spaces with one

# Save cleaned lyrics
with open("cleaned_kanye_lyrics.txt", "w") as file:
file.write(cleaned_data)

import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Embedding
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
7 changes: 7 additions & 0 deletions content/projects/dl/Kanye-West-RNN.python
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import LSTM, Dense, Embedding
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
10 changes: 10 additions & 0 deletions content/projects/dl/frisbee-stats.org
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
+++
title = "Non-descriptive Frisbee Statistics"
categories = ["computer-vision", "dl"]
tags = ["ultimate-frisbee", "statistics", "non-descriptive"]
+++

** Non-descriptive frisbee stats
A computer vision model that takes in streamed games and outputs a player statistic that factors in non-descriptive events --- i.e. giving the correct call at the correct time, or poaching in the lane to force a bad throw.

I expect this to be trained using a transformer and written in Python. It is inspired by [[https://github.com/AndyWood91][Andrew Wood's]] analytical Ultimate dream.
10 changes: 10 additions & 0 deletions content/projects/dl/frisbee-stats.org~
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
+++
title = "Non-descriptive Frisbee Statistics"
categories = ["computer-vision", "dl"]
tags = ["ultimate-frisbee", "statistics", "non-descriptive"]
+++

## Non-descriptive frisbee stats
A computer vision model that takes in streamed games and outputs a player statistic that factors in non-descriptive events --- i.e. giving the correct call at the correct time, or poaching in the lane to force a bad throw.

I expect this to be trained using a transformer and written in Python. It is inspired by [[https://github.com/AndyWood91][Andrew Wood's]] analytical Ultimate dream.
12 changes: 12 additions & 0 deletions content/projects/dl/micrograd.org
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
+++
title = "My notes on Andrej Karpathy's micrograd repository"
tags = ["back-propagation", "gradient-descent", "differentiation"]
+++

#+PROPERTY: HEADER-ARGS:python+ :python /opt/anaconda3/envs/metal/bin/python

#+BEGIN_SRC python
import tensorflow as tf
#+END_SRC

#+RESULTS:
5 changes: 5 additions & 0 deletions content/projects/dl/micrograd.org~
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
+++
title = "My notes on Andrej Karpathy's micrograd repository"
tags = ["back-propagation", "gradient-descent", "differentiation"]
+++

Loading

0 comments on commit 86285a9

Please sign in to comment.