Merge branch 'ht23' of https://github.com/NBISweden/workshop-python i…

…nto ht23
NBISweden · Oct 13, 2023 · cd66a25 · cd66a25
2 parents 8af243a + 9813567
commit cd66a25
Show file tree

Hide file tree

Showing 68 changed files with 39,535 additions and 29,182 deletions.
diff --git a/environment.yml b/environment.yml
@@ -14,3 +14,4 @@ dependencies:
   - beautifulsoup4=4.10.0
   - fsspec=2021.10.0
   - openpyxl=3.0.9
+  - scikit-learn=1.3.0
diff --git a/exercises/day2/Day_2_Exercise_ChatGPT.ipynb b/exercises/day2/Day_2_Exercise_ChatGPT.ipynb
@@ -0,0 +1,89 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "c8d1efdc",
+   "metadata": {},
+   "source": [
+    "<span style=\"color:green; font-size:30px\">ChatGPT</span><span style=\"color:red; font-size:30px\"> Exercise</span>  \n",
+    "\n",
+    "<br>\n",
+    "ChatGPT can be a tremendous help in writing and understanding code. But, ChatGPT often makes mistakes, and you cannot always trust the result. However, we should try to use it as a tool in the work we do.\n",
+    "\n",
+    "In this exercise you will be given a piece of code that is much more complex than what you have worked with so far. As there are no comments explaining what this code does, we will use ChatGPT to help us understand what it does, and modify the code. It often happens that you receive code, or find some code online that you would want to use, but you don't understand it fully. Here, ChatGPT comes in handy.\n",
+    "\n",
+    "This exercise has some different levels of difficulty, try around with a few of the tasks below:\n",
+    "\n",
+    "1. Input the code below to ChatGPT, and have it explain, line by line, what the code does\n",
+    "2. Modify the code (on your own, not using ChatGPT) to use 4 clusters instead of 2, and write the results to a file instead of printing it\n",
+    "3. Use ChatGPT to see if you can generate code that clusters the data into 2 clusters, of which you further cluster the biggest of those clusters into 3 subclusters\n",
+    "4. Use ChatGPT to see if you can use t-sne to plot the results of the results from both question 2 and question 3. Try using the cluster groups as colors\n",
+    "\n",
+    "And if there are parts ChatGPT says that you do not understand, try prompting it for further explanations."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4ba19843",
+   "metadata": {},
+   "source": [
+    "### The code to understand and modify:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "50460557",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import numpy as np\n",
+    "from sklearn.cluster import KMeans\n",
+    "\n",
+    "def one_hot_encode(sequence):\n",
+    "    encoding = {'A': [1, 0, 0, 0], 'C': [0, 1, 0, 0], 'G': [0, 0, 1, 0], 'T': [0, 0, 0, 1]}\n",
+    "    return np.array([encoding[nucleotide] for nucleotide in sequence]).flatten()\n",
+    "\n",
+    "def generate_random_dna_sequence(length):\n",
+    "    nucleotides = ['A', 'C', 'G', 'T']\n",
+    "    return ''.join(np.random.choice(nucleotides, size=length))\n",
+    "\n",
+    "def main():\n",
+    "    sequences = [generate_random_dna_sequence(20) for _ in range(50)]\n",
+    "\n",
+    "    encoded_data = [one_hot_encode(sequence) for sequence in sequences]\n",
+    "\n",
+    "    kmeans = KMeans(n_clusters=4, random_state=42)\n",
+    "    clusters = kmeans.fit_predict(encoded_data)\n",
+    "\n",
+    "    for i, sequence in enumerate(sequences):\n",
+    "        cluster_label = clusters[i]\n",
+    "        print(f\"Sequence: {sequence}, Cluster: {cluster_label}\")\n",
+    "\n",
+    "if __name__ == \"__main__\":\n",
+    "    main()"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.4"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/exercises/day4/Day_4_exercise_1_hints.ipynb b/exercises/day4/Day_4_exercise_1_hints.ipynb
@@ -101,7 +101,8 @@
    "source": [
     "This gives us the code\n",
     "```py\n",
-    "    fields = line.split('|')```"
+    "fields = line.split('|')\n",
+    "```"
    ]
   },
   {
@@ -158,7 +159,8 @@
    "source": [
     "The genres are at position 5, which is the second last position\n",
     "```py\n",
-    "genres = fields[5].strip()```\n",
+    "genres = fields[5].strip()\n",
+    "```\n",
     "or\n",
     "```py\n",
     "genres = fields[-2].strip()\n",
@@ -170,7 +172,8 @@
     "to `[\"Action\", \"Drama\"]`  we must split the string at `,`:\n",
     "\n",
     "```py\n",
-    "genres = fields[5].strip().split(',')```"
+    "genres = fields[5].strip().split(',')\n",
+    "```"
    ]
   },
   {
@@ -192,7 +195,8 @@
     "            rating = float(fields[1])\n",
     "            title = fields[-1].strip()\n",
     "            m_year = int(fields[2])\n",
-    "            genres = fields[-2].strip().split(',')```\n",
+    "            genres = fields[-2].strip().split(',')\n",
+    "```\n",
     "        \n",
     "        \n",
     "------------"
@@ -287,7 +291,7 @@
     "            movie_ok = False\n",
     "        if rating_max and rating_max < rating:\n",
     "            movie_ok = False\n",
-    "          ```  "
+    "```  "
    ]
   },
   {
@@ -436,7 +440,7 @@
  ],
  "metadata": {
   "kernelspec": {
-   "display_name": "Python 3",
+   "display_name": "Python 3 (ipykernel)",
    "language": "python",
    "name": "python3"
   },
@@ -450,7 +454,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.6.9"
+   "version": "3.9.4"
   }
  },
  "nbformat": 4,