diff --git a/2024/_sources/exercise_solutions/exercise_1/exercise_1.1_solution.ipynb.txt b/2024/_sources/exercise_solutions/exercise_1/exercise_1.1_solution.ipynb.txt new file mode 100644 index 00000000..6394ced9 --- /dev/null +++ b/2024/_sources/exercise_solutions/exercise_1/exercise_1.1_solution.ipynb.txt @@ -0,0 +1,100 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Exercise 1.1: Command line exercises\n", + "\n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In this exercise, you will play around with the command line on your machine and get more familiar with it.\n", + "\n", + "**a)** Let's play around with some options for the `ls` command. First `cd` into a directory that has some interesting files in it (like `~git/bootcamp/command_line_tutorial`). Try the following if you are using `bash`.\n", + "\n", + " ls -F\n", + " ls -G # Might not be as cool with Git Bash on Windows\n", + " ls -l\n", + " ls -lh\n", + " ls -lS\n", + " ls -FGLh\n", + " \n", + "You should be able to infer what these different options do, but you can ask the course staff as well.\n", + "\n", + "Normally, files that begin with a dot (`.`) are omitted when listing things. They are also generally omitted when you use your OS's GUI-based file handling system (like Finder on Macs). To see them, use `ls -a`. So, `cd` into your home directory (you remember how to do that, right?), and then do\n", + "\n", + " ls -a" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**b)** The nuclear option to delete *everything* in a directory is `rm -rf`. The `r` means to delete recursively, and the `f` means to \"force\" deletion. I was going to give you an exercise that uses the nuclear option, but I'm not going to do that. So, just forget I said anything. For this part of the problem, I want you to discuss with someone else in the class *when* the nuclear option might be used, and what needs to be in place before exercising it." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**c)** Try doing this if you are using macOS or Linux:\n", + "\n", + " ls /\n", + " \n", + "What is `/`? Try `cd`-ing there and seeing what's in there. **Do not delete anything!**" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Solution\n", + "\n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "tags": [] + }, + "source": [ + "This problem more or less consisted of messing around with the command line." + ] + } + ], + "metadata": { + "anaconda-cloud": {}, + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.12" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/2024/_sources/exercise_solutions/exercise_1/exercise_1.2_solution.ipynb.txt b/2024/_sources/exercise_solutions/exercise_1/exercise_1.2_solution.ipynb.txt new file mode 100644 index 00000000..bbb3a8f2 --- /dev/null +++ b/2024/_sources/exercise_solutions/exercise_1/exercise_1.2_solution.ipynb.txt @@ -0,0 +1,128 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Exercise 1.2: Making an rc file\n", + "\n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Having a `.bashrc` or `.zshrc` file allows you to configure your shell how you like.\n", + "\n", + "**a)** If you are using Linux or macOS, open a terminal and type\n", + "\n", + " echo $SHELL\n", + " \n", + "This will tell you if you are using a Bash shell or Zsh, which will tell you which kind of rc file to set up in the next part of the exercise. If you are using Windows, you will create a `.bashrc` file.\n", + "\n", + "**b)** Create a `.bashrc` or `.zshrc` file in your home directory. If you already have one, open it up for editing using Jupyter's text editor.\n", + "\n", + "**c)** It is often useful to `alias` functions to other functions. For example, I am always worried I will accidentally delete things by accident. I therefore have the following line in my `.zshrc` file.\n", + "\n", + " alias rm=\"rm -i\"\n", + " \n", + "You should create aliases for commands like `ls` based on the flags you like to *always* use. Do the same for `rm` and `mv` (I use the `-i` flag with these). To figure out what flags are available, you can look at the `man` pages. Asking Google will usually give you the information you need on flags.\n", + "\n", + "If you like, you can use my `.bashrc` file, available in `~/git/bootcamp/misc/jb_bashrc`, or my `.zshrc` file, available in `~/git/bootcamp/misc/jb_zshrc`.\n", + "\n", + "**d)** Depending on your operating system, if you are using Bash, your `~/.bashrc` file may or may not be properly loaded upon opening a new bash shell. You may, e.g. for new macOS versions, need to explicitly source your `.bashrc` file in your `~/.bash_profile` file. Therefore, you should add the following to the bottom of your `~/.bash_profile` file.\n", + "\n", + "```bash\n", + "if [ -f $HOME/.bashrc ]; then\n", + " . $HOME/.bashrc\n", + "fi\n", + "```" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Solution\n", + "\n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "tags": [] + }, + "source": [ + "Again, this was mostly you messing around with the command line. The contents of my .bashrc file are shown below.\n", + "\n", + "```bash\n", + "# Give me a nice prompt that tells me my pwd\n", + "export PS1=\"\\[\\e[1;32m\\]\\u\\[\\e[0m\\]@\\e[1;36m\\]\\h\\[\\e[0m\\] [\\w]\\n% \"\n", + "\n", + "\n", + "# Keep me out of trouble!\n", + "alias rm=\"rm -i\"\n", + "alias mv=\"mv -i\"\n", + "alias cp=\"cp -i\"\n", + "\n", + "\n", + "# customize list output\n", + "alias ls=\"ls -FGh\"\n", + "export LSCOLORS=\"gxfxcxdxCxegedabagacad\"\n", + "```\n", + "\n", + "And my .zshrc file is:\n", + "\n", + "```zsh\n", + "# This is a nice prompt; gives green check mark if last command executed\n", + "# without a problem and gives a red questionmark with an exit code\n", + "# if it didn't, along with pwd.\n", + "PROMPT='%(?.%F{green}√.%F{red}?%?)%f [%B%F{240}%10~%f%b] \n", + "%# '\n", + "\n", + "# Aliases for save moving, removing, and copying of files\n", + "alias rm=\"rm -i\"\n", + "alias mv=\"mv -i\"\n", + "alias cp=\"cp -i\"\n", + "\n", + "# Nicely formatted listing\n", + "alias ls=\"ls -FGh\"\n", + "\n", + "# Nice set of colors\n", + "export LSCOLORS=\"gxfxcxdxCxegedabagacad\"\n", + "```" + ] + } + ], + "metadata": { + "anaconda-cloud": {}, + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.3" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/2024/_sources/exercise_solutions/exercise_1/exercise_1.3_solution.ipynb.txt b/2024/_sources/exercise_solutions/exercise_1/exercise_1.3_solution.ipynb.txt new file mode 100644 index 00000000..8e2c6b16 --- /dev/null +++ b/2024/_sources/exercise_solutions/exercise_1/exercise_1.3_solution.ipynb.txt @@ -0,0 +1,160 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Exercise 1.3: Time and type conversions\n", + "\n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Using the techniques you have learned in the first day of bootcamp, generate a time stamp (like 13:29:45 for nearly half past one in the afternoon) for the time that is 63,252 seconds after midnight. Start with this statement:" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "seconds_past_midnight = 63252" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "After that statement, the only numeric keys you should need or want to push are `0`, `2` or `3`, and `6`." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Solution\n", + "\n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To get the number of hours, we floor divide by 3600. To get the number of minutes, we take the modulus of division by 3600, and then divide that by 60. Finally, the seconds are what is left over when we divide by 60." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "hours = seconds_past_midnight // 60**2\n", + "minutes = (seconds_past_midnight % 60**2) // 60\n", + "seconds = seconds_past_midnight % 60" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now that we have these, we concatenate a string together." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "17:34:12\n" + ] + } + ], + "source": [ + "time_str = str(hours) + ':' + str(minutes) + ':' + str(seconds)\n", + "\n", + "print(time_str)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "tags": [] + }, + "source": [ + "There are much elegant ways of doing these two operations, including using string methods. For most applications using time stamps, you would use the [datetime module](https://docs.python.org/3/library/datetime.html) of the standard library." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Computing environment" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": { + "tags": [ + "hide-input" + ] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Python implementation: CPython\n", + "Python version : 3.11.3\n", + "IPython version : 8.12.0\n", + "\n", + "jupyterlab: 3.6.3\n", + "\n" + ] + } + ], + "source": [ + "%load_ext watermark\n", + "%watermark -v -p jupyterlab" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.3" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/2024/_sources/exercise_solutions/exercise_1/exercise_1.4_solution.ipynb.txt b/2024/_sources/exercise_solutions/exercise_1/exercise_1.4_solution.ipynb.txt new file mode 100644 index 00000000..89de02a8 --- /dev/null +++ b/2024/_sources/exercise_solutions/exercise_1/exercise_1.4_solution.ipynb.txt @@ -0,0 +1,261 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Exercise 1.4: Using string methods\n", + "\n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In [Lesson 7](l07_intro_to_functions.ipynb), we wrote a function to compute the reverse complement of a sequence. \n", + "\n", + "**a)** Write that function again, still using a `for` loop, but do not use the built-in `reversed()` function.\n", + "\n", + "**b)** Write the function one more time, but without any loops." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Solution\n", + "\n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**a)** The trick here is to do what we did in Lesson 7, except use `[::-1]` indexing instead of the `reversed()` function." + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "def complement_base(base):\n", + " \"\"\"Returns the Watson-Crick complement of a base.\"\"\"\n", + " if base == 'A' or base == 'a':\n", + " return 'T'\n", + " elif base == 'T' or base == 't':\n", + " return 'A'\n", + " elif base == 'G' or base == 'g':\n", + " return 'C'\n", + " else:\n", + " return 'G'\n", + "\n", + "\n", + "def reverse_complement(seq):\n", + " \"\"\"Compute reverse complement of a sequence.\"\"\"\n", + " # Initialize reverse complement\n", + " rev_seq = ''\n", + " \n", + " # Loop through and populate list with reverse complement\n", + " for base in seq:\n", + " rev_seq += complement_base(base)\n", + " \n", + " return rev_seq[::-1]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "And we'll do a quick test with the same sequence as in lesson 7." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'TGCAACTGC'" + ] + }, + "execution_count": 2, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "reverse_complement('GCAGTTGCA')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Bingo!" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**b)** We can eliminate the `for` loop by using the `replace()` method of strings." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "def reverse_complement(seq):\n", + " \"\"\"Compute reverse complement of a sequence.\"\"\"\n", + " # Initialize rev_seq to a lowercase seq\n", + " rev_seq = seq.lower()\n", + " \n", + " # Substitute bases\n", + " rev_seq = rev_seq.replace('t', 'A')\n", + " rev_seq = rev_seq.replace('a', 'T')\n", + " rev_seq = rev_seq.replace('g', 'C')\n", + " rev_seq = rev_seq.replace('c', 'G')\n", + " \n", + " return rev_seq[::-1]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "And let's give it a test!" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'TGCAACTGC'" + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "reverse_complement('GCAGTTGCA')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Note:** We haven't learned about it yet, but some Googling would allow you to use the `translate()` and `maketrans()` string methods. `maketrans()` makes a **translation table** for characters in a string, and then the `translate()` functions uses it to mutate the characters in the list." + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'TGCAACTGC'" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "def reverse_complement(seq):\n", + " \"\"\"Compute reverse complement of a sequence.\"\"\"\n", + " return seq.translate(str.maketrans('ATGCatgc', 'TACGTACG'))[::-1]\n", + "\n", + "reverse_complement('GCAGTTGCA')" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "tags": [] + }, + "source": [ + "So, we were able to do it in one line!" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Computing environment" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": { + "tags": [ + "hide-input" + ] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Python implementation: CPython\n", + "Python version : 3.11.3\n", + "IPython version : 8.12.0\n", + "\n", + "jupyterlab: 3.6.3\n", + "\n" + ] + } + ], + "source": [ + "%load_ext watermark\n", + "%watermark -v -p jupyterlab" + ] + } + ], + "metadata": { + "anaconda-cloud": {}, + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.3" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/2024/_sources/exercise_solutions/exercise_1/exercise_1.5_solution.ipynb.txt b/2024/_sources/exercise_solutions/exercise_1/exercise_1.5_solution.ipynb.txt new file mode 100644 index 00000000..9a8c7af1 --- /dev/null +++ b/2024/_sources/exercise_solutions/exercise_1/exercise_1.5_solution.ipynb.txt @@ -0,0 +1,233 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Exercise 1.5: Longest common substring\n", + "\n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Write a function that takes two sequences and returns the longest common substring. A substring is a contiguous portion of a string. For example:\n", + "\n", + "Substrings of `ATGCATAT`:\n", + "\n", + " TGCA\n", + " T\n", + " TAT\n", + " \n", + "Not substrings of `ATGCATAT`:\n", + "\n", + " AGCA # Skipped T\n", + " CCATA # Added another C\n", + " Hello, world. # Has nothing to do with the input sequence\n", + " \n", + "There may be more than one longest common substring; you only need to return one of them.\n", + "\n", + "The call signature of the function should be\n", + "\n", + "```python\n", + "longest_common_substring(s1, s2)\n", + "```\n", + "\n", + "Here are some return values you should get.\n", + "\n", + "|Function call|Result |\n", + "|:---|---:|\n", + "|`longest_common_substring('ATGC', 'ATGCA')` | `'ATGC'`|\n", + "|`longest_common_substring('GATGCCATGCA', 'ATGCC')` | `'ATGCC'`|\n", + "|`longest_common_substring('ACGTGGAAAGCCA', 'GTACACACGTTTTGAGAGACAC') `|`'ACGT'`|" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Solution\n", + "\n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This is actually an [important problem](https://en.wikipedia.org/wiki/Longest_common_substring_problem), and there are clever algorithms to solve it. We will take a more brute force approach. Let $n$ be the length of the shorter of the two strings. We will start with the entirety of the shorter string and see if it is in the longer. We then will take both substrings of length $n - 1$ in the shorter string and check to see if they are in the longer string. We then take all three substrings of length $n - 2$ and see if they are in the longer string. We continue like this until we get a hit, which will necessarily be one of the longest common substrings." + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "def longest_common_substring(s1, s2):\n", + " \"\"\"Return one of the longest common substrings\"\"\"\n", + " # Make sure s1 is the shorter\n", + " if len(s1) > len(s2):\n", + " s1, s2 = s2, s1\n", + " \n", + " # Start with the entire sequence and shorten\n", + " substr_len = len(s1)\n", + " while substr_len > 0: \n", + " # Try all substrings\n", + " for i in range(len(s1) - substr_len + 1):\n", + " if s1[i:i+substr_len] in s2:\n", + " return s1[i:i+substr_len]\n", + "\n", + " substr_len -= 1\n", + " \n", + " # If we haven't returned, there is no common substring\n", + " return ''" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let's try our function out with the tests." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": { + "tags": [] + }, + "outputs": [ + { + "data": { + "text/plain": [ + "'ATGC'" + ] + }, + "execution_count": 2, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "longest_common_substring('ATGC', 'ATGCA')" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": { + "tags": [] + }, + "outputs": [ + { + "data": { + "text/plain": [ + "'ATGCC'" + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "longest_common_substring('GATGCCATGCA', 'ATGCC')" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": { + "tags": [] + }, + "outputs": [ + { + "data": { + "text/plain": [ + "'ACGT'" + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "longest_common_substring('ACGTGGAAAGCCA', 'GTACACACGTTTTGAGAGACAC')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "All look good!" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Computing environment" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": { + "tags": [ + "hide-input" + ] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Python implementation: CPython\n", + "Python version : 3.11.3\n", + "IPython version : 8.12.0\n", + "\n", + "jupyterlab: 3.6.3\n", + "\n" + ] + } + ], + "source": [ + "%load_ext watermark\n", + "%watermark -v -p jupyterlab" + ] + } + ], + "metadata": { + "anaconda-cloud": {}, + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.3" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/2024/_sources/exercise_solutions/exercise_1/exercise_1.6_solution.ipynb.txt b/2024/_sources/exercise_solutions/exercise_1/exercise_1.6_solution.ipynb.txt new file mode 100644 index 00000000..a4c19e12 --- /dev/null +++ b/2024/_sources/exercise_solutions/exercise_1/exercise_1.6_solution.ipynb.txt @@ -0,0 +1,484 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Exercise 1.6: RNA secondary structure validator\n", + "\n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## RNA secondary structure validator\n", + "\n", + "In this problem, we will write a function that takes an RNA sequence and an RNA secondary structure and decides if the secondary structure is possible given the sequence. Remember, single stranded RNA can fold back on itself and form base pairs. An RNA secondary structure is simply the list of base pairs that are present. We will represent the base pairs in dot-parentheses notation. For example, a sequence/secondary structure pair would be\n", + "\n", + " 0123456789\n", + " GCAUCUAUGC\n", + " (((....)))\n", + "\n", + "For convenience of discussion, I have labeled the indices of the bases on the top row. In this case, base `0`, a `G`, pairs with base `9`, a `C`. Base `1` pairs with base `8`, and base `2` pairs with base `7`. Bases `3`, `4`, `5`, and `6` are unpaired. (This structure is aptly called a \"hairpin.\")\n", + "\n", + "I hope the dot-parentheses notation is clear. An open parenthesis is paired with the parenthesis that closes it. Dots are unpaired.\n", + "\n", + "So, the goal of our function is to check all base pairs present in a secondary structure and see if they are with `G-C`, `A-U`, or (optionally) `G-U`." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**a)** Write a function to make sure that the number of closed parentheses is equal to the number of open parentheses, a requirement for a valid secondary structure. It should return `True` if the parentheses are valid and `False` otherwise." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**b)** Write a function that converts the dot-parens notation to a tuple of 2-tuples representing the base pairs. We'll call this function `dotparen_to_bp()`. An example input/output of this function would be:\n", + "\n", + " dotparen_to_bp('(((....)))')\n", + " \n", + " ((0, 9), (1, 8), (2, 7))\n", + " \n", + "*Hint*: You should look at [methods that are available for lists](https://docs.python.org/3/tutorial/datastructures.html#more-on-lists). You might find the `append()` and `pop()` methods useful." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**c)** Because of sterics, the minimal length of a hairpin loop is three bases. A hairpin loop is a series of unpaired bases that are closed by a base pair. For example, the secondary structure `(.(....).)` has a single hairpin loop of length 4. So, the structure `((((..))))` is not valid because it has a hairpin loop of only two bases.\n", + "\n", + "Write a function that verifies that a list of base pairs (as outputted by `dotparen_to_bp()`) satisfies the minimal hairpin length requirement." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**d)** Now write your validator function. The function definition should look like this:\n", + "\n", + " def rna_ss_validator(seq, sec_struc, wobble=True):\n", + " \n", + "It should return `True` if the sequence is commensurate with a valid secondary structure and `False` otherwise. The `wobble` keyword argument is `True` if we allow wobble pairs (`G` paired with `U`). Here are some expected results:\n", + "\n", + "Returns `True`:\n", + "\n", + " rna_ss_validator('GCAUCUAUGC', '(((....)))')\n", + " rna_ss_validator('GCAUCUAUGU', '(((....)))') \n", + " rna_ss_validator('GCAUCUAUGU', '(.(....).)') \n", + "\n", + "Returns `False`:\n", + "\n", + " rna_ss_validator('GCAUCUACGC', '(((....)))')\n", + " rna_ss_validator('GCAUCUAUGU', '(((....)))', wobble=False) \n", + " rna_ss_validator('GCAUCUAUGU', '(.(....)).') \n", + " rna_ss_validator('GCCCUUGGCA', '(.((..))).')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Solution\n", + "\n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "tags": [] + }, + "source": [ + "**a)** This part of the validation is simple. We just need to make sure the number of open and closed parentheses are equal." + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "def parens_count(struc):\n", + " \"\"\"\n", + " Ensures there are equal number of open and closed parentheses\n", + " in structure.\n", + " \"\"\"\n", + " return struc.count('(') == struc.count(')')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let's give it a try." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "True\n", + "False\n" + ] + } + ], + "source": [ + "print(parens_count('(((..(((...)).))))'))\n", + "print(parens_count('(((..(((...)).)))'))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**b)** As we scan a dot-parens structure from left to right, we can keep a list of the positions of open parentheses. Whenever we encounter a closed one, we have closed the last open one we added. So, we can just scan through the dot-parens string and pop out base pairs. If this procedure fails, we know that there was an error in the input structure (i.e., a closed parenthesis appeared without a corresponding open one before it)." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "def dot_parens_to_bp(struc):\n", + " \"\"\"\n", + " Convert a dot-parens structure to a list of base pairs.\n", + " Return False if the structure is invalid.\n", + " \"\"\"\n", + " if not parens_count(struc):\n", + " print('Error in input structure.')\n", + " return False\n", + " \n", + " # Initialize list of open parens and list of base pairs\n", + " open_parens = []\n", + " bps = []\n", + " \n", + " # Scan through string\n", + " for i, x in enumerate(struc):\n", + " if x == '(':\n", + " open_parens.append(i)\n", + " elif x == ')':\n", + " if len(open_parens) > 0:\n", + " bps.append((open_parens.pop(), i))\n", + " else:\n", + " print('Error in input structure.')\n", + " return False\n", + "\n", + " # Return the result as a tuple\n", + " return tuple(sorted(bps))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let's try it on some legitimate sequences and on some bad ones." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "((2, 10), (3, 9), (4, 8), (11, 25), (12, 23), (13, 22), (14, 21), (15, 20))" + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Good structure\n", + "dot_parens_to_bp('..(((...)))(((((....)))).)..')" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "((0, 17), (1, 16), (2, 15), (5, 14), (6, 12), (7, 11))" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Good structure\n", + "dot_parens_to_bp('(((..(((...)).))))')" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Error in input structure.\n" + ] + }, + { + "data": { + "text/plain": [ + "False" + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Bad structure\n", + "dot_parens_to_bp('((....)))(')" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Error in input structure.\n" + ] + }, + { + "data": { + "text/plain": [ + "False" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "dot_parens_to_bp('())....))))')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**c)** It is quite easy to detect short hairpins once we have a list of base pairs. We just need to make sure the difference in index of any pair of paired bases is not less than three." + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [], + "source": [ + "def hairpin_check(bps):\n", + " \"\"\"Check to make sure no hairpins are too short.\"\"\"\n", + " for bp in bps:\n", + " if bp[1] - bp[0] < 4:\n", + " print('A hairpin is too short.')\n", + " return False\n", + " \n", + " # Everything checks out\n", + " return True" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**d)** Most everything is in place. We just need to check the sequence." + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [], + "source": [ + "def rna_ss_validator(seq, sec_struc, wobble=True):\n", + " \"\"\"Validate and RNA structure\"\"\"\n", + " # Convert structure to base pairs\n", + " bps = dot_parens_to_bp(sec_struc)\n", + " \n", + " # If this failed, the structure was invalid\n", + " if not bps:\n", + " return False\n", + " \n", + " # Do the hairpin check\n", + " if not hairpin_check(bps):\n", + " return False\n", + " \n", + " # Possible base pairs\n", + " if wobble:\n", + " ok_bps = ('gc', 'cg', 'au', 'ua', 'gu', 'ug')\n", + " else:\n", + " ok_bps = ('gc', 'cg', 'au', 'ua')\n", + "\n", + " # Check complementarity\n", + " for bp in bps:\n", + " bp_str = (seq[bp[0]] + seq[bp[1]]).lower()\n", + " if bp_str not in ok_bps:\n", + " print('Invalid base pair.')\n", + " return False\n", + " \n", + " # Everything passed\n", + " return True" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let's test it on the test cases from the problem statement." + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Should be True:\n", + "True\n", + "True\n", + "True\n", + "True\n", + "\n", + "Should be False:\n", + "Invalid base pair.\n", + "False \n", + "\n", + "Invalid base pair.\n", + "False \n", + "\n", + "Invalid base pair.\n", + "False \n", + "\n", + "A hairpin is too short.\n", + "False \n", + "\n", + "Invalid base pair.\n", + "False\n" + ] + } + ], + "source": [ + "print('Should be True:')\n", + "print(rna_ss_validator('GCAUCUAUGC', '(((....)))'))\n", + "print(rna_ss_validator('GCAUCUAUGU', '(((....)))'))\n", + "print(rna_ss_validator('GCAUCUAUGU', '(.(....).)'))\n", + "print(rna_ss_validator('AUUGAUGCACGUGCAUCCCCAGCGGGUCCCGCGAGCUCACCCCCUUCCAAAAGCACCACGUGCCAGGCCUCGCCCCCGGAAGUAUACCUGUGAGCCAGA',\n", + " '...(((((....)))))....((((...))))..((((((...(((((....((((...))))..(((...)))...))))).......))))))....'))\n", + "\n", + "print('\\nShould be False:')\n", + "print(rna_ss_validator('GCAUCUACGC', '(((....)))'), '\\n')\n", + "print(rna_ss_validator('GCAUCUAUGU', '(((....)))', wobble=False), '\\n')\n", + "print(rna_ss_validator('GCAUCUAUGU', '(.(....)).'), '\\n')\n", + "print(rna_ss_validator('GCCCUUGGCA', '(.((..))).'),'\\n')\n", + "print(rna_ss_validator('AUUGAUGCACGUGCAUCCCCAGCGGGUCCCGCGAGCCCACCCCCUUCCAAAAGCACCACGUGCCAGGCCUCGCCCCCGGAAGUAUACCUGUGAGCCAGA',\n", + " '...(((((....)))))....((((...))))..((((((...(((((....((((...))))..(((...)))...))))).......))))))....'))" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "tags": [] + }, + "source": [ + "Looks good!" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Computing environment" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": { + "tags": [ + "hide-input" + ] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Python implementation: CPython\n", + "Python version : 3.11.3\n", + "IPython version : 8.12.0\n", + "\n", + "jupyterlab: 3.6.3\n", + "\n" + ] + } + ], + "source": [ + "%load_ext watermark\n", + "%watermark -v -p jupyterlab" + ] + } + ], + "metadata": { + "anaconda-cloud": {}, + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.3" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/2024/_sources/exercise_solutions/exercise_1/index.rst.txt b/2024/_sources/exercise_solutions/exercise_1/index.rst.txt new file mode 100644 index 00000000..f2554466 --- /dev/null +++ b/2024/_sources/exercise_solutions/exercise_1/index.rst.txt @@ -0,0 +1,14 @@ +****************************************************************** +Exercise 1 solutions +****************************************************************** + +.. toctree:: + :maxdepth: 1 + + exercise_1.1_solution.ipynb + exercise_1.2_solution.ipynb + exercise_1.3_solution.ipynb + exercise_1.4_solution.ipynb + exercise_1.5_solution.ipynb + exercise_1.6_solution.ipynb + diff --git a/2024/_sources/index.rst.txt b/2024/_sources/index.rst.txt index c89e23ae..7e245144 100644 --- a/2024/_sources/index.rst.txt +++ b/2024/_sources/index.rst.txt @@ -94,7 +94,6 @@ Files you will need to complete the bootcamp can be found in `this repository on exercises/exercise_3/index.rst exercises/exercise_4/index.rst exercises/exercise_5/index.rst - exercises/exercise_6/index.rst .. toctree:: @@ -106,7 +105,6 @@ Files you will need to complete the bootcamp can be found in `this repository on exercise_solutions/exercise_3/index.rst exercise_solutions/exercise_4/index.rst exercise_solutions/exercise_5/index.rst - exercise_solutions/exercise_6/index.rst .. toctree:: diff --git a/2024/_sources/lessons/bootcamp_live/Untitled.ipynb.txt b/2024/_sources/lessons/bootcamp_live/Untitled.ipynb.txt new file mode 100644 index 00000000..60909096 --- /dev/null +++ b/2024/_sources/lessons/bootcamp_live/Untitled.ipynb.txt @@ -0,0 +1,1637 @@ +{ + "cells": [ + { + "cell_type": "code", + "execution_count": 1, + "id": "61bc3f6a-870a-4903-a5df-c120add940d2", + "metadata": {}, + "outputs": [], + "source": [ + "my_list = [1, 2, 3, 4]" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "id": "9468b95a-6692-4825-8dde-823ea531f79b", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "list" + ] + }, + "execution_count": 2, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "type(my_list)" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "id": "b7fab275-9ab5-4eba-833d-11404b96f1d8", + "metadata": {}, + "outputs": [], + "source": [ + "my_list = [1, 2.4, 'a string', ['a string in another list', 5]]" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "id": "6f3772ff-c06d-47bf-a860-6a5b515a0e65", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[5, 15, 16]" + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "my_list = [2 + 3, 5 * 3, 4**2]\n", + "\n", + "my_list" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "id": "fe9c81ff-ecbf-4f2b-a4ac-5d25d3db9ab2", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "42" + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "int('42')" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "id": "fbc51335-b5fe-45cd-90bc-1abe1fc02841", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "['a', ' ', 's', 't', 'r', 'i', 'n', 'g']" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "list('a string')" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "id": "e3b449d9-277b-45db-bf60-d440d6918950", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[1, 2, 3, 4, 5, 6]" + ] + }, + "execution_count": 8, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "[1, 2, 3] + [4, 5, 6] " + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "id": "3b665855-9e00-4eb4-a6a5-6a6fb4cf5920", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[1, 2, 3, 1, 2, 3, 1, 2, 3]" + ] + }, + "execution_count": 9, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "[1, 2, 3] * 3" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "id": "010ef92b-fa37-48cd-a784-680e90de0cc6", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[5, 15, 16]" + ] + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "my_list" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "id": "30b6dc71-73bf-43b9-bc17-16ca54cb14cc", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "15 in my_list" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "id": "735ff7a5-b11d-4a28-ba1f-bd70e97a1641", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 13, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "'jeffrey lebowski' not in my_list" + ] + }, + { + "cell_type": "code", + "execution_count": 28, + "id": "65755e75-be34-4f72-a333-6f8c505190c8", + "metadata": {}, + "outputs": [], + "source": [ + "my_list = [1, 2.4, 'a string', ['a string in another list', 5]]" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "id": "c8f09808-828d-4c21-9040-c854aad64562", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[1, 2.4, 'a string', ['a string in another list', 5]]" + ] + }, + "execution_count": 15, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "my_list" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "id": "1c0cdfb0-5b8b-4361-b40a-1c91011ecbed", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 17, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "['a string in another list', 5] in my_list" + ] + }, + { + "cell_type": "code", + "execution_count": 73, + "id": "f4962ab0-4b4c-49a2-a5fc-ab1fe9e922d5", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "This codon is a stop codon.\n" + ] + } + ], + "source": [ + "codon = 'UAA'\n", + "\n", + "if codon == 'AUG':\n", + " print('This codon is the start codon.')\n", + "elif codon in ('UAA', 'UAG', 'UGA'):\n", + " print('This codon is a stop codon.')\n", + "else:\n", + " print('This codon is neither a start nor stop codon.')" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "id": "e1f9fb88-9f0d-4e1a-834b-842c1cbc0d33", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[1, 2.4, 'a string', ['a string in another list', 5]]" + ] + }, + "execution_count": 21, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "my_list" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "id": "f9dfa49b-6a6f-414b-872d-67bc015f6646", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'a string'" + ] + }, + "execution_count": 22, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "my_list[2]" + ] + }, + { + "cell_type": "code", + "execution_count": 29, + "id": "60db5f5f-605f-4684-9296-57948133cef0", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[1, 2.4, 'a string', ['a string in another list', 5]]" + ] + }, + "execution_count": 29, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "my_list" + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "id": "1ab50364-b2cb-493d-8e7d-f3063ef4fafb", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "5" + ] + }, + "execution_count": 25, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "my_list[3][1]" + ] + }, + { + "cell_type": "code", + "execution_count": 30, + "id": "52960b27-9415-4e39-9824-35de067be06d", + "metadata": {}, + "outputs": [], + "source": [ + "my_list = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]" + ] + }, + { + "cell_type": "code", + "execution_count": 31, + "id": "a649fc66-50f5-466a-bfd8-eee2a11d6b75", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "4" + ] + }, + "execution_count": 31, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "my_list[4]" + ] + }, + { + "cell_type": "code", + "execution_count": 32, + "id": "35517bda-ce6e-40e4-b217-833fa52373b7", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0" + ] + }, + "execution_count": 32, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "my_list[0]" + ] + }, + { + "cell_type": "code", + "execution_count": 41, + "id": "c3de8081-c330-4d34-bee0-2ae7a5a56985", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]" + ] + }, + "execution_count": 41, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "my_list" + ] + }, + { + "cell_type": "code", + "execution_count": 47, + "id": "49cd38ec-615b-4721-8c01-4d37125ada27", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[9, 7, 5, 3]" + ] + }, + "execution_count": 47, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "my_list[-2:2:-2] " + ] + }, + { + "cell_type": "code", + "execution_count": 48, + "id": "49dd5809-ce1b-4360-bc98-27d645ab9047", + "metadata": {}, + "outputs": [], + "source": [ + "my_slice = my_list[1:7:-3]" + ] + }, + { + "cell_type": "code", + "execution_count": 50, + "id": "1d7f98d9-d3d0-42b6-a628-bff1fcd07c97", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]" + ] + }, + "execution_count": 50, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "my_list" + ] + }, + { + "cell_type": "code", + "execution_count": 51, + "id": "e02cc9ba-0b04-4508-91b1-0d8df34c7dfb", + "metadata": {}, + "outputs": [], + "source": [ + "my_list[4] = 'four'" + ] + }, + { + "cell_type": "code", + "execution_count": 52, + "id": "bf6bab8c-111d-4219-a8b3-ff2baaf0e34a", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[0, 1, 2, 3, 'four', 5, 6, 7, 8, 9, 10]" + ] + }, + "execution_count": 52, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "my_list" + ] + }, + { + "cell_type": "code", + "execution_count": 56, + "id": "eaae8e68-d031-4345-af11-14a9d66a5476", + "metadata": {}, + "outputs": [], + "source": [ + "my_list = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]\n", + "my_list2 = my_list\n", + "\n", + "my_list2[0] = 'a'" + ] + }, + { + "cell_type": "code", + "execution_count": 57, + "id": "94ab0c98-5a4f-427b-a5e0-c8f2e44fd733", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "['a', 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]" + ] + }, + "execution_count": 57, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "my_list2" + ] + }, + { + "cell_type": "code", + "execution_count": 59, + "id": "497d6aba-3c41-4b3e-ace3-637741d8048f", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "['a', 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]" + ] + }, + "execution_count": 59, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "my_list" + ] + }, + { + "cell_type": "code", + "execution_count": 61, + "id": "d09c3dbb-62cc-41ae-bbad-40864beb7a7e", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 61, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "my_list is my_list2" + ] + }, + { + "cell_type": "code", + "execution_count": 53, + "id": "3c6d3742-e3a8-40a3-92c2-8d480b1b63cf", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "7" + ] + }, + "execution_count": 53, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "a = 5\n", + "b = 7\n", + "a = b\n", + "\n", + "a" + ] + }, + { + "cell_type": "code", + "execution_count": 62, + "id": "771f206d-f33b-4f45-937d-5322124cd217", + "metadata": {}, + "outputs": [], + "source": [ + "my_tuple = (1, 2, 3, 4, 5, 6, 7, 8, 9, 10)" + ] + }, + { + "cell_type": "code", + "execution_count": 63, + "id": "d77ccc15-5f15-4629-bba2-5ad96b7682b4", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(1, 2, 3, 4, 5, 6, 7, 8, 9, 10)" + ] + }, + "execution_count": 63, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "my_tuple" + ] + }, + { + "cell_type": "code", + "execution_count": 65, + "id": "1613cc56-5faf-421d-b76a-b16a6898fde3", + "metadata": {}, + "outputs": [], + "source": [ + "my_tuple2 = my_tuple" + ] + }, + { + "cell_type": "code", + "execution_count": 67, + "id": "7cf257da-bea8-4ccd-b59d-19e5c015ea1d", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 67, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "my_tuple2 is my_tuple" + ] + }, + { + "cell_type": "code", + "execution_count": 68, + "id": "43892cb6-658e-4ecf-89e6-a12c454e9782", + "metadata": {}, + "outputs": [], + "source": [ + "my_tuple = (5, 6, 7)" + ] + }, + { + "cell_type": "code", + "execution_count": 69, + "id": "ef0e66f8-728b-4588-be63-aeda22abe6ef", + "metadata": {}, + "outputs": [], + "source": [ + "a, b, c = my_tuple" + ] + }, + { + "cell_type": "code", + "execution_count": 70, + "id": "65e4ec42-83f3-4794-a8d4-1a014a1a0d9c", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "5" + ] + }, + "execution_count": 70, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "a" + ] + }, + { + "cell_type": "code", + "execution_count": 71, + "id": "c5d7fc1b-829e-409d-b991-6c2fc42a6fbb", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "6" + ] + }, + "execution_count": 71, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "b" + ] + }, + { + "cell_type": "code", + "execution_count": 72, + "id": "12fda73e-16b1-4e64-8960-e590f3aa9d11", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "7" + ] + }, + "execution_count": 72, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "c" + ] + }, + { + "cell_type": "code", + "execution_count": 89, + "id": "9b483621-600c-4d9a-a940-77743f0ec47b", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "2 5 6 9 11 12 14 16 19 20 22 23 24 25 26 31 32 34 " + ] + } + ], + "source": [ + "seq = 'UACUACGAUCAGGACUGAUCGACGCGCUAUACGACUA'\n", + "\n", + "for i, base in enumerate(seq):\n", + " if base in 'GCgc':\n", + " print(i, end=' ')" + ] + }, + { + "cell_type": "code", + "execution_count": 75, + "id": "94910ea0-afc5-4177-ae44-7debcc9541b8", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "37" + ] + }, + "execution_count": 75, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "len(seq)" + ] + }, + { + "cell_type": "code", + "execution_count": 85, + "id": "d8847189-3c93-4ce1-8ee9-302592d26366", + "metadata": {}, + "outputs": [], + "source": [ + "my_integers = [1, 2, 3, 4, 5]\n", + "\n", + "for i in range(len(my_integers)):\n", + " my_integers[i] *= 2" + ] + }, + { + "cell_type": "code", + "execution_count": 86, + "id": "00af6e15-ecce-402c-bb70-89ece2685040", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[2, 4, 6, 8, 10]" + ] + }, + "execution_count": 86, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "my_integers" + ] + }, + { + "cell_type": "code", + "execution_count": 84, + "id": "22b0d933-99b2-4389-9653-1882ca6d2a70", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[0, 1, 2, 3, 4]" + ] + }, + "execution_count": 84, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "list(range(5))" + ] + }, + { + "cell_type": "code", + "execution_count": 90, + "id": "aefb5663-c2e9-45d9-a594-3ad5f85f871a", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "23 Acosta MF\n", + "3 Murillo D\n", + "11 Bale F\n" + ] + } + ], + "source": [ + "names = ('Acosta', 'Murillo', 'Bale')\n", + "positions = ('MF', 'D', 'F')\n", + "numbers = (23, 3, 11)\n", + "\n", + "for num, pos, name in zip(numbers, positions, names):\n", + " print(num, name, pos)" + ] + }, + { + "cell_type": "code", + "execution_count": 92, + "id": "3d044c33-d318-4069-b9e2-4bc1953fea30", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "10\n", + "9\n", + "8\n", + "7\n", + "6\n", + "5\n", + "4\n", + "3\n", + "2\n", + "1\n", + "ignition\n" + ] + } + ], + "source": [ + "count_up = ('ignition', 1, 2, 3, 4, 5, 6, 7, 8, 9, 10)\n", + "\n", + "for count in reversed(count_up):\n", + " print(count)" + ] + }, + { + "cell_type": "code", + "execution_count": 98, + "id": "2b281a56-ba37-4052-98cf-3d80bdc6067a", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Didn't find codon.\n" + ] + } + ], + "source": [ + "seq = 'UAGUACUACUAGUAUGAUGCCAUCCCUA'\n", + "codon = 'GGG'\n", + "\n", + "i = 0\n", + "\n", + "while seq[i:i+3] != codon and i < len(seq):\n", + " i += 1\n", + "\n", + "if i == len(seq):\n", + " print(\"Didn't find codon.\")\n", + "else:\n", + " print('The index of the codon is', i)" + ] + }, + { + "cell_type": "code", + "execution_count": 96, + "id": "bc2a2d22-4792-4d7d-a9fa-58089b71862b", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "''" + ] + }, + "execution_count": 96, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "seq[100:103]" + ] + }, + { + "cell_type": "code", + "execution_count": 99, + "id": "cf51940d-8b3e-4093-b7f0-93f65e8568cd", + "metadata": {}, + "outputs": [], + "source": [ + "def ratio(x, y):\n", + " \"\"\"The ratio of `x` to `y`.\"\"\"\n", + " return x / y" + ] + }, + { + "cell_type": "code", + "execution_count": 102, + "id": "1c530201-d085-418d-beb7-7008767a82d8", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "2.0" + ] + }, + "execution_count": 102, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "ratio(4, 2)" + ] + }, + { + "cell_type": "code", + "execution_count": 103, + "id": "56d1184f-7d81-46e8-9724-25e82942fb4a", + "metadata": {}, + "outputs": [], + "source": [ + "def answer_to_the_ultimate_question_of_life_the_universe_and_everything():\n", + " \"\"\"Simpler program that Deep Thgouth's, I bet.\"\"\"\n", + " return 42" + ] + }, + { + "cell_type": "code", + "execution_count": 104, + "id": "5f5b7b87-3e5e-4fd9-9388-a2277727bea0", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "42" + ] + }, + "execution_count": 104, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "answer_to_the_ultimate_question_of_life_the_universe_and_everything()" + ] + }, + { + "cell_type": "code", + "execution_count": 105, + "id": "7ebfe2da-794f-4f7d-bc4a-853b9bfe9dff", + "metadata": {}, + "outputs": [], + "source": [ + "def think_too_much():\n", + " \"\"\"Express Caesar's skepticism about Cassius.\"\"\"\n", + " print(\"\"\"Yond Cassius has a lean and hungry look,\n", + "He thinks too much; such men are dangerous.\"\"\")" + ] + }, + { + "cell_type": "code", + "execution_count": 111, + "id": "4a845090-18fa-4061-8ac0-13265c89d6c7", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Yond Cassius has a lean and hungry look,\n", + "He thinks too much; such men are dangerous.\n" + ] + } + ], + "source": [ + "return_val = think_too_much()" + ] + }, + { + "cell_type": "code", + "execution_count": 114, + "id": "ff8400ab-90e5-48fe-88b5-1748fe6de5b9", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "None\n" + ] + } + ], + "source": [ + "print(return_val)" + ] + }, + { + "cell_type": "code", + "execution_count": 107, + "id": "32001416-baec-4017-ab83-8958f3477521", + "metadata": {}, + "outputs": [], + "source": [ + "def evens_up_to_8():\n", + " return 2, 4, 6, 8" + ] + }, + { + "cell_type": "code", + "execution_count": 115, + "id": "ee7d4cb5-ec8c-433b-a2bb-0b67ad192c68", + "metadata": {}, + "outputs": [], + "source": [ + "a, b, c, d = evens_up_to_8()" + ] + }, + { + "cell_type": "code", + "execution_count": 117, + "id": "f19293e5-c7a8-400b-98bb-64ad1e2ce880", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "6" + ] + }, + "execution_count": 117, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "c" + ] + }, + { + "cell_type": "code", + "execution_count": 126, + "id": "f6afd3c7-4b51-40a8-9425-125715b71aef", + "metadata": {}, + "outputs": [], + "source": [ + "def complement_base(base, material='DNA'):\n", + " \"\"\"Return the Watson-Crick complement of a base.\"\"\"\n", + " if base in 'Aa':\n", + " if material == 'DNA':\n", + " return 'T'\n", + " elif material == 'RNA':\n", + " return 'U'\n", + " elif base in 'TtUu':\n", + " return 'A'\n", + " elif base in 'Gg':\n", + " return 'C'\n", + " elif base in 'Cc':\n", + " return 'G'\n", + " else:\n", + " return ''\n", + " \n", + "\n", + "def reverse_complement(seq, material='DNA'):\n", + " \"\"\"Compute the reverse of a sequence.\"\"\"\n", + " # Initialize the rev comp\n", + " rev_seq = ''\n", + "\n", + " # Loop through in reverse and add each base\n", + " for base in reversed(seq):\n", + " rev_seq += complement_base(base, material)\n", + "\n", + " return rev_seq" + ] + }, + { + "cell_type": "code", + "execution_count": 130, + "id": "f8cf4dac-15c3-4f0c-828c-d946c3252023", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'UGCAACUGC'" + ] + }, + "execution_count": 130, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "reverse_complement(seq='GCAGUUGCA', material='RNA')" + ] + }, + { + "cell_type": "code", + "execution_count": 138, + "id": "a1ba54ff-3540-41de-982e-4f036f756f40", + "metadata": {}, + "outputs": [], + "source": [ + "def is_almost_right(a, b, c):\n", + " \"\"\"Check to see if a triangle with side lengths a, b, and c is right.\"\"\"\n", + " # Use sorted() to make sure c is largest\n", + " a, b, c = sorted([a, b, c])\n", + "\n", + " if abs(a**2 + b**2 - c**2) < 1e-12:\n", + " return True\n", + " else:\n", + " return False\n" + ] + }, + { + "cell_type": "code", + "execution_count": 143, + "id": "f53c0101-24a9-4e2c-b48a-0adb803aed11", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 143, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "is_almost_right(5, 12, 13)" + ] + }, + { + "cell_type": "code", + "execution_count": 144, + "id": "75bb3a9f-fba8-4e7d-b953-80053c1a6058", + "metadata": {}, + "outputs": [], + "source": [ + "side_lengths = (5, 12, 13)" + ] + }, + { + "cell_type": "code", + "execution_count": 146, + "id": "8f51234e-ba43-4813-b193-3ecc349076c3", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 146, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "is_almost_right(*side_lengths) " + ] + }, + { + "cell_type": "code", + "execution_count": 147, + "id": "922cf9d8-4ee2-406c-aedc-a9411993e5d2", + "metadata": {}, + "outputs": [], + "source": [ + "def ratio(x, y):\n", + " \"\"\"ratio\"\"\"\n", + " return x / y" + ] + }, + { + "cell_type": "code", + "execution_count": 148, + "id": "88a7cd90-bd44-434e-8007-f46c42cf583b", + "metadata": {}, + "outputs": [], + "source": [ + "ratio = lambda x, y: x / y" + ] + }, + { + "cell_type": "code", + "execution_count": 149, + "id": "b4d7b4a8-eaeb-4194-b7e0-68b6d11f7c32", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0.7142857142857143" + ] + }, + "execution_count": 149, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "ratio(5, 7)" + ] + }, + { + "cell_type": "code", + "execution_count": 157, + "id": "d4e00fd9-5e3a-494e-aa40-ad9aa78d16be", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "['Kellyn Acosta', 'Gareth Bale', 'Jesus Murillo']" + ] + }, + "execution_count": 157, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "sorted(['Kellyn Acosta', 'Jesus Murillo', 'Gareth Bale'], key=lambda x: x[x.find(' ')+1:])" + ] + }, + { + "cell_type": "code", + "execution_count": 158, + "id": "4f2399bc-0c79-4fbd-aacf-5b9efd3d5228", + "metadata": {}, + "outputs": [], + "source": [ + "last_name = lambda x: x[x.find(' ')+1:]" + ] + }, + { + "cell_type": "code", + "execution_count": 159, + "id": "716eb5ef-6b6a-4fc5-941a-9da9d7b49c54", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'Lebowski'" + ] + }, + "execution_count": 159, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "last_name(\"Jeffrey Lebowski\")" + ] + }, + { + "cell_type": "code", + "execution_count": 161, + "id": "e8ed68b7-de26-40bb-be4f-8225a4413650", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'.sediba eduD ehT'" + ] + }, + "execution_count": 161, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "my_str = 'The Dude abides.'\n", + "\n", + "my_str[::-1]" + ] + }, + { + "cell_type": "code", + "execution_count": 164, + "id": "ab1337a0-547c-44fc-bf35-33f53d2c66a4", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0.5428571428571428" + ] + }, + "execution_count": 164, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "seq = 'ATCGATCGCTTCTAGGCGATCGTACGATCGACTGC'\n", + "\n", + "(seq.count('G') + seq.count('C')) / len(seq)" + ] + }, + { + "cell_type": "code", + "execution_count": 165, + "id": "5172ca87-3eef-49ce-b47e-cb3b73134ab8", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "6" + ] + }, + "execution_count": 165, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "'AGTAGATACAGTATAGTAGT'.count('T')" + ] + }, + { + "cell_type": "code", + "execution_count": 168, + "id": "9eed63b7-a76b-40da-9383-34970e260126", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0" + ] + }, + "execution_count": 168, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "'AAAAAAA'.count('nonsense')" + ] + }, + { + "cell_type": "code", + "execution_count": 171, + "id": "3836b019-949d-4cc5-a9bb-84b51d170136", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "25" + ] + }, + "execution_count": 171, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "'AGATCGAGAUAGAUGATCGATCAGGGATCG'.rfind('GAT')" + ] + }, + { + "cell_type": "code", + "execution_count": 173, + "id": "83e13637-347d-48c6-9b19-3239d7b25eec", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'AGTGAGATGAG'" + ] + }, + "execution_count": 173, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "'AGTGAGATGAG'.lower().upper()" + ] + }, + { + "cell_type": "code", + "execution_count": 175, + "id": "6631a01c-da7d-4da5-9f49-ad326273601c", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'The*Dude*abides.'" + ] + }, + "execution_count": 175, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "word_tuple = ('The', 'Dude', 'abides.')\n", + "\n", + "'*'.join(word_tuple)" + ] + }, + { + "cell_type": "code", + "execution_count": 180, + "id": "8df46e1f-6af7-4302-947a-88d684764f13", + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "During this bootcamp, I feel tiny.\n", + "The instructors give us flowers.\n", + "\n" + ] + } + ], + "source": [ + "adjective = 'tiny'\n", + "plural_noun = 'flowers'\n", + "\n", + "my_str = f\"\"\"\n", + "During this bootcamp, I feel {adjective}.\n", + "The instructors give us {plural_noun}.\n", + "\"\"\"\n", + "\n", + "print(my_str)" + ] + }, + { + "cell_type": "code", + "execution_count": 182, + "id": "2a5acfeb-2e4c-4ef8-b0fc-785c74826e79", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'There are 0050 states in the US.'" + ] + }, + "execution_count": 182, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "'There are {n:04d} states in the US.'.format(n=50)" + ] + }, + { + "cell_type": "code", + "execution_count": 184, + "id": "c8b29be1-f827-443b-b35c-660ff5f790f9", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'π is approximately 3.141593e+00'" + ] + }, + "execution_count": 184, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "pi = 3.1415926535\n", + "f'π is approximately {pi:.6e}'" + ] + }, + { + "cell_type": "code", + "execution_count": 185, + "id": "89d7a73b-32fe-4a3b-8fb9-e9603b3c99fe", + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "1000000000000" + ] + }, + "execution_count": 185, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "1_000_000_000_000" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6ef7922c-fda7-4e76-9c24-a365467fbd37", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.9" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/2024/_sources/lessons/l03_variables_operators_types.ipynb.txt b/2024/_sources/lessons/l03_variables_operators_types.ipynb.txt index 0d397dbe..88639307 100644 --- a/2024/_sources/lessons/l03_variables_operators_types.ipynb.txt +++ b/2024/_sources/lessons/l03_variables_operators_types.ipynb.txt @@ -16,6 +16,7 @@ "Whether you are programming in Python or pretty much any other language, you will be working with **variables**. While the precise definition of a variable will vary from language to language, we'll focus on Python variables here. Like many of the concepts in this bootcamp, though, the knowledge you gain about Python variables will translate to other languages.\n", "\n", "We will talk more about **objects** later, but a variable, like everything in Python, is an object. For now, you can think of it this way. The following can be properties of a variable:\n", + "\n", "1. The **type** of variable. E.g., is it an integer, like `2`, or a string, like `'Hello, world.'`?\n", "2. The **value** of the variable.\n", "\n", diff --git a/2024/_sources/lessons/l04_more_operators_and_conditionals.ipynb.txt b/2024/_sources/lessons/l04_more_operators_and_conditionals.ipynb.txt index 8eb74b1c..e605c19a 100644 --- a/2024/_sources/lessons/l04_more_operators_and_conditionals.ipynb.txt +++ b/2024/_sources/lessons/l04_more_operators_and_conditionals.ipynb.txt @@ -610,8 +610,8 @@ "\n", "|English|Python|\n", "|:-------|:----------:|\n", - "|is the same object | **`is`**|\n", - "|is not the same object | **`is not`**|\n", + "|is the same object | `is` |\n", + "|is not the same object | `is not` |\n", "\n", "That's right. The operators are pretty much the same as English! Let's see these operators in action and get at the difference between `==` and `is`. Let's use the **`is`** operator to investigate how Python stored variables in memory, starting with `float`s." ] diff --git a/2024/exercise_solutions/exercise_1/exercise_1.1_solution.html b/2024/exercise_solutions/exercise_1/exercise_1.1_solution.html new file mode 100644 index 00000000..f379d791 --- /dev/null +++ b/2024/exercise_solutions/exercise_1/exercise_1.1_solution.html @@ -0,0 +1,270 @@ + + + + + + + Exercise 1.1: Command line exercises — Programming Bootcamp documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + +
+ +
+
+
+ +
+
+
+
+ +
+

Exercise 1.1: Command line exercises

+

In this exercise, you will play around with the command line on your machine and get more familiar with it.

+

a) Let’s play around with some options for the ls command. First cd into a directory that has some interesting files in it (like ~git/bootcamp/command_line_tutorial). Try the following if you are using bash.

+
ls -F
+ls -G    # Might not be as cool with Git Bash on Windows
+ls -l
+ls -lh
+ls -lS
+ls -FGLh
+
+
+

You should be able to infer what these different options do, but you can ask the course staff as well.

+

Normally, files that begin with a dot (.) are omitted when listing things. They are also generally omitted when you use your OS’s GUI-based file handling system (like Finder on Macs). To see them, use ls -a. So, cd into your home directory (you remember how to do that, right?), and then do

+
ls -a
+
+
+

b) The nuclear option to delete everything in a directory is rm -rf. The r means to delete recursively, and the f means to “force” deletion. I was going to give you an exercise that uses the nuclear option, but I’m not going to do that. So, just forget I said anything. For this part of the problem, I want you to discuss with someone else in the class when the nuclear option might be used, and what needs to be in place before exercising it.

+

c) Try doing this if you are using macOS or Linux:

+
ls /
+
+
+

What is /? Try cd-ing there and seeing what’s in there. Do not delete anything!

+
+

Solution

+

This problem more or less consisted of messing around with the command line.

+
+
+ + +
+
+ +
+
+
+
+ + + + + + + + + + + \ No newline at end of file diff --git a/2024/exercise_solutions/exercise_1/exercise_1.1_solution.ipynb b/2024/exercise_solutions/exercise_1/exercise_1.1_solution.ipynb new file mode 100644 index 00000000..6394ced9 --- /dev/null +++ b/2024/exercise_solutions/exercise_1/exercise_1.1_solution.ipynb @@ -0,0 +1,100 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Exercise 1.1: Command line exercises\n", + "\n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In this exercise, you will play around with the command line on your machine and get more familiar with it.\n", + "\n", + "**a)** Let's play around with some options for the `ls` command. First `cd` into a directory that has some interesting files in it (like `~git/bootcamp/command_line_tutorial`). Try the following if you are using `bash`.\n", + "\n", + " ls -F\n", + " ls -G # Might not be as cool with Git Bash on Windows\n", + " ls -l\n", + " ls -lh\n", + " ls -lS\n", + " ls -FGLh\n", + " \n", + "You should be able to infer what these different options do, but you can ask the course staff as well.\n", + "\n", + "Normally, files that begin with a dot (`.`) are omitted when listing things. They are also generally omitted when you use your OS's GUI-based file handling system (like Finder on Macs). To see them, use `ls -a`. So, `cd` into your home directory (you remember how to do that, right?), and then do\n", + "\n", + " ls -a" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**b)** The nuclear option to delete *everything* in a directory is `rm -rf`. The `r` means to delete recursively, and the `f` means to \"force\" deletion. I was going to give you an exercise that uses the nuclear option, but I'm not going to do that. So, just forget I said anything. For this part of the problem, I want you to discuss with someone else in the class *when* the nuclear option might be used, and what needs to be in place before exercising it." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**c)** Try doing this if you are using macOS or Linux:\n", + "\n", + " ls /\n", + " \n", + "What is `/`? Try `cd`-ing there and seeing what's in there. **Do not delete anything!**" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Solution\n", + "\n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "tags": [] + }, + "source": [ + "This problem more or less consisted of messing around with the command line." + ] + } + ], + "metadata": { + "anaconda-cloud": {}, + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.12" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/2024/exercise_solutions/exercise_1/exercise_1.2_solution.html b/2024/exercise_solutions/exercise_1/exercise_1.2_solution.html new file mode 100644 index 00000000..7651b483 --- /dev/null +++ b/2024/exercise_solutions/exercise_1/exercise_1.2_solution.html @@ -0,0 +1,302 @@ + + + + + + + Exercise 1.2: Making an rc file — Programming Bootcamp documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + +
+ +
+
+
+ +
+
+
+
+ +
+

Exercise 1.2: Making an rc file

+

Having a .bashrc or .zshrc file allows you to configure your shell how you like.

+

a) If you are using Linux or macOS, open a terminal and type

+
echo $SHELL
+
+
+

This will tell you if you are using a Bash shell or Zsh, which will tell you which kind of rc file to set up in the next part of the exercise. If you are using Windows, you will create a .bashrc file.

+

b) Create a .bashrc or .zshrc file in your home directory. If you already have one, open it up for editing using Jupyter’s text editor.

+

c) It is often useful to alias functions to other functions. For example, I am always worried I will accidentally delete things by accident. I therefore have the following line in my .zshrc file.

+
alias rm="rm -i"
+
+
+

You should create aliases for commands like ls based on the flags you like to always use. Do the same for rm and mv (I use the -i flag with these). To figure out what flags are available, you can look at the man pages. Asking Google will usually give you the information you need on flags.

+

If you like, you can use my .bashrc file, available in ~/git/bootcamp/misc/jb_bashrc, or my .zshrc file, available in ~/git/bootcamp/misc/jb_zshrc.

+

d) Depending on your operating system, if you are using Bash, your ~/.bashrc file may or may not be properly loaded upon opening a new bash shell. You may, e.g. for new macOS versions, need to explicitly source your .bashrc file in your ~/.bash_profile file. Therefore, you should add the following to the bottom of your ~/.bash_profile file.

+
if [ -f $HOME/.bashrc ]; then
+    . $HOME/.bashrc
+fi
+
+
+
+

Solution

+

Again, this was mostly you messing around with the command line. The contents of my .bashrc file are shown below.

+
# Give me a nice prompt that tells me my pwd
+export PS1="\[\e[1;32m\]\u\[\e[0m\]@\e[1;36m\]\h\[\e[0m\] [\w]\n% "
+
+
+# Keep me out of trouble!
+alias rm="rm -i"
+alias mv="mv -i"
+alias cp="cp -i"
+
+
+# customize list output
+alias ls="ls -FGh"
+export LSCOLORS="gxfxcxdxCxegedabagacad"
+
+
+

And my .zshrc file is:

+
# This is a nice prompt; gives green check mark if last command executed
+# without a problem and gives a red questionmark with an exit code
+# if it didn't, along with pwd.
+PROMPT='%(?.%F{green}√.%F{red}?%?)%f [%B%F{240}%10~%f%b]
+%# '
+
+# Aliases for save moving, removing, and copying of files
+alias rm="rm -i"
+alias mv="mv -i"
+alias cp="cp -i"
+
+# Nicely formatted listing
+alias ls="ls -FGh"
+
+# Nice set of colors
+export LSCOLORS="gxfxcxdxCxegedabagacad"
+
+
+
+
+ + +
+
+ +
+
+
+
+ + + + + + + + + + + \ No newline at end of file diff --git a/2024/exercise_solutions/exercise_1/exercise_1.2_solution.ipynb b/2024/exercise_solutions/exercise_1/exercise_1.2_solution.ipynb new file mode 100644 index 00000000..bbb3a8f2 --- /dev/null +++ b/2024/exercise_solutions/exercise_1/exercise_1.2_solution.ipynb @@ -0,0 +1,128 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Exercise 1.2: Making an rc file\n", + "\n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Having a `.bashrc` or `.zshrc` file allows you to configure your shell how you like.\n", + "\n", + "**a)** If you are using Linux or macOS, open a terminal and type\n", + "\n", + " echo $SHELL\n", + " \n", + "This will tell you if you are using a Bash shell or Zsh, which will tell you which kind of rc file to set up in the next part of the exercise. If you are using Windows, you will create a `.bashrc` file.\n", + "\n", + "**b)** Create a `.bashrc` or `.zshrc` file in your home directory. If you already have one, open it up for editing using Jupyter's text editor.\n", + "\n", + "**c)** It is often useful to `alias` functions to other functions. For example, I am always worried I will accidentally delete things by accident. I therefore have the following line in my `.zshrc` file.\n", + "\n", + " alias rm=\"rm -i\"\n", + " \n", + "You should create aliases for commands like `ls` based on the flags you like to *always* use. Do the same for `rm` and `mv` (I use the `-i` flag with these). To figure out what flags are available, you can look at the `man` pages. Asking Google will usually give you the information you need on flags.\n", + "\n", + "If you like, you can use my `.bashrc` file, available in `~/git/bootcamp/misc/jb_bashrc`, or my `.zshrc` file, available in `~/git/bootcamp/misc/jb_zshrc`.\n", + "\n", + "**d)** Depending on your operating system, if you are using Bash, your `~/.bashrc` file may or may not be properly loaded upon opening a new bash shell. You may, e.g. for new macOS versions, need to explicitly source your `.bashrc` file in your `~/.bash_profile` file. Therefore, you should add the following to the bottom of your `~/.bash_profile` file.\n", + "\n", + "```bash\n", + "if [ -f $HOME/.bashrc ]; then\n", + " . $HOME/.bashrc\n", + "fi\n", + "```" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Solution\n", + "\n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "tags": [] + }, + "source": [ + "Again, this was mostly you messing around with the command line. The contents of my .bashrc file are shown below.\n", + "\n", + "```bash\n", + "# Give me a nice prompt that tells me my pwd\n", + "export PS1=\"\\[\\e[1;32m\\]\\u\\[\\e[0m\\]@\\e[1;36m\\]\\h\\[\\e[0m\\] [\\w]\\n% \"\n", + "\n", + "\n", + "# Keep me out of trouble!\n", + "alias rm=\"rm -i\"\n", + "alias mv=\"mv -i\"\n", + "alias cp=\"cp -i\"\n", + "\n", + "\n", + "# customize list output\n", + "alias ls=\"ls -FGh\"\n", + "export LSCOLORS=\"gxfxcxdxCxegedabagacad\"\n", + "```\n", + "\n", + "And my .zshrc file is:\n", + "\n", + "```zsh\n", + "# This is a nice prompt; gives green check mark if last command executed\n", + "# without a problem and gives a red questionmark with an exit code\n", + "# if it didn't, along with pwd.\n", + "PROMPT='%(?.%F{green}√.%F{red}?%?)%f [%B%F{240}%10~%f%b] \n", + "%# '\n", + "\n", + "# Aliases for save moving, removing, and copying of files\n", + "alias rm=\"rm -i\"\n", + "alias mv=\"mv -i\"\n", + "alias cp=\"cp -i\"\n", + "\n", + "# Nicely formatted listing\n", + "alias ls=\"ls -FGh\"\n", + "\n", + "# Nice set of colors\n", + "export LSCOLORS=\"gxfxcxdxCxegedabagacad\"\n", + "```" + ] + } + ], + "metadata": { + "anaconda-cloud": {}, + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.3" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/2024/exercise_solutions/exercise_1/exercise_1.3_solution.html b/2024/exercise_solutions/exercise_1/exercise_1.3_solution.html new file mode 100644 index 00000000..c52812a9 --- /dev/null +++ b/2024/exercise_solutions/exercise_1/exercise_1.3_solution.html @@ -0,0 +1,316 @@ + + + + + + + Exercise 1.3: Time and type conversions — Programming Bootcamp documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + +
+ +
+
+
+ +
+
+
+
+ +
+

Exercise 1.3: Time and type conversions

+

Using the techniques you have learned in the first day of bootcamp, generate a time stamp (like 13:29:45 for nearly half past one in the afternoon) for the time that is 63,252 seconds after midnight. Start with this statement:

+
+
[1]:
+
+
+
seconds_past_midnight = 63252
+
+
+
+

After that statement, the only numeric keys you should need or want to push are 0, 2 or 3, and 6.

+
+

Solution

+

To get the number of hours, we floor divide by 3600. To get the number of minutes, we take the modulus of division by 3600, and then divide that by 60. Finally, the seconds are what is left over when we divide by 60.

+
+
[2]:
+
+
+
hours = seconds_past_midnight // 60**2
+minutes = (seconds_past_midnight % 60**2) // 60
+seconds = seconds_past_midnight % 60
+
+
+
+

Now that we have these, we concatenate a string together.

+
+
[3]:
+
+
+
time_str = str(hours) + ':' + str(minutes) + ':' + str(seconds)
+
+print(time_str)
+
+
+
+
+
+
+
+
+17:34:12
+
+
+

There are much elegant ways of doing these two operations, including using string methods. For most applications using time stamps, you would use the datetime module of the standard library.

+
+
+

Computing environment

+
+
[4]:
+
+
+
%load_ext watermark
+%watermark -v -p jupyterlab
+
+
+
+
+
+
+
+
+Python implementation: CPython
+Python version       : 3.11.3
+IPython version      : 8.12.0
+
+jupyterlab: 3.6.3
+
+
+
+
+
+ + +
+
+ +
+
+
+
+ + + + + + + + + + + \ No newline at end of file diff --git a/2024/exercise_solutions/exercise_1/exercise_1.3_solution.ipynb b/2024/exercise_solutions/exercise_1/exercise_1.3_solution.ipynb new file mode 100644 index 00000000..8e2c6b16 --- /dev/null +++ b/2024/exercise_solutions/exercise_1/exercise_1.3_solution.ipynb @@ -0,0 +1,160 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Exercise 1.3: Time and type conversions\n", + "\n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Using the techniques you have learned in the first day of bootcamp, generate a time stamp (like 13:29:45 for nearly half past one in the afternoon) for the time that is 63,252 seconds after midnight. Start with this statement:" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "seconds_past_midnight = 63252" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "After that statement, the only numeric keys you should need or want to push are `0`, `2` or `3`, and `6`." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Solution\n", + "\n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To get the number of hours, we floor divide by 3600. To get the number of minutes, we take the modulus of division by 3600, and then divide that by 60. Finally, the seconds are what is left over when we divide by 60." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [], + "source": [ + "hours = seconds_past_midnight // 60**2\n", + "minutes = (seconds_past_midnight % 60**2) // 60\n", + "seconds = seconds_past_midnight % 60" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now that we have these, we concatenate a string together." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "17:34:12\n" + ] + } + ], + "source": [ + "time_str = str(hours) + ':' + str(minutes) + ':' + str(seconds)\n", + "\n", + "print(time_str)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "tags": [] + }, + "source": [ + "There are much elegant ways of doing these two operations, including using string methods. For most applications using time stamps, you would use the [datetime module](https://docs.python.org/3/library/datetime.html) of the standard library." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Computing environment" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": { + "tags": [ + "hide-input" + ] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Python implementation: CPython\n", + "Python version : 3.11.3\n", + "IPython version : 8.12.0\n", + "\n", + "jupyterlab: 3.6.3\n", + "\n" + ] + } + ], + "source": [ + "%load_ext watermark\n", + "%watermark -v -p jupyterlab" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.3" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/2024/exercise_solutions/exercise_1/exercise_1.4_solution.html b/2024/exercise_solutions/exercise_1/exercise_1.4_solution.html new file mode 100644 index 00000000..ade200da --- /dev/null +++ b/2024/exercise_solutions/exercise_1/exercise_1.4_solution.html @@ -0,0 +1,388 @@ + + + + + + + Exercise 1.4: Using string methods — Programming Bootcamp documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + +
+ +
+
+
+ +
+
+
+
+ +
+

Exercise 1.4: Using string methods

+

In Lesson 7, we wrote a function to compute the reverse complement of a sequence.

+

a) Write that function again, still using a for loop, but do not use the built-in reversed() function.

+

b) Write the function one more time, but without any loops.

+
+

Solution

+

a) The trick here is to do what we did in Lesson 7, except use [::-1] indexing instead of the reversed() function.

+
+
[1]:
+
+
+
def complement_base(base):
+    """Returns the Watson-Crick complement of a base."""
+    if base == 'A' or base == 'a':
+        return 'T'
+    elif base == 'T' or base == 't':
+        return 'A'
+    elif base == 'G' or base == 'g':
+        return 'C'
+    else:
+        return 'G'
+
+
+def reverse_complement(seq):
+    """Compute reverse complement of a sequence."""
+    # Initialize reverse complement
+    rev_seq = ''
+
+    # Loop through and populate list with reverse complement
+    for base in seq:
+        rev_seq += complement_base(base)
+
+    return rev_seq[::-1]
+
+
+
+

And we’ll do a quick test with the same sequence as in lesson 7.

+
+
[2]:
+
+
+
reverse_complement('GCAGTTGCA')
+
+
+
+
+
[2]:
+
+
+
+
+'TGCAACTGC'
+
+
+

Bingo!

+

b) We can eliminate the for loop by using the replace() method of strings.

+
+
[3]:
+
+
+
def reverse_complement(seq):
+    """Compute reverse complement of a sequence."""
+    # Initialize rev_seq to a lowercase seq
+    rev_seq = seq.lower()
+
+    # Substitute bases
+    rev_seq = rev_seq.replace('t', 'A')
+    rev_seq = rev_seq.replace('a', 'T')
+    rev_seq = rev_seq.replace('g', 'C')
+    rev_seq = rev_seq.replace('c', 'G')
+
+    return rev_seq[::-1]
+
+
+
+

And let’s give it a test!

+
+
[4]:
+
+
+
reverse_complement('GCAGTTGCA')
+
+
+
+
+
[4]:
+
+
+
+
+'TGCAACTGC'
+
+
+

Note: We haven’t learned about it yet, but some Googling would allow you to use the translate() and maketrans() string methods. maketrans() makes a translation table for characters in a string, and then the translate() functions uses it to mutate the characters in the list.

+
+
[5]:
+
+
+
def reverse_complement(seq):
+    """Compute reverse complement of a sequence."""
+    return seq.translate(str.maketrans('ATGCatgc', 'TACGTACG'))[::-1]
+
+reverse_complement('GCAGTTGCA')
+
+
+
+
+
[5]:
+
+
+
+
+'TGCAACTGC'
+
+
+

So, we were able to do it in one line!

+
+
+

Computing environment

+
+
[6]:
+
+
+
%load_ext watermark
+%watermark -v -p jupyterlab
+
+
+
+
+
+
+
+
+Python implementation: CPython
+Python version       : 3.11.3
+IPython version      : 8.12.0
+
+jupyterlab: 3.6.3
+
+
+
+
+
+ + +
+
+ +
+
+
+
+ + + + + + + + + + + \ No newline at end of file diff --git a/2024/exercise_solutions/exercise_1/exercise_1.4_solution.ipynb b/2024/exercise_solutions/exercise_1/exercise_1.4_solution.ipynb new file mode 100644 index 00000000..89de02a8 --- /dev/null +++ b/2024/exercise_solutions/exercise_1/exercise_1.4_solution.ipynb @@ -0,0 +1,261 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Exercise 1.4: Using string methods\n", + "\n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In [Lesson 7](l07_intro_to_functions.ipynb), we wrote a function to compute the reverse complement of a sequence. \n", + "\n", + "**a)** Write that function again, still using a `for` loop, but do not use the built-in `reversed()` function.\n", + "\n", + "**b)** Write the function one more time, but without any loops." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Solution\n", + "\n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**a)** The trick here is to do what we did in Lesson 7, except use `[::-1]` indexing instead of the `reversed()` function." + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "def complement_base(base):\n", + " \"\"\"Returns the Watson-Crick complement of a base.\"\"\"\n", + " if base == 'A' or base == 'a':\n", + " return 'T'\n", + " elif base == 'T' or base == 't':\n", + " return 'A'\n", + " elif base == 'G' or base == 'g':\n", + " return 'C'\n", + " else:\n", + " return 'G'\n", + "\n", + "\n", + "def reverse_complement(seq):\n", + " \"\"\"Compute reverse complement of a sequence.\"\"\"\n", + " # Initialize reverse complement\n", + " rev_seq = ''\n", + " \n", + " # Loop through and populate list with reverse complement\n", + " for base in seq:\n", + " rev_seq += complement_base(base)\n", + " \n", + " return rev_seq[::-1]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "And we'll do a quick test with the same sequence as in lesson 7." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'TGCAACTGC'" + ] + }, + "execution_count": 2, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "reverse_complement('GCAGTTGCA')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Bingo!" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**b)** We can eliminate the `for` loop by using the `replace()` method of strings." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "def reverse_complement(seq):\n", + " \"\"\"Compute reverse complement of a sequence.\"\"\"\n", + " # Initialize rev_seq to a lowercase seq\n", + " rev_seq = seq.lower()\n", + " \n", + " # Substitute bases\n", + " rev_seq = rev_seq.replace('t', 'A')\n", + " rev_seq = rev_seq.replace('a', 'T')\n", + " rev_seq = rev_seq.replace('g', 'C')\n", + " rev_seq = rev_seq.replace('c', 'G')\n", + " \n", + " return rev_seq[::-1]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "And let's give it a test!" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'TGCAACTGC'" + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "reverse_complement('GCAGTTGCA')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Note:** We haven't learned about it yet, but some Googling would allow you to use the `translate()` and `maketrans()` string methods. `maketrans()` makes a **translation table** for characters in a string, and then the `translate()` functions uses it to mutate the characters in the list." + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'TGCAACTGC'" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "def reverse_complement(seq):\n", + " \"\"\"Compute reverse complement of a sequence.\"\"\"\n", + " return seq.translate(str.maketrans('ATGCatgc', 'TACGTACG'))[::-1]\n", + "\n", + "reverse_complement('GCAGTTGCA')" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "tags": [] + }, + "source": [ + "So, we were able to do it in one line!" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Computing environment" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": { + "tags": [ + "hide-input" + ] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Python implementation: CPython\n", + "Python version : 3.11.3\n", + "IPython version : 8.12.0\n", + "\n", + "jupyterlab: 3.6.3\n", + "\n" + ] + } + ], + "source": [ + "%load_ext watermark\n", + "%watermark -v -p jupyterlab" + ] + } + ], + "metadata": { + "anaconda-cloud": {}, + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.3" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/2024/exercise_solutions/exercise_1/exercise_1.5_solution.html b/2024/exercise_solutions/exercise_1/exercise_1.5_solution.html new file mode 100644 index 00000000..2e184c05 --- /dev/null +++ b/2024/exercise_solutions/exercise_1/exercise_1.5_solution.html @@ -0,0 +1,392 @@ + + + + + + + Exercise 1.5: Longest common substring — Programming Bootcamp documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + +
+ +
+
+
+ +
+
+
+
+ +
+

Exercise 1.5: Longest common substring

+

Write a function that takes two sequences and returns the longest common substring. A substring is a contiguous portion of a string. For example:

+

Substrings of ATGCATAT:

+
TGCA
+T
+TAT
+
+
+

Not substrings of ATGCATAT:

+
AGCA              # Skipped T
+CCATA             # Added another C
+Hello, world.     # Has nothing to do with the input sequence
+
+
+

There may be more than one longest common substring; you only need to return one of them.

+

The call signature of the function should be

+
longest_common_substring(s1, s2)
+
+
+

Here are some return values you should get.

+ + + + + + + + + + + + + + + + + +

Function call

Result

longest_common_substring('ATGC', 'ATGCA')

'ATGC'

longest_common_substring('GATGCCATGCA', 'ATGCC')

'ATGCC'

longest_common_substring('ACGTGGAAAGCCA', 'GTACACACGTTTTGAGAGACAC')

'ACGT'

+
+

Solution

+

This is actually an important problem, and there are clever algorithms to solve it. We will take a more brute force approach. Let \(n\) be the length of the shorter of the two strings. We will start with the entirety of the shorter string and see if it is in the longer. We then will take both substrings of length \(n - 1\) in the shorter string and check to see if they are in the longer string. We then take all three +substrings of length \(n - 2\) and see if they are in the longer string. We continue like this until we get a hit, which will necessarily be one of the longest common substrings.

+
+
[1]:
+
+
+
def longest_common_substring(s1, s2):
+    """Return one of the longest common substrings"""
+    # Make sure s1 is the shorter
+    if len(s1) > len(s2):
+        s1, s2 = s2, s1
+
+    # Start with the entire sequence and shorten
+    substr_len = len(s1)
+    while substr_len > 0:
+        # Try all substrings
+        for i in range(len(s1) - substr_len + 1):
+            if s1[i:i+substr_len] in s2:
+                return s1[i:i+substr_len]
+
+        substr_len -= 1
+
+    # If we haven't returned, there is no common substring
+    return ''
+
+
+
+

Let’s try our function out with the tests.

+
+
[2]:
+
+
+
longest_common_substring('ATGC', 'ATGCA')
+
+
+
+
+
[2]:
+
+
+
+
+'ATGC'
+
+
+
+
[3]:
+
+
+
longest_common_substring('GATGCCATGCA', 'ATGCC')
+
+
+
+
+
[3]:
+
+
+
+
+'ATGCC'
+
+
+
+
[4]:
+
+
+
longest_common_substring('ACGTGGAAAGCCA', 'GTACACACGTTTTGAGAGACAC')
+
+
+
+
+
[4]:
+
+
+
+
+'ACGT'
+
+
+

All look good!

+
+
+

Computing environment

+
+
[5]:
+
+
+
%load_ext watermark
+%watermark -v -p jupyterlab
+
+
+
+
+
+
+
+
+Python implementation: CPython
+Python version       : 3.11.3
+IPython version      : 8.12.0
+
+jupyterlab: 3.6.3
+
+
+
+
+
+ + +
+
+ +
+
+
+
+ + + + + + + + + + + \ No newline at end of file diff --git a/2024/exercise_solutions/exercise_1/exercise_1.5_solution.ipynb b/2024/exercise_solutions/exercise_1/exercise_1.5_solution.ipynb new file mode 100644 index 00000000..9a8c7af1 --- /dev/null +++ b/2024/exercise_solutions/exercise_1/exercise_1.5_solution.ipynb @@ -0,0 +1,233 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Exercise 1.5: Longest common substring\n", + "\n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Write a function that takes two sequences and returns the longest common substring. A substring is a contiguous portion of a string. For example:\n", + "\n", + "Substrings of `ATGCATAT`:\n", + "\n", + " TGCA\n", + " T\n", + " TAT\n", + " \n", + "Not substrings of `ATGCATAT`:\n", + "\n", + " AGCA # Skipped T\n", + " CCATA # Added another C\n", + " Hello, world. # Has nothing to do with the input sequence\n", + " \n", + "There may be more than one longest common substring; you only need to return one of them.\n", + "\n", + "The call signature of the function should be\n", + "\n", + "```python\n", + "longest_common_substring(s1, s2)\n", + "```\n", + "\n", + "Here are some return values you should get.\n", + "\n", + "|Function call|Result |\n", + "|:---|---:|\n", + "|`longest_common_substring('ATGC', 'ATGCA')` | `'ATGC'`|\n", + "|`longest_common_substring('GATGCCATGCA', 'ATGCC')` | `'ATGCC'`|\n", + "|`longest_common_substring('ACGTGGAAAGCCA', 'GTACACACGTTTTGAGAGACAC') `|`'ACGT'`|" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Solution\n", + "\n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This is actually an [important problem](https://en.wikipedia.org/wiki/Longest_common_substring_problem), and there are clever algorithms to solve it. We will take a more brute force approach. Let $n$ be the length of the shorter of the two strings. We will start with the entirety of the shorter string and see if it is in the longer. We then will take both substrings of length $n - 1$ in the shorter string and check to see if they are in the longer string. We then take all three substrings of length $n - 2$ and see if they are in the longer string. We continue like this until we get a hit, which will necessarily be one of the longest common substrings." + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "def longest_common_substring(s1, s2):\n", + " \"\"\"Return one of the longest common substrings\"\"\"\n", + " # Make sure s1 is the shorter\n", + " if len(s1) > len(s2):\n", + " s1, s2 = s2, s1\n", + " \n", + " # Start with the entire sequence and shorten\n", + " substr_len = len(s1)\n", + " while substr_len > 0: \n", + " # Try all substrings\n", + " for i in range(len(s1) - substr_len + 1):\n", + " if s1[i:i+substr_len] in s2:\n", + " return s1[i:i+substr_len]\n", + "\n", + " substr_len -= 1\n", + " \n", + " # If we haven't returned, there is no common substring\n", + " return ''" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let's try our function out with the tests." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": { + "tags": [] + }, + "outputs": [ + { + "data": { + "text/plain": [ + "'ATGC'" + ] + }, + "execution_count": 2, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "longest_common_substring('ATGC', 'ATGCA')" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": { + "tags": [] + }, + "outputs": [ + { + "data": { + "text/plain": [ + "'ATGCC'" + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "longest_common_substring('GATGCCATGCA', 'ATGCC')" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": { + "tags": [] + }, + "outputs": [ + { + "data": { + "text/plain": [ + "'ACGT'" + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "longest_common_substring('ACGTGGAAAGCCA', 'GTACACACGTTTTGAGAGACAC')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "All look good!" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Computing environment" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": { + "tags": [ + "hide-input" + ] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Python implementation: CPython\n", + "Python version : 3.11.3\n", + "IPython version : 8.12.0\n", + "\n", + "jupyterlab: 3.6.3\n", + "\n" + ] + } + ], + "source": [ + "%load_ext watermark\n", + "%watermark -v -p jupyterlab" + ] + } + ], + "metadata": { + "anaconda-cloud": {}, + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.3" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/2024/exercise_solutions/exercise_1/exercise_1.6_solution.html b/2024/exercise_solutions/exercise_1/exercise_1.6_solution.html new file mode 100644 index 00000000..2b4f2b4c --- /dev/null +++ b/2024/exercise_solutions/exercise_1/exercise_1.6_solution.html @@ -0,0 +1,575 @@ + + + + + + + Exercise 1.6: RNA secondary structure validator — Programming Bootcamp documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + +
+ +
+
+
+ +
+
+
+
+ +
+

Exercise 1.6: RNA secondary structure validator

+
+

RNA secondary structure validator

+

In this problem, we will write a function that takes an RNA sequence and an RNA secondary structure and decides if the secondary structure is possible given the sequence. Remember, single stranded RNA can fold back on itself and form base pairs. An RNA secondary structure is simply the list of base pairs that are present. We will represent the base pairs in dot-parentheses notation. For example, a sequence/secondary structure pair would be

+
0123456789
+GCAUCUAUGC
+(((....)))
+
+
+

For convenience of discussion, I have labeled the indices of the bases on the top row. In this case, base 0, a G, pairs with base 9, a C. Base 1 pairs with base 8, and base 2 pairs with base 7. Bases 3, 4, 5, and 6 are unpaired. (This structure is aptly called a “hairpin.”)

+

I hope the dot-parentheses notation is clear. An open parenthesis is paired with the parenthesis that closes it. Dots are unpaired.

+

So, the goal of our function is to check all base pairs present in a secondary structure and see if they are with G-C, A-U, or (optionally) G-U.

+

a) Write a function to make sure that the number of closed parentheses is equal to the number of open parentheses, a requirement for a valid secondary structure. It should return True if the parentheses are valid and False otherwise.

+

b) Write a function that converts the dot-parens notation to a tuple of 2-tuples representing the base pairs. We’ll call this function dotparen_to_bp(). An example input/output of this function would be:

+
dotparen_to_bp('(((....)))')
+
+((0, 9), (1, 8), (2, 7))
+
+
+

Hint: You should look at methods that are available for lists. You might find the append() and pop() methods useful.

+

c) Because of sterics, the minimal length of a hairpin loop is three bases. A hairpin loop is a series of unpaired bases that are closed by a base pair. For example, the secondary structure (.(....).) has a single hairpin loop of length 4. So, the structure ((((..)))) is not valid because it has a hairpin loop of only two bases.

+

Write a function that verifies that a list of base pairs (as outputted by dotparen_to_bp()) satisfies the minimal hairpin length requirement.

+

d) Now write your validator function. The function definition should look like this:

+
def rna_ss_validator(seq, sec_struc, wobble=True):
+
+
+

It should return True if the sequence is commensurate with a valid secondary structure and False otherwise. The wobble keyword argument is True if we allow wobble pairs (G paired with U). Here are some expected results:

+

Returns True:

+
rna_ss_validator('GCAUCUAUGC', '(((....)))')
+rna_ss_validator('GCAUCUAUGU', '(((....)))')
+rna_ss_validator('GCAUCUAUGU', '(.(....).)')
+
+
+

Returns False:

+
rna_ss_validator('GCAUCUACGC', '(((....)))')
+rna_ss_validator('GCAUCUAUGU', '(((....)))', wobble=False)
+rna_ss_validator('GCAUCUAUGU', '(.(....)).')
+rna_ss_validator('GCCCUUGGCA', '(.((..))).')
+
+
+
+
+

Solution

+

a) This part of the validation is simple. We just need to make sure the number of open and closed parentheses are equal.

+
+
[1]:
+
+
+
def parens_count(struc):
+    """
+    Ensures there are equal number of open and closed parentheses
+    in structure.
+    """
+    return struc.count('(') == struc.count(')')
+
+
+
+

Let’s give it a try.

+
+
[2]:
+
+
+
print(parens_count('(((..(((...)).))))'))
+print(parens_count('(((..(((...)).)))'))
+
+
+
+
+
+
+
+
+True
+False
+
+
+

b) As we scan a dot-parens structure from left to right, we can keep a list of the positions of open parentheses. Whenever we encounter a closed one, we have closed the last open one we added. So, we can just scan through the dot-parens string and pop out base pairs. If this procedure fails, we know that there was an error in the input structure (i.e., a closed parenthesis appeared without a corresponding open one before it).

+
+
[3]:
+
+
+
def dot_parens_to_bp(struc):
+    """
+    Convert a dot-parens structure to a list of base pairs.
+    Return False if the structure is invalid.
+    """
+    if not parens_count(struc):
+        print('Error in input structure.')
+        return False
+
+    # Initialize list of open parens and list of base pairs
+    open_parens = []
+    bps = []
+
+    # Scan through string
+    for i, x in enumerate(struc):
+        if x == '(':
+            open_parens.append(i)
+        elif x == ')':
+            if len(open_parens) > 0:
+                bps.append((open_parens.pop(), i))
+            else:
+                print('Error in input structure.')
+                return False
+
+    # Return the result as a tuple
+    return tuple(sorted(bps))
+
+
+
+

Let’s try it on some legitimate sequences and on some bad ones.

+
+
[4]:
+
+
+
# Good structure
+dot_parens_to_bp('..(((...)))(((((....)))).)..')
+
+
+
+
+
[4]:
+
+
+
+
+((2, 10), (3, 9), (4, 8), (11, 25), (12, 23), (13, 22), (14, 21), (15, 20))
+
+
+
+
[5]:
+
+
+
# Good structure
+dot_parens_to_bp('(((..(((...)).))))')
+
+
+
+
+
[5]:
+
+
+
+
+((0, 17), (1, 16), (2, 15), (5, 14), (6, 12), (7, 11))
+
+
+
+
[6]:
+
+
+
# Bad structure
+dot_parens_to_bp('((....)))(')
+
+
+
+
+
+
+
+
+Error in input structure.
+
+
+
+
[6]:
+
+
+
+
+False
+
+
+
+
[7]:
+
+
+
dot_parens_to_bp('())....))))')
+
+
+
+
+
+
+
+
+Error in input structure.
+
+
+
+
[7]:
+
+
+
+
+False
+
+
+

c) It is quite easy to detect short hairpins once we have a list of base pairs. We just need to make sure the difference in index of any pair of paired bases is not less than three.

+
+
[8]:
+
+
+
def hairpin_check(bps):
+    """Check to make sure no hairpins are too short."""
+    for bp in bps:
+        if bp[1] - bp[0] < 4:
+            print('A hairpin is too short.')
+            return False
+
+    # Everything checks out
+    return True
+
+
+
+

d) Most everything is in place. We just need to check the sequence.

+
+
[9]:
+
+
+
def rna_ss_validator(seq, sec_struc, wobble=True):
+    """Validate and RNA structure"""
+    # Convert structure to base pairs
+    bps = dot_parens_to_bp(sec_struc)
+
+    # If this failed, the structure was invalid
+    if not bps:
+        return False
+
+    # Do the hairpin check
+    if not hairpin_check(bps):
+        return False
+
+    # Possible base pairs
+    if wobble:
+        ok_bps = ('gc', 'cg', 'au', 'ua', 'gu', 'ug')
+    else:
+        ok_bps = ('gc', 'cg', 'au', 'ua')
+
+    # Check complementarity
+    for bp in bps:
+        bp_str = (seq[bp[0]] + seq[bp[1]]).lower()
+        if bp_str not in ok_bps:
+            print('Invalid base pair.')
+            return False
+
+    # Everything passed
+    return True
+
+
+
+

Let’s test it on the test cases from the problem statement.

+
+
[10]:
+
+
+
print('Should be True:')
+print(rna_ss_validator('GCAUCUAUGC', '(((....)))'))
+print(rna_ss_validator('GCAUCUAUGU', '(((....)))'))
+print(rna_ss_validator('GCAUCUAUGU', '(.(....).)'))
+print(rna_ss_validator('AUUGAUGCACGUGCAUCCCCAGCGGGUCCCGCGAGCUCACCCCCUUCCAAAAGCACCACGUGCCAGGCCUCGCCCCCGGAAGUAUACCUGUGAGCCAGA',
+                       '...(((((....)))))....((((...))))..((((((...(((((....((((...))))..(((...)))...))))).......))))))....'))
+
+print('\nShould be False:')
+print(rna_ss_validator('GCAUCUACGC', '(((....)))'), '\n')
+print(rna_ss_validator('GCAUCUAUGU', '(((....)))', wobble=False), '\n')
+print(rna_ss_validator('GCAUCUAUGU', '(.(....)).'), '\n')
+print(rna_ss_validator('GCCCUUGGCA', '(.((..))).'),'\n')
+print(rna_ss_validator('AUUGAUGCACGUGCAUCCCCAGCGGGUCCCGCGAGCCCACCCCCUUCCAAAAGCACCACGUGCCAGGCCUCGCCCCCGGAAGUAUACCUGUGAGCCAGA',
+                       '...(((((....)))))....((((...))))..((((((...(((((....((((...))))..(((...)))...))))).......))))))....'))
+
+
+
+
+
+
+
+
+Should be True:
+True
+True
+True
+True
+
+Should be False:
+Invalid base pair.
+False
+
+Invalid base pair.
+False
+
+Invalid base pair.
+False
+
+A hairpin is too short.
+False
+
+Invalid base pair.
+False
+
+
+

Looks good!

+
+
+

Computing environment

+
+
[11]:
+
+
+
%load_ext watermark
+%watermark -v -p jupyterlab
+
+
+
+
+
+
+
+
+Python implementation: CPython
+Python version       : 3.11.3
+IPython version      : 8.12.0
+
+jupyterlab: 3.6.3
+
+
+
+
+
+ + +
+
+ +
+
+
+
+ + + + + + + + + + + \ No newline at end of file diff --git a/2024/exercise_solutions/exercise_1/exercise_1.6_solution.ipynb b/2024/exercise_solutions/exercise_1/exercise_1.6_solution.ipynb new file mode 100644 index 00000000..a4c19e12 --- /dev/null +++ b/2024/exercise_solutions/exercise_1/exercise_1.6_solution.ipynb @@ -0,0 +1,484 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Exercise 1.6: RNA secondary structure validator\n", + "\n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## RNA secondary structure validator\n", + "\n", + "In this problem, we will write a function that takes an RNA sequence and an RNA secondary structure and decides if the secondary structure is possible given the sequence. Remember, single stranded RNA can fold back on itself and form base pairs. An RNA secondary structure is simply the list of base pairs that are present. We will represent the base pairs in dot-parentheses notation. For example, a sequence/secondary structure pair would be\n", + "\n", + " 0123456789\n", + " GCAUCUAUGC\n", + " (((....)))\n", + "\n", + "For convenience of discussion, I have labeled the indices of the bases on the top row. In this case, base `0`, a `G`, pairs with base `9`, a `C`. Base `1` pairs with base `8`, and base `2` pairs with base `7`. Bases `3`, `4`, `5`, and `6` are unpaired. (This structure is aptly called a \"hairpin.\")\n", + "\n", + "I hope the dot-parentheses notation is clear. An open parenthesis is paired with the parenthesis that closes it. Dots are unpaired.\n", + "\n", + "So, the goal of our function is to check all base pairs present in a secondary structure and see if they are with `G-C`, `A-U`, or (optionally) `G-U`." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**a)** Write a function to make sure that the number of closed parentheses is equal to the number of open parentheses, a requirement for a valid secondary structure. It should return `True` if the parentheses are valid and `False` otherwise." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**b)** Write a function that converts the dot-parens notation to a tuple of 2-tuples representing the base pairs. We'll call this function `dotparen_to_bp()`. An example input/output of this function would be:\n", + "\n", + " dotparen_to_bp('(((....)))')\n", + " \n", + " ((0, 9), (1, 8), (2, 7))\n", + " \n", + "*Hint*: You should look at [methods that are available for lists](https://docs.python.org/3/tutorial/datastructures.html#more-on-lists). You might find the `append()` and `pop()` methods useful." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**c)** Because of sterics, the minimal length of a hairpin loop is three bases. A hairpin loop is a series of unpaired bases that are closed by a base pair. For example, the secondary structure `(.(....).)` has a single hairpin loop of length 4. So, the structure `((((..))))` is not valid because it has a hairpin loop of only two bases.\n", + "\n", + "Write a function that verifies that a list of base pairs (as outputted by `dotparen_to_bp()`) satisfies the minimal hairpin length requirement." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**d)** Now write your validator function. The function definition should look like this:\n", + "\n", + " def rna_ss_validator(seq, sec_struc, wobble=True):\n", + " \n", + "It should return `True` if the sequence is commensurate with a valid secondary structure and `False` otherwise. The `wobble` keyword argument is `True` if we allow wobble pairs (`G` paired with `U`). Here are some expected results:\n", + "\n", + "Returns `True`:\n", + "\n", + " rna_ss_validator('GCAUCUAUGC', '(((....)))')\n", + " rna_ss_validator('GCAUCUAUGU', '(((....)))') \n", + " rna_ss_validator('GCAUCUAUGU', '(.(....).)') \n", + "\n", + "Returns `False`:\n", + "\n", + " rna_ss_validator('GCAUCUACGC', '(((....)))')\n", + " rna_ss_validator('GCAUCUAUGU', '(((....)))', wobble=False) \n", + " rna_ss_validator('GCAUCUAUGU', '(.(....)).') \n", + " rna_ss_validator('GCCCUUGGCA', '(.((..))).')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Solution\n", + "\n", + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "tags": [] + }, + "source": [ + "**a)** This part of the validation is simple. We just need to make sure the number of open and closed parentheses are equal." + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "def parens_count(struc):\n", + " \"\"\"\n", + " Ensures there are equal number of open and closed parentheses\n", + " in structure.\n", + " \"\"\"\n", + " return struc.count('(') == struc.count(')')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let's give it a try." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "True\n", + "False\n" + ] + } + ], + "source": [ + "print(parens_count('(((..(((...)).))))'))\n", + "print(parens_count('(((..(((...)).)))'))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**b)** As we scan a dot-parens structure from left to right, we can keep a list of the positions of open parentheses. Whenever we encounter a closed one, we have closed the last open one we added. So, we can just scan through the dot-parens string and pop out base pairs. If this procedure fails, we know that there was an error in the input structure (i.e., a closed parenthesis appeared without a corresponding open one before it)." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "def dot_parens_to_bp(struc):\n", + " \"\"\"\n", + " Convert a dot-parens structure to a list of base pairs.\n", + " Return False if the structure is invalid.\n", + " \"\"\"\n", + " if not parens_count(struc):\n", + " print('Error in input structure.')\n", + " return False\n", + " \n", + " # Initialize list of open parens and list of base pairs\n", + " open_parens = []\n", + " bps = []\n", + " \n", + " # Scan through string\n", + " for i, x in enumerate(struc):\n", + " if x == '(':\n", + " open_parens.append(i)\n", + " elif x == ')':\n", + " if len(open_parens) > 0:\n", + " bps.append((open_parens.pop(), i))\n", + " else:\n", + " print('Error in input structure.')\n", + " return False\n", + "\n", + " # Return the result as a tuple\n", + " return tuple(sorted(bps))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let's try it on some legitimate sequences and on some bad ones." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "((2, 10), (3, 9), (4, 8), (11, 25), (12, 23), (13, 22), (14, 21), (15, 20))" + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Good structure\n", + "dot_parens_to_bp('..(((...)))(((((....)))).)..')" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "((0, 17), (1, 16), (2, 15), (5, 14), (6, 12), (7, 11))" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Good structure\n", + "dot_parens_to_bp('(((..(((...)).))))')" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Error in input structure.\n" + ] + }, + { + "data": { + "text/plain": [ + "False" + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Bad structure\n", + "dot_parens_to_bp('((....)))(')" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Error in input structure.\n" + ] + }, + { + "data": { + "text/plain": [ + "False" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "dot_parens_to_bp('())....))))')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**c)** It is quite easy to detect short hairpins once we have a list of base pairs. We just need to make sure the difference in index of any pair of paired bases is not less than three." + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [], + "source": [ + "def hairpin_check(bps):\n", + " \"\"\"Check to make sure no hairpins are too short.\"\"\"\n", + " for bp in bps:\n", + " if bp[1] - bp[0] < 4:\n", + " print('A hairpin is too short.')\n", + " return False\n", + " \n", + " # Everything checks out\n", + " return True" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**d)** Most everything is in place. We just need to check the sequence." + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [], + "source": [ + "def rna_ss_validator(seq, sec_struc, wobble=True):\n", + " \"\"\"Validate and RNA structure\"\"\"\n", + " # Convert structure to base pairs\n", + " bps = dot_parens_to_bp(sec_struc)\n", + " \n", + " # If this failed, the structure was invalid\n", + " if not bps:\n", + " return False\n", + " \n", + " # Do the hairpin check\n", + " if not hairpin_check(bps):\n", + " return False\n", + " \n", + " # Possible base pairs\n", + " if wobble:\n", + " ok_bps = ('gc', 'cg', 'au', 'ua', 'gu', 'ug')\n", + " else:\n", + " ok_bps = ('gc', 'cg', 'au', 'ua')\n", + "\n", + " # Check complementarity\n", + " for bp in bps:\n", + " bp_str = (seq[bp[0]] + seq[bp[1]]).lower()\n", + " if bp_str not in ok_bps:\n", + " print('Invalid base pair.')\n", + " return False\n", + " \n", + " # Everything passed\n", + " return True" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let's test it on the test cases from the problem statement." + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Should be True:\n", + "True\n", + "True\n", + "True\n", + "True\n", + "\n", + "Should be False:\n", + "Invalid base pair.\n", + "False \n", + "\n", + "Invalid base pair.\n", + "False \n", + "\n", + "Invalid base pair.\n", + "False \n", + "\n", + "A hairpin is too short.\n", + "False \n", + "\n", + "Invalid base pair.\n", + "False\n" + ] + } + ], + "source": [ + "print('Should be True:')\n", + "print(rna_ss_validator('GCAUCUAUGC', '(((....)))'))\n", + "print(rna_ss_validator('GCAUCUAUGU', '(((....)))'))\n", + "print(rna_ss_validator('GCAUCUAUGU', '(.(....).)'))\n", + "print(rna_ss_validator('AUUGAUGCACGUGCAUCCCCAGCGGGUCCCGCGAGCUCACCCCCUUCCAAAAGCACCACGUGCCAGGCCUCGCCCCCGGAAGUAUACCUGUGAGCCAGA',\n", + " '...(((((....)))))....((((...))))..((((((...(((((....((((...))))..(((...)))...))))).......))))))....'))\n", + "\n", + "print('\\nShould be False:')\n", + "print(rna_ss_validator('GCAUCUACGC', '(((....)))'), '\\n')\n", + "print(rna_ss_validator('GCAUCUAUGU', '(((....)))', wobble=False), '\\n')\n", + "print(rna_ss_validator('GCAUCUAUGU', '(.(....)).'), '\\n')\n", + "print(rna_ss_validator('GCCCUUGGCA', '(.((..))).'),'\\n')\n", + "print(rna_ss_validator('AUUGAUGCACGUGCAUCCCCAGCGGGUCCCGCGAGCCCACCCCCUUCCAAAAGCACCACGUGCCAGGCCUCGCCCCCGGAAGUAUACCUGUGAGCCAGA',\n", + " '...(((((....)))))....((((...))))..((((((...(((((....((((...))))..(((...)))...))))).......))))))....'))" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "tags": [] + }, + "source": [ + "Looks good!" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Computing environment" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": { + "tags": [ + "hide-input" + ] + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Python implementation: CPython\n", + "Python version : 3.11.3\n", + "IPython version : 8.12.0\n", + "\n", + "jupyterlab: 3.6.3\n", + "\n" + ] + } + ], + "source": [ + "%load_ext watermark\n", + "%watermark -v -p jupyterlab" + ] + } + ], + "metadata": { + "anaconda-cloud": {}, + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.3" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} diff --git a/2024/exercise_solutions/exercise_1/index.html b/2024/exercise_solutions/exercise_1/index.html new file mode 100644 index 00000000..0cc51863 --- /dev/null +++ b/2024/exercise_solutions/exercise_1/index.html @@ -0,0 +1,252 @@ + + + + + + + Exercise 1 solutions — Programming Bootcamp documentation + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + +
+ +
+
+
+ +
+
+ + +
+
+
+
+ + + + + + + + + + + \ No newline at end of file diff --git a/2024/exercises/exercise_1/exercise_1.1.html b/2024/exercises/exercise_1/exercise_1.1.html index 128b4cde..8c7a4225 100644 --- a/2024/exercises/exercise_1/exercise_1.1.html +++ b/2024/exercises/exercise_1/exercise_1.1.html @@ -128,6 +128,10 @@
  • Exercise 4
  • Exercise 5
  • +

    Exercise solutions

    +

    Schedule

    +

    Exercise solutions

    +

    Schedule

    +

    Exercise solutions

    +

    Schedule

    +

    Exercise solutions

    +

    Schedule

    +

    Exercise solutions

    +

    Schedule

    +

    Exercise solutions

    +

    Schedule

    +

    Exercise solutions

    +

    Schedule

    +

    Exercise solutions

    +

    Schedule

    +

    Exercise solutions

    +

    Schedule

    +

    Exercise solutions

    +

    Schedule

    +

    Exercise solutions

    +

    Schedule

    +

    Exercise solutions

    +

    Schedule

    +

    Exercise solutions

    +

    Schedule

    +

    Exercise solutions

    +

    Schedule

    +

    Exercise solutions

    +

    Schedule

    +

    Exercise solutions

    +

    Schedule

    +

    Exercise solutions

    +

    Schedule

    +

    Exercise solutions

    +

    Schedule

    +

    Exercise solutions

    +

    Schedule

    +

    Exercise solutions

    +

    Schedule

    +

    Exercise solutions

    +

    Schedule

    +

    Exercise solutions

    +

    Schedule

    +

    Exercise solutions

    +

    Schedule

    +

    Exercise solutions

    +

    Schedule

    +

    Exercise solutions

    +

    Schedule

    +

    Exercise solutions

    +

    Schedule

    +

    Exercise solutions

    +

    Schedule

    +

    Exercise solutions

    +

    Schedule

    +

    Exercise solutions

    +

    Schedule

    +

    Exercise solutions

    +

    Schedule

    +

    Exercise solutions

    +

    Schedule

    +

    Exercise solutions

    +

    Schedule

    +

    Exercise solutions

    +

    Schedule

    +

    Exercise solutions

    +

    Schedule

    +

    Exercise solutions

    +

    Schedule

    +

    Exercise solutions

    +

    Schedule

    +

    Exercise solutions

    +

    Schedule

    +

    Exercise solutions

    +

    Schedule

    +

    Exercise solutions

    +

    Schedule

    +

    Exercise solutions

    +

    Schedule

    +

    Exercise solutions

    +

    Schedule

    +

    Exercise solutions

    +

    Schedule

    +

    Exercise solutions

    +

    Schedule

    +

    Exercise solutions

    +

    Schedule

    +

    Exercise solutions

    +

    Schedule

    +

    Exercise solutions

    +

    Schedule

    +

    Exercise solutions

    +

    Schedule

    +

    Exercise solutions

    +

    Schedule

    +

    Exercise solutions

    +

    Schedule

    +

    Exercise solutions

    +

    Schedule

    +

    Exercise solutions

    +

    Schedule

    +

    Exercise solutions

    +

    Schedule