-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathhw3.txt
116 lines (75 loc) · 3.76 KB
/
hw3.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
Homework 3 (due Tuesday, Oct 22nd, at 11:59pm PST)
##################################################
.. note::
Need help? Go to http://angus.askbot.com/ to ask questions and see
other people's answers!
.. note::
See :doc:`amazon/using-screen` for information on using screen.
1. Annotate E. coli; compare annotations
----------------------------------------
Hint: you definitely want to coordinate with some other people on this one.
Each Prokka run will take about 45 minutes.
Start up an m1.xlarge. Install BLAST (just the part under "Next,
install BLAST", in :doc:`week2/blastkit`) and download ngs-scripts::
git clone https://github.com/ngs-docs/ngs-scripts /usr/local/share/ngs-scripts
You'll also need screed::
pip install screed
Next: install prokka, as per :doc:`week3/prokka-annotation`.
Go to /mnt and download a bunch of velvet assemblies that I ran for you
(what would be produced by running through :doc:`week2/assembly-lab` for
many different k)::
cd /mnt
curl -O http://public.ged.msu.edu.s3.amazonaws.com/velvet-dn-ecoli.tar.gz
tar xzf velvet-dn-ecoli.tar.gz
Run the command ``ls``; you should see about 11 assemblies, named
``ecoli-kXX.fa``.
Now, download the "official" E. coli MG 1655 file::
curl -O http://ftp.ncbi.nlm.nih.gov/genomes/Bacteria/Escherichia_coli_K_12_substr__MG1655_uid57779/NC_000913.faa
and format it for BLAST::
formatdb -i NC_000913.faa -o T -p T
Then, build and compare annotations by doing the following: for each
E. coli file, run Prokka and construct a set of pairwise BLAST files::
/mnt/prokka-1.7/bin/prokka ecoli-k19.fa --outdir k19 --prefix k19
formatdb -i k19/k19.faa -o T -p T
blastall -i k19/k19.faa -d NC_000913.faa -p blastp -e 1e-12 -o k19.x.mg1655
blastall -d k19/k19.faa -i NC_000913.faa -p blastp -e 1e-12 -o mg1655.x.k19
Finally, for each set of annotated E. coli files, calculate the
orthologs and then count them::
python /usr/local/share/ngs-scripts/blast/blast-to-ortho-csv.py k19/k19.faa NC_000913.faa k19.x.mg1655 mg1655.x.k19 > k19-ortho.csv
wc -l k19-ortho.csv
You should see output like this: ``3804 k19-ortho.csv``. Record the
number (3804) and the k value (19); do this for each of the E. coli
assemblies & annotations, then put the list in a spreadsheet and send
to me.
.. note::
You can automate the above by putting it in a for loop in the shell::
for i in {19..51..2}
do
/mnt/prokka-1.7/bin/prokka ecoli-k${i}.fa --outdir k${i} --prefix k${i}
# other commands involving ${i}
done
Here, the for loop will run the commands between 'do' and 'done' once for
every value of "i" between 19 and 51, skipping numbers forward by 2 each
time (e.g. 19, 21, 23, 25, ...).
A good trick is to use 'echo' to see if your commands are correct before
actually running them -- e.g. run ::
for i in {19..51..2}
do
echo /mnt/prokka-1.7.bin/prokka ... ${i}
done
and then look at the commands; after it looks like you're going to
run the right commands, then take the 'echo' commands out.
2. Programming in Python: lists and dictionaries and functions, oh my! part 2
-----------------------------------------------------------------------------
Log in to your EC2 instance via SSH and install ipythonblocks::
pip install ipythonblocks
and Ruby/Ruby gems::
apt-get install -y ruby1.9.1 rubygems
Then install 'gist'::
gem install gist
and download the class 3 notebook::
cd /usr/local/notebooks
curl -O https://raw.github.com/beacon-center/2013-intro-computational-science/master/notebooks/class3-lists-dicts-functions.ipynb
Connect to IPython Notebook ('https://' + your EC2 machine name) and
solve the problems. Post the notebook as a github 'gist' (see last
line for instructions) and send me the nbviewer link.