I was recently asked how to get up-to-date annotations of microarrays. One way is to use a current version of cDNAs and the probe sequences supplied by the manufacturer. The following examples were used in conjunction with Agilent 44K microarrays (Physcomitrella and rice to be specific). I may add an Affymetrix example as well.
The pre-processor script relies on a working Bowtie installation. The binaries are assumed to be located in your PATH. To see all options of the python script type:
python prepareMicroarrayProbes.py
Download and unpack the pre-process test data. The file called Agilent-017743_GPL14653_spotSequences.txt corresponds to the seqTable.txt and PpatensV6_filtered_cosmoss_mRNA.fasta to cDNA.fasta.
To align the probes to the cDNAs of interest, one needs to build a bowtie index first. Note that the locus ID (so the ID that gets the expression value) corresponds to the first field after the arrow (>) in the fasta file (split using space character).
python prepareMicroarrayProbes.py BUILD cDNA.fasta cDNA_index
Probe sequences are frequently stored in tabular form. This needs to be changed into a fasta file.
python prepareMicroarrayProbes.py TABTOFASTA seqTable.txt 1 1 2 probes.fasta
To align the probes to the cDNAs of interest:
python prepareMicroarrayProbes.py ALIGN cDNA_index probes.fasta unaligned.txt aligned.txt
Extract the probe name to locus ID mappings (and vice versa):
python prepareMicroarrayProbes.py EXTRACT aligned.txt probeNameToID.txt IDtoProbeName.txt
download and unpack the rice test data and see agilentSingleChannelExample.R
download and unpack the Physcomitrella test data and see agilentDualChannelExample.R