Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add friendzymes collection #238

Open
wants to merge 16 commits into
base: develop
Choose a base branch
from

Conversation

jcahill
Copy link

@jcahill jcahill commented Feb 14, 2022

About

This PR is for inclusion of the Friendzymes Collection.

Description

This collection is aimed at expanding what people are able to do with FreeGenes collections and the iGEM distribution, both in terms of genetic assembly and in terms of biomanufacturing. Friendzymes' primary goals are to democratize strain engineering and recombinant protein manufacturing and purification.

For manufacturing, this collection contains an expansion of the FreeGenes Open Yeast Collection, including target P. pastoris-optimized target enzymes for recombinant production (such as Eco31I, an IP-free BsaI isoschizomer and its cognate methyltransferases), additional purification tags, an anti-His tag antibody for protein blotting and quantification, and additional yeast promoters. Further, this collection contains complements to the FreeGenes Bacillus subtilis Secretion Tag Library Plasmids, for recombinant protein production and secretion from B. subtilis. These include B. subtilis promoters, target proteins for production like Pfu-Sso7d polymerase, and various B. subtilis regulatory elements.

For strain engineering, we include E. coli origins or replication, E. coli, B. subtilis and P. pastoris selection markers, counterselection markers for E. coli, an origin of transfer for conjugation from E. coli to other bacterial species, homology arm pairs for genomic integration into B. subtilis and P. pastoris, and 5' and 3' recombinase site parts for insertion, deletion or inversion of synthetic genetic elements. Many of these parts are not elements of a canonical transcription unit, and do not have clearly defined part types in the MoClo/uLoop assembly standard; moreover, for some parts, their insertion into the transcription unit would require changing the overhangs on the core promoter, RBS, CDS, and/or terminator parts.

To address this challenge, we designed a high-fidelity, backwards-compatible expansion of the MoClo assembly standard, AllClo (https://docs.google.com/spreadsheets/d/1TICnbGYY96myM7TPXWwBsLvyadgSfmtbVTGsUN5iMI8/edit?usp=sharing), all with a single 26-overhang set that includes all uLoop overhangs and the vector assembly overhangs used in the Open Yeast Collection, and whose predicted ligation fidelity in a 26-part assembly is 96%.

We further designed a set of part switching linkers, that take as input canonical uLoop transcription unit components and output those parts with new 5' and 3' overhangs. These part switching reactions enable, for instance, insertion of recombination sites 5' to the promoter and/or 3' to the terminator in a TU, or ribozymes 3' to the promoter and 5' to the RBS/start site. In this way, standard uLoop parts can participate in assembly reactions that construct modular vector backbones, composite 5' and 3' UTRs, and multi-tagged CDSs.

The part switching linkers were designed to proceed in two methods: with an orthogonal, linker-specific Type IIS restriction site (BbsI), or with a conditionally methylatable, idempotent BsaI restriction site (mBsaI), that is suppressed when the linker is cloned inside an E. coli cell expressing HpaII and/or MspI, and becomes active when the part is cloned into an MspI-/HpaII- strain or PCR amplified to remove the methylation sites. These parts and this expanded assembly standard have the potential to enable iGEM teams with tools and a framework to manufacture their own enzymatic reagents and perform their own sophisticated modification of strains' genomic background.

Figure: AllClo overview

Technical Notes

  1. SwitchClo linkers may cause some automated checks to fail. This is because they contain IIS restriction sites, by design.
  2. Parts are all housed under the benchling.com/friendzymes namespace. These are available for individual attachment if the maintainers wish. Some benchling items contain additional documentation in their Description fields.
  3. We have not yet enumerated any items in the Libraries and Composites sub-sheet. We can amend the submission further if the maintainers wish for this tab to contain additional information.

Thanks,
Friendzymes Contributors

@jakebeal
Copy link
Contributor

@jcahill Can you please run the workflows on your fork? The automation needs to run the build in order to validate whether this can be integrated.

@jcahill
Copy link
Author

jcahill commented Feb 14, 2022

@jakebeal Running script regression testing now: https://github.com/friendzymes/iGEM-distribution/actions/runs/1842097864

@jakebeal
Copy link
Contributor

The synchronize.yml workflow is needed too, since that's what validates the constructs (as opposed to the script code).

@jcahill
Copy link
Author

jcahill commented Feb 14, 2022

After several rounds of trial-and-error with source prefix and ID columns, synchronize.yml continues to fail at SBOL export. We are requesting assistance on how to proceed.

Blocker 1

Build automation rejects non-unique data source IDs, but it's unclear how this value can be meaningful if required to be unique.

Blocker 2

If the workflow logs are to be trusted, URI expansions are not being generated correctly. No combination of the following in the two relevant columns has generated a correct expansion:

Data Source Prefix Data Source ID
Prefix from dropdown menu https?://explicit.url.tld/to/part/ID
Prefix from dropdown menu PREFIX:ID
Prefix from dropdown menu ID

That is, all of the following fail:

Data Source Prefix Data Source ID
iGEM Registry http://parts.igem.org/Part:BBa_K1074001
iGEM Registry iGEM:BBa_K1074001
iGEM Registry BBa_K1074001

Logs

Using the final example from above, wiki namespace path /Part: is not included in the URI expansion.

Could not export SBOL file for package Friendzymes: An entity with identity "http://parts.igem.org/BBa_K1074001" already exists in document

@jakebeal
Copy link
Contributor

With respect to your blockers, there are two key pieces of information that I think will help you:

  1. Data sources have a "Literal Part" column that distinguishes whether or not there is expected to be a 1:1 correspondence between identifier and sequence. NCBI and iGEM, for example, both have are literal part, because if I tell you "NCBI accession FJ859897.1" or "iGEM part BBa_K1074001", that should map to a particular sequence. PubMed, on the other hand, is non-literal. So when you say BBa_K1074001 is EcoOri_ColE1pMB1pBR32, it's a mismatch, because if we retrieve BBa_K1074001, the sequence we find won't be the one that's in your sheet. If you got the sequence by extracting it out of BBa_K1074001, then that would be better to go into the design notes. Right now, it believes it's finding several conflicting definitions for BBa_K1074001 and complaining accordingly.
  2. The URI generated (http://parts.igem.org/BBa_K1074001) is the intended one. Since the source material in the iGEM repository isn't in SBOL, we need to convert it into an SBOL object, and this is the name for that object, not the literal URI used to access the SBOL object. (We are working towards an implementation of the packaging approach described in SEP 054). Each import source currently has a special case for how to remap URIs in order to access the import, which is required because there is no standardization across the databases that we import from (lots of future work to be done in generalization of import approaches...)

On a separate note, I would also ask you to consider whether it would be a good idea to split this collection up into more than one package. I see inside of it a number of sub-collections that seem like they might stand on their own, such as the linker subcollection. Most other packages in the distribution are organized around function rather than around source: is that possible to do here, or is this something that needs to be monolithic like the current OpenYeast import from FreeGenes?

@jcahill
Copy link
Author

jcahill commented Feb 14, 2022

Re: 1 and 2, thanks. We'll revise based on these notes.

Re: the size/scope of the package: We have had some discussion around handling this. I've re-raised the topic with the team in light of your suggestion.

So far, the working model has been to handle the whole collection as a single package, prioritizing the downside risks of confusion and fragmentation likely to stem from introducing an assembly standard of considerable complexity across multiple packages over the downside risks of concentrating too much material in one place.

Would grouping the natural classes of parts into libraries nested within the package be a suitable middle-ground?

@jakebeal
Copy link
Contributor

Ah, if it's got an alternate assembly standard, then it probably does want to be isolated in a single package right now (and that will be a discussion necessary with iGEM HQ). If we had a full implementation of SEP 054, then sub-packages would be the right answer, but at the moment that's not an option.

@eyesmo
Copy link

eyesmo commented Feb 14, 2022

To clarify, in this collection, all parts that are defined with specified overhangs in uLoop--all promoters, RBSs, CDSs, and terminators--have uLoop overhangs. It is the part types that are not explicitly defined in uLoop--vector backbone subcomponents, recombination sites, ribozymes--where the new overhangs and part definitions come in. So at least for level 0/level 1 assemblies, it's not so much intended to be an alternate assembly standard, as an expansion and extension of the existing iGEM/uLoop assembly standard. Happy to talk more about this on this thread, in Wednesday's meeting or on a call.

@jakebeal
Copy link
Contributor

@eyesmo Yes, I think a discussion on the Wednesday distribution call would likely be a good thing.

@jcahill
Copy link
Author

jcahill commented Feb 15, 2022

Workflows have run successfully on the fork.

- Rec3_Lox66 (recombination_signal_sequence) in
- Rec3_LoxP (recombination_signal_sequence) in
- Rec5_Lox71 (recombination_signal_sequence) _<span style="color:red">not included in distribution</span>_
- Rec5_LoxP (recombination_signal_sequence) _<span style="color:red">not included in distribution</span>_
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these two intentionally not included, or do you want to update the build plan?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is simply an unnoticed error in porting parts to the spreadsheet.

Libraries and Composite Parts,Blue text column headers are optional,,,,,,,,,,,,,,,,,,,,,,,,,,,
,,,,,,,,,,,,,,,,,,,,,,,,,,,,
Part/Library Name,Design Notes,Part Description,Final Product,Backbone/locus,Constraints,Part 1,Part 2,Part 3,Part 4,Part 5,Part 6,Part 7,Part 8,Part 9,,,,,,,,,,,,,,
,,,False,,,,,,,,,,,,,,,,,,,,,,,,,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see there's nothing in the composites sheet. Everything else is being ordered in a vector (generally pSB1C5). Right now, your sheet says that you want to have things delivered as just linear DNA fragments, which may not be compatible with FreeGenes processes. Is that the intention (in which case a discussion with @vinoo-igem is likely needed)? If it's not the intention, then the build plans should be expressed on this sheet and the "final product" markers on the parts sheet set to false.

Copy link

@eyesmo eyesmo Feb 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make sure I understand, you're saying that the Libraries/Composites tab is where we should put information about the cloning/holding vector these parts should be stored in, correct? So currently the sheet implies no cloning vector, just raw DNA?

I think it would be useful to discuss the relative merits of pSB1C5 vs pOpen_v3 vs pOpen_v4. Is there a thread where the engineering committee has covered this? We'd be open to pSB1C5 and would ideally like to use the same standard vector as the new iGEM Distribution; my main (possibly unfounded) concern here is about compatibility with the existing FreeGenes libraries that are going into the Distro (e.g. Open Yeast Collection and the Protein Expression Toolkit), which to my knowledge use pOpen_v3.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

started an issue #244 to open up the discussion

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct: you want to use the "Backbone/locus" column to indicate the vector holding the part. You should also consider whether you need to add flanking sequences, depending on whether the vector comes with them built in already.

Which vectors can be used is a separate discussion with @vinoo-igem

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All FreeGenes parts, including Open Yeast Collection and Open Enzyme Collection, are clone by Twist in pOpen_v3. pOpen_v3 is ampR. All our parts should be cloned by Twist in this vector to make is useful as an AllClo/OYC part for GGA.

GACCAGGTAGCATAACTTCGTATAATGTATGCTATACGAACGGTAATGATGAGACCGTGC
AC
>3_APOSTROPHE_Part_Switch_Linker
GGTCTCATACTTGTGATGTCTTCGCCTACGGATTGTCTGTCAAGGCATGAGACC
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These short parts cannot be built by Twist, whose minimum synthesis length is 300bp. They need to have padding and flanking sequences added to them. See the Anderson Promoters package for an example of how this has been done.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By my count, we have 36 parts under 300bp in length. This is just to note my intention to pad all of those parts.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I added ~50%GC hair-pin free random sequence to all of my OYC parts that were under 300bp. I made those parts 310bp total.

@vinoo-igem
Copy link
Contributor

I think discussing this on our next call will be good! I do want to surface that this is ambitious and will take some effort for review, as this will feed directly into a number of different topics that we need to address this year, primarily what iGEM will be defining as the assembly standard beyond L0 basic parts (which also clearly needs work #236 #214) and vector construction and whether this would constitute testing and/or adoption.

@eyesmo
Copy link

eyesmo commented Feb 15, 2022

I do want to surface that this is ambitious and will take some effort for review, as this will feed directly into a number of different topics that we need to address this year, primarily what iGEM will be defining as the assembly standard beyond L0 basic parts (which also clearly needs work #236 #214) and vector construction and whether this would constitute testing and/or adoption.

Very much looking forward to this review/discussion! A core desired outcome of mine is to help move the ball forward on these topics for iGEM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants