Skip to content

2. Library Cloning

Tom Röschinger edited this page Jul 17, 2020 · 38 revisions

Initial Amplification of Oligo Library

Upon receiving the oligo pool from TWIST (which typically takes ~1-2 weeks) or IDT, we amplify the oligo pool with a small number of PCR cycles to increase the amount of DNA.

In Part 1 of this protocol, we discussed the importance of pooling small groups of mutated sequences together; typically 3-5 genes for each orthogonal primer pair. This grouping saves time in this part of the protocol, as it eliminates the need to perform a separate PCR reaction for each gene (and its 1200-1500 mutated versions).

The TWIST website provides instructions on oligo pool amplification. We recommend that you familiarize yourself with that page before proceeding.

An initial amplification of the aliquoted oligo library ensures that we have ample DNA for cloning of the library into plasmids, followed by transformation into E. coli or another organism of interest. This part of the protocol assumes that you have divided the oligo library into smaller aliquots, at a concentration of 10 ng/μL (see the last step of Part 1 of this protocol).

When you are ready to work with the DNA, follow the TWIST guidelines to amplify the oligo library. Remove a 5 μL aliquot of the oligo pool, and set up one PCR reaction per primer pair, with a total volume of 50 μL per PCR reaction. If you have 5 primer pairs (which should accomodate up to 25 TSS/genes), this would require 5 PCR reactions.

For each PCR reaction:

reagent concentration volume (μL)
DNA (oligo pool) 10ng/μL 1
Q5 polymerase mix 2x stock 25
Primer Fwd 10 μM 2.5
Primer Rev 10 μM 2.5
Water N/A 19

Mix these reagents carefully, and keep everything on ice throughout the experiment. The 'Primer Fwd' and 'Primer Rev' will differ based on the 20-nucleotide sequences flanking each group within the oligo pool (e.g. primer pair 101 is distinct from primer pair 102). We already have most primer pairs stored away at -20°C in the Phillips laboratory. The "forward" and "reverse" primers that you use for this amplification are simply the last 20 nucleotides in the oligo pool for each gene group that was ordered from TWIST.

Use the following thermocycler settings with 12 cycles.

cycles temperature time
1 98°C 30 seconds
12 98°C 10 seconds
12 64°C (anneal) 30 seconds
12 72°C (extend) 30 seconds
1 72°C 120 seconds
Hold 4°C

For more information on primer annealing temperatures (and a very user-friendly annealing temperature calculator), visit the NEB Tm calculator. For more information on the thermocycler conditions for Q5 polymerase, see the NEB website.

Given the low concentration of DNA involved within the oligo pool, even 12 cycles of amplification will not provide an overabundance of DNA. Loss of material is a major concern at this stage. Nonetheless, it is a good idea to perform a gel extraction at each step of amplification, as this can greatly reduce the number of "improper" length sequences. If you are having trouble with the amount of DNA that you are working with, it is usually better to do a few more cycles of amplification and still do the gel extraction. Perform a gel extraction by adding 10 μL of 6x NEB DNA dye to each 50 μL PCR reaction. Load the full volumes on a thick, 2% agarose gel. Perform electrophoresis for 45 minutes at 120V, or until the gel bands migrate more than halfway across the total length of the gel. Only load wells in the top row -- leave the bottom row empty. Use a scalpel to remove the DNA band corresponding to the amplified oligo libraries. Perform a gel extraction using one of many commercially-available kits. We have previously obtained good results with the Zymoclean Gel DNA Recovery Kit. Only extract the band that corresponds to the expected size of the amplified PCR product.

After performing the gel extraction according to the manufacturer's protocol, NanoDrop the eluted DNA and record the concentration and purity. Store away DNA at -20°C.

Barcoding of Oligo Library

After the first amplification (which does not add barcodes), it is time to add barcodes to the oligo pool. Barcoding DNA, as a process, involves adding random, unique, 20-nucleotide "barcodes" to a DNA sequence of interest. Barcodes assist in downstream sequencing, as each unique barcode is associated with one mutated promoter sequence. By "mapping" promoters to a unique barcode, and counting the relative frequency of each barcode via next-generation sequencing (NGS), gene expression can be computed for each mutated promoter.

Genetic barcodes can be added to the oligo pool via PCR, using primers with randomly-synthesized "overhangs". During extension of DNA, these random synthesis regions of the primer are incorporated into the amplified DNA. Adding barcodes via PCR is a biased process, however, wherein some barcodes are incorporated with a higher fidelity than others. Bias is especially pronounced at higher amplification cycles, when more DNA is present in a PCR reaction. Therefore, adding barcodes should be done with care to minimize bias wherever possible.

An important step to minimize bias in barcoding DNA involves performing quantitative PCR (qPCR) to determine the "optimal" number of cycles to amplify DNA without saturating a PCR reaction mixture. Specifically, we wish to determine the number of PCR cycles required for the oligo pool to be exponentially amplified, but not saturated. We then use that number of cycles to perform the "real" barcoding PCR reactions. qPCR also serves as an important control because it ensures that the oligos for each group of genes (with orthogonal primer pairs) are being amplified at relatively even levels.

There are many commercially-available qPCR reaction mixtures, which typically contain all of the salts, buffers, and polymerases necessary for quantitative amplification. These mixtures also contain dyes, which the qPCR machine uses to "read out" the amount of DNA present in a tube after each cycle of amplification.

In the past, our lab has used PerfeCTA SYBR Green SuperMix, Low ROX for qPCR. We used this product simply because it was available in lab already, and it worked. Other qPCR reagents, such as the NEB Luna qPCR kit, also work for this purpose.

When performing qPCR, you should use the actual "barcoding" primers that you will use for barcoding the oligo pools. The results from qPCR can change based on the primers used. Thus, it is important that you perform all steps in qPCR as you would in the actual, final PCR reaction. We discuss the sequence of these "barcoding primers" in the next step of this protocol, after first outlining the qPCR process.

Our lab has an Applied Biosystems qPCR machine with MxPro software installed. To perform qPCR, set up 20 μL reaction volumes, and also prepare a "blank" control sample (no template DNA added) for each group of genes (e.g. for each primer pair present in the oligo pool). Mix the following reagents on ice, preparing two reactions per gene "set", one with 1ng of DNA, and the other with 10ng of DNA. Note that the DNA template, in the reagent list below, is the PCR product that was gel-purified from the previous step in this protocol.

reagent volume (μL) concentration
PerfeCTa SYBR Green SuperMix, Low ROX (2X) 10 N/A
Forward primer 1 500 nM
Reverse primer 1 500 nM
Nuclease-free water 7 N/A
DNA template 1 10 ng/μL OR 1 ng/μL

Again, there should be 2 reactions per primer pair (one with 1ng of DNA and the other with 10ng of DNA used as template). There should also be a reaction in which no DNA was added at all for each primer pair. Replace the DNA template in this "blank" control by adding an additional 1 μL of water (in lieu of DNA).

Set up these qPCR reactions on ice and then load them into the qPCR machine. Specify the wells and reference dye in the MxPro software. The QuantaBio qPCR kit contains a reaction buffer with magnesium chloride, dNTPs (dATP, dCTP, dGTP, dTTP), AccuStart Taq DNA Polymerase, SYBR Green I dye, and a ROX Reference Dye. Input the dye information as necessary in the MxPro software, and then run the experiment with the following thermocycler settings (NOTE: AccuStart Taq will demand different settings than, say, the polymerase in the Luna qPCR kit. In an ideal world, you would use the same polymerase for both the qPCR and barcoding PCR reactions, as there are deviations in polymerase behavior (different annealing temperatures, different thermocycler settings, and wildly different fidelities). Check the specific thermocycler settings for each polymerase that you wish to use.

cycles temperature time
1 94°C 120 seconds
30 94°C 20 seconds
30 (anneal temp.) 30 seconds
30 72°C (extend) 20 seconds
HOLD 4°C

The annealing temperature should be determined using NEB's calculator) for each pair of primers used in the barcoding PCR reaction. When computing an annealing temperature, only input the nucleotides in the primer that actually bind to the template DNA -- do not include "overhang" nucleotides that do not actually anneal to the template DNA.

Once the qPCR has finished running, check the qPCR curves. These curves should resemble the curves shown in Fig. 1, below.

Figure 1: Image from ThermoFisher's resource page on qPCR. Each curve, plotted in various colors, represents a unique sample that was amplified. The farther to the right that a curve lies, the more cycles it took to amplify that signal to some arbitrary threshold (given by the red, horizontal line).

After some number of cycles (given on the x-axis), the amplification profile begins to increase until it reaches saturation. The position on the x-axis should be displaced between your samples -- "blank" samples should not amplify at all, while samples with 1 ng of DNA should have an amplification curve that is delayed when compared to samples containing 10 ng of DNA.

By looking at these amplification curves, it is straightforward to determine the "optimal" number of PCR cycles to use for barcoding; simply choose the number of cycles that correspond to the mid-point of the curves, for each primer pair, with 10 ng of template DNA. As an example, consider the red curve in the amplification plot above. The midpoint of this curve corresponds to approximately 10 cycles. Thus, you would use 10 cycles when performing your barcoding PCR. This is to ensure that DNA levels do not saturate, which can significantly bias your PCR amplifications when adding barcodes.

After performing qPCR, the next step is to perform the barcoding PCRs. This is done by setting up PCR reactions in much the same way, with the same primers, but with 2x Q5 polymerase mix rather than the 2x qPCR mix. We recommend setting up 50 μL reaction volumes (in triplicate, for each set of primers) to ensure that you obtain enough DNA for later steps of this protocol. Depending on your amplification results from the qPCR experiment, you should decide whether you'd like to use 1 ng template or 10 ng template; just be consistent and use the same DNA concentration across all primer sets. Program the thermocycler with the following settings:

cycles temperature time
1 98°C 30 seconds
see qPCR results 98°C 10 seconds
see qPCR results (anneal) 30 seconds
see qPCR results 72°C 30 seconds
1 72°C 120 seconds
Hold 4°C

After amplifying the oligo libraries for each primer pair, perform another gel extraction. Add 10 μL of 6x NEB DNA dye to each 50 μL PCR reaction. Load the full volumes on a thick, 2% agarose gel. Perform electrophoresis for 45 minutes at 120V. Use a scalpel to remove the DNA band corresponding to the amplified oligo libraries. Perform a gel extraction using one of many commercially-available kits. We have previously obtained good results with the Zymoclean Gel DNA Recovery Kit. Only extract that band which corresponds to the expected size of the amplified PCR product.

Note on Use of the qPCR Machine

After using the qPCR machine, ensure that the lamp is turned off. When starting a run, there is a checkbox that enables you to automatically turn the lamp to "off" once the run has been completed. Prolonged use of the lamp can burn it out. Also, ensure that you use optically-clear qPCR tubes for your samples and controls, and avoid introducing any air bubbles in your samples. Lastly, do not mark on the tops of the clear qPCR tubes with a marker -- use the sides of the tubes if necessary.

Primer Design Considerations for Barcoding Oligo Pools

Amplification of oligo pools must account for several considerations. First, the reverse primer in the barcoding PCR should add the 20nt, random barcode sequence. Such a primer can be ordered from IDT by inputting 'NNNNNNNNNNNNNNNNNNNN' in the desired barcode position. The reverse primer must also contain a region that overlaps and binds to the DNA template; shoot for an annealing temperature between 61°C - 64°C. The reverse primer, finally, should include a sequence to be used for cloning into the desired plasmid later (via Golden Gate or Gibson assembly, depending on the plasmid to be used). We discuss this in greater detail in the next part of this protocol. The forward primer for barcode PCRs must also contain a sequence that binds to the DNA template (be sure to match the annealing temperatures for forward and reverse primers using the NEB Tm calculator). There is no need to add a barcode via this primer. However, you must add a sequence corresponding to the other Gibson site (or a sequence that includes restriction sites, if using genome integration).

In all situations, design your primers carefully and pay close attention to the plasmid sequences on Benchling. -Reg-Seq on Plasmid-Encoded Libraries (pJK14) -Reg-Seq on Genome-Integrated Libraries (pLibAcceptorV2)

The final constructs should be designed as follows: the barcode should be inserted 110 base pairs from the 5’ end of the mRNA, followed by 45 base pairs from the targeted regulatory region, followed by 64 base pairs containing primer sites used in the construction of the plasmid, and 11 base pairs containing a three frame stop codon. Following the barcode there is an RBS and a GFP coding region. All of the sequences used in the original Reg-Seq study can be found in Supplementary Table 1 of the paper.

Insertion of Barcoded Library into Plasmid Backbone

After amplifying, barcoding, and purifying the variety of oligo pools, the next step is to insert each of these library "groups" into a plasmid, which can then be cloned into E. coli. In prior Reg-Seq experiments, all oligo pools were cloned and expressed from a plasmid; they were not genome-integrated. For the purposes of this protocol, we will discuss both plasmid expression and a potential method for genome-integration and sequencing of DNA libraries.

Cloning and Expressing Oligo Libraries from a Plasmid (Gibson assembly)

Reg-Seq experiments have previously been performed by cloning oligo libraries into pJK14 plasmid (SC101 origin) via Gibson assembly. Overhang "arms" from the PCR amplicons with homology to this plasmid were used to insert the oligo pools. To use Gibson assembly for cloning the PCR-amplified, barcoded oligo pools into pJK14, you should first amplify the backbone using the primer-binding sites specified on the Benchling sequence (https://benchling.com/s/seq-M9lQusDbSzsjmGihPxYr). See the pink annotations on the Benchling sequence for the Gibson sites and primer amplification binding sites. This plasmid also encodes kanamycin resistance.

Perform Gibson assembly according to NEB's instructions. Prior to electroporation, perform drop dialysis with water for at least 30 minutes. Electroporate into highly electrocompetent DH5α cells (these can be purchased from NEB), shooting for a time constant of electroporation (1800 mV) exceeding 5.0 milliseconds. If desired, one can also electroporate directly into the strain to be studied (E. coli K-12 MG1655), but it is typically a good idea to first electroporate into a highly competent strain, isolate the library again (there should now be many, many more copies), perform routine checks (e.g. gel electrophoresis and so forth), and then transform into the final strain. This also enables one to store the isolated, cloned DNA library for future use in raw, DNA form.

Cloning and Genome-Integrating Oligo Libraries

In future iterations of Reg-Seq, it might be a good idea to clone promoter mutant libraries directly into the genome of E. coli. Lambda red recombination is not an efficient process, however, and thus would highly bias the library of interest. Therefore, we have previously tested other methods, especially those used by the Kosuri lab at UCLA.

In our lab, we have a plasmid called pLibAcceptorV2 (Addgene link) which can be used to genome-integrate any DNA sequences cloned onto this plasmid into a genetic locus of interest. This method requires two crucial steps:

  1. The accepting strain of E. coli (K-12 MG1655) must first be genomically modified, using classical recombination, by inserting a "landing pad" at the genomic position of interest. Specifically, lox sites must be inserted into the genome position of interest, which correspond to lox sites on the pLibAcceptorV2 plasmid. lox71 recombines with lox66. Note that the genomic landing pad sequences are reversed, so one must consider their positioning if concerned with the orientation of the reporter after integration.

Construct Site Sequences (Spacer in bold)

Construct Site Sequence (spacer in bold)
pLibAcceptorV2 lox66 5'- ATAACTTCGTATAGCATACATTATACGAAcggta -3'
pLibAcceptorV2 _lox_m2/71 5'- taccgTTCGTATATGGTTTCTTATACGAAGTTAT -3'
Genomic Landing Pad _lox_m2/66 (reverse orientation) 5'- taccgTTCGTATAAGAAACCATATACGAAGTTAT -3'
Genomic Landing Pad _lox_71 (reverse orientation) 5'- ATAACTTCGTATAATGTATGCTATACGAAcggta -3'

The landing pad sequence can be found at this Benchling link. This landing pad also contains constitutively active mCherry & Chloramphenicol resistance markers flanked by complementary loxP sites. Insert this into the genomic position of interest using loxP recombination, the full protocol for which is provided in Part 3 of this protocol.

The pLibAcceptorV2 plasmid also contains an arabinose-inducible Cre-recombinase, a heat curable origin of replicate (to remove the plasmid after cre-lox cassette exchange), and a selectable marker/library cloning site flanked by loxP sites. Built around the library cloning site are priming sites for sequencing as well as terminators to block outside transcriptional interference. The plasmid also has restriction sites built-in for restriction-based cloning of oligo libraries.

If you wish to perform genome-integration for oligo library cloning, you should first purify ample amounts of pLibAcceptorV2 plasmid (at least 666 ng per library transformation) by growing cells expressing this plasmid at 30°C for 24 hours, and then performing a maxiprep to isolate plasmid (which is very low-copy). This plasmid cannot tolerate temperatures higher than 30°C. DNA should then be concentrated using a Promega Wizard SV Gel and PCR Clean-up System. This ensures that genomic DNA, which can sometimes "bleed" through a maxiprep, is fully removed and solely plasmid remains.

A full, highly detailed protocol that walks through all of the steps for cloning and genomically-integrating a pLib-library can be found in this GitHub repository in the Home directory, or at the link here. This protocol was developed and validated by Guillaume Urtecho, a former graduate student in the Kosuri lab at UCLA.

Notes on Plating and Scraping of Libraries

After electroporation and recovery of the E. coli cells, the cell+LB mixture should be plated onto oversized petri dishes filled with LB + kanamycin at a concentration of 25 μg/ml. Prepare at least three plates per oligo "pool" -- one plate with 300 μL of undiluted recovery mixture, one plate with a 1:10 dilution, and so forth (1:1, 1:10, 1:100 dilutions).

After 24 hours of growth at 30°C, physically scrape the plates using sterilized glass pipettes by adding 100 μL of sterile LB to each plate, scraping the cells off the surface, resuspending them with a clean pipette, and then storing them in 10% glycerol for later use.

To grow the cell libraries, inoculate 800 million cells into 450 mL M9 + 0.5% glucose and grow at 30°C to an OD600 of 0.3. The number of cells in each library can be determined by preparing 1:100 and 1:1000 dilutions (in sterile M9) and then measuring the OD. Use the Agilent OD600 calculator to determine how many cells are in the undiluted cell mixture. Note that an OD600 of 1.0 = 800 million cells/ml.

Growth and DNA Isolation of Library

Remove the growth culture that has reached OD600 0.3 and extract the DNA using a miniprep kit from Qiagen. Follow the manufacturer's protocol. Each gene group (recall the primer sets from earlier) should be grown and isolated independently. Store away each DNA library at -20°C until ready for sequencing.