QUEEN (a framework to generate quinable and efficiently editable nucleotide sequence resources) is a Python programming module designed to describe, share credit DNA building processes and resources. DNA parts information can be imported from external annotated DNA files (GenBank and FASTA format). Output file (GenBank format) encodes the complete information of the constructed DNA and its annotations and enables the production of a quine code that self-reproduces the output file itself. In QUEEN, all of the manipulations required in DNA construction are covered by four simple operational functions, "cutdna", "modifyends", "flipdna", and "joindna" that can collectively represent any of the standard molecular DNA cloning processes, two search functions, "searchsequence" and "searchfeature", and two super functions, "editsequence" and "editfeature". A new DNA can be designed by programming a Python script or using Jupyter Notebook, an interactive Python programming interpreter. The designed DNA product can be output in the GenBank file format that involves the history of its building process. The "quinable" feature of a QUEEN-generated GenBank file certifies that the annotated DNA material information and its production process are fully transparent, reproducible, inheritable, and modifiable by the community.
If you've found QUEEN is useful for your research, please consider citing our paper published in Nature Communications.
Mori, H., Yachie, N. A framework to efficiently describe and share reproducible DNA materials and construction protocols. Nat Commun 13, 2894 (2022). https://doi.org/10.1038/s41467-022-30588-x
- Software dependency
- Installation
- Usage
- QUEEN class
- Output functions
- Search Function
- Operational functions
- Common parameters of the quinable functions
- Quine
- Visualization
qexperiment module was added
qexperiment module enable users to easily describe and simulate experimental molecular cloing process with newly implemented methods based on actual experimental methods. For now, the following methods can be available.
- pcr
- digestion
- ligation
- homology_based_assembly
- annealing
- gateway_reaction
- goldengate_assembly
- primerdesigin (prototype)
For deatails, please see qexperiment_usage.md, qexperiment_demo.ipynb and Google colab notebook.
Please see changelog.md.
Python 3.7.0 or later
-
Install QUEEN using the following command.
For the official release (v1.1.0) on the Python Package Indexpip install python-queen
For the developmental version on GitHub
pip install git+https://github.com/yachielab/QUEEN.git@(branch name)
-
Install Graphviz (optional; required for visualizing flowcharts of DNA building processes using the
visualizeflow()
function described below). Graphviz package is available at the following link.
QUEEN provides the QUEEN class to define a double-stranded (ds)DNA object with sequence annotations. The QUEEN class and its operational functions are described below. Jupyter Notebook files for all of the example codes are provided in ./demo/tutorial
of QUEEN (https://github.com/yachielab/QUEEN) and made executable in Google Colaboratory.
Command line interface
A part of QUEEN functions can also be used from the command line interface (CLI) instead of describing python codes.
For details, please see CLI_usage.md
Simulators for general molecular cloning methods
simple molecular cloning simulators for both homology-based and digestion/ligation-based assembly are provided on Google colab. By using these simulators, you can exeperience the benefits of QUEEN without describing python codes.
The QUEEN class can define a dsDNA object with sequence annotations. It can be created by specifying a DNA sequence or importing a sequence data file in GenBank or FASTA file format (single sequence entry). When a GenBank format file is imported, its NCBI accession number, Addgene plasmid ID, or Benchling share link can be provided instead of downloading the file to your local environment.
A QUEEN_object
(blunt-end) is created by providing its top-stranded sequence (5’-to-3’). By default, the DNA topology will be linear.
(Expected runtime: less than 1 sec.)
Source code
from QUEEN.queen import *
dna = QUEEN(seq="CCGGTATGCGTCGA")
The left and right values separated by "/"
show the top and bottom strand sequences of the generating QUEEN_object
, respectively. The top strand sequence is provided in the 5’-to-3’ direction from left to right, whereas the bottom strand sequence is provided in the 3′-to-5′ direction from left to right. Single-stranded regions can be provided by "-"
for the corresponding nucleotide positions on the opposite strands. A:T and G:C base-pairing rule is required between the two strings except for the single-stranded positions.
(Expected runtime: less than 1 sec.)
Source code
from QUEEN.queen import *
dna = QUEEN(seq="CCGGTATGCG----/----ATACGCAGCT")
The sequence topology of generating QUEEN_object
can be specified by "linear"
or "circular"
.
(Expected runtime: less than 1 sec.)
Source code
from QUEEN.queen import *
dna = QUEEN(seq="CCGGTATGCGTCGA", topology="circular")
The single strand QUEEN_object
can be generated by specifying ssdna=True
.
(Expected runtime: less than 1 sec.)
Source code
from QUEEN.queen import *
dna = QUEEN(seq="CCGGTATGCGTCGA", ssdna=True)
GenBank file can be loaded by specifying its local file path.
(Expected runtime: less than 1 sec.)
Source code
from QUEEN.queen import *
pUC19 = QUEEN(record="./input/pUC19.gbk")
QUEEN_object can be generated from a NCBI accession number with dbtype="ncbi"
.
(Expected runtime: less than 1 sec.)
Source code
from QUEEN.queen import *
#"M77789.2" is NCBI accession number for pUC19 plasmid
pUC19 = QUEEN(record="M77789.2", dbtype="ncbi")
QUEEN_object
can be generated from an Addgene plasmid ID with dbtype="addgene"
.
(Expected runtime: less than 1 sec.)
Source code
from QUEEN.queen import *
#"50005" is Addgene plasmid ID for pUC19 plasmid
pUC19 = QUEEN(record="50005", dbtype="addgene")
QUEEN_object
can be generated from a Benchling shared link with dbtype="benchling"
.
(Expected runtime: less than 1 sec.)
Source code
from QUEEN.queen import *
plasmid = QUEEN(record="https://benchling.com/s/seq-U4pePb09KHutQzjyOPQV", dbtype="benchling")
pX330 plasmid encoding a Cas9 gene and a gRNA expression unit is provided in the above example. The QUEEN_object
generated here is used in the following example codes in this document.
-
.project:
str
Project name ofQUEEN_object
construction. In QUEEN, this property is also used as a dictionary key to access the.productdict
described below. If aQUEEN_object
is created from a GenBank or FASTA file, its sequence ID will be inherited here. Otherwise, the project name is automatically generated to be unique amongst the existing.productdict
keys. -
.seq:
str
Top strand sequence (5′→3′). This property cannot be directly edited; only the built-in operational functions of QUEEN described below can edit this property. -
.rcseq:
str
Bottom strand sequence (5′→3′). This property cannot be directly edited; only the built-in operational functions of QUEEN described below can edit this property. -
.topology:
str
("linear"
or"circular"
)
Sequence topology. When aQUEEN_object
is created by loading from a GenBank file, the topology is set according to the description in the GenBank file. Only the built-in operational functions of QUEEN described below can edit this property. -
.dnafeatures:
list
ofDNAfeature_objects
When aQUEEN_object
is loaded from a GenBank file,.dnafeatures
will automatically be generated from the GenBank file's sequence features. Otherwise,.dnafeatures
will be an empty list. EachDNAfeature_object
with the following attributes provides an annotation for a given range of DNA sequence in aQUEEN_object
.- .feature_id:
str
Unique identifier. It is automatically determined to each feature when aQUEEN_object
is loaded from a GenBank file. - .feature_type:
str
Biological nature. Any value is acceptable. The GenBank format requires registering a biological nature for each feature. - .start:
int
Start position ofDNAfeature_object
inQUEEN_object
. - .end:
int
End position ofDNAfeature_object
inQUEEN_object
. - .strand:
int (1 or -1)
Direction ofDNAfeature_object
inQUEEN_object
. Top strand (1
) or bottom strand (-1
). - .sequence:
str
Sequence of theDNAfeature_object
for its encoded direction. - .qualifiers:
dict
Qualifiers. When a GenBank file is imported, qualifiers of each feature will be registered here. Qualifier names and values will serve as dictionary keys and values, respectively.
DNAfeature_object
can be edited only by theeditfeature()
function described below. DNAfeature class is implemented as a subclass of the Biopython SeqFeature class. Therefore, apart from the above attributes, DNAfeature class inherits all the attributes and methods of SeqFeature class. For details about SeqFeature class, see (https://biopython.org/docs/dev/api/Bio.SeqFeature.html) - .feature_id:
-
.productdict:
dict
Dictionary for all of the inheritedQUEEN_objects
used to construct the presentQUEEN_object
. The.project
of eachQUEEN_object
serves as a key of this dictionary. -
.record:
Bio.SeqRecord
Bio.SeqRecord object that was used as the source for creating the QUEEN object.
QUEEN_objects
hold a simple set of functions to output its information.
-
Returns and displays partial or the entire dsDNA sequence and sequence end structures of
QUEEN_object
.- start:
int
(zero-based indexing; default:0
)
Start position of the sequence. - end:
int
(zero-based indexing; default: the last sequence position ofQUEEN_object
)
End position of the sequence. - strand:
int
:1
(top strand only),-1
(bottom strand only), or2
(both strands) (default:2
)
Sequence strand(s) to be returned. - display:
bool
(True
orFalse
; default:False
)
IfTrue
, the output will be displayed inSTDOUT
. - hide_middle:
int
orNone
(default:None
)
Length of both end sequences to be displayed. - linebreak:
int
(default: length of theQUEEN_object
sequence)
Length of sequence for linebreak.
If
strand
is1
or-1
, sequence of the defined strand (5’→3’)
Ifstrand
is2
,"str/str"
: "top strand sequence (5’→3’)/bottom strand sequence (3’→5’)" - start:
(Expected runtime: less than 1 sec.)
Source code
from queen import *
fragment = QUEEN(seq="CCGGTATGCG----/----ATACGCAGCT")
fragment.printsequence(display=True)
Output
5′ CCGGTATGCG---- 3′
3′ ----ATACGCAGCT 5′
-
.printfeature(feature_list=list, attribute=list, seq=bool, separation=str, output=str, x_based_index=int)
Print a tidy data table of annotation features/attributes of
QUEEN_object
. Default output attributes are"feature_id"
,"feature_type"
,"qualifier:label"
,"start"
,"end"
, and"strand"
.-
feature_list:
list
ofDNAfeaure_objects
(default:.dnafeatures
)
List of features to be displayed in the output table. If not given, all features held by the QUEEN_object will be the subject. -
attribute:
list
of feature attributes (default:["feature_id", "feature_type", "qualifier:label", "start", "end", "strand"]
) List of feature attributes to be displayed in the output table. If the value is"all"
, it will generate a table for all the attributes held by theQUEEN_object
except for"sequence"
. -
seq:
bool
(True
orFalse
; default:False
)
IfTrue
, the sequence of each feature for its encoded direction will be displayed in the output table. -
separation:
str
(default: space(s) to generate a well-formatted table)
String to separate values of each line. -
output:
str
(default:STDOUT
)
Output file name. -
x_based_index:
0
or1
(default:0
)
As a default, positions of all features are given in the zero-based indexing in QUEEN (same as Python). If this parameter is set to1
, they will be shown in the 1-based indexing (as seen in the GenBank format).None
(Expected runtime: less than 1 sec.)
Source code
from QUEEN.queen import * plasmid = QUEEN(record="input/px330.gb") plasmid.printfeature()
Output
feature_id feature_type qualifier:label start end strand 1 source source 0 8484 + 100 primer_bind hU6-F 0 21 + 200 promoter U6 promoter 0 241 + 300 primer_bind LKO.1 5' 171 191 + 400 misc_RNA gRNA scaffold 267 343 + 500 enhancer CMV enhancer 439 725 + 600 intron hybrid intron 983 1211 + 700 regulatory Kozak sequence 1222 1232 + 800 CDS 3xFLAG 1231 1297 + 900 CDS SV40 NLS 1303 1324 + 1000 CDS Cas9 1348 5449 + 1100 CDS nucleoplasmin NLS 5449 5497 + 1200 primer_bind BGH-rev 5524 5542 - 1300 polyA_signal bGH poly(A) signal 5530 5738 + 1400 repeat_region AAV2 ITR 5746 5876 + 1500 repeat_region AAV2 ITR 5746 5887 + 1600 rep_origin f1 ori 5961 6417 + 1700 primer_bind F1ori-R 6048 6068 - 1800 primer_bind F1ori-F 6258 6280 + 1900 primer_bind pRS-marker 6433 6453 - 2000 primer_bind pGEX 3' 6552 6575 + 2100 primer_bind pBRforEco 6612 6631 - 2200 promoter AmpR promoter 6698 6803 + 2300 CDS AmpR 6803 7664 + 2400 primer_bind Amp-R 7021 7041 - 2500 rep_origin ori 7834 8423 + 2600 primer_bind pBR322ori-F 8323 8343 +
-
-
Output
QUEEN_object
to a GenBank file. In addition to all of theDNAfeature_objects
in the inputQUEEN_object
, aDNAfeature_object
encoding the entire construction processes that generated theQUEEN_object
inqualifiers:building_history
will also be output to the GenBank file.-
output:
str
(default:STDOUT
)
Output file name. -
format:
str
(default:"genbank"
)
Output file format ("genbank" or "fasta") -
annotation:
str
(default:None
)
Dictionary of annotations for the genbank.
For details, please see https://biopython.org/docs/latest/api/Bio.SeqRecord.html. -
export_history:
bool
(default:True
)
If False, construnction history of theQUEEN_object
will not be output.None
-
QUEEN_objects
hold .searchsequene()
and .searchfeature()
functions that enables users to search for query sequences and values in DNAfeature_objects
.
-
.searchsequence (query=regex or str, start=int, end=int, strand=int, product=str, process_name=str, process_description="str")
Search for specific sequences from a user-defined region of a
QUEEN_object
and return a list ofDNAfeature_objects
. Start and end attributes of returnedDNAfeature_objects
represent the sequence regions of theQUEEN_object
that matched the user's query. Note that the returnedDNAfeature_objects
will not be generated with.feature_id
and reflected to the parentalQUEEN_object
**. **The returnedDNAfeature_objects
can be added toQUEEN_object.dnafeatures
byeditfeature()
with thecreateattribute
option as explained below.- query:
regex
orstr
(default:".+"
)
Search query sequence. If the value is not provided, the user-specified search region of theQUEEN_object
sequence withstart
andend
explained below will be returned. It allows fuzzy matching and regular expression. For details, see https://pypi.org/project/regex/. All IUPAC nucleotide symbols can be used. Restriction enzyme cut motif representation can be used to define a query with"^"
and"_"
or"(int/int)"
. For example, EcoRI cut motif can be provided by"G^AATT_C"
, where"^"
and"_"
represent the cut positions on the top and bottom strands, respectively, or by"GAATTC(-5/-1)"
or"(-5/-1)GAATTC"
, where the left and right integers between the parentheses represent the cut positions on the top and bottom strands, respectively. Similarly, the cut motif of a Type-IIS restriction enzyme BsaI can be given by"GGTCTCN^NNN_N"
,"N^NNN_NGAGACC"
,"GGTCTC(1/5)"
or"(5/1)GAGACC"
. The returnedDNAfeature_objects
obtained for a query restriction enzyme cut motif will hold the cutting rule in thequalifier:cutsite"
attribute, which can be added toQUEEN_object.dnafeatures
byeditfeature()
with thecreateattribute
option as explained below. Regular expression is disabled for restriction enzyme cut motifs. - start:
int
(zero-based indexing; default:0
)
Start position of the target range of theQUEEN_object
sequence for the search. - end:
int
(zero-based indexing; default: the last sequence position ofQUEEN_object
)
End position of the target range of theQUEEN_object
sequence for the search. - strand:
int
:1
(top strand only),-1
(bottom strand only), or2
(both strands) (default:2
)
Sequence strand to be searched. - unique:
bootl
:True
orFalse
(default: False) If the value isTrue
and multiple (more than a single) sequence region are detected in the search, it would raise error. If False, multiple seaquence detections could be acceptable.
list
(list
ofDNAfeature_objects
)(Expected runtime: less than 1 sec.)
Source code (continued from the previous code)
match_list = plasmid.searchsequence(query="G[ATGC]{19}GGG") plasmid.printfeature(match_list, seq=True, attribute=["start", "end", "strand"])
Output
start end strand sequence 115 138 + GTAGAAAGTAATAATTTCTTGGG 523 546 + GACTTTCCATTGACGTCAATGGG 816 839 + GTGCAGCGATGGGGGCGGGGGGG 1372 1395 + GACATCGGCACCAACTCTGTGGG 1818 1841 + GGCCCACATGATCAAGTTCCGGG 3097 3120 + GATCGGTTCAACGCCTCCCTGGG 3300 3323 + GCGGCGGAGATACACCGGCTGGG 3336 3359 + GAAGCTGATCAACGGCATCCGGG 3529 3552 + GGCAGCCCCGCCATTAAGAAGGG 3577 3600 + GACGAGCTCGTGAAAGTGATGGG ︙ 493 516 - GCGTTACTATTGACGTCAATGGG 654 677 - GTCCCATAAGGTCATGTACTGGG 758 781 - GGTGGGGAGGGGGGGGAGATGGG 1014 1037 - GCGCGAGGCGGCGGCGGAGCGGG 1301 1324 - GACCTTCCGCTTCTTCTTTGGGG 1820 1843 - GCCCCGGAACTTGATCATGTGGG 2090 2113 - GAAGTTGCTCTTGAAGTTGGGGG 2183 2206 - GGCGTACTGGTCGCCGATCTGGG 2288 2311 - GATCATAGAGGCGCTCAGGGGGG 2689 2712 - GCCAGAGGGCCCACGTAGTAGGG ︙
- query:
Search for "AAAAAAAA"
sequence, permitting a single nucleotide mismatch.
(Expected runtime: less than 1 sec.)
Source code (continued from the previous code)
match_list = plasmid.searchsequence(query="(?:AAAAAAAA){s<=1}")
plasmid.printfeature(match_list, seq=True)
Output
feature_id feature_type qualifiers:label start end strand sequence
null misc_feature null 5484 5492 + AAAAAAGA
null misc_feature null 6369 6377 + AACAAAAA
null misc_feature null 7872 7880 + AAACAAAA
null misc_feature null 346 354 - AAAACAAA
null misc_feature null 799 807 - AAAAAATA
null misc_feature null 1201 1209 - GAAAAAAA
null misc_feature null 6716 6724 - AAAAATAA
null misc_feature null 7844 7852 - AGAAAAAA
(Expected runtime: less than 1 sec.)
Source code (continued from the previous code)
match_list = plasmid.searchsequence(query="SWSWSWDSDSBHBRHH")
plasmid.printfeature(match_list, seq=True)
Output
feature_id feature_type qualifiers:label start end strand sequence
null misc_feature null 4098 4114 + GAGACAGCTGGTGGAA
null misc_feature null 3550 3566 - CTGTCTGCAGGATGCC
null misc_feature null 5239 5255 - CTCTGATGGGCTTATC
null misc_feature null 6415 6431 - GAGAGTGCACCATAAA
null misc_feature null 8357 8373 - GTCAGAGGTGGCGAAA
-
.searchfeature(key_attribute=str, query=regex or str, source=list of DNAfeature_objects, start=int, end=int, strand=int)
Search for
DNAfeature_objects
holding a queried value in a designatedkey_attribute
inQUEEN_object
.-
key_attribute:
str
(default:"all"
)
Attribute type to be searched (feature_id
,feature_type
,"qualifier:*"
, orsequence
). If the value is not provided, it will be applied to all of the attributes in theQUEEN_object
, excludingsequence
. However, if thequery
value is provided with only the four nucleotide letters (A, T, G, and C), this value will be automatically set tosequence
. -
query:
regex
orstr
(default:".+"
)
Query term.DNAfeature_objects
that have a value matches to this query forkey_attribute
designated above will be returned. It allows fuzzy matching and regular expression. For details, see https://pypi.org/project/regex/. If thekey_attribute
issequence
, all IUPAC nucleotide symbols can be used. -
source:
list
of_DNAfeature_objects
(default:QUEEN_object.dnafeatures
)
SourceDNAfeature_objects
to be searched.DNAfeature_objects
outside the search range defined bystart
,end
, andstrand
will be removed from the source. AnyDNAfeature_objects
can be provided here. For example, a list ofDNAfeature_objects
_returned from anothersearchsequence()
orsearchfeature()
operation can be used as the source to achieve an AND search with multiple queries. -
start:
int
(zero-based indexing; default:0
)
Start position of the target range of theQUEEN_object
sequence for the search. -
end:
int
(zero-based indexing; default: the last sequence position ofQUEEN_object
)
End position of the target range of theQUEEN_object
sequence for the search. -
strand:
int
:1
(top strand only),-1
(bottom strand only), or2
(both strands) (default:2
)
Sequence strand to be searched.list
(list
ofDNAfeature_objects
)
-
Search for DNAfeature_objects
with a feature type "primer_bind"
, and then further screen ones holding a specific string in "qualifier:label"
.
(Expected runtime: less than 1 sec.)
Source code (continued from the previous code)
feature_list = plasmid.searchfeature(key_attribute="feature_type", query="primer_bind")
plasmid.printfeature(feature_list)
sub_feature_list = plasmid.searchfeature(key_attribute="qualifier:label", query=".+-R$", source=feature_list)
plasmid.printfeature(sub_feature_list)
Output
feature_id feature_type qualifiers:label start end strand
200 primer_bind hU6-F 0 21 +
300 primer_bind LKO.1 5' 171 191 +
1200 primer_bind BGH-rev 5524 5542 -
1700 primer_bind F1ori-R 6048 6068 -
1800 primer_bind F1ori-F 6258 6280 +
1900 primer_bind pRS-marker 6433 6453 -
2000 primer_bind pGEX 3' 6552 6575 +
2100 primer_bind pBRforEco 6612 6631 -
2400 primer_bind Amp-R 7021 7041 -
2600 primer_bind pBR322ori-F 8323 8343 +
feature_id feature_type qualifiers:label start end strand
1700 primer_bind F1ori-R 6048 6068 -
2400 primer_bind Amp-R 7021 7041 -
QUEEN objects can be manipulated by four simple operational functions, cutdna()
, modifyends()
, flipdna()
, and joindna()
, that can collectively represent any of the standard molecular DNA cloning processes, and two super functions, editsequence()
and editfeature()
.
-
Cut
QUEEN_object
at queried positions or by queriedDNAfeature_object
and return a list of fragmentedQUEEN_object
. Each existingDNAfeature_object
in the originalQUEEN_object
will be inherited to the generatingQUEEN_object
. Suppose anyDNAfeature_objects
are at the cut boundaries being split into fragments. In that case, eachDNAfeature_object
will also be carried over to the newQUEEN_object
with the"qualifier:broken_feature"
attribute to be"[.project of the original QUEEN_object]:[.feature_id of the original DNAfeature_object]:[sequence length of the original DNAfeature_object]:[sequence of the original DNAfeature_object]:[start..end positions of the original DNAfeature_object in the sequence of the original QUEEN_object]:[5'..3' end positions of the broken DNAfeature_object in the original DNAfeature_object]"
. This function also linearizes a circularQUEEN_object
.- input:
QUEEN_object
- cutsites:
list
ofint
,"int/int"
, and/orDNAfeature_objects
List of cut positions. For blunt-end cut, a cut position can be provided byint
. For sticky-end cut, a cut position can be specified by"int/int"
, where the left and right integers represent cut positions on the top and bottom strands, respectively.DNAfeature_objects
holding"qualifier:cut_site"
attributes can also be provided to cut a query DNA. This operation cannot proceed with multiple cut sites where a nicking or blunt-end cut of a cutting event happens between two nick positions of another sticky-end cut.
Valid case:
cutdna(object, *["100/105", "120/110", "50/55"])
Invalid case:cutdna(object, *["50/105", "100/55", "120/110"])
list
(list
ofQUEEN_objects
) - input:
Cut a circular plasmid px330 at the three different positions, resulting in the generation of three fragments. Then, cut one of the three fragments again.
(Expected runtime: less than 1 sec.)
Source code (continued from the previous code)
print(plasmid)
fragments = cutdna(plasmid ,1000, 2000, 4000)
print(fragments)
fragment3, fragment4 = cutdna(fragments[1], 500)
print(fragment3)
print(fragment4)
Output
<queen.QUEEN object; project='pX330', length='1000 bp', topology='linear' >, <queen.QUEEN object; project='pX330', length='2000 bp', topology='linear' >, <queen.QUEEN object; project='pX330', length='5484 bp', topology='linear' >]
<queen.QUEEN object; project='pX330', length='500 bp', topology='linear' >
<queen.QUEEN object; project='pX330', length='1500 bp', topology='linear' >
If an invalid cut pattern are provided, an error message will be returned.
Source code (continued from the previous code)
fragments = cutdna(plasmid, *["50/105", "100/55", "120/110"])
Error message
ValueError: Invalid cut pattern.
Digestion of pX330 plasmid with EcoRI can be simulated as follows.
-
Search for EcoRI recognition sites in pX330 with its cut motif and obtain the
DNAfeature_objects
representing its cut position(s) and motif. -
Use the
DNAfeature_objects
to cut pX330 bycutdna()
.(Expected runtime: less than 1 sec.)
Source code (continued from the previous code)
sites = plasmid.searchsequence("G^AATT_C") fragments = cutdna(plasmid, *sites) for fragment in fragments: print(fragment) fragment.printsequence(display=True, hide_middle=10)
Output
<queen.QUEEN object; project='pX330', length='8488 bp', topology='linear' > 5' AATTCCTAGA...AGTAAG---- 3' 3' ----GGATCT...TCATTCTTAA 5'
QUEEN provides a library of restriction enzyme motifs (described in the New England Biolab's website).
Source code (continued from the previous code)
from QUEEN import cutsite #Import a restriction enzyme library sites = plasmid.searchsequence(cutsite.lib["EcoRI"]) fragments = cutdna(plasmid, *sites) for fragment in fragments: print(fragment) fragment.printsequence(display=True, hide_middle=10)
Output
<queen.QUEEN object; project='pX330', length='8488 bp', topology='linear' > 5' AATTCCTAGA...AGTAAG---- 3' 3' ----GGATCT...TCATTCTTAA 5'
(Expected runtime: less than 1 sec.)
Source code (continued from the previous code)
sites = plasmid.searchsequence("GAAGAC(2/6)")
fragments = cutdna(plasmid,*sites)
for fragment in fragments:
print(fragment)
fragment.printsequence(display=True, hide_middle=10)
Output
<queen.QUEEN object; project='pX330', length='8466 bp', topology='linear' >
5' GTTTTAGAGC...ACGAAA---- 3'
3' ----ATCTCG...TGCTTTGTGG 5'
<queen.QUEEN object; project='pX330', length='26 bp', sequence='CACCGGGTCTTCGAGAAGACCTGTTT', topology='linear'>
5' CACCGGGTCT...AGACCT---- 3'
3' ----CCCAGA...TCTGGACAAA 5'
Here, the BbsI recognition motif can also be represented by "(6/2)GTCTTC"
, "GAAGACNN^NNNN_"
or "^NNNN_NNGTCTTC"
.
The BbsI recognition motif is also available from the library of restriction enzyme motifs.
Source code (continued from the previous code)
from QUEEN import cutsite #Import a restriction enzyme library
sites = plasmid.searchsequence(cutsite.lib["BbsI"])
fragments = cutdna(plasmid, *sites)
for fragment in fragments:
print(fragment)
fragment.printsequence(display=True, hide_middle=10)
Output
<queen.QUEEN object; project='pX330', length='8466 bp', topology='linear' >
5' GTTTTAGAGC...ACGAAA---- 3'
3' ----ATCTCG...TGCTTTGTGG 5'
<queen.QUEEN object; project='pX330', length='26 bp', sequence='CACCGGGTCTTCGAGAAGACCTGTTT', topology='linear'>
5' CACCGGGTCT...AGACCT---- 3'
3' ----CCCAGA...TCTGGACAAA 5'
-
cropdna(input=QUEEN_object, start=int, "int/int", or DNAfeature_object, end=int, "int/int", or DNAfeature_object)
This is a subfunction of
cutdna()
and extracts a partial fragment fromQUEEN_object
.- input:
QUEEN_object
- start:
int
,"int/int"
(zero-based indexing) orDNAfeature_object
(default:0
)
Start position of the fragment of theQUEEN_object
sequence to be trimmed. - end:
int
,"int/int"
(zero-based indexing)or
DNAfeature_object (default: the last sequence position ofQUEEN_object
)
End position of the fragment of theQUEEN_object
sequence to be trimmed. If the topology of theQUEEN_object
is"linear"
, the end position must be larger than thestart
position. If the topology is"circular"
and thestart
position is larger than theend
position, the fragment across the zero position will be returned.
QUEEN_object
- input:
If the second fragment of "Example code 11" is for further manipulation, cropdna()
is convenient.
(Expected runtime: less than 1 sec.)
Source code (continued from the previous code)
fragment = cropdna(plasmid ,2000, 4000)
print(fragment)
Output
<queen.QUEEN object; project='pX330', length='2000 bp', topology='linear' >
If a start position is larger than an end position, an error message will be returned.
Source code (continued from the previous code)
fragment = cropdna(fragment, 1500, 1000)
Error message
ValueError: 'end' position must be larger than 'start' position.
-
Modify sequence end structures of
QUEEN_object
. If the topology is"circular"
, it won't work.- input:
QUEEN_object
- left:
"str"
,"str/str"
(default:None
)
Left sequence end structure ofQUEEN_object
. The following examples show how to provide this parameter. - right:
"str"
,"str/str"
(default:None
)
Right sequence end structure ofQUEEN_object
. The following examples show how to describe the parameter.
QUEEN_object
- input:
Sticky ends can be generated by trimming nucleotides where their end structures are given by top and bottom strand strings with "*"
and "-"
separated by "/"
, respectively. The letters "-"
indicate nucleotide letters to be trimmed, and the letters "*"
indicate ones to remain.
(Expected runtime: less than 1 sec.)
Source code (continued from the previous code)
fragment = cropdna(plasmid, 100, 120)
fragment.printsequence(display=True)
fragment = modifyends(fragment, "-----/*****", "**/--")
fragment.printsequence(display=True)
Output
5' CTTAACGTTGGCTTGCCACG 3'
3' GAATTGCAACCGAACGGTGC 5'
5' ----ACGTTGGCTTGCCACG 3'
3' GAATTGCAACCGAACGGT-- 5'
The following codes achieve the same manipulation.
Source code (continued from the previous code)
fragment = cropdna(plasmid,'105/100', '120/118')
fragment.printsequence(display=True)
A regex-like format can also be used.
Source code (continued from the previous code)
fragment = modifyends(fragment, "-{5}/*{5}","*{2}/-{2}")
fragment.printsequence(display=True)
If a QUEEN object with circular topology is given, an error message will be returned.
Source code (continued from the previous code)
fragment = modifyends(plasmid, "-----/*****", "**/--")
Error message
ValueError: End sequence structures cannot be modified. The topology of the QUEEN_object is circular.
modifyends()
can also add adapter sequences to DNA ends.
(Expected runtime: less than 1 sec.)
Source code (continued from the previous code)
#Add blunt-ended dsDNA sequences to both ends
fragment = cropdna(plasmid, 100, 120)
fragment = modifyends(fragment,"TACATGC","TACGATG")
fragment.printsequence(display=True)
#Add sticky-ended dsDNA sequences to both ends
fragment = cropdna(plasmid, 100, 120)
fragment = modifyends(fragment,"---ATGC/ATGTACG","TACG---/ATGCTAC")
fragment.printsequence(display=True)
Output
5' TACATGCTACAAAATACGTGACGTAGATACGATG 3'
3' ATGTACGATGTTTTATGCACTGCATCTATGCTAC 5'
5' ---ATGCTACAAAATACGTGACGTAGATACG--- 3'
3' ATGTACGATGTTTTATGCACTGCATCTATGCTAC 5'
-
Invert
QUEEN_object
.- input:
QUEEN_object
QUEEN_object
- input:
-
joindna(*inputs=*list of QUEEN objects, topology=str, compatibility=str, homology_length=int, autoflip=bool)
Assemble
QUEEN_objects
. Therefore, the connecting DNA end structures must include compatible region (i.e., only blunt ends and sequence ends including compatible sticky ends can be assembled).From QUEEN v1.1.0,
joindna
can also accept ssDNA objects as inputs. When ssdna objects are specified, it can take only two ssDNA objects.
The first one is set as the top strand and the second one is set as the bottom strand. Then, they are annealed according to the longest complementary sequence between them and return the new dsDNA object. If the assembly restores unfragmented sequences ofDNAfeature_objects
that are fragmented before the assembly and hold"qualifier:broken_feature"
attributes, the originalDNAfeature_objects
will be restored in the outputQUEEN_object
(the fragmentedDNAfeature_objects
will not be inherited). A single linearQUEEN_object
processed by this function will be circularized.-
inputs:
list
ofQUEEN_object
-
topology:
str
("linear"
or"circular"
; default:"linear"
)
Topology of the outputQUEEN_object
. -
compatibility:
str
("complete"
or"partial"
; default:"partial"
) If the value is"complete"
, the entire of connecting DNA end structures must be perfectly compatible.
Otherwise, at leasthomology_length
bases from the end of the protruding sequence must be compatible.For details, please see the following example.
Connecting DNA end sequences when the value is
"partial"
If the value is"complete"
, Sequence A and Sequence B cannot be joined because their sticky end legnths are different.
However, the value is "partial", the two sequneces can be joined, yielding Sequence C as shown below.Sequence A GGGGATGCAT CCCC------ Sequence B -----GGGG ACGTACCCC Sequence C GGGGATGCATGGGG CCCCTACGTACCCC
-
homology_length:
int
, (default: 2 ifcompatibility
=="partial"
else 0)
The minimum compatible homology length to be required in the assembly.
If the compatible end length is shorter than this value, 'joindna' operation will be interrupted and raise the error message.
However, if the connecting DNA end structures are blunt ends, this threshold value will be ignored and the QUEEN objects wil be joined. -
autoflip:
bool
, (default: True)
If this value is True and if the joining fails, the joining process is automatically redone with the flipped fragment.
If False, the fragments should be oriented corrently.
QUEEN_object
-
-
Generate a QUEEN class object for an EGFP fragment,
-
Create EcoRI sites to both ends of the EGFP fragment,
-
Digest the EGFP fragment and pX330 by EcoRI, and
-
Assemble the EGFP fragment and linearized pX330.
(Expected runtime: less than 1 sec.)
Source code (continued from the previous code)
EGFP = QUEEN(record="input/EGFP.fasta") EGFP = modifyends(EGFP, cutsite.lib["EcoRI"].seq, cutsite.lib["EcoRI"].seq) sites = EGFP.searchsequence(cutsite.lib["EcoRI"]) insert = cutdna(EGFP, *sites)[1] insert.printsequence(display=True, hide_middle=10) sites = plasmid.searchsequence(cutsite.lib["EcoRI"]) backbone = cutdna(plasmid, *sites)[0] backbone.printsequence(display=True, hide_middle=10) pEGFP = joindna(backbone, insert, topology="circular") print(plasmid) print(EGFP) print(pEGFP)
Output
5′ AATTCGGCAG...ACAAGG---- 3′ 3′ ----GCCGTC...TGTTCCTTAA 5′ 5′ AATTCCTAGA...AGTAAG---- 3′ 3′ ----GGATCT...TCATTCTTAA 5′ <queen.QUEEN object; project='pX330', length='8484 bp', topology='circular'> <queen.QUEEN object; project='EGFP', length='789 bp', topology='linear'> <queen.QUEEN object; project='pX330', length='9267 bp', topology='circular'>
If connecting DNA end structures of the input
QUEEN_object
are not compatible, an error message will be returned.
Source code (continued from the previous codeEGFP = QUEEN(record="input/EGFP.fasta") EGFP = modifyends(EGFP, cutsite.lib["BamHI"].seq, cutsite.lib["BamHI"].seq) sites = EGFP.searchsequence(cutsite.lib["BamHI"]) insert = cutdna(EGFP, *sites)[1] insert.printsequence(display=True, hide_middle=10)/ pEGFP = joindna(backbone, insert, topology="circular")
Error message
ValueError: The QUEEN_objects cannot be joined due to the end structure incompatibility.
pX330 serves as a standard gRNA expression backbone plasmid. A gRNA spacer can simply be cloned into a BbsI-digested destination site of pX330 as follows:
-
Generate QUEEN object for a sticky-ended gRNA spacer dsDNA,
-
Digest pX330 by BbsI, and
-
Assemble the spacer with the BbsI-digested pX330.
(Expected runtime: less than 1 sec.)
Source code (continued from the previous code)
gRNA_top = QUEEN(seq="CACCGACCATTGTTCAATATCGTCC", ssdna=True) gRNA_bottom = QUEEN(seq="AAACGGACGATATTGAACAATGGTC", ssdna=True) gRNA = joindna(gRNA_top, gRNA_bottom, supfeature={"feature_id":"gRNA-1", "feature_type":"gRNA", "qualifier:label":"gRNA"}) gRNA.printsequence(display=True) sites = plasmid.searchsequence(cutsite.lib["BbsI"]) fragments = cutdna(plasmid, *sites) backbone = fragments[0] if len(fragments[0].seq) > len(fragments[1].seq) else fragment[1] pgRNA = joindna(gRNA, backbone, topology="circular", product="pgRNA") pgRNA.printfeature() print(backbone) print(insert) print(pgRNA)
Output
5' CACCGACCATTGTTCAATATCGTCC---- 3' 3' ----CTGGTAACAAGTTATAGCAGGCAAA 5' feature_id feature_type qualifier:label start end strand 0 primer_bind hU6-F 0 21 + 100 promoter U6 promoter 0 241 + 200 source source 0 249 + 300 primer_bind LKO.1 5' 171 191 + gRNA-1 gRNA gRNA 245 274 + 500 misc_RNA gRNA scaffold 270 346 + 600 source source 270 8487 + 700 enhancer CMV enhancer 442 728 + 800 intron hybrid intron 986 1214 + 900 regulatory Kozak sequence 1225 1235 + 1000 CDS 3xFLAG 1234 1300 + 1100 CDS SV40 NLS 1306 1327 + 1200 CDS Cas9 1351 5452 + 1300 CDS nucleoplasmin NLS 5452 5500 + 1400 primer_bind BGH-rev 5527 5545 - 1500 polyA_signal bGH poly(A) signal 5533 5741 + 1600 repeat_region AAV2 ITR 5749 5879 + 1700 repeat_region AAV2 ITR 5749 5890 + 1800 rep_origin f1 ori 5964 6420 + 1900 primer_bind F1ori-R 6051 6071 - 2000 primer_bind F1ori-F 6261 6283 + 2100 primer_bind pRS-marker 6436 6456 - 2200 primer_bind pGEX 3' 6555 6578 + 2300 primer_bind pBRforEco 6615 6634 - 2400 promoter AmpR promoter 6701 6806 + 2500 CDS AmpR 6806 7667 + 2600 primer_bind Amp-R 7024 7044 - 2700 rep_origin ori 7837 8426 + 2800 primer_bind pBR322ori-F 8326 8346 + <queen.QUEEN object; project='pX330_26', length='8466 bp', topology='linear'> <queen.QUEEN object; project='EGFP_2', length='787 bp', topology='linear'> <queen.QUEEN object; project='pgRNA', length='8487 bp', topology='circular'>
-
Search for the ampicillin-resistant gene in pX330,
-
Cut pX330 with start and end positions of the ampicillin-resistant gene,
-
Flip the ampicillin-resistant gene fragment, and
-
Join it with the other fragment.
(Expected runtime: less than 1 sec.)
Source code (continued from the previous code)
site = plasmid.searchfeature(query="^AmpR$")[0] fragments = cutdna(plasmid, site.start, site.end) fragments[0] = flipdna(fragments[0]) new_plasmid = joindna(*fragments, topology="circular") plasmid.printfeature(plasmid.searchfeature(query="^AmpR$")) new_plasmid.printfeature(new_plasmid.searchfeature(query="^AmpR$"))
Output
feature_id feature_type qualifiers:label start end strand 2300 CDS AmpR 6803 7664 + feature_id feature_type qualifiers:label start end strand 2400 CDS AmpR 6803 7664 -
-
editsequence(input=QUEEN object, source_sequence=regex or str, destination_sequence=str, start=int, end=int, strand=int)
Edit sequence of
QUEEN_object
by searching target sequence fragments matched to asource_sequence
and replacing each of them with adestination_sequence
. AllDNAfeature_objects
located on the edited sequence regions will be given the"qualifier:broken-feature"
attribute. In any sequence edit that confers change in the sequence length of theQUEEN object
, the coordinates of all affectedDNAfeature_objects
will be adjusted. This is the parental function ofsearchsequence()
. Ifdestination_sequence
is not provided, it works just assearchsequence()
.-
input:
QUEEN object
-
source_sequence:
regex
orstr
(default:".+"
)
Source sequence(s) to be replaced. If the value is not provided, the entireQUEEN_object
sequence will be replaced with adestination_sequence
. It allows fuzzy matching and regular expression. For details, see https://pypi.org/project/regex/. All IUPAC nucleotide symbols can also be used. Substrings of theregex
value can be isolated by enclosing them in parentheses. Each pair of parentheses is indexed sequentially by numbers from left to right. Isolated substrings can be replaced at once by providing adestination_sequence
where each substring replacement is designated, referring to the index numbers. For details, see https://docs.python.org/3/library/re.html#re.sub -
destination_sequence:
str
(default:None
)
Destination sequence. -
start:
int
(zero-based indexing; default:0
)
Start position of the target range of theQUEEN_object
sequence to be searched for the replacement. -
end:int (zero-based indexing; default: the last sequence position of
QUEEN_object
)
End position of the target range of theQUEEN_object
sequence to be searched for the replacement. -
strand:
int
:1
(top strand only),-1
(bottom strand only), or2
(both strands) (default:2
)
Sequence strand to be searched for the replacement.If
destination_sequence
is not provided, it will act assearchsequence()
and return alist
ofDNAFeature_objects
. Otherwise,QUEEN_object
.
-
An EGFP sequence insertion to the EcoRI site demonstrated in Example code17 can be described with a simpler code using editsequence()
.
(Expected runtime: less than 1 sec.)
Source code (continued from the previous code)
EGFP = QUEEN(record="input/EGFP.fasta")
pEGFP = editsequence(plasmid, "({})".format(cutsite.lib["EcoRI"].seq), r"\1{}\1".format(EGFP.seq))
print(plasmid)
print(pEGFP)
Output
<queen.QUEEN object; project='pX330', length='8484 bp', topology='circular'>
<queen.QUEEN object; project='pX330', length='9267 bp', topology='linear'>
-
editfeature(input=QUEEN_object, key_attribute=str, query=regex or str, source=list of DNAfeature_objects, start=int, end=int, strand=int, target_attribute=str, operation=function, quine=bool, new_copy=bool)
Search for
DNAfeature_objects
holding aquery
value in a designatedkey_attribute
and edit atarget_attribute
of the sameDNAfeature_objects
with one of the three operations:removeattribute
,replaceattribute
, orcreateattribute
. This is the parental function ofsearchfeature()
. Iftarget_attribute
is not provided, it works just assearchfeature()
.-
input:
QUEEN object
-
key_attribute:
str
(default:"all"
)
Attribute type to be searched (feature_id
,feature_type
,"qualifier:*"
, orsequence
). If the value is not provided, it will be applied to all of the attributes in theQUEEN_object
, excludingsequence
. However, if thequery
value is provided with only the four nucleotide letters (A, T, G, and C), this value will be automatically set tosequence
. -
query:
regex
orstr
(default:".+"
)
Query term.DNAfeature_objects
that have a value matches to the query value forkey_attribute
designated above will be subjected to the edit. It allows fuzzy matching and regular expression. For details, see https://pypi.org/project/regex/. If thekey_attribute
issequence
, all IUPAC nucleotide symbols can be used. -
source:
list
ofDNAfeature_objects
(default:QUEEN_object.dnafeatures
)
SourceDNAfeature_objects
to be searched for the editing.DNAfeature_objects
outside the search range defined bystart
,end
, andstrand
will be removed from the source. AnyDNAfeature_objects
can be provided here. For example, a list ofDNAfeature_objects
returned fromsearchsequence()
orsearchfeature()
operation can be used as the source. -
start:
int
(zero-based indexing; default:0
)
Start position of the target range of theQUEEN_object
sequence for the editing. -
end:
int
(zero-based indexing; default: the last sequence position ofQUEEN_object
)
End position of the target range of theQUEEN_object
sequence for the editing. -
strand:
int
:1
(top strand only),-1
(bottom strand only), or2
(both strands) (default:2
)
Sequence strand to be searched. -
target_attribute:
str
(default:None
)
Attribute type of the targetDNAfeature_objects
to be edited (feature_id
,feature_type
,"qualifier:*"
,strand
,start
,end
orsequence
). If the value is not provided, this will work just assearchfeature()
. -
operation:
removeattribute()
,createattribute(value="str")
orreplaceattribute(source_value=regex or str, destination_value=str or int)
(default:None
)
If the operation is not specified, this will work just assearchfeature()
.removeattribute()
: This removestarget_attribute
from the targetDNAfeature_objects
but only forfeature_id
or"qualifier:*"
. Iftarget_attribute
isfeature_id
, the entireDNAfeature_objects
will be erased from theQUEEN_object
.createattribute(value="str")
: This creates or overwrites target_attributes of the targetDNAfeature_objects
with"str"
. Iftarget_attribute
isfeature_id
and there is no existingDNAfeature_object
with the samefeature_id
of"str"
, it will create the newDNAfeature_object
in theQUEEN_object.dnafeatures
. If the search byDNAfeature_objects
determines multipleDNAfeature_objects
to be created, eachfeature_id
of the newDNAfeature_objects
is generated as"str-number"
, wherenumbers
follow the order they were searched. If the samefeature_id
of"str"
already exists in the operatingQUEEN_object.dnafeatures
, theDNAfeature_object
will be generated with thefeature_id="str-number"
. Iftarget_attribute
is"qualifier:*"
, the qualifier whose value is"str"
will be added into the.qualifiers
of the targetDNAfeature_object
as long as it does not overlap with the existing.qualifiers
.replaceattribute(source_value=regex or str, destination_value=str or int)
: This will search for substrings in values of the target_attributes of the targetDNAfeature_object
that match to thesource_value
and replace them with thedestination_value
. Similar toeditsequence()
, substrings of theregex
value can be isolated by enclosing them in parentheses. Each pair of parentheses is indexed sequentially by numbers from left to right. Isolated substrings can be replaced at once by providing adestination_sequence
where each substring replacement is designated, referring to the index numbers. For details, see https://docs.python.org/3/library/re.html#re.sub. If thetarget_attribute
issequence
, the sequences corresponding to the targetDNAfeature_object
can be modified likeeditsequence()
. When thesource_value
is not provided, the entire data value will be replaced with thedestination value
. If thetarget_attribute
isfeature_id
, the replacement will be operated only when no conflict with the existingDNAfeature_object
. Iftarget_attribute
isstart
,end
, orstrand
, nosource_value
is required, and thedestination_value
must beint
.
-
new_copy:
bool
(default:True
) IfTrue
, it will first generate a copy of theQUEEN_object
and edit it. Otherwise, the originalQUEEN_object
will be edited directly (Note that this mode does not record the operation process into the building history).If
operation
or target_attribute is not specified, it will act assearchfeature()
and return alist
ofDNAFeature_objects
If new_copy is True,QUEEN_object
, otherwiseNone
.
-
(Expected runtime: less than 1 sec.)
Source code (continued from the previous code)
new_plasmid = editfeature(plasmid, key_attribute="feature_type", query="CDS",
strand=1, target_attribute="sequence", operation=replaceattribute(r"(.+)", r"AAAAA\1"))
for feat in new_plasmid.searchfeature(key_attribute="feature_type", query="CDS", strand=1):
print(feat.start, feat.end, new_plasmid.printsequence(feat.start, feat.start+20, strand=1), feat.qualifiers["label"][0], sep="\t")
Output
1231 1302 AAAAAGACTATAAGGACCAC 3xFLAG
1308 1334 AAAAACCAAAGAAGAAGCGG SV40 NLS
1358 5464 AAAAAGACAAGAAGTACAGC Cas9
5464 5517 AAAAAAAAAGGCCGGCGGCC nucleoplasmin NLS
6823 7689 AAAAAATGAGTATTCAACAT AmpR
(Expected runtime: less than 1 sec.)
Source code (continued from the previous code)
new_plasmid = editfeature(plasmid, key_attribute="feature_type", query="CDS",
target_attribute="feature_type", operation=replaceattribute("gene"))
new_plasmid.printfeature()
Output
feature_id feature_type qualifier:label start end strand
1 source null 0 8484 +
100 promoter U6 promoter 0 241 +
200 primer_bind hU6-F 0 21 +
300 primer_bind LKO.1 5' 171 191 +
400 misc_RNA gRNA scaffold 267 343 +
500 enhancer CMV enhancer 439 725 +
600 intron hybrid intron 983 1211 +
700 regulatory null 1222 1232 +
800 gene 3xFLAG 1231 1297 +
900 gene SV40 NLS 1303 1324 +
1000 gene Cas9 1348 5449 +
1100 gene nucleoplasmin NLS 5449 5497 +
1200 primer_bind BGH-rev 5524 5542 -
1300 polyA_signal bGH poly(A) signal 5530 5738 +
1400 repeat_region AAV2 ITR 5746 5887 +
1500 repeat_region AAV2 ITR 5746 5876 +
1600 rep_origin f1 ori 5961 6417 +
1700 primer_bind F1ori-R 6048 6068 -
1800 primer_bind F1ori-F 6258 6280 +
1900 primer_bind pRS-marker 6433 6453 -
2000 primer_bind pGEX 3' 6552 6575 +
2100 primer_bind pBRforEco 6612 6631 -
2200 promoter AmpR promoter 6698 6803 +
2300 gene AmpR 6803 7664 +
2400 primer_bind Amp-R 7021 7041 -
2500 rep_origin ori 7834 8423 +
2600 primer_bind pBR322ori-F 8323 8343 +
-
Search for all of the single restriction enzyme cutters in pX330 using the library of restriction enzymes listed on the website of NEW England Biolabs.
-
Add the single cutter annotations to pX330.
(Expected runtime: less than 1 sec.)
Source code (continued from the previous code)
unique_cutters = [] for key, re in cutsite.lib.items(): sites = plasmid.searchsequence(re.cutsite) if len(sites) == 1: unique_cutters.append(sites[0]) else: pass new_plasmid = editfeature(plasmid, source=unique_cutters, target_attribute="feature_id", operation=createattribute("RE")) new_plasmid = editfeature(new_plasmid, key_attribute="feature_id", query="RE", target_attribute="feature_type", operation=replaceattribute("misc_bind")) features = new_plasmid.searchfeature(key_attribute="feature_type", query="misc_bind") new_plasmid.printfeature(features, seq=True)
Output
RE-1 misc_bind Acc65I 433 439 + GGTACC RE-2 misc_bind AgeI 1216 1222 + ACCGGT RE-3 misc_bind ApaI 2700 2706 + GGGCCC RE-4 misc_bind BglII 1595 1601 + AGATCT RE-5 misc_bind BsaBI 4839 4849 + GATCACCATC RE-6 misc_bind BseRI 1098 1104 - GAGGAG RE-7 misc_bind BsmI 4979 4985 + GAATGC RE-8 misc_bind CspCI 4127 4139 + CAAAGCACGTGG RE-9 misc_bind EcoRI 5500 5506 + GAATTC RE-10 misc_bind EcoRV 3196 3202 + GATATC RE-11 misc_bind FseI 5472 5480 + GGCCGGCC RE-12 misc_bind FspI 7365 7371 + TGCGCA RE-13 misc_bind KasI 5887 5893 + GGCGCC RE-16 misc_bind NotI 5738 5746 + GCGGCCGC RE-17 misc_bind PaqCI 1184 1191 + CACCTGC RE-19 misc_bind PmlI 4132 4138 + CACGTG RE-20 misc_bind PsiI 6317 6323 + TTATAA RE-22 misc_bind PvuI 7218 7224 + CGATCG RE-23 misc_bind SacII 7522 7528 + CCGCGG RE-24 misc_bind SbfI 5879 5887 + CCTGCAGG RE-26 misc_bind SnaBI 698 704 + TACGTA RE-27 misc_bind XbaI 427 433 + TCTAGA
DNA construction process achieved by QUEEN()
for genearating QUEEN object, the search functions searchsequence()
and searchfeature()
, operational functions cutdna()
, cropdna()
, modifyends()
, flipdna()
, and joindna()
and super functions editsequence()
and editfeature()
described up to here can progressively be recorded into the manipulating QUEEN object, which enables to generate a quine code that replicates the same QUEEN object by the quine()
function described below. From here, we call these functions "quinable" functions.
In addition to the parameters and options described above for the quinable functions, all of them can commonly take the five parameters.
The process_name
, process_description
, and product
, that enable annotation and structured visualization of the construction process (see below). The three optional parameters do not affect the behavior of the quinable functions.
Then, from ver 1.1, the additional two common parameters quianable
and supfeature
are added (see below)
-
process_name (or pn):
str
(default:""
) This option enables users to provide label names for process flow groups. An experimental flow composed of sequential operations by quinable functions can be grouped and labeled with a user-defined name by providing the same name to the quinable function operations belonging to the same target group. Such group labels can be, for example,"PCR 1"
,"EcoRI digestion"
,"Gibson Assembly"
, etc.visualizeflow()
described below takes into account the group information to generate experimental flow maps fromQUEEN_objects
. -
process_description (or pd):
str
(default:""
) Similar toprocess_name
, this option enables users to provide narrative descriptions of operations conferred by quinable functions. This enables the generation of the whole "Materials and Methods" description for a DNA construction process along with its DNA construction flow from aQUEEN_object
(or a QUEEN-generated GenBank file) using thequine()
function described below. -
product:
str
(default:""
) This option enables users to provide label names for producingQUEEN_objects
. The provided labels are stored inQUEEN_objects.project
. -
supfeature:
dict
,list
ofdict
,list
oflist
ofdict
This option can be acceptable by onlyQUEEN()
and basic operational fuctionscutdna()
,cropdna()
,modifyends()
,flipdna()
andjoindna()
. Adict
object is composed of key-value pairs of the attributes in a DNAfeature object. The DNAfeature object generated based on the dictionary value would be added in the.dnafeatures
of a newly generated QUEEN object.
When adding multiple DNAfeature objects, the value shoud be specified aslist
ofdict
. However, forcutdna()
, the value should be specified aslist
oflist
ofdict
.
The following attributes have default values, so if they are not specified in adict
object, the values would be set with the default values.feature_id
:str
, (default: Random unique ID which is not used in.dnafeatures
of the QUEEN object)feature_type
:str
(default:"misc_feature"
)start
:int
(default: 0)end
:int
(default: length of theQUEEN_object
sequence)strand
:int
(-1, 0 or 1, default: 1)
In "Example code 18", the use ofsupfeature
parameter is demonstrated.
-
quinable:
bool
(True
orFalse
; default:True
) IfFalse
, the operational process will not be recorded into the building history.
Generate "quine code" of QUEEN_object
that produces the same QUEEN_object
. A quine code can be executed as a Python script.
-
input:
QUEEN_object
-
output:
str
(default:STDOUT
)
Output file name. -
process_description:
bool
(default:False
)
IfTrue
, this will output the process_descriptions registered to quinable operations along the process flows instead of generating the quine code. The output can be used for the "Materials and methods" of theQUEEN_object
construction process. -
execution:
bool
(default:False
)
IfTrue
, this will reconstruct theQUEEN_object
by generating and executing its quine code and confirm if the reconstructedQUEEN_object
is identical to the original one. Ifexecution
isTrue
andoutput
isNone
, the quine code will be output into a temporary file instead ofSTDOUT
; the temporary file will be removed after the operation. The execution won't happen ifprocess_description
isTrue
.if
execution
isFalse
,None
. Ifexecution
isTrue
,True
if the reconstructedQUEEN_object
is identical to the original one. Otherwise,False
.
The Target-AID plasmid (pCMV-Target-AID) was constructed by assembling two fragments encoding the N- and C-terminus halves of Target-AID, which were both amplified from pcDNA3.1_pCMV-nCas-PmCDA1-ugi pH1-gRNA(HPRT) (Addgene 79620) using primer pairs RS045/HM129 and HM128/RS046, respectively, with a backbone fragment amplified from pCMV-ABE7.10 using RS047/RS048. The construction process was simulated by using quinable functions, and the GenBank file was generated. The quine code generated from the GenBank file by quine()
successfully reconstructed the same GenBank file. The Python scripts for the following Example codes 24-27 can be found in "./demo/tutorial_ex24-28.ipynb"
.
(Expected runtime: less than 1 sec.)
Source code
from QUEEN.queen import *
︙(ommitted)
pCMV_Target_AID = QUEEN(record="./output/pCMV-Target-AID.gbk")
quine(pCMV_Target_AID, output="./output/pCMV-Target-AID_clone.py")
Shell commands
%python3 ./output/pCMV_Target_AID_clone.py > ./output/clone_pCMV-Target-AID.gbk
%diff -s ./output/pCMV_Target_AID.gbk ./output/clone_pCMV_Target_AID.gbk
Output
Files ./output/clone_pCMV-Target-AID.gbk and ./output/pCMV-Target-AID.gbk are identical.
If a QUEEN_object
is loaded from a QUEEN-generated GenBank file for a new DNA construction, the quine code of the original QUEEN_object
will be inherited into the newly producing QUEEN_object
. The following example demonstrates that a QUEEN_object
representing a DNA fragment cropped from the QUEEN_object
of pCMV-Target-AID holds not only the process history of the cropping but also the whole previous construction process of pCMV-Target-AID.
(Expected runtime: less than 1 sec.)
Source code (continued from the previous code)
description = "Extract a fragment spanning from 8,000 nt to 2,000 nt of pCMV-Target-AID"
cropdna(pCMV_Target_AID, 8000, 2000, product="fragment", process_description=description)
quine(fragment)
Output (quine code generated from the "fragment" product)
︙(ommitted)
description5 = 'Extract a fragment spanning from 8,000 nt to 2,000 nt of pCMV-Target-AID'
cropdna(QUEEN.dna_dict['pCMV-BE4max_8'], start='8000/8000', end='2000/2000', project='pCMV-BE4max', product='fragment', process_description=description5)
There is an option import_history=False
prepared for QUEEN()
to disable the inheritance of operational process histories of previously generated QUEEN_objects
to a newly producing QUEEN_object
.
quine()
will provide each quinable process in a quine code with a unique process identifier in the process_id option, like "process_id=QUEEN_object.project–XXXXXXXXXXXXXXXXXXXXXXXX"
, where "Xs"
represents md5() transformation of the quinable operation excluding the process_id
and original_ids
(described below). This process_id
serves as a checksum to validate if any modification is provided to the operation code. Therefore, when a new QUEEN script is created by editing a quine code generated from an existing QUEEN_object
or combining different process parts from multiple quine codes, the newly generating QUEEN_object
will hold these previous process_ids
. These process_ids
will be passed over to a list original_ids
of the corresponding new operation when a new quine code is generated from the new QUEEN_object
. Hence, editing histories of quine codes and their inheritances can also be tracked and stored in QUEEN_objects
.
By default, QUEEN cannot track and record user-defined variable names of QUEEN_objects
used in the original code. Therefore, the .project
value of each QUEEN_object
is used as its variable name when a quine code is generated. To generate a quine code with user-defined variable names for each operational step, the QUEEN_object
needs to be generated with the Python command set_namespace(globals())
executed. This enables providing variable names of producing objects as arguments of their operational functions and, therefore, the recovery of variable names in quine codes.
For example, Example code 19 can be written in this format as follows.
(Expected runtime: less than 1 sec.)
Original code
import sys
from QUEEN.queen import *
set_namespace(globals())
QUEEN(record="input/pX330.gbk", product="plasmid")
plasmid.searchfeature(query="^AmpR$", product="sites")
cutdna(plasmid, sites[0].start, sites[0].end, product="fragments")
flipdna(fragments[0], product="fragments0_rc")
joindna(fragments0_rc, fragments[1], topology="circular", product="new_plasmid")
Quine code
import sys
sys.path.append("/content/colab")
from QUEEN.queen import *
from QUEEN import cutsite as cs
set_namespace(globals())
QUEEN(record='input/pX330.gbk', product='plasmid', process_id='new_plasmid-9WZX2KEVGD9NVBR4DSWNBCSND320', original_ids=[])
plasmid.searchfeature(key_attribute='all', query='^AmpR$', product='sites', process_id='new_plasmid-52D5THPBJ8G961IHBTPEGI4JH321', original_ids=[])
cutdna(plasmid, sites[0].start, sites[0].end, product='fragments', process_id='new_plasmid-2XUXT7UUAIIY5UFUMC8TBPOHC322', original_ids=[])
flipdna(fragments[0], product='fragments0_rc', process_id='new_plasmid-30A7468VURMAE1DIOMS3A7JJP324', original_ids=[])
joindna(*[fragments0_rc, fragments[1]], topology='circular', product='new_plasmid', process_id='new_plasmid-649L08K2L92IMQ1SY8NDKX76B325', original_ids=[])
QUEEN provides the following visualization functions.
-
visualizemap(input=QUEEN_object, map_view=str, feature_list=list, start=int, end=int, width_scale=float, height_scale=float, label_location=str, linebreak=int, seq=bool, diameter=float)
Generate annotated sequence map of
QUEEN_object
with selectedDNAfeature_objects
. Each feature annotation label is retrieved from the"qualifier:label"
attribute. All feature annotations and their label Locations of feature annotation labels are automatically adjusted to prevent overlaps on the sequence map. The face color and edge color of each feature annotation are also automatically assigned from the default colormap. However, they can be determined by"qualifier:edgecolor_queen"
and"qualifier:facecolor_queen"
attributes ofDNAfeature_objects
.-
input:
QUEEN_object
-
map_view:
str
("linear"
or"circular"
; default:"linear"
)
Visualization style. -
feature_list:
list
ofDNAfeaure_objects
(default:QUEEN_object.dnafeatures
excluding those with the feature type"source"
)
DNAfeature_objects
to be displayed on the sequence map. -
fontsize:
int
(default:12
for"circular"
map and10
for"linear"
map)
Common font size. Separate font sizes can also be defined for differentDNAfeaure_objects
by editing the"qualifier:fontsize_queen"
attribute, which overrides the common font size. -
labelcolor:
str
(default:"black"
) Common font color for all feature labels. Separate font colors can also be defined for differentDNAfeaure_objects
by editing the"qualifier:labelcolor_queen"
attribute, which overrides the common font color. -
display_label:
0
,1
or2
(default:2
)
If2
, all of the labels will be displayed. If1
, only the feature labels that can fit inside the object boxes will be displayed. If0
, feature labels won't be displayed. -
tick_interval:
int
(default:None
) Tick interval of sequence map (base pairs).\ -
display_axis:
bool
(default:True
) -
title:
str
(default:QUEEN_object.project
) -
start:
int
(zero-based indexing; default:0
)
Start position of theQUEEN_object
sequence to be displayed. -
end:
int
(zero-based indexing; default: the last sequence position ofQUEEN_object
)
End position of theQUEEN_object
sequence to be displayed. -
width_scale:
float
(default: Please see the following description.)
Scaling factor for the width of the sequence map. Default value is 1.0 if the dna length > 4000, 4.0 if the dna length > 1000, 10 if the dna length > 500 else 20. However, ifseq
is True, the value is 40. -
height_scale:
float
(default: 1.0)
Scaling factor for the height of the sequence map. -
label_location:
str
(default:"either"
whenseq
isFalse
, otherwise"top"
)
Feature label locations. Each feature label is generally placed inside the object box. However, if a feature label is larger than the object box, the label will be put outside. If this value is"either"
, labels will be put below or above the object boxes, whichever is available. If this value is"top"
, labels will be put above the object boxes. Ifseq
isTrue
, the value must be set to"top"
. -
linebreak:
int
orNone
(default: Length of theQUEEN_object
sequence)
Sequence length for line break. -
seq:
bool
(default:False
)
WhenTrue
, a color map representing theQUEEN_object
sequence will be displayed below the sequence map. -
rcseq:
bool
(default:False
)
WhenTrue
, a color map representing the revese complement sequence ofQUEEN_object
will be displayed below the sequence map. -
diameter_scale Scaling factor for the diameter of the sequence map.
if you installed patchworklib
patchworklib.Bricks object
Otherwise,matplolib.pyplot.figure object
(Expected runtime: less than a few min.)
Source code (continued from the example code 24)
fig_a = visualizemap(fragment1, title="fragment-1") fig_b = visualizemap(fragment2, title="fragment-2") fig_c = visualizemap(fragment3, linebreak=120, seq=True, title="fragment-3") features = pCMV_Target_AID.searchfeature(key_attribute="feature_type", query="^(?!.*primer).*$") fig_d = visualizemap(pCMV_Target_AID, feature_list=features, map_view="circular", tick_interval=1000, title="pCMV-Target-AID")
Output figures
-
-
visualizeflow(*input=*list of QUEEN_objects, search_function=bool, grouping=bool, inherited_process=bool, process_description=bool, alias_dict=dict)
Generate flow charts representing construction processes of
QUEEN_objects
with four different types of nodes: file shape, round, uncolored rectangle, and colored box nodes, representing input gbk files,QUEEN_objects
,DNAfeature_objects
, and quinable functions, respectively.- input:
list
ofQUEEN_objects
- search_function (or sf):
bool
(default:True
)
IfTrue
, the generating flow charts will display all quinable processes involved to produce theQUEEN_objects
. Otherwise, operations by the search functions will be omitted in the visualization. - grouping:
bool
(default:True
)
IfTrue
, the operations that has a sameprocess_name
will be grouped by a parental box. - inherited_process (or ip) :
bool
(default:False
)
IfTrue
, the construction process of previousQUEEN_objects
inherited in the presentQUEEN_object
construction will also be displayed. - process_description (or pd):
bool
(default:False
)
IfTrue
, bothprocess_names
andprocess_descriptiosn
will be displayed on the top left of the operational object box nodes. IfFalse
, onlyprocess_names
will be displayed on the top center of the operational object box nodes. However, Ifgrouping
isFalse
, none of them will be displayed. - alias_dict:
dict
(default:None
)
Alias name dictionary forQUEEN_objects.project
names. This will display the alias names instead ofQUEEN_objects.project
.
- input:
graphviz.dot.Digraph object
(Expected runtime: less than 1 sec.)
Source code (continued from the example code 24)
graph_a = visualizeflow(pCMV_Target_AID, sf=False,ip=True, grouping=False)
graph_b = visualizeflow(pCMV_Target_AID, sf=True, ip=True, grouping=False)
graph_c = visualizeflow(pCMV_Target_AID, sf=True, ip=True, grouping=True, pd=True)
graph_d = visualizeflow(pCMV_Target_AID) #default setting
graph1.render("pCMV-Target-AID_flow1")
graph2.render("pCMV-Target-AID_flow2")
graph3.render("pCMV-Target-AID_flow3")
graph4.render("pCMV-Target-AID_flow4")
Output figures