Objects whose values can be changed during the course of program execution are called mutable. Objects whose value is unchangeable after they've been created are immutable.
Each object in Python has 3 characteristics:
- Type
- value
- identity
Most objects has either data or methods associated with them, they are called attributes. Instance is one occurrence of an object.
Sample module interaction:
import math
from math import pi
math.sqrt(math.pi / 2)
import math
import numpy as np
math.sqrt(2)
np.sqrt([2,3,4])
math and np are different Namespaces.
name = "Amy"
type(name)
dir(name) # to get all available methods
dir(str) # can use a type
help(name.upper)
Three numerical types are available in Python - int, float, complex. Int
have unlimited precision.
import random
random.choice(['aa','bb','cc'])
OR
, AND
and NOT
are the only bool operatinos.
There are total of 8 comparison operations.
2 < 4 # True
2 <= 2 # True
2 == 2 # True
3 != 3 # False
[2,3] == [3,2] # False
[2,3] is [2,3] # False
[2,3] is not [2,3] # True
2.0 == 2 # True. 2nd operand will be converted to float => 2.0
There are three basic Sequences: list
, tuple
, range
.
s = (1,2,3,4,5,6,7,8)
s[0] # = 1
s[1] # = 2
s[-1] # = 8, very last objects
Sequence supports slising.
x = s[0:2] # x is (1,2)
y = s[:5] # y is (1,2,3,4,5)
z = s[5:] # z is (4,5,6,7,8)
Lists are mutable Sequences.
Tuples are immutable Sequences.
x = 11.23
y = 12.31
coordinate = (x,y) # <-- this is called tuple packing
type(coordinate) # returns 'tuple'
(c1, c2) = coordinate # <-- and this is tuple unpacking
coordinates = [(0,0),(1,1),(2,4),(3,9),(4,16),(5,25),(6,36),(7,49),(8,64)]
for (x,y) in coordinates:
print(x,y)
Ranges are immutable Sequences often used in for
loops.
range(5) # creates range(0,5)
list(range(5)) # returns a list [0, 1, 2, 3, 4]
list(range(1,6)) # returns a list [1, 2, 3, 4, 5]
list(range(1,13,2)) # returns a list [1, 3, 5, 7, 9, 11]
Using ranges saves a lot of memory.
s = "abc"
3*s # creates s + s +s string --> "abcabcabc"
Sets are unordered collections of distinct hashable objects.
There is a set
, and then there is a frozen set
(yes, it is immutable).
Sets are especially useful for keeping track of distinct objects.
ids = set(list(range(1,10)))
ids # prints {1, 2, 3, 4, 5, 6, 7, 8, 9}
ids.pop() # pops "1" from the set
ids.add(10) # adds "10" object to the set
males = set([1, 3, 4, 6])
females = ids = males
females # prints {2, 5, 7, 8, 9, 10}
everyone = males | females # "|" means UNION for Sets
everyone & set([-1, -2, 3, 5, 11]) # returns {3, 5}
word = "antidisestablcihmentarianism"
letters = set(word)
len(letters) # gives us a number of unique letters in a string
Key - Value pairs, where key
is immutable, and value
is anything.
Dictionaries can be used to perform very fast lookups on unordered data.
age = {} # empty Dictionary
age = dict() # same effect
age = {"Tim": 29, "Tom": 31, "Ally": 18}
age["Ally"] # prints 18
age["Tim"] += 1
names = age.keys
type(names) # returns "dict_keys"
Type checking is performed during runtime.
x = 3
# 1) Python creates an object '3'
# 2) Variable named 'x' created
# 3) Reference from 'x' to '3' created
How dynamic typing works with immutable objects:
x = 3 # immutable object '3' created, 'x' created and linked to '3'
y = x # 'y' created and linked to '3', as variable name cannot reference other name
y = y - 1 # '2' (as 3-1 = 2) created and 'y' is linked to '2'
print(x) # '3' is printed
print(y) # '2' is printed
How dynamic typing works with mutable objects:
L1 = [2, 3, 4]
L2 = L1
L1[0] = 24
print(L1) # [24, 3, 4]
print(L2) # [24, 3, 4]
L2 is L1 # True
Scope Rules application acronym: L ocal E nclosing Function G lobal B uilt-in
def update():
x.append(1)
x = [1, 1]
update()
x # [1, 1, 1]
This is so called side effect, generally you should avoid this style of programming. It makes errors difficult to find.
def update(n, x):
n = 2
x.append(4)
print('update: ', n, x)
def main():
n = 1
x = [0, 1, 2, 3]
print('main: ', n, x)
update(n, x)
print('main: ', n, x)
main()
# main: 1, [1, 2, 3]
# update: 2, [1, 2, 3, 4]
# main: 1, [1, 2, 3, 4]
Inheritance
class MyList(list):
def remove_min(self):
self.remove(min(self))
def remove_max(self):
self.remove(max(self))
x = [10, 3, 5, 1, 2, 7, 6, 4, 8]
y = MyList(x)
y.remove_min() # y is [10, 3, 5, 2, 7, 6, 4, 8]
y.remove_max() # y is [3, 5, 2, 7, 6, 4, 8]
NumPy arrays used for representing vectors and matrices. In oppose to standard Python arrays they have size that is fixed on creation.
import numpy as np
zero_vector = np.zeros(5)
zero_matrix = np.zeros((5,3))
x = np.array([1, 2, 3])
y = np.array([2, 4, 6])
Z = np.array([[1, 3, 5], [5, 9, 4]])
Z.transpose()
import numpy as np
x = np.array([1, 2, 3])
y = np.array([2, 4, 6])
X = np.array([[1, 2, 3], [4, 5, 6]])
X = np.array([[2, 4, 6], [8, 10, 12]])
x[0:2] # array([1, 2])
z = x + y # z is [3, 6, 9]
X[:, 1] # array([2, 5])
X[:, 1] + Y[:, 1] # array([6, 15])
X[1, :] # array([4, 5, 6])
X[1, :] + Y[1, :] # array([12, 15, 18])
X[1] # array([4, 5, 6])
NumPy arrays can also be indexed with other arrays or other sequence-like objects.
import numpy as np
z1 = np.array([1, 3, 5, 7, 9])
z2 = z1 + 1 # array([2, 4, 6, 8, 10])
ind = [0, 2, 3]
z1[ind] # array([1, 5, 7])
ind = np.array([0, 2, 3])
z1[ind] # SAME RESULT: array([1, 5, 7])
# Boolean OR logical array can be used:
z1 > 6 # array([False, False, False, True, True], dtype = bool)
z1[z1 > 6] # array([7, 9])
z2[z1 > 6] # array([8, 10])
When you slice an array using the colon
operator, you get a view of an object.
If you modify it the original array is also modified. This is in contrast with what happens when you index an array,
in which case what is returned to you is a copy of the original data.
import numpy as np
z1 = np.array([1, 3, 5, 7, 9])
w = z1[0:3] # w is array([1, 3, 5])
w[0] = 3 # w is array([3, 3, 5])
z1 # array([3, 3, 5, 7, 9])
z1 = np.array([1, 3, 5, 7, 9])
ind = [0, 1, 2]
w = z1[ind] # array([1 , 3, 5])
w[0] = 3 # w is array([3, 3, 5])
z1 # array([1, 3, 5, 7, 9])
import numpy as np
np.linspace(start = 0, stop = 100, num = 10) # array([0., 11,11111111, ..., 88,88888889, 100.])
np.logspace(start = 1, stop = 2, num = 10) # start = log(10) = 1, stop = log(100) = 2, array consisting of 10 elements where the first element is 10 and the last element is 100.
np.logspace(np.log10(250), np.log10(500), 10) # array([250., ..., 500.]) of 10 elements uniformly spaced in log10 space
X = np.array([[1, 2, 3], [4, 5, 6]])
X.shape # (2,3)
X.size # 6
# shape and size are DATA AATRIBUTES, not methods
x = np.random.random(10) # NumPy has it sown RANDOM Module
np.any(x > 0.9) # True
np.all(x >= 0.1) # True
import matplotlib.pyplot as plt
plt.plot([i**2 for i in range(10)])
plt.show()
import matplotlib.pyplot as plt
import numpy as np
x = np.linspace(0, 10, 20)
y1 = x**2
y2 = x**1.5
plt.plot(x, y1, 'bo-', linewidth = 2, markersize = 12, label = 'First')
plt.plot(x, y2, 'gs-', linewidth = 2, markersize = 12, label = 'Second')
plt.xlabel('X')
plt.ylabel('Y')
plt.axis(-0.5, 10.5, -5, 105)
plt.legeng(loc = 'upper left')
plt.savefig('myPlot.pdf')
import numpy as np
import random
random.choice(['H', 'T'])
random.choice([0, 1]) # H & T replaced with Numbers
random.choice([i for i in range(1,7)])
random.choice(range(1,7))
random.choice(random.choice([range(1,7), range(1,9), range(1,11)]))
# Draw a histogram of 10000 dice rolls
rolls = [random.choice(range(1,7)) for i in range(10000)]
plt.hist(rolls, bins=np.linspace(.5,6.5,7))
plt.show()
# Draw a histogram of 10000 rolls of 10 dices
Y = [sum([random.choice(range(1,7)) for i in range(10)]) for j in range(1000000)]
plt.hist(Y, bins=np.linspace(9.5,60.5,51))
plt.show()
import numpy as np
np.random.random((5,3)) # returns 5x3 array of random number between 0 and 1
x = np.random.randint(1,7, (100,10))
y = np.sum(x, axis=1)
plt.hist(y)
plt.show()
import time
start_time = time.clock()
end_time = time.clock()
print(end_time - start_time)
import numpy as np
import matplotlib.pyplot as plt
x_0 = np.array([[0], [0]])
delta_x = np.random.normal(0,1,(2,100))
x = np.concatenate((x_0, np.cumsum(delta_x, axis = 1)), axis = 1)
plt.plot(x[0], x[1], "ro-", markersize = 4)
plt.show()
We have 4 tasks to accomplish the proper DNA to amino acid translation:
- Manually download DNA and protein sequence data
- Import the DNA data into python
- Create an algo to translate the DNA
- Check if translation matches the download
- Go to NCBI - public repo of DNA
- Search for Nucleotide NM_207618.2
- get FASTA data
- Save to dna.txt
- Get CDS translation sequence
inputfile = "dna.txt"
f = open(inputfile, 'r')
seq = f.read()
seq = seq.perlace('\n','')
seq = seq.perlace('\r','')
def translate(seq):
'''
Translate a string of Nucleotide sequence into a string of corresponding sequence of amino acids
'''
table = {
'ATA':'I', 'ATC':'I', 'ATT':'I', 'ATG':'M',
'ACA':'T', 'ACC':'T', 'ACG':'T', 'ACT':'T',
'AAC':'N', 'AAT':'N', 'AAA':'K', 'AAG':'K',
'AGC':'S', 'AGT':'S', 'AGA':'R', 'AGG':'R',
'CTA':'L', 'CTC':'L', 'CTG':'L', 'CTT':'L',
'CCA':'P', 'CCC':'P', 'CCG':'P', 'CCT':'P',
'CAC':'H', 'CAT':'H', 'CAA':'Q', 'CAG':'Q',
'CGA':'R', 'CGC':'R', 'CGG':'R', 'CGT':'R',
'GTA':'V', 'GTC':'V', 'GTG':'V', 'GTT':'V',
'GCA':'A', 'GCC':'A', 'GCG':'A', 'GCT':'A',
'GAC':'D', 'GAT':'D', 'GAA':'E', 'GAG':'E',
'GGA':'G', 'GGC':'G', 'GGG':'G', 'GGT':'G',
'TCA':'S', 'TCC':'S', 'TCG':'S', 'TCT':'S',
'TTC':'F', 'TTT':'F', 'TTA':'L', 'TTG':'L',
'TAC':'Y', 'TAT':'Y', 'TAA':'_', 'TAG':'_',
'TGC':'C', 'TGT':'C', 'TGA':'_', 'TGG':'W',
}
protein = ''
if len(seq) % 3 == 0:
# loop over the sequence
for i in range(0, len(seq), 3):
# extract the single codon (triplet)
codon = seq[i: i+3]
# look up it in the table and store the result
protein += table[codon]
return protein
def read_seq(inputfile):
with open(inputfile, 'r') as f:
seq = f.read()
seq = seq.replace('\n','')
seq = seq.replace('\r','')
return seq
text = 'Lorem ipsum dolor sit amet, consectetur adipiscing elit. Praesent ac elit non tellus fringilla ullamcorper fermentum sed libero. Aenean porta nisi ut consectetur imperdiet. Etiam posuere pulvinar luctus. Nulla facilisi. In neque ligula, convallis vitae ex sit amet, blandit fringilla libero. Vestibulum ante ipsum primis in faucibus orci luctus et.'
def count_words(text):
'''
Count the number of times each word occurs in text (str). Return dictionary
where keys are unique words and values are word counts. Skip punctuation.
'''
text = text.lower()
skips = [',', '.', ':', ';', '\'', '"']
for ch in skips:
text = text.replace(ch, '')
words = {}
for word in text.split(' '):
if word in words:
words[word] += 1
else:
words[word] = 1
return words
words = count_words(text)
# FAST implementation with collections.Counter
from collections import Counter
def count_words_fast(text):
'''
FAST Count (with collections.Counter) the number of times each word occurs in text (str). Return dictionary
where keys are unique words and values are word counts. Skip punctuation.
'''
text = text.lower()
skips = [',', '.', ':', ';', '\'', '"']
for ch in skips:
text = text.replace(ch, '')
words = Counter(text.split(' '))
return words
def read_book(title_path):
'''
Read a book and return it as a string.
'''
with open(title_path, 'r', encoding='utf8') as current_file:
text = current_file.read()
text = text.replace('\n', '').replace('\r', '')
return text