Skip to content

Program daily, do something fun, and link to some interesting people.

Notifications You must be signed in to change notification settings

bianchengrike/jinlei-playground

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

86 Commits
 
 
 
 
 
 

Repository files navigation

jinlei-playground

Program daily, do something fun, and link to some interesting people.

Contents

forked from gto76/python-cheatsheet

    1. Collections:   List, Dictionary, Set, Tuple, Range, Enumerate, Iterator, Generator.
    2. Types:            Type, String, Regular_Exp, Format, Numbers, Combinatorics, Datetime.
    3. Syntax:           Args, Inline, Closure, Decorator, Class, Duck_Type, Enum, Exception.
    4. System:          Exit, Print, Input, Command_Line_Arguments, Open, Path, OS_Commands.
    5. Data:               JSON, Pickle, CSV, SQLite, Bytes, Struct, Array, Memory_View, Deque.
    6. Advanced:     Threading, Operator, Introspection, Metaprograming, Eval, Coroutines.
    7. Libraries:        Progress_Bar, Plot, Table, Curses, Logging, Scraping, Web, Profile,
                                  NumPy, Image, Audio, Games, Data, Cython.

Daily coding list

LBYL vs EAFP: Preventing or Handling Errors in Python

Look before you leap (LBYL)

Easier to ask forgiveness than permission (EAFP)

However, the fact remains that Python as a language doesn't have an explicit preference regarding these two coding styles. Guido van Rossum, the creator of Python, has said as much:

[…] I disagree with the position that EAFP is better than LBYL, or “generally recommended” by Python. (Source)

"As with many other things in life, the answer to the initial questions is: it depends! If the problem at hand suggests that EAFP is the best approach, then go for it. On the other hand, if the best solution implies using LBYL, then use it without thinking that you’re violating a Pythonic rule."

LBYL vs EAFP: Preventing or Handling Errors in Python

Avoiding Race Conditions

Python Type Checking (Guide)

Type hint 101

Pairwise distances or affinity

Common Assertion Formats

  • Comparison assertions

  • Membership assertions

  • Identity assertions

  • Type check assertions

The goal of assertion should be to uncover programmers’ errors rather than users’ errors.

Handling exceptions with the context manager.

Matrix-vector multiplication

Matrix-vector multiplication is an operation between a matrix and a vector that produces a new vector.

Matrix multiplication

That is, given two matrices A and B, each column of the product matrix AB is formed by performing matrix-vector multiplication between A and each column of B.

Code demo

Boolean logic best-practices

Using Python’s and Operator in Non-Boolean Contexts

Functional Programming

Functions are first-class citizens, note the distinction between function names and function calls. A function name is same as a classname.

perceptron learning algorithm

data read for myself

statistical inference

the use of except, else(then may be correct) and finally

Logarithmic coordinates

Flip coins

Just play for fun

Frequently used database config code

Load model file

Cluster plot

Clustering performance evaluation

今天写了一篇公众号,喜欢的朋友可以订阅一下🙏。里面引用了 Python 中处理异常的两种编程风格:LBYL vs EAFP

向量的范数

对比 Python 加载机器学习模型的两种方式

通过类保存流水线处理的中间结果

Selecting columns based on dtype

有时候,在对 df 操作中,没有生成新的对象,导致程序出现了bug

向量点积两种方式的验证

对于之前写代码时出现的bug:如果目前还找不到比较优雅的解决方式,那就先用最基(丑)本(陋)的办法解决掉,毕竟,有总比没有好。

深入阅读理解 Python 官网对于 iterable and iterator的说明

无监督学习一些思考

Pickle sklearn pipeline

Argument Tuple Packing vs. Argument Tuple Unpacking

聚类评价:聚类前和聚类后

职责模式_v1.0

职责模式抽象框架

条件概率 $$ p(A,B,C) = p(A|B,C) * p(C|B) * p(B) \

p(B,C) = p(C|B) * p(B) = p(B|C) * p(C) $$

$$ p(A|B) = p(A|B,C) * p(C|B) + p(A|B, \overline{C}) * p(\overline{C}|B) $$

$$ \begin{align*}

p(A|B) &= \frac{p(A,B)}{p(B)} \ &= \frac{p(A,B,C)}{p(B)} + \frac{p(A,B, \overline{C})}{p(B)} \ &= \frac{p(A,B,C)}{p(C|B) * p(B)} * p(C|B) + \frac{p(A,B, \overline{C})}{p(\overline{C}|B) * p(B)} * p(\overline{C}|B) \ &= p(A|B,C) * p(C|B) + p(A|B, \overline{C}) * p(\overline{C}|B)

\end{align*} $$

Interpret ML

SOM 网络的研究

中心极限定理 python

深入浅出神经网络与深度学习-中文版重新阅读与学习

SVM 人脸识别

基于 minisom 的测试机代码

Python 封装数据类

Python 封装数据类 v2

Python 封装数据类 v3

WOE (weight of evidence) 编码

多维数组的切片与拼接

连续特征离散化,离散特征数值化

KBinsDiscretizer

特征评估指标:熵、卡方值 $$ \chi^2 = \sum\limits_{i=1}^n \sum\limits_{j=1}^m \frac{(Bin_{ij} - E_{ij})^2}{E_{ij}} $$

$$ \frac{N \times \big(P(f_i,c_j) \times P(\overline{f_i}, \overline{c_j}) - P(f_i,\overline{c_j}) \times P(\overline{f_i},c_j) \big) }{P(f_i) \times P(\overline{f_i}) \times P(c_i) \times P(\overline{c_i})} $$

条件熵的两种推到方式 $$ \begin{align*} H(X, Y) - H(Y) &= - \sum\limits_{x \in X} \sum\limits_{y \in Y} p(x, y) \log{p(x,y)} + \sum\limits_{y \in Y} p(y) \log{p(y)} \ &= - \sum\limits_{x \in X} \sum\limits_{y \in Y} p(x, y) \log{p(x,y)} + \sum\limits_{y \in Y} \big(\sum\limits_{x \in X} p(x,y) \big) \log{p(y)} \ &= - \sum\limits_{x \in X} \sum\limits_{y \in Y} p(x, y) \log{p(x,y)} + \sum\limits_{x \in X} \sum\limits_{y \in Y} p(x,y) \log{p(y)} \ &= - \sum\limits_{x \in X} \sum\limits_{y \in Y} p(x, y) \log{\frac{p(x,y)}{p(y)}} \ &= - \sum\limits_{x \in X} \sum\limits_{y \in Y} p(x, y) \log{p(y|x)} \end{align*} $$

$$ \begin{align*} H(Y|X) &= \sum\limits_{x \in X} p(x) H(Y|X=x) \\ &= \sum\limits_{x \in X} p(x) \times \big( - \sum\limits_{y \in Y} p(y|x) \log{p(y|x)} \big) \\ &= - \sum\limits_{x \in X} p(x) \sum\limits_{y \in Y} p(y|x) \log{p(y|x)} \\ &= - \sum\limits_{x \in X} \sum\limits_{y \in Y} p(x, y) \log{p(y|x)} \end{align*} $$

模型的「特征选择」

特征选择是一个模型特征数量有多到少的过程,再上一环节的特征衍生之后,进一步选择出稳定性好,有利于保证模型准确性的特征。

常用的特征选择方法有:过滤法、包装法、嵌入法。

过滤法是一种一模型无关的特征选择方法。一切可以反应特征预测能力的指标都可以用于变量选择,此外,从数据输入的角度,还有方差、缺失值情况;从相关性角度有,提升特征与模型标签相关性、降低特征间的相关性。

包装法与模型有关,每次的特征选择都要训练一次模型。主要有前向搜索算法、后向搜索算法以及双向搜索算法。

嵌入法也是模型有关的方法,不需要每次都训练模型,而是在模型训练的同时同步完成。

关于熵的记进一步说明:

信息熵、条件熵、互信息、联合熵

交叉熵、K-L散度

FeatureUnion 的 demo

sklearn 标准化的一个误区

RFM模型是根据会员最近一次购买时间R(Recency)、购买频率F(Frequency)、购买金额M(Monetary)计算得出RFM得分,通过这3个维度来评估客户的订单活跃价值,常用来做客户分群或价值区分。该模型常用于电子商务(即交易类)企业的会员分析。 RFM模型基于一个固定时间点来做模型分析,因此今天做的RFM得分与7天前做的结果可能不一样,原因是每个客户在不同的时间节点所得到的数据不同。 格式引文: 宋天龙著.Python数据分析与数据化运营(第2版).机械工业出版社华章分社.2019:509. 得到电子书:https://d.dedao.cn/DQdnq8oBxNBZdr6x

数据分箱——Kmeans

pd.merge( on="account_id")

今天读了一本很棒的书,「人人可懂的数据科学」,其中的机器学习章节很有启发。

周末继续刷书「模型思维:简化世界的人工智能模型」

PCA 代码实践

SVD 分解,参考

一元梯度下降

多元梯度下降

获取聚类最佳数量的一种方式(通过轮廓系数)

一种DataFrame 可视化落地的方式

研究了一下 running page,没成功

非线性——多项式、非多项式、分段函数,本次示范多项式

最小二乘法 格式引文: 龚才春著.模型思维:简化世界的人工智能模型.电子工业出版社.2021:74. 得到电子书:https://d.dedao.cn/DRYHV2hTZezuIcm2

损失函数

$$ L(a,b) = h(x) - y $$

损失函数是在单个样本上计算的

成本(代价)函数

$$ J(a,b) = \sum\limits_{i=1}^{m}(h(x)^{(i)} - y^{(i)})^2 $$

代价函数是在整个样本集上做计算的。假设模型是线性模型,则 $$ J(a,b) = \sum\limits_{i=1}^{m}(ax^{(i)} + b - y^{(i)})^2 $$ 进一步,求代价函数的最小值(因为其是凸函数) $$ \frac{\partial{J(a,b)}}{\partial{a}} = \sum\limits_{i=1}^{m} (2a{x^{(i)}}^2 + 2x^{(i)}b - 2x^{(i)}y^{(i)}) $$

$$ \frac{\partial{J(a,b)}}{\partial{b}} = \sum\limits_{i=1}^{m} (2ax^{(i)} + 2b - 2y^{(i)}) $$

轮廓系数- samples

全连接的层次聚类

初步跑通了个人跑步主页

高斯混合聚类

逻辑回归 demo

one-hot 归一化之前与之后的区别

手动实现决策树-1

pickle 也可以实现对 python list 的保存

绘制逻辑回归决策轮廓

About

Program daily, do something fun, and link to some interesting people.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages