jinlei-playground

Program daily, do something fun, and link to some interesting people.

    1. Collections: List, Dictionary, Set, Tuple, Range, Enumerate, Iterator, Generator.
    2. Types:            Type, String, Regular_Exp, Format, Numbers, Combinatorics, Datetime.
    3. Syntax:           Args, Inline, Closure, Decorator, Class, Duck_Type, Enum, Exception.
    4. System:          Exit, Print, Input, Command_Line_Arguments, Open, Path, OS_Commands.
    5. Data:               JSON, Pickle, CSV, SQLite, Bytes, Struct, Array, Memory_View, Deque.
    6. Advanced:     Threading, Operator, Introspection, Metaprograming, Eval, Coroutines.
    7. Libraries:        Progress_Bar, Plot, Table, Curses, Logging, Scraping, Web, Profile,
                                  NumPy, Image, Audio, Games, Data, Cython.

Daily coding list

day-001

day-002

day-003

LBYL vs EAFP: Preventing or Handling Errors in Python

Look before you leap (LBYL)

Easier to ask forgiveness than permission (EAFP)

However, the fact remains that Python as a language doesn't have an explicit preference regarding these two coding styles. Guido van Rossum, the creator of Python, has said as much:

[…] I disagree with the position that EAFP is better than LBYL, or “generally recommended” by Python. (Source)

"As with many other things in life, the answer to the initial questions is: it depends! If the problem at hand suggests that EAFP is the best approach, then go for it. On the other hand, if the best solution implies using LBYL, then use it without thinking that you’re violating a Pythonic rule."

day-004

LBYL vs EAFP: Preventing or Handling Errors in Python

Avoiding Race Conditions

day-005

Python Type Checking (Guide)

Type hint 101

day-006

Pairwise distances or affinity

day-007

Common Assertion Formats

Comparison assertions
Membership assertions
Identity assertions
Type check assertions

day-008

The goal of assertion should be to uncover programmers’ errors rather than users’ errors.

day-009

Handling exceptions with the context manager.

day-010

Matrix-vector multiplication

Matrix-vector multiplication is an operation between a matrix and a vector that produces a new vector.

Matrix multiplication

That is, given two matrices A and B, each column of the product matrix AB is formed by performing matrix-vector multiplication between A and each column of B.

Code demo

day-011

Boolean logic best-practices

day-012

Using Python’s and Operator in Non-Boolean Contexts

day-013

Functional Programming

Functions are first-class citizens, note the distinction between function names and function calls. A function name is same as a classname.

day-014

perceptron learning algorithm

day-015

data read for myself

day-016

statistical inference

day-017

the use of except, else(then may be correct) and finally

day-018

Logarithmic coordinates

day-019

Flip coins

day-020

Just play for fun

day-021

Frequently used database config code

day-022

Load model file

day-023

Cluster plot

day-024

Clustering performance evaluation

day-025

今天写了一篇公众号，喜欢的朋友可以订阅一下🙏。里面引用了 Python 中处理异常的两种编程风格：LBYL vs EAFP

day-026

向量的范数

day-027

对比 Python 加载机器学习模型的两种方式

day-028

通过类保存流水线处理的中间结果

day-029

Selecting columns based on dtype

day-030

有时候，在对 df 操作中，没有生成新的对象，导致程序出现了bug

day-031

向量点积两种方式的验证

day-032

对于之前写代码时出现的bug：如果目前还找不到比较优雅的解决方式，那就先用最基（丑）本（陋）的办法解决掉，毕竟，有总比没有好。

day-033

深入阅读理解 Python 官网对于 iterable and iterator的说明

day-034

无监督学习一些思考

day-035

Pickle sklearn pipeline

day-036

Argument Tuple Packing vs. Argument Tuple Unpacking

day-037

聚类评价：聚类前和聚类后

day-038

职责模式_v1.0

day-039

职责模式抽象框架

day-040

条件概率 $$ p(A,B,C) = p(A|B,C) * p(C|B) * p(B) \

p(B,C) = p(C|B) * p(B) = p(B|C) * p(C) $$

$$ p(A|B) = p(A|B,C) * p(C|B) + p(A|B, \overline{C}) * p(\overline{C}|B) $$

$$ \begin{align*}

p(A|B) &= \frac{p(A,B)}{p(B)} \ &= \frac{p(A,B,C)}{p(B)} + \frac{p(A,B, \overline{C})}{p(B)} \ &= \frac{p(A,B,C)}{p(C|B) * p(B)} * p(C|B) + \frac{p(A,B, \overline{C})}{p(\overline{C}|B) * p(B)} * p(\overline{C}|B) \ &= p(A|B,C) * p(C|B) + p(A|B, \overline{C}) * p(\overline{C}|B)

\end{align*} $$

day-041

Interpret ML

day-042

SOM 网络的研究

day-043

中心极限定理 python

day-044

深入浅出神经网络与深度学习-中文版重新阅读与学习

day-045

SVM 人脸识别

day-046

基于 minisom 的测试机代码

day-047

Python 封装数据类

day-048

Python 封装数据类 v2

day-049

Python 封装数据类 v3

day-050

WOE (weight of evidence) 编码

day-051

多维数组的切片与拼接

day-052

连续特征离散化，离散特征数值化

day-053

KBinsDiscretizer

day-054

特征评估指标：熵、卡方值 $$ \chi^2 = \sum\limits_{i=1}^n \sum\limits_{j=1}^m \frac{(Bin_{ij} - E_{ij})^2}{E_{ij}} $$

$$ \frac{N \times \big(P(f_i,c_j) \times P(\overline{f_i}, \overline{c_j}) - P(f_i,\overline{c_j}) \times P(\overline{f_i},c_j) \big) }{P(f_i) \times P(\overline{f_i}) \times P(c_i) \times P(\overline{c_i})} $$

day-055

条件熵的两种推到方式 $$ \begin{align*} H(X, Y) - H(Y) &= - \sum\limits_{x \in X} \sum\limits_{y \in Y} p(x, y) \log{p(x,y)} + \sum\limits_{y \in Y} p(y) \log{p(y)} \ &= - \sum\limits_{x \in X} \sum\limits_{y \in Y} p(x, y) \log{p(x,y)} + \sum\limits_{y \in Y} \big(\sum\limits_{x \in X} p(x,y) \big) \log{p(y)} \ &= - \sum\limits_{x \in X} \sum\limits_{y \in Y} p(x, y) \log{p(x,y)} + \sum\limits_{x \in X} \sum\limits_{y \in Y} p(x,y) \log{p(y)} \ &= - \sum\limits_{x \in X} \sum\limits_{y \in Y} p(x, y) \log{\frac{p(x,y)}{p(y)}} \ &= - \sum\limits_{x \in X} \sum\limits_{y \in Y} p(x, y) \log{p(y|x)} \end{align*} $$

$$ \begin{align*} H(Y|X) &= \sum\limits_{x \in X} p(x) H(Y|X=x) \\ &= \sum\limits_{x \in X} p(x) \times \big( - \sum\limits_{y \in Y} p(y|x) \log{p(y|x)} \big) \\ &= - \sum\limits_{x \in X} p(x) \sum\limits_{y \in Y} p(y|x) \log{p(y|x)} \\ &= - \sum\limits_{x \in X} \sum\limits_{y \in Y} p(x, y) \log{p(y|x)} \end{align*} $$

day-056

模型的「特征选择」

特征选择是一个模型特征数量有多到少的过程，再上一环节的特征衍生之后，进一步选择出稳定性好，有利于保证模型准确性的特征。

常用的特征选择方法有：过滤法、包装法、嵌入法。

过滤法是一种一模型无关的特征选择方法。一切可以反应特征预测能力的指标都可以用于变量选择，此外，从数据输入的角度，还有方差、缺失值情况；从相关性角度有，提升特征与模型标签相关性、降低特征间的相关性。

包装法与模型有关，每次的特征选择都要训练一次模型。主要有前向搜索算法、后向搜索算法以及双向搜索算法。

嵌入法也是模型有关的方法，不需要每次都训练模型，而是在模型训练的同时同步完成。

day-057

关于熵的记进一步说明：

信息熵、条件熵、互信息、联合熵

交叉熵、K-L散度

day-058

FeatureUnion 的 demo

day-059

sklearn 标准化的一个误区

day-060

RFM模型是根据会员最近一次购买时间R（Recency）、购买频率F（Frequency）、购买金额M（Monetary）计算得出RFM得分，通过这3个维度来评估客户的订单活跃价值，常用来做客户分群或价值区分。该模型常用于电子商务（即交易类）企业的会员分析。 RFM模型基于一个固定时间点来做模型分析，因此今天做的RFM得分与7天前做的结果可能不一样，原因是每个客户在不同的时间节点所得到的数据不同。格式引文：宋天龙著.Python数据分析与数据化运营（第2版）.机械工业出版社华章分社.2019:509. 得到电子书：https://d.dedao.cn/DQdnq8oBxNBZdr6x

day-061

数据分箱——Kmeans

day-062

pd.merge( on="account_id")

day-063

今天读了一本很棒的书，「人人可懂的数据科学」，其中的机器学习章节很有启发。

day-064

周末继续刷书「模型思维：简化世界的人工智能模型」

day-065

PCA 代码实践

day-066

SVD 分解，参考

day-067

一元梯度下降

day-068

多元梯度下降

day-069

获取聚类最佳数量的一种方式（通过轮廓系数）

day-070

一种DataFrame 可视化落地的方式

day-071

研究了一下 running page,没成功

day-072

非线性——多项式、非多项式、分段函数，本次示范多项式

day-073

最小二乘法格式引文：龚才春著.模型思维：简化世界的人工智能模型.电子工业出版社.2021:74. 得到电子书：https://d.dedao.cn/DRYHV2hTZezuIcm2

损失函数

$$ L(a,b) = h(x) - y $$

损失函数是在单个样本上计算的

成本（代价）函数

$$ J(a,b) = \sum\limits_{i=1}^{m}(h(x)^{(i)} - y^{(i)})^2 $$

代价函数是在整个样本集上做计算的。假设模型是线性模型，则 $$ J(a,b) = \sum\limits_{i=1}^{m}(ax^{(i)} + b - y^{(i)})^2 $$ 进一步，求代价函数的最小值（因为其是凸函数） $$ \frac{\partial{J(a,b)}}{\partial{a}} = \sum\limits_{i=1}^{m} (2a{x^{(i)}}^2 + 2x^{(i)}b - 2x^{(i)}y^{(i)}) $$

$$ \frac{\partial{J(a,b)}}{\partial{b}} = \sum\limits_{i=1}^{m} (2ax^{(i)} + 2b - 2y^{(i)}) $$

Name		Name	Last commit message	Last commit date
Latest commit History 86 Commits
2022		2022
.gitignore		.gitignore
README.md		README.md

bianchengrike/jinlei-playground

Folders and files

Latest commit

History

Repository files navigation

jinlei-playground

Contents

Daily coding list

损失函数

成本（代价）函数

About

Resources

Stars

Watchers

Forks

Languages