Joyenjoye

2021/12/30 Thursday

Machine Learning：K-Means Mathematics

This post covers the mathematics behind K-Means. Specifically, it covers the following: Cost Function Optimization Initialization How to Choose the Number of Cluster...

Read More →

2021/12/29 wednesday

Machine Learning：K-Means Overview

The K-Means algorithm is one of the most widely used clustering methods in practice. It is categorized as unsupervised learning which learns from unlabelled data instead of from...

Read More →

2021/12/28 Tuesday

Machine Learning：Overview

This posts covers overveiw on different types of Machine Learning as well as notations or terminology of terminology.

Read More →

2022/06/15 Wednesday

Machine Learning：Decision Tree Models

Decision Tree Models can be used for both regression and classification tasks. It is the building block for popular ensemble models such as random forest.

Read More →

2022/06/13 Monday

Python Environment Management

This post covers the python environment and packages management with different tools for Mac OS. Specifically, it covers the following tools: venv, Anaconda, and Miniforge.

Read More →

2022/06/09 Thursday

Machine Learning：Linear Models

Linear Models provide simple and fast baselines for more complicated models. When the number of features is large, more complex models may be hard to beat linear models.

Read More →

2022/06/08 Wednesday

Machine Learning：Loss Functions

This post covers popular loss functions used in machine learning and deep learning models.

Read More →

2024/06/16 Thursday

Machine Learning：Ensemble Models - Bagging

Bagging is a general strategy that can work with any base models - linear models and decision trees.

Read More →

2022/01/08 Saturday

Machine Learning：SVM Mathematics

This post covers the mathematics behind Support Vector Machine(SVM). Specifically, it covers the following: Margin and Support Vector

Read More →

2021/01/06 Thursday

Machine Learning：SVM Overview

Support Vector Machine(SVM), also called max margin classifer, is a very popular supervised algorithm. It can handle linear or nonlinear classification, regression as well as ou...

Read More →

2022/01/01 Saturday

Machine Learning：Prepare Data for K-Means Clustering

This post covers data preprocessing steps for K-Means Clustering. Specifically, it covers the following:

Read More →

2022/12/30 Sunday

Welcome to Joyenjoye 🤗

Welcome to my blog. My name is Joye. If you can read Chinese, my Chinese name is 李拙 (LI ZHUO). 🤔 Want to know more about me? check the About tab for my work experiences ...

Read More →

2022/06/13 Monday

Python Environment Management

This post covers Python environment and package management using different tools for macOS. Specifically, it discusses the following tools: venv, Anaconda, and Miniforge.

Read More →

2022/04/15 Friday

NLP Materials

The post covers some material on an overview of NLP.

Read More →

2020/04/25 Saturday

论文总结：GloVe - Global Vectors for Word Representation

对于一个给定的词 $k$，根据其在不同语境 $i$ , $j$ 出现的概率的比值$\frac{ P_{ik}}{P_{jk}}$，可以区分其语义。

Read More →

2020/04/05 Sunday

论文总结：XGBoost - A Scalable Tree Boosting System

提升树算法（Gradient Tree Boosting）是机器学习中处理分类问题十分有效的方法，常被应用于广告点击率的预测和机器学习类比赛。 2014年，在传统提升树算法模型上，作者提出了XGBoost，并发布了相应的工具包。XGBoost因其计算速度快和模型表示好而广泛被应用在各类数据竞赛中，这些比赛包括：门店销售额预测，网页文本分类，点击率，产...

Read More →

2020/03/12 Friday

论文总结：From Word Embeddings to Document Distance

文章提出词移距离(Word Mover’s Distance, WMD)用于计算文档之间的距离。文档之间的距离被看作为一个文档中词与词距离的加权平均。词与词的距离可基于Word Embedding得到的词向量计算，两篇文档词与词的映射关系为可变条件，目标函数为最小化文档之间的距离。求解得到最小的文档距离为词移距离。而这个最优化问题是Earth’s M...

Read More →

2020/01/30 Thursday