Joyenjoye

Welcome to Joyenjoye 🤗

2022/12/30 Sunday

Welcome to my blog. My name is Joye. If you can read Chinese, my Chinese name is 李拙 (LI ZHUO).

🤔 Want to know more about me?

check the About tab for my work experiences and skillsets
check the Hobbies tab for my hobbies outside of work

🤔 What can you find here？

Learnings from my day-to-day work as a data scientist
Summarization of papers and methodologies in machine learning, deep learning, and NLP.

Python Environment Management

2022/06/13 Monday

This post covers Python environment and package management using different tools for macOS. Specifically, it discusses the following tools: venv, Anaconda, and Miniforge.

NLP Materials

2022/04/15 Friday

The post covers some material on an overview of NLP.

论文总结：GloVe - Global Vectors for Word Representation

2020/04/25 Saturday

对于一个给定的词 $k$，根据其在不同语境 $i$ , $j$ 出现的概率的比值$\frac{ P_{ik}}{P_{jk}}$，可以区分其语义。

论文总结：XGBoost - A Scalable Tree Boosting System

2020/04/05 Sunday

提升树算法（Gradient Tree Boosting）是机器学习中处理分类问题十分有效的方法，常被应用于广告点击率的预测和机器学习类比赛。

2014年，在传统提升树算法模型上，作者提出了XGBoost，并发布了相应的工具包。XGBoost因其计算速度快和模型表示好而广泛被应用在各类数据竞赛中，这些比赛包括：门店销售额预测，网页文本分类，点击率，产品分类等。该论文发表于两年后的2016KDD会议。

论文总结：From Word Embeddings to Document Distance

2020/03/12 Friday

文章提出词移距离(Word Mover’s Distance, WMD)用于计算文档之间的距离。文档之间的距离被看作为一个文档中词与词距离的加权平均。词与词的距离可基于Word Embedding得到的词向量计算，两篇文档词与词的映射关系为可变条件，目标函数为最小化文档之间的距离。求解得到最小的文档距离为词移距离。而这个最优化问题是Earth’s Mover’s Distance的特殊情况，可采用相应的算法进行求解。