My Avatar

Joyenjoye

It is all about what Joye enjoys

Welcome to Joyenjoye 🤗

2022/12/30 Sunday

Welcome to my blog. My name is Joye. If you can read Chinese, my Chinese name is 李拙.


🤔 Want to know more details about me?

  • check the About tab for my work experiences and skillsets
  • check the Hobbies tab for my hobbies outside work


🤔 What can you find here?

  • Learnings from day-to-day work from a data scientist in PR and Marketing Industry.
  • Summarization of papers and methodologies in machine learning, deep learning, and NLP.


Read More

Git Notes

2023/05/11 Thursday



Read More

Machine Learning:Decision Tree Models

2022/06/15 Wednesday

Decision Tree Models can be used for both regression and classification tasks. It is the building block for popular ensemble models such as random forest.




Read More

Python Environment Management

2022/06/13 Monday

This post covers the python environment and packages management with different tools for Mac OS. Specifically, it covers the following tools: venv, Anaconda, and Miniforge.



Read More

Machine Learning:Linear Models

2022/06/09 Thursday

Linear Models provide simple and fast baselines for more complicated models. When the number of features is large, more complex models may be hard to beat linear models.



Read More

Machine Learning:Loss Functions

2022/06/08 Wednesday

This post covers popular loss functions used in machine learning and deep learning models.

Read More

NLP Materials

2022/04/15 Friday

The post covers some materials on NLP overview.

Read More

Machine Learning:SVM Mathematics

2022/01/08 Saturday

This post covers the mathematics behind Support Vector Machine(SVM). Specifically, it covers the following:

  1. Margin and Support Vector

Read More

Machine Learning:SVM Overview

2021/01/06 Thursday

Support Vector Machine(SVM), also called max margin classifer, is a very popular supervised algorithm. It can handle linear or nonlinear classification, regression as well as outlier detection[2]. SVMs are particularly well suited for classification of complex but small- or medium-sized datasets[1].

Read More

Machine Learning:Prepare Data for K-Means Clustering

2022/01/01 Saturday

This post covers data preprocessing steps for K-Means Clustering. Specifically, it covers the following:

Read More

Machine Learning:K-Means Mathematics

2021/12/30 Thursday

This post covers the mathematics behind K-Means. Specifically, it covers the following:

Read More

Machine Learning:K-Means Overview

2021/12/29 wednesday

The K-Means algorithm is one of the most widely used clustering methods in practice. It is categorized as unsupervised learning which learns from unlabelled data instead of from labelled data, and try to find the “structure” or “pattern” in the data. Also as a type of clustering algorithm, it aims to automatically group the data to coherent clusters[1]. Typical use cases include customer segmentation[2], social network analysis, and document clustering.

Read More

Machine Learning:Overview

2021/12/28 Tuesday

This posts covers overveiw on different types of Machine Learning as well as notations or terminology of terminology.



Read More

论文总结:GloVe - Global Vectors for Word Representation

2020/04/25 Saturday

对于一个给定的词 $k$,根据其在不同语境 $i$ , $j$ 出现的概率的比值$\frac{ P_{ik}}{P_{jk}}$,可以区分其语义。

Read More

论文总结:XGBoost - A Scalable Tree Boosting System

2020/04/05 Sunday

提升树算法(Gradient Tree Boosting)是机器学习中处理分类问题十分有效的方法,常被应用于广告点击率的预测和机器学习类比赛。

2014年,在传统提升树算法模型上,作者提出了XGBoost,并发布了相应的工具包。XGBoost因其计算速度快和模型表示好而广泛被应用在各类数据竞赛中,这些比赛包括:门店销售额预测,网页文本分类,点击率,产品分类等。该论文发表于两年后的2016KDD会议。

Read More

论文总结:From Word Embeddings to Document Distance

2020/03/12 Friday

文章提出词移距离(Word Mover’s Distance, WMD)用于计算文档之间的距离。文档之间的距离被看作为一个文档中词与词距离的加权平均。词与词的距离可基于Word Embedding得到的词向量计算,两篇文档词与词的映射关系为可变条件,目标函数为最小化文档之间的距离。求解得到最小的文档距离为词移距离。而这个最优化问题是Earth’s Mover’s Distance的特殊情况,可采用相应的算法进行求解。

Read More

Python爬虫学习笔记 3-1:实训项目之58同城

2020/01/30 Thursday

本项目旨在爬取成都二手房源的位置,价格和房型等信息。

Read More

Python爬虫学习笔记 2-3:Scrapy中间件

2019/12/25 Wedesday

上一节我们讲解scrapy的项目管道的使用, 这一节介绍中间件的使用。

Read More

Python爬虫学习笔记 2-2:Scrapy项目管道

2019/11/16 Sunday

上一节,我们了解到scrapy框架,安装和基本使用。其中提到了项目管道的主要作用包括清洗验证数据,检查重复并删除,数据入库。这一节,我们讲解scrapy的项目管道的使用。

Read More

Python爬虫学习笔记 2-1:Scrapy框架

2019/11/17 Sunday

前面章节主要用到Requests的方式爬取网页。在小规模爬虫时,Requests能够有效地满足需求,但大规模多线程的爬虫时则需要使用Scrapy。本节讲解Scrapy爬虫的基本框架, 安装和基本使用。

Read More

Python爬虫学习笔记 1-4:使用selenium爬取网页

2019/08/11 Sunday

本节以爬取淘宝商品数据为例,讲解如何利用selelium爬取网页数据。

Read More

Python爬虫学习笔记 1-3:爬取ajax加载网页

2019/08/11 Sunday

本节以爬取Joyenjoye关注的人为例,讲解如何爬取ajax或者javascript加载的网页。

Read More

Python爬虫学习笔记 1-2:初识Python爬虫

2019/08/11 Sunday

本节以爬取小王子的豆瓣短评为例,从以下三个方面来初步了解爬虫: 1 数据获取 2 网页解析 3 数据保存

Read More

Python爬虫学习笔记 1-1:什么是爬虫

2019/08/11 Sunday

本节从以下三点来全面介绍爬虫:1 爬虫的定义和应用场景 2 爬虫基本知识 3 爬虫协议。

Read More