My Avatar

Joyenjoye

It is all about what Joye enjoys

2021/12/30 Thursday

Machine Learning:K-Means Mathematics

This post covers the mathematics behind K-Means. Specifically, it covers the following: Cost Function Optimization Initialization How to Choose the Number of Cluster...

Read More →

2021/12/29 wednesday

Machine Learning:K-Means Overview

The K-Means algorithm is one of the most widely used clustering methods in practice. It is categorized as unsupervised learning which learns from unlabelled data instead of from...

Read More →

2021/12/28 Tuesday

Machine Learning:Overview

This posts covers overveiw on different types of Machine Learning as well as notations or terminology of terminology.

Read More →

2022/06/15 Wednesday

Machine Learning:Decision Tree Models

Decision Tree Models can be used for both regression and classification tasks. It is the building block for popular ensemble models such as random forest.

Read More →

2022/06/13 Monday

Python Environment Management

This post covers the python environment and packages management with different tools for Mac OS. Specifically, it covers the following tools: venv, Anaconda, and Miniforge.

Read More →

2022/06/09 Thursday

Machine Learning:Linear Models

Linear Models provide simple and fast baselines for more complicated models. When the number of features is large, more complex models may be hard to beat linear models.

Read More →

2022/06/08 Wednesday

Machine Learning:Loss Functions

This post covers popular loss functions used in machine learning and deep learning models.

Read More →

2024/06/16 Thursday

Machine Learning:Ensemble Models - Bagging

Bagging is a general strategy that can work with any base models - linear models and decision trees.

Read More →

2022/01/08 Saturday

Machine Learning:SVM Mathematics

This post covers the mathematics behind Support Vector Machine(SVM). Specifically, it covers the following: Margin and Support Vector

Read More →

2021/01/06 Thursday

Machine Learning:SVM Overview

Support Vector Machine(SVM), also called max margin classifer, is a very popular supervised algorithm. It can handle linear or nonlinear classification, regression as well as ou...

Read More →

2022/01/01 Saturday

Machine Learning:Prepare Data for K-Means Clustering

This post covers data preprocessing steps for K-Means Clustering. Specifically, it covers the following:

Read More →

2022/12/30 Sunday

Welcome to Joyenjoye 🤗

Welcome to my blog. My name is Joye. If you can read Chinese, my Chinese name is 李拙 (LI ZHUO). 🤔 Want to know more about me? check the About tab for my work experiences ...

Read More →

2022/06/13 Monday

Python Environment Management

This post covers Python environment and package management using different tools for macOS. Specifically, it discusses the following tools: venv, Anaconda, and Miniforge.

Read More →

2022/04/15 Friday

NLP Materials

The post covers some material on an overview of NLP.

Read More →

2020/04/25 Saturday

论文总结:GloVe - Global Vectors for Word Representation

对于一个给定的词 $k$,根据其在不同语境 $i$ , $j$ 出现的概率的比值$\frac{ P_{ik}}{P_{jk}}$,可以区分其语义。

Read More →

2020/04/05 Sunday

论文总结:XGBoost - A Scalable Tree Boosting System

提升树算法(Gradient Tree Boosting)是机器学习中处理分类问题十分有效的方法,常被应用于广告点击率的预测和机器学习类比赛。 2014年,在传统提升树算法模型上,作者提出了XGBoost,并发布了相应的工具包。XGBoost因其计算速度快和模型表示好而广泛被应用在各类数据竞赛中,这些比赛包括:门店销售额预测,网页文本分类,点击率,产...

Read More →

2020/03/12 Friday

论文总结:From Word Embeddings to Document Distance

文章提出词移距离(Word Mover’s Distance, WMD)用于计算文档之间的距离。文档之间的距离被看作为一个文档中词与词距离的加权平均。词与词的距离可基于Word Embedding得到的词向量计算,两篇文档词与词的映射关系为可变条件,目标函数为最小化文档之间的距离。求解得到最小的文档距离为词移距离。而这个最优化问题是Earth’s M...

Read More →

2020/01/30 Thursday

Python爬虫学习笔记 3-1:实训项目之58同城

本项目旨在爬取成都二手房源的位置,价格和房型等信息。

Read More →

2019/12/25 Wedesday

Python爬虫学习笔记 2-3:Scrapy中间件

上一节我们讲解scrapy的项目管道的使用, 这一节介绍中间件的使用。

Read More →

2019/11/16 Sunday

Python爬虫学习笔记 2-2:Scrapy项目管道

上一节,我们了解到scrapy框架,安装和基本使用。其中提到了项目管道的主要作用包括清洗验证数据,检查重复并删除,数据入库。这一节,我们讲解scrapy的项目管道的使用。

Read More →

2019/11/17 Sunday

Python爬虫学习笔记 2-1:Scrapy框架

前面章节主要用到Requests的方式爬取网页。在小规模爬虫时,Requests能够有效地满足需求,但大规模多线程的爬虫时则需要使用Scrapy。本节讲解Scrapy爬虫的基本框架, 安装和基本使用。

Read More →

2019/08/11 Sunday

Python爬虫学习笔记 1-4:使用selenium爬取网页

本节以爬取淘宝商品数据为例,讲解如何利用selelium爬取网页数据。

Read More →

2019/08/11 Sunday

Python爬虫学习笔记 1-3:爬取ajax加载网页

本节以爬取Joyenjoye关注的人为例,讲解如何爬取ajax或者javascript加载的网页。

Read More →

2019/08/11 Sunday

Python爬虫学习笔记 1-2:初识Python爬虫

本节以爬取小王子的豆瓣短评为例,从以下三个方面来初步了解爬虫: 1 数据获取 2 网页解析 3 数据保存

Read More →

2019/08/11 Sunday

Python爬虫学习笔记 1-1:什么是爬虫

本节从以下三点来全面介绍爬虫:1 爬虫的定义和应用场景 2 爬虫基本知识 3 爬虫协议。

Read More →