## The Tricky Conditional Probability

Date: 23 January 2021 Recently, I received an image, saying the Covid-19 test accuracy for positive cases is 93% and that for negative case is 99%. Does it say anything good or bad about the Covid-19 test? Are 93% and 99% good? Or is it bad that the test missed 7% of the positive cases? …

## Some Beginner Resources for Learning Data Visualization in Tableau

Date: 06 Jan 2021 Recently, I am preparing the launch of Tableau in my company. I have, therefore, collected some useful resources and want to share with my readers. General Learning Path Understand the terminology used in Tableau You may refer to this link. Understand the Tableau environment, meaning the functions of buttons and cards …

## Data Governance V.S. Data Engineering V.S. Data Analysis

Date: 31 Dec 2020 Like most of you, when I tried to explore the career in data field, I want to be a data analyst. It is the sexiest job in 21st Century (according to Harvard Business Review. [an article titled: Data Scientist: The Sexiest Job of the 21st Century]). At that time, I didn’t …

## Typical Credit Risk Modelling

date: 6 Dec 2020 I will share what I know and some thoughts on developing credit risk model. Expected Loss The reason why expected loss is calculated is to determine how much capital a bank/Financial Institution (FI) should hold. The expected loss is not the only factor in determining capital requirements. There are other factors, …

## Predicting House Price in Hong Kong #8: KNN

In #7, I have talked about choosing the adjusted R square as the error metric. In this article, I am going to summarize the features of K nearest neighbour model, the simplest model to use in predicting house price. The reason I want to use KNN is just that it is simple. I hope the …

## Predicting House Price in Hong Kong #7: Selecting Error Matric

In #6, I have talked about building a Minimum Viable Product (MVP). In this post, I am going to discuss the different matrices to evaluate a regression model and my choice for the problem. NOTATIONS n is the number of testing samples. y_i is the i th predicted value. x_i is the i th actual …

## Predicting House Price in Hong Kong #6

It has been a while since I made the last git commit in the project repository. Although I am a bit busy with my daytime job recently, I will keep the side project rolling. This blog post is about my recent reflection on the project direction. There are two goals I want to achieve in …

## Predicting House Price in Hong Kong #5

in #4, I talked about how to find the hidden APIs of Centaline Android App. In this article, I am going to show you how to write a scraper to scrape the data. You can find the Scrapy Crawler in this Git directory. 3 Levels of Data The scraping was done in 3 levels. First, …

## Predicting House Price in Hong Kong #4

Date: 26 July 2020 In #3, I faced difficulty in having a lot of missing values in the property transaction data. After searching the web, I found that Centaline claimed they have spent 10 million HKD to fill the missing values. Maybe, I should try scraping data from Centaline. Luckily, I have scrapped Centaline data …

## Predicting House Price in Hong Kong #3

Date: 13 July 2020 In part 2, we have talked about splitting data into a training set and testing set. In part 3, I would like to share some findings on the first data exploration. Big Problem: Missing Data I did not expect there are so many missing data in some of the key fields. I …

Hey, wait!

Before you go, Subscribe and Get Notify for Content Like This.