O

Services

Case studies

Journal

Contact

17.11.20

Catboost for big data

featured image thumbnail for post Catboost for big data

For structured, heterogenous data, gradient boosting is the way to go.

For all of the hoo-ha about deep learning, the most widely used machine learning algorithm is either logistic regression or gradient boosted decision trees. Gradient boosting is a method whereby you iteratively fit simple models to your data (typically shallow trees), but weight each iteration based on the errors of the previous iteration. It tends to produce good prediction in medium to large datasets.

This paper reviews Catboost which, alongside Xgboost and LightGBM, is one of the most popular gradient boosting implementations. It is particularly well suited to categorical data (hence the name) and doesn't work well with homogenous numeric data like images. The paper compares implementation and describes application in fields such as psychology, transport and chemistry.

←Previous: Post election post

Next: How to hire people→


If you enjoyed this then please sign up for a weekly 5 mins briefing in AI and data strategy.

ortom logoortom logoortom logoortom logo

©2021

LINKEDIN

CLUTCH.CO

TERMS & PRIVACY