In today’s blogpost I will look at historical data from the Tour de France. This data was used in the Tidytuesday series back in April, but I thought I’d take a closer look at it now as I currently suffer from Tour de France withdrawal symptoms (a July without Tour de France is like December without Christmas).
First, let’s load the data and have a look at the structure.
Dagens blogginnlegg er skrevet i samarbeid med min dyktige kollega Eivind Kvitstein, en usedvanlig allsidig sørlending som er både aktuar, data scientist, revisor og nå også hobby-epidemiolog.
Covid-19-viruset har spredt seg over hele verden og snudd opp ned på vår hverdag. I nettavisene som f.eks. VG kan vi følge utviklingen i antall døde, antall smittede både i Norge og hele verden. Dette har ført til en rekke feiltolkninger av dataene.
Today I will look at how to connect to the Strava-API and do some quick analysis on the activity data.
To start using the Strava-API, start by registering a developer app at strava.com. After doing this, you will get your API credentials, including your personal secret and client id. Here, I have added my secret to keyring using key_set before fetching it in the code below.
In today’s post, I will go through how to get started with solving Kaggle-competitions in R using e.g. xgboost and recipes. The Kaggle-competition used in the example is the IEEE-CIS Fraud Detection: https://www.kaggle.com/c/ieee-fraud-detection/overview.
In just 100 lines of code and without creating any new features, we will create a xgboost-model which puts us at 93,5% AUC. Now this is quite far down on the leader-board as the competition is fierce, but it is actually only 3-percentage points away from the current leader.
Today I will discuss a few common errors in statistical modelling/Machine learning, and how to avoid them. Admittedly, I have done some of these mistakes myself, and some I have simply observed (more than once).
1. The time-travelling model If we were actually capable of time-travelling, we wouldn’t have to spend so much time creating predictive models! (and we would have to find another job)
However, I have often observed people creating models that are time-travelling by mistake.
Little is more frustrating then writing code that is mind-numbingly slow, but you have no idea how to fix it, so you just end up waiting for several minutes each time your code runs. Now, this isn’t always a huge problem (maybe you needed a break anyway). However, for code in production, speed is absolutely essential.
In this blog post, I will go over 5 tips that may help you speed your R-code.
Hello, and welcome to my first blog post!
In this post we will combine two of my primary interests: cycling and data analysis. We will look at the Bergen Bicycle Data, which is a dataset (available through a public API) consisting of all data from the rentable bikes that have been set out all over Bergen.
In this first part, we will do some exploratory analysis and get familar with the data at hand.