Recently, I needed a few localisation points to create a test for a visualization.
To create a real effect, I decided to see if I could use the data I create everyday when I use Google map.
Why I am not shy about these data? Simply because they are outdated due to the fact that I have recently moved. (It would be a shame to be robbed by a bugler expert in data because of a blog post :) ).
I ended to find a lot of interesting things:
Over two years, Google had stored 12K points with an associated time stamps.
These data tells a lot on me:
Where I work
Where I live
My typical day
How I move: by car, by bike or by foot.
All in all, this data set reflects pretty accurately my routine week and is perfect for advertisers and more. Well done, Google.
Presentation of the Dataset
I went to the takeout manager of my Google account to see if I could use the data from my Google map account.
The process is fairly easy and fast. In my case, I was able to get 12K coordinates over 2 years of use of Google products.
In addition of the coordinates, Google also provide a time stamp and a bunch of variables poorly completed: velocity, altitude, accuracy and activity with a reliability score.
Heat Map of my Historical Positions
The data set is included in my github account. I have limited the data set to London and 2 years.
The first thing to do with this data set is to create an heat map to visually assess the content.
Here, one difficulty is to interpret the density scale: What exactly are we representing?
Read the Scale
The density is calculated at an exact point through a two-dimensional kernel density. It is estimated on a grid of, in that case, 100 * 100 points.
A small calculus allows us to define the area of a square:
A density of 6 could be read as exp(6) or 403 points per 53,290 square meter or 76 points per hectare.
Heat Map per Time Period
Is it possible to define specific place at specific time period?
Let’s have a look at the full time period:
In a strange way, no data appear between November 2015 and March 2016.
The time stamp is mainly associated with hours between 6 and 10 in the afternoon.
It seems logic to me, as it is the time of the day when I am the most active.
More astonishing, there is points registered during the night. It may be nice to create a heat map per hour to see if we can predict where I will be at certain hours:
As we can see on that graph, I am pretty quiet during the night, wake up at Acton between 8 and 9 and run around London the rest of the day, coming back home between 7 and 10 p.m..
Is there a discretisation that can be done per day?
I have more data stored for Saturday and Sunday, which seems normal, as it is the days where I use Google map the most.
It reflects well a typical day: I am generally out in Monday and Wednesday, so I stay still in Tuesday. Saturday and especially Sunday are days where I move a lot around London.
In the data set, we also can find a variable activity which is a list of activities, not really well define.
Over 4,072 rows are completed with a list of activity. It represents 38.8% of the rows.
Activities have a time stamps of their own.
79,392 activities are recorded, an average of 7 per location. Nevertheless, we can find a confidence variable. If we restrict to only the highly confident time stamps, over 90, we limit to 16,657 rows.
Most of the time stamps of the activities are a couple of minutes after the time stamp of the coordinates.
Nevertheless, we can also observe a decreasing sinusoidal trend with peaks at 24h, 48h and 64h.
I don’t have a car nor a bike so I was astonished to see so much tags “inVehicle” and “onBicycle”. The reason of so much of these tags is that they are associated to a low confidence. When limiting to a confidence of 90 or above, the activities make finally sense.
Most of my activities are standing and tilting. I tilt when I use my unicycle so it makes sense to have so much and I am still mainly at work and at home.
Is it possible to predict that way if a person has a car or a bike? Yes, definitely.
I have created a graph by type that can be found here.
In that graph, it is possible to check where I am standing still and where I am just passing.