App Analysis: Airbnb

This blog series focuses on examining the collection of device data by various popular mobile applications. This data is often collected in the name of advertising, error monitoring, fraud detection, and social media integration.


The application featured in this blog is Airbnb, a marketplace for arranging and offering lodging. Whether Airbnb is "disrupting" the hotel industry or quiet neighborhoods, the fact remains that Airbnb provides this platform to ~150 million users for booking temporary lodging.

Bookings are often added, searched for, or made through the Airbnb mobile application which communicates data back to Airbnb servers. Alongside the Airbnb data is the collection of device data by third-party partners Airbnb packages in their application. This data is anything but minimal, containing device information whose usability and practicality is hard to decipher.

Servers communicated with while using the Airbnb iOS application

Who's collecting what and why?

Airbnb's iOS application makes use of four data collection tools: Sift Science, Bugsnag, mParticle, and Facebook's Graph API. The size, granularity, and underlying reason behind collecting the device data differs for each tool. The following focuses on Sift Science as it collects the largest and most detailed data from the device.

Sift Science: Knows I'm facing North-East

Sift Science collects device data with the aim of preventing payment fraud, account abuse, and platform misuse. At the core of Sift Science is a system which uses machine learning to achieve these goals. Airbnb leverages this tool to improve user experience and protect themselves from potential fraud. However it is not clear whether the data collected by Sift Science solely benefits Airbnb, as monetizing data is a common practice of analytic tools.

Sample of device data collected by Sift Science

Google's Chief Scientist Peter Norvig is quoted as saying "We don't have better algorithms than anyone else; we just have more data"; this seems to have inspired Sift Science. By collecting swaths of device data presumably used to train their machine-learnt models, Sift Science is able to provide a robust detection capability. In the world of machine learning and data collection, less is never more. However not all data is created equal; it's easy to understand why Sift Science would want to collect a device's type, but less so for their collection of the device's compass orientation.

Sift Science collects the orientation of the device

Evidence that their service also collects a user's location, direction, and speed is also found in Sift Sciences data model available on Github. For more information on the type of data retrieved by Sift Science refer to their iOS client's source code documenting the device data retrieved which is available here and here.

What users can do about it

If any user feels uncomfortable with the amount of data they send to Airbnb's third-party partners there are methods to prevent this. This can be done by stopping the DNS server they use from resolving the following names: api3.siftscience.com, notify.bugsnag.com, graph.facebook.com, and *.mparticle.com.

It should be noted this method is heavy-handed and will stop other applications from using these tracking tools. For example, the Graph API provided by Facebook is prevalent in many applications and it's loss may cause unexpected behavior. That being said, Sift Science, Bugsnag, and mParticle are strictly for data collection. Blocking them should not cause an application to degrade.

Data model developed by Sift Science for collecting Device data

Conclusion

If any user feels uncomfortable with the amount of data they send to Airbnb's third-party partners there are methods to prevent this. This can be done by stopping the DNS server they use from resolving the following names: api3.siftscience.com, notify.bugsnag.com, graph.facebook.com, and *.mparticle.com.

Once again this is the trade-off between security and usability. Hopefully this blog allows users to make an informed decision on how they share that data.