Wow - thank you so much for sharing! This is very exciting to hear. And big congratulations to you, Sophie, Andrew, and Paul!
I was very interested to see the different domains using smartphone-GPS data in Figure 2. Since I work with researchers using primarily SafeGraph data, we see several common research topics and use cases emerge within the Community. It was interesting to compare common use cases for SafeGraph data to what you had displayed in Figure 2. For example, I was surprised to see in your literature review that Sports Tracking and Physical Activity was the second most common domain using smartphone-GPS data. That’s one research topic that doesn’t surface in the Dewey Community as frequently as something like Health and Wellbeing. It would be interesting to do a similar analysis and see the different thematic groupings of research by SafeGraph users only. I would guess that the top three groupings would likely be Health and Wellbeing, Movement and Places in the City, and Neighborhoods and Society. I know you’re a little newer to SafeGraph since we recently did onboarding. Just off of your first impressions, what do you think the common groupings would be?
I was looking at the variables used by researchers in Table 1, and it’s interesting to see some that do and don’t overlap with columns in SafeGraph datasets. I’m not as familiar with work in urban design, and I’m always interested to hear feedback on how we can improve SafeGraph data. What would be variables that would be helpful in your research area that we might currently be lacking in?
I love the discussion on data collection and privacy. We recently had @U01K1G1JAN9 present her research using automotive data from a safe-driving app in conjunction with SafeGraph Geometry. The big focal point of the discussion being what are the trade-offs between value of data and consumers’ privacy concerns? Here’s the talk if you want to check it out.
Thank you so much for your comments and feedback. I have a lot to say to answer your questions! Ha! I have been thinking a lot about how data products can be designed to be useful for urban design research, but also maintain privacy needs. In terms of thematic groupings I think your reflection is probably accurate. The sports tracking studies were directly related to new sports apps sharing data (like Strava) and there was a burst of related papers for a few years. I haven’t been involved with Safegraph for long but it appears at that data products like Safegraph are often marketed towards retail and private industry. I also think they are applied for banking, economics and finance. There is also a link to insurance issues, and risks related to natural disasters. There is definitely a connection between these themes and pedestrian activity, but I didn’t see it in the literature we reviewed. Not to say it didn’t exist, it just didn’t come up with our search terms and I expect there will be a lot more overlap in the future.
I am still waiting for Safegrah Patterns in Canada – I am really excited to use it! I have not yet spent a lot of time with the data so I am not very familiar with the variables and aggregation methods. Although I am very interested in how this kind of data is structured to be usable for research. One challenge we are finding is that it is often aggregated by place, vs user. This means we can learn about the trends of use in a place (and postal codes of users), but it is harder to see how certain groups or types of people travel. The latter is very useful for equity research, health and wellbeing research and for evaluating urban designs. Another challenge is identifying trip and routes of individuals or groups such that they can be joined with other population data. Overall, I think the most important thing for researchers is to be able to join variables with other sources. Many companies focus on providing a service where they join data themselves. As a researcher this is not highly valuable because I will do the joining myself. Instead, being able to cluster (temporally or by user) via specific characteristics that do not follow a consecutive or obvious pattern, (for example: specific weather days, or smartphone use characteristics such as socio-economic background), is helpful.
In terms of privacy, there is always much to say. But overall it is true that it is a value vs cost analysis. People are willing to share data if they see the value. My feeling is that in many cases (particularly in urban design) the value has not been very clearly manifested for the public, and thus there is a lot of suspicion or lack of trust.
OK that’s a lot! But I think it is a very interesting discussion and thank you for the questions. Also, I am happy to share the PDF directly to anyone who can’t access online.
I think this is a really useful overview of the literature to date! I had a few follow-up questions, particularly considering privacy.
You mention that you include papers from multiple disciplines. I’d love to see the breakdown, particularly over time. Are particular disciplines discovering/turning away from GPS-based papers? Are most papers in urban planning? Public health? I feel like political science (my field) is just starting to discover the promise of GPS-based data, and would show up more in 2021 data than 2018 data, and I’d love to know if that hunch is borne out in the data.
I thought the privacy section was really important, but I was left with a number of questions about protecting individual privacy. For a start, I wasn’t sure what countries the data sets were from – for instance, I have very different privacy concerns in the US/UK/Canada than I might in countries where free movement is more likely to be restricted. Particularly if future studies may focus on marginalized populations, I’d like to know more about where the data is collected and how it is used by authorities.
Even within US, I wondered if there were any more concrete examples of privacy issues that academics should think about when designing studies. I know there have been a number of privacy concerns raised by company use of location data. I agree that considering this, academics must develop safe and trustworthy systems of information that respect the privacy of participants.
Has the academic use of location information caused any concerns so far? Have researchers been able to re-identify subjects? Have there been news reports about such information being misused or of participants feeling like they did not give truly informed consent, or didn’t understand what their data would be used for? I hope not, and I’m not aware of any, but I think privacy of subjects is one of the most important things to consider going forward with GPS-based data. I’d love to read more of your thoughts on how the literature deals with it so far, and how we can improve on that going forward.
Hello @Lauren Gilbert
Thank you so much for the feedback about our paper! I m happy to answer or discuss your questions and observations. It is always a pleasure.
Overall, I would like to preface by saying that this review we conducted includes a majority of studies published in the last five years, but does extend back to some studies conducted in the mid 2000’s. I believe this review covers very preliminary work in this area that will grow exponentially in the next few years. I think if I did this review in 2-3 years we would see massive shift in the domains, sources and uses of smartphone data in social science research. Many of the papers we reviewed were early in their discipline and working with very few other precedents. I think this sample represents a lot of experimentation in collecting data and figuring out how to use it. In my own discipline (Urban Planning) smartphone data is very new. The majority of studies we reviewed were in public health, transportation or technology fields. That said the very vast majority of publications in this area are in engineering and computer science disciplines, with technology centered research. This has been on-going for decades and includes topics like indoor localization, positioning, urban canyons, machine learning methods etc. My I think social science disciplines across-the-board are still very new to this data. I did not see much in political science, but that may also be because they do not refer specifically to “pedestrian behaviors” and so they did not show up in our search.
I agree that privacy is a primary topic of interest for researchers at this point. In terms of collection methods, the review also includes a detailed supplementary data spreadsheet that lists all of the papers that we reviewed, as well as the collection method and many also include the name of the collection device. In these papers I think we see a lot of experimentation and freedom because many privacy issues were not a part of mainstream discussion, and I believe many researchers were still learning the limitations and challenges related to privacy. I believe the major challenge for researchers in the field will be developing usable data sets that also protect privacy. Our review did not focus on this, and we did not review papers that covered privacy methods, although there are many available and is an independent topic of research in its own right. Privacy in the studies we reviewed was mostly addressed through university research ethics board, but new privacy-ensuring methods will likely become more prevalent in future. In addition, most of our studies were in Europe and North America with a few in Asia. Yes, ethics and privacy are very different depending on the country you are from. For example, very large data sets directly from phone companies were used in Japan, but would not be available in other countries. I do not know of cases where smartphone location information collected by academics has become a privacy concern. I also do not know if any participants of studies have publicly declared that they felt their rights or privacy was compromised, or have complained in the media. I do think that the public is becoming more educated about these topics and in the future may be better able to make an informed decision about participating in data collection projects. In 2009 many people did not understand location data enough to make an informed decision about sharing it. I think that this public awareness goes hand-in-hand with development of anonymization methods. Informed consent and data security are two essential pillars of ethics use of personally revealing data for research.