Wondering what's going on with these particular CBGs and why there's a significantly higher number of device counts here?

Hello! My team and I are producing choropleth maps using the Neighborhood Patterns dataset for New York (Jul 2019-Jun 2021).

Our map is showing the raw_device_count in each census tract over a 6 month period. We noticed that Tract #003100 (GEOID #36061003100) in the NY Downtown Manhattan area consistently has a higher concentration of devices than its immediate surrounding areas. The screenshot shared shows the concentrated tract from Jul 2019 - Dec 2019 with over 1 million devices versus the adjacent tract with less than 200k devices.

We are wondering what’s going on with these particular CBGs and why there’s a significantly higher number of device counts here. The one things we did notice is that it’s around the City Hall area, but other than that, we’re not sure if this is expected or if it has to do with Safegraph’s methods to count devices. Just as a note, there’s no normalization is being done in these maps. Any clarifications or suggestions would be much appreciated!

Thank you!

This topic was automatically generated from Slack. You can find the original thread here.

Steve Scott (City of New York, Mayor’what do you mean? There are lots of protests at City Hall but not millions of people. Occasionally there is a ticker tape parade on that tract, but infrequent.

Just noting that in the movement data from which the higher-level data are generated, there have been sometimes been hotspots of anomalously high density. e.g., NYC dr5regmyk, Moscow ucftpuxvm, Dallas 9vg4mpqg9, …etc

Safegraph filters these out intelligently, but when I see ‘City Hall’ and ‘device density’ it rings bells in my memory. sorry for the noise

Hi ! Thanks for the question. This question will take a little further investigation so your patience will be greatly appreciated. Give us a few days (as it is also the weekend) to get you some answers. Thanks, Trang!

Hi - just to keep you in the loop. I’ve asked our Product team for an update on this.

Hi - the short answer, as James noted above, is that sometimes there are sinks in the data (i.e., unnatural locations to which many devices are inaccurately assigned). This does not mean that a lot of people were at City Hall, but that devices from nearby areas may have been assigned a location within this CBG. This owes to the fact that GPS positioning can sometimes be inaccurate. It’s not ideal but is a wart that is inherent in the data.

Here’s a longer explanation:
• GPS works because your phone needs to see at least 3 satellites to estimate its position. Connecting to more satellites can help it locate itself more accurately, and connecting to fewer than 3 means that the phone won’t have enough information to accurately locate itself.
• When phones go underground, as one example, their GPS receivers oftentimes can’t connect to the GPS satellites orbiting Earth. When this happens, your phone will sometimes “guesstimate” its position by either reporting its last-known location or by choosing the midpoint of the state or country that it knows it’s in. This latter phenomenon, of course, contributes to sinks, which are locations on a map which see a disproportionate number of location pings that usually aren’t actually accurate.

…best practices for dealing with sinks would be great metis to share. When there are tiny geohash9s with a million people in the middle of a lake or a park, simple density filtering is generally enough to weed them out. But when they land in denser areas, … I always wonder how to know when filtering is ‘enough.’ They also make effective normalization extremely difficult on all scales

If it’s at the CBG level, one thing that can be done is aggregate up to higher levels (e.g., census tracts, cities, states), assuming the application allows for this. We’ve found that the types of sinks that this describes are usually from adjacent CBGS or surrounding areas. Therefore aggregating up preserves the consistency of the data.

Of course, this is not always possible!

Hey everyone! Thank you for all the input and responses. The original analysis was done at the tract level and this is where we found a high density of devices. But, aggregating to the zip code level seems to be solving this issue. I’ll relay the information to my team and follow up if there are additional questions. Thanks for looking into it!

Hey ! Thanks again for the question. As this thread is several days old, I’m going to go ahead and close this thread out. If you have any more questions or follow-up questions, we’re always here to help! Just be sure to make a new post to safegraphdata, as we aren’t monitoring old threads at this time. Thanks!