Missing CBGs in weekly patterns data


I had the following queries when mapping the CBGs:

  1. I was looking into the census block groups (CBGs) under the ‘visitor_home_cbgs’ attribute for Louisiana for specific sub-categories. When I tried mapping these CBGs from the file ‘cbg_2020.geojson’ under ‘safegraph_open_census_data_2020_to_2029_geometry’, I could not find a lot of CBGs. Could anyone please tell me where can I find the geometry/shapefile of these missing CBGs?
  2. When looking into CBGs under the ‘visitor_home_cbgs’ for one state, I could see people from CBGs located in different states. Does it mean that these people actually live in other states but were present there probably for a few days only?

The ‘visitor_home_cbgs’ data had 18827 unique CBGs out of which 2363 CBGs had the ID present in ‘cbg_2020.geojson’ in Louisiana. I have attached the image showing the missing CBGs below. The red-colored CBGs are the missing ones.

Hi @utkarsh ! Hopefully we can find a solution to your questions. Thank you for the attached image, it is very helpful!

  1. Which sub-categories of POIs were you looking at here? The visitor_home_cbgs attribute will only include a CBG if it had visitors to the POI. So, for example, if you were filtering to train stations, you might expect that a lot of CBGs would not have any visitors. The SafeGraph docs have more info on this attribute.
  2. It is somewhat common to have visitors coming from a different state than the POI is in. Airports, for example, will have visitors from a variety of states. It is difficult to determine how many days the out-of-state people were there. It could be just one day, or it could be longer.

For the remaining unmatched ~16,500 CBGs in the visitor_home_cbgs data, how many of them began with “22”, which is Louisiana’s FIPS code? The ones beginning with anything other than “22” are non-Louisiana. So it’s possible most of the home CBGs for your sub-categories are for out-of-state visitors.

Hi @ryank! Thank you for your reply.

  1. I have used grocery stores as the sub-categories. The exact names of the sub-categories I was looking at are shown in the image. The data I looked into is for four months. I was surprised to see that there is no one who visited the grocery stores from the missing CBGs in the four-month period.
  2. Is it common to see people from other states visiting grocery stores?
  3. Out of the 18,827 unique CBGs I mentioned, 6,661 are missing (codes do not match) from the ‘cbg_2020.geojson’. This includes 1,088 from Louisiana (start with ‘22’) and the rest (5,573) belong to other states.

Thanks @utkarsh!

  1. Is it possible that many of the grocery stores you are looking for do not have a sub_category? According to the SafeGraph Places Docs, the sub_category will be null if the POI’s NAICS category is only 4 digits long. So there could be more grocery stores that just have a top_category but no sub_category.
  2. I would expect it’s reasonably common to see quite a few out-of-state visitors at grocery stores.
  3. This is strange. I wonder if this is a mismatch between the Census map used in the Patterns files vs the cbg_2020.geojson. Do you have access to any other CBG geojson files (perhaps an older version) to test this?
  1. Before processing the data, I removed all the rows with NaN values. I will check if there are rows with a top_category but no sub_catgory.
  2. I see. Thank you for clarifying this.
  3. Yes, I had the same thought initially. If there is a code for the CBG, there should be some file that contains these codes along with the geometry. I couldn’t find the older versions of these files (or maybe I came across it but didn’t know which one it was). Could you please share the links to these files if you have them?


Hey @utkarsh ! Good news, I found a link for the 2010-2019 CBG geometries that may be what is currently referenced in the SafeGraph Patterns data you’ve been working with. Does this file give what you need and seem to match up better?

Please let me know if that (combined with the NaN sub_category adjustment mentioned above) seems to fix things!

Hi @utkarsh, just following up here to see if you’ve had the chance to try the above-mentioned adjustments!

Hi @ryank! I checked the 2010-2019 CBG geometries and I was able to find all the geometries without modifying it for the NAN sub_category. It seems that SafeGraph weekly patterns uses the CBG codes from the geojson file in 2010-2019 CBG geometries.

Thanks a lot for your help!

