Does anyone have any input on how the variables visitor_home_cbgs, visitor_work_cbgs, and popularity_by_hour from the monthly patterns dataset can be aggregated and used in conjunction with raw visits/visitors totals on a month-postal code level?

Sadhvika_Viswanath_University_of_Chicago · June 27, 2020, 12:00am

Does anyone have any input on how the variables visitor_home_cbgs, visitor_work_cbgs, and popularity_by_hour from the monthly patterns dataset can be aggregated and used in conjunction with raw visits/visitors totals on a month-postal code level? (Given their format, it is difficult to collapse the observations)

Nick_H-K_Seattle_University · June 27, 2020, 8:08pm

Since there’s no crosstab of these variables, you’ll likely need a different aggregation for home, work, and hour (unless you want to store everything in wide format). Then you’ll need a CBG/postal code crosswalk, which will include the proportion of each CBG that is in each postal code. Sum up the counts weighted by those proportions to get yourself to the postal code level

Sadhvika_Viswanath_University_of_Chicago · June 28, 2020, 6:20am

Thank you! I will try this.

Jessica_Williams-Holt · June 29, 2020, 6:54pm

@Sadhvika_Viswanath_University_of_Chicago check out the awesome list. there is a version of the crosswalk in R you might find useful as a jumping off point. additional resources will be added for python and R.

also, if you come up with a good method, please share there as well!