Hi! I have a question about one variable in the social distancing data. Most variables are well defined. However, for one specific variable “distance_traveled_from_home”. “We first find the median for each device and then find the median across all of the devices.” Shouldn’t we use the maximum distance for each device or sum up the distance travelled by this device? Taking the median across all of the devices is great but within each device, I am not sure taking the median is the best way to aggregate the data.
Shouldn’t we use the maximum distance for each device or sum up the distance travelled by this device?
@Kunru_Zou we are trying to summarize the average travel distance across all trips. So median is a good descriptive statistic to describe the distribution. If people are making longer trips then that number will be larger if people are making shorter trips then that number will be shorter.
Why do you want to know max() ?
It’s great to take median distance across devices. I just worry that if one device makes several trips (for example, 2 long ones and several short ones), taking the median within this device would make the distance too small. I don’t know the distribution of travel distances within devices in the raw data. That’s just my conjecture.
It’s too trivial… better not to waster your time on this.