I am curious about the general rule of the above files. I have two hypotheses. Could you let me know what is correct?

Hello,
I have a question about brand_info and core_POI in “Core Places US”.
Currently each month has the following files:
“brand_info.csv”, “core_poi-part1.csv”, “core_poi-part2.csv”, ….
I am curious about the general rule of the above files. I have two hypotheses. Could you let me know what is correct?

  1. Are more recent files are updated version of previous ones? In this case, I need to keep just the most recent ones.
    
  2. Or do files in a given month folder show brands and POIs that existed in the given month? In this case, I may keep all POIs (brands) in a single file and generate a variable showing active durations of POIs.
    

Thank you!

Hi @Seung_Hoon_Lee_Georgia_Institute_of_Technology, generally, you want to use the most recent Core Places because each month has improvements over the previous month, which are documented in the Release Notes.

However, the Patterns files are only backfilled every six months or so. So the newest Patterns file is up-to-date with Core Places, but the previous Patterns may be ever-so-slightly outdated (the month-to-month changes in Core Places are minor so this is really not anything to worry about). The Monthly Patterns always uses the corresponding month’s Core Places. You can see which Core Places corresponds to each Weekly Patterns in the release_metadata file.

To answer your question, (1) is almost always correct. The opened_on, closed_on columns are intended to track POIs that no longer exist.

Thank you @Ryan_Kruse_MN_State for your response. Let me try to clarify my question with an example.

I look at the following two folders in “Core Places US (Pre-Nov-2020)”: “2020/03 " and “2020/10".

For each folder, I combined 5 files (“core_poi-part1.csv” - “core_poi-part5.csv”) and compared POIs between the two months, “2020/03” and “2020/10”.

  1. There are 4,915,222 POIs (counted by “safegraph_place_id”) that exist in both months.
    
  2. There are 418,279 POIs that only exist in 2020/10.
    
  3. There are 477,398 POIs that only exist in 2020/03.
    

What are the POIs in #2? They are new POIs? Or you fixed errors when you update 2020/10 files?

Likewise, what are the POIs in #3? Why do they disappear?

You are welcome, @Seung_Hoon_Lee_Georgia_Institute_of_Technology. If you were to look at the months in between 2020/03 and 2020/10, the changes month-to-month would not be so high. For example, the October 2020 Release Notes say that month had about 31,750 new POIs from the previous month (November 2020). Every month, SafeGraph provides Release Notes to detail the POI changes from the previous month (and any other things to be aware of). POIs appear and disappear from the dataset for a variety of reasons. To see what specifically was going on in the months you are looking at, I would recommend checking out the Release Notes for those months.

@Ryan_Kruse_MN_State Thank you so much Ryan! It became a lot clearer to me. Have a great evening! :slightly_smiling_face: