I want to know if it's possible to learn more about the sources you use for POI data and what information is included in those sources that allows you to make the NAICS assignment

Hi following up on my questions about NAICS assignment I wanted to know if it’s possible to learn more about the sources you use for POI data and what information is included in those sources that allows you to make the NAICS assignment. Thanks again for your help.

This topic was automatically generated from Slack. You can find the original thread here.

Hi Thanks for reaching out! We are looking into it and will get back to you once we have an answer.

Hi , thanks for reaching out! Before I loop in some other members from our team, just wanted to clarify from our previous threads. Were these answers not what you were looking for?

May I also ask what you need the sources for our POI data for? Perhaps that’ll help us understand your request better. Thanks, Nathalie!

Hi Niki, sure I’m trying to understand the process you follow to assign the NAICS. The information you provide in the FAQ is useful to have a general idea of the process but I still would like to understand more details. I don’t need the name of the private data you use but more generally the information those data provide and the way in which you process it. For example a possible source of POI information are business registries, however when working with that type of data there is a lot of information from non-employers that one wouldn’t want to include as a POI. If you are using that kind of data, do you follow any particular procedure to clean the non employer data. There are some business registries (yellow pages kind of data) that already provide sector codes, some companies have dedicated staff that are in charge of calling the companies and getting detailed data, including the sector. Is that an important source of your NAICS data?

Thanks for the details ! I’m pushing this our Product team to get you more specifics. We’ll get back to you soon with updates!

Hi - appreciate your patience on this. We prefer data sources (like business registries) containing rich metadata about the POI - the more the merrier really. We then convert the text to word vectors so that our naics_code model can reference against the training data and make a naics_code prediction. Sources reporting their own category codes are helpful because category codes often have corresponding descriptions which are the most useful pieces of text for our model.

Thanks so much for following up on this! I will let you know if I have any other question, have a great rest of the day

Thanks, Nathalie! I’ll close this thread out for now. Let us know if you have additional or follow-up questions by starting a new thread in safegraphdata.