Hi following up on my questions about NAICS assignment I wanted to know if it’s possible to learn more about the sources you use for POI data and what information is included in those sources that allows you to make the NAICS assignment. Thanks again for your help.
This topic was automatically generated from Slack. You can find the original thread here.
Hi Niki, sure I’m trying to understand the process you follow to assign the NAICS. The information you provide in the FAQ is useful to have a general idea of the process but I still would like to understand more details. I don’t need the name of the private data you use but more generally the information those data provide and the way in which you process it. For example a possible source of POI information are business registries, however when working with that type of data there is a lot of information from non-employers that one wouldn’t want to include as a POI. If you are using that kind of data, do you follow any particular procedure to clean the non employer data. There are some business registries (yellow pages kind of data) that already provide sector codes, some companies have dedicated staff that are in charge of calling the companies and getting detailed data, including the sector. Is that an important source of your NAICS data?
Hi - appreciate your patience on this. We prefer data sources (like business registries) containing rich metadata about the POI - the more the merrier really. We then convert the text to word vectors so that our naics_code model can reference against the training data and make a naics_code prediction. Sources reporting their own category codes are helpful because category codes often have corresponding descriptions which are the most useful pieces of text for our model.