I am wondering whether anyone could tell me that how to parse a column with comma separated string

QIULI_SU_Louisiana_State_Univerisity · June 9, 2020, 12:00am

0      [18,4,6]
1      [43,1,3]
2      [1,4,1]
   ....

and I want to convert to three int columns: day1,  day2,  day3, 
0           18   4  6
1          43    1  3 
2           1    4  1```
....

Jack_Lindsay_Kraken1 · June 9, 2020, 2:20pm

Hi @QIULI_SU_Louisiana_State_Univerisity, funny you should mention that, I am working on a coLab notebook for that at this very moment, however, until I get that sorted, I will point you to the safegraph documentation found HERE.

If you scroll down a bit, you will find some sample code for a few different methods. Safegraph also offers a webapp to handle files of no more than 1000 rows.

Let me know whether or not this helps!

QIULI_SU_Louisiana_State_Univerisity · June 9, 2020, 2:30pm

Thank you very much. yes, I want to convert" visit by day" column into several int columns. But I did not the sample code at the web page you mentioned. Can you let know where is the sample code in python? thanks.

Ryan_Kruse_MN_State · June 9, 2020, 2:35pm

Hi @QIULI_SU_Louisiana_State_Univerisity, if @Jack_Lindsay_Kraken1’s solution doesn’t work for you, I would suggest something like this, assuming all lists in the column have length of 3:

df['day1'] = 0
df['day2'] = 0
df['day3'] = 0

#iterate through the column with string list entries
  #parse string lists and replace corresponding column
for index, row in df.iterrows():
     new_entries = df['stringcol',index].replace('[','').replace(']','').split(',')
     df['day1', index] = new_entries[0]
     df['day2', index] = new_entries[1]
     df['day3', index] = new_entries[2]```
I didn't actually run this so there's probably some syntax errors, but this general flow should work. Then if you want you could delete the 'stringcol' column that you are exploding into the new columns.

QIULI_SU_Louisiana_State_Univerisity · June 9, 2020, 2:50pm

@Jack_Lindsay_Kraken1 Thanks for the code, but it looks a little bit different with “visit by day” format. So I am using Ryan’s method now.

Jack_Lindsay_Kraken1 · June 9, 2020, 2:51pm

Great! Let us know if you need anything else!

QIULI_SU_Louisiana_State_Univerisity · June 9, 2020, 2:52pm

@Ryan_Kruse_MN_State I run the code and it have some errors, do you know how to change it? thanks.#create placeholder columns

parse string column into int column

new_merge['day2'] = 0
new_merge['day3'] = 0
new_merge['day4'] = 0
new_merge['day5'] = 0
new_merge['day6'] = 0
new_merge['day7'] = 0
#iterate through the column with string list entries
  #parse string lists and replace corresponding column
for index, row in new_merge.iterrows():
     new_entries = new_merge['visits_by_day',index].replace('[','').replace(']','').split(',')
     new_merge['day1', index] = new_entries[0]
     new_merge['day2', index] = new_entries[1]
     new_merge['day3', index] = new_entries[2]
     new_merge['day4', index] = new_entries[3]
     new_merge['day5', index] = new_entries[4]
     new_merge['day6', index] = new_entries[5]
     new_merge['day7', index] = new_entries[6]```
error
```---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
C:\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   2645             try:
-&gt; 2646                 return self._engine.get_loc(key)
   2647             except KeyError:

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: ('visits_by_day', 0)

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
&lt;ipython-input-120-e12bddb5345f&gt; in &lt;module&gt;
     12   #parse string lists and replace corresponding column
     13 for index, row in new_merge.iterrows():
---&gt; 14      new_entries = new_merge['visits_by_day',index].replace('[','').replace(']','').split(',')
     15      new_merge['0330', index] = new_entries[0]
     16      new_merge['0331', index] = new_entries[1]

C:\Anaconda3\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
   2798             if self.columns.nlevels &gt; 1:
   2799                 return self._getitem_multilevel(key)
-&gt; 2800             indexer = self.columns.get_loc(key)
   2801             if is_integer(indexer):
   2802                 indexer = [indexer]

C:\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   2646                 return self._engine.get_loc(key)
   2647             except KeyError:
-&gt; 2648                 return self._engine.get_loc(self._maybe_cast_indexer(key))
   2649         indexer = self.get_indexer([key], method=method, tolerance=tolerance)
   2650         if indexer.ndim &gt; 1 or indexer.size &gt; 1:

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: ('visits_by_day', 0)```

Ryan_Kruse_MN_State · June 9, 2020, 2:58pm

@QIULI_SU_Louisiana_State_Univerisity Yes, the format of how I tried to update the dataframe isn’t quite correct. See this post: python - Edit pandas dataframe row-by-row - Stack Overflow. Basically you just need to change new_merge['day1', index] = new_entries[0] to new_merge.loc[index,'day1'] = new_entries[0] for each column

QIULI_SU_Louisiana_State_Univerisity · June 9, 2020, 3:09pm

new_merge['day2'] = 0
new_merge['day3'] = 0
new_merge['day4'] = 0
new_merge['day5'] = 0
new_merge['day6'] = 0
new_merge['day7'] = 0
#iterate through the column with string list entries
  #parse string lists and replace corresponding column
for index, row in new_merge.iterrows():
     new_entries = new_merge['visits_by_day',index].replace('[','').replace(']','').split(',')
     new_merge.loc[index,'day1'] = new_entries[0]
     new_merge.loc[index,'day2'] = new_entries[1]
     new_merge.loc[index,'day3'] = new_entries[2]
     new_merge.loc[index,'day4'] = new_entries[3]
     new_merge.loc[index,'day5'] = new_entries[4]
     new_merge.loc[index,'day6'] = new_entries[5]
     new_merge.loc[index,'day7'] = new_entries[6]
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
C:\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   2645             try:
-&gt; 2646                 return self._engine.get_loc(key)
   2647             except KeyError:

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: ('visits_by_day', 0)

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
&lt;ipython-input-123-8655d1f00b3c&gt; in &lt;module&gt;
     14   #parse string lists and replace corresponding column
     15 for index, row in new_merge.iterrows():
---&gt; 16      new_entries = new_merge['visits_by_day',index].replace('[','').replace(']','').split(',')
     17      new_merge.loc[index,'day1'] = new_entries[0]
     18      new_merge.loc[index,'day2'] = new_entries[1]

C:\Anaconda3\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
   2798             if self.columns.nlevels &gt; 1:
   2799                 return self._getitem_multilevel(key)
-&gt; 2800             indexer = self.columns.get_loc(key)
   2801             if is_integer(indexer):
   2802                 indexer = [indexer]

C:\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   2646                 return self._engine.get_loc(key)
   2647             except KeyError:
-&gt; 2648                 return self._engine.get_loc(self._maybe_cast_indexer(key))
   2649         indexer = self.get_indexer([key], method=method, tolerance=tolerance)
   2650         if indexer.ndim &gt; 1 or indexer.size &gt; 1:

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: ('visits_by_day', 0)```
I think the error because of this sentence, since I run this sentence only :                                           `for index, row in new_merge.iterrows():`
     `new_entries = new_merge['visits_by_day',index].replace('[','').replace(']','').split(',')`
the error is same:

QIULI_SU_Louisiana_State_Univerisity · June 9, 2020, 3:09pm

KeyError                                  Traceback (most recent call last)
C:\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   2645             try:
-&gt; 2646                 return self._engine.get_loc(key)
   2647             except KeyError:

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: ('visits_by_day', 0)

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call last)
&lt;ipython-input-122-fc0659bea268&gt; in &lt;module&gt;
      1 for index, row in new_merge.iterrows():
----&gt; 2      new_entries = new_merge['visits_by_day',index].replace('[','').replace(']','').split(',')

C:\Anaconda3\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
   2798             if self.columns.nlevels &gt; 1:
   2799                 return self._getitem_multilevel(key)
-&gt; 2800             indexer = self.columns.get_loc(key)
   2801             if is_integer(indexer):
   2802                 indexer = [indexer]

C:\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
   2646                 return self._engine.get_loc(key)
   2647             except KeyError:
-&gt; 2648                 return self._engine.get_loc(self._maybe_cast_indexer(key))
   2649         indexer = self.get_indexer([key], method=method, tolerance=tolerance)
   2650         if indexer.ndim &gt; 1 or indexer.size &gt; 1:

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()

KeyError: ('visits_by_day', 0)```

Ryan_Kruse_MN_State · June 9, 2020, 3:10pm

@QIULI_SU_Louisiana_State_Univerisity You need to make the same change for new_merge['visits_by_day',index] as you did for the other ones

Jack_Lindsay_Kraken1 · June 9, 2020, 3:36pm

@QIULI_SU_Louisiana_State_Univerisity keep us posted! If that method does’t work, we can work through other methods together.

QIULI_SU_Louisiana_State_Univerisity · June 9, 2020, 3:38pm


new_merge['new_visits_by_day']= new_merge['visits_by_day'].str.strip('[]')

# split by ","

split1 = new_merge['new_visits_by_day'].str.split(',', expand=True)

# making separate column
new_merge['0330']= split[0] 
new_merge['0331']= split[1] 
new_merge['0401']= split[2] 
new_merge['0402']= split[3] 
    
new_merge['0403']= split[4] 
new_merge['0404']= split[5] 
new_merge['0405']= split[6] 


# convert string to float and then to int
new_merge['0330_float'] = new_merge['0330'].apply(pd.to_numeric, errors='coerce')

new_merge['0330_int'] = new_merge['0330_float'] .astype('Int64')```

Jack_Lindsay_Kraken1 · June 9, 2020, 3:41pm

Fantastic. As long as it is working.

for future reference, there is a library called ast that parses the JSON data for you. I will share the unfinished notebook with you here

I am happy you got your code working and please let us know if you have any other questions!

QIULI_SU_Louisiana_State_Univerisity · June 9, 2020, 3:42pm

Thanks for your help too! @Jack_Lindsay_Kraken1 @Ryan_Kruse_MN_State