处理数据时偶尔会遇到某列中有包含多个值,用逗号或其他分隔符分割,如下表:
| 12
 3
 4
 5
 6
 7
 
 | +-------------------------------------+------------+|               Company               |   Field    |
 +-------------------------------------+------------+
 | [UBC, GoldmanSachs, Morgan Stanley] |  Finance   |
 |     [Amazon, Microsoft, Huawei]     | Technology |
 |               [Pfizer]              |   Pharma   |
 +-------------------------------------+------------+
 
 | 
我们需要纵向展开表格中的list,得到如下结果:
| 12
 3
 4
 5
 6
 7
 8
 9
 10
 11
 
 | +----------------+------------+|    Company     |   Field    |
 +----------------+------------+
 |      UBC       |  Finance   |
 |  GoldmanSachs  |  Finance   |
 | Morgan Stanley |  Finance   |
 |     Amazon     | Technology |
 |   Microsoft    | Technology |
 |     Huawei     | Technology |
 |     Pfizer     |   Pharma   |
 +----------------+------------+
 
 | 
用 Pandas 如何实现? 答案是 pandas.DataFrame.explode:
| 12
 3
 4
 5
 6
 7
 8
 9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 
 | >>>import pandas as pd>>>data = {
 'Company': [['UBC', 'GoldmanSachs', 'Morgan Stanley'],['Amazon', 'Microsoft', 'Huawei'], ['Pfizer']],
 'Field': ['Finance','Technology','Pharma']}
 >>>df = pd.DataFrame(data)
 >>>df
 '''
 输出
 +-------------------------------------+------------+
 |               Company               |   Field    |
 +-------------------------------------+------------+
 | [UBC, GoldmanSachs, Morgan Stanley] |  Finance   |
 |     [Amazon, Microsoft, Huawei]     | Technology |
 |               [Pfizer]              |   Pharma   |
 +-------------------------------------+------------+
 '''
 >>>df.explode('Company',ignore_index=True)
 '''
 输出
 +----------------+------------+
 |    Company     |   Field    |
 +----------------+------------+
 |      UBC       |  Finance   |
 |  GoldmanSachs  |  Finance   |
 | Morgan Stanley |  Finance   |
 |     Amazon     | Technology |
 |   Microsoft    | Technology |
 |     Huawei     | Technology |
 |     Pfizer     |   Pharma   |
 +----------------+------------+
 '''
 
 | 
对于非 list-like 的 element,则需要先预处理下:
| 12
 3
 4
 5
 6
 7
 8
 9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 
 | >>>data = {'Company': ['UBC|GoldmanSachs|Morgan Stanley','Amazon|Microsoft|Huawei', 'Pfizer'],
 'Field': ['Finance', 'Technology', 'Pharma']
 }
 >>>df = pd.DataFrame(data)
 >>>df
 '''
 +---------------------------------+---------+
 |             Company             |  Field  |
 +---------------------------------+---------+
 | UBS|GoldmanSachs|Morgan Stanley | Finance |
 |     Amazon|Microsoft|Huawei     | Finance |
 |              Pfizer             | Finance |
 +---------------------------------+---------+
 '''
 >>>df = df.assign(Company=df.Company.str.split('|'))
 >>>df
 '''
 输出
 +-------------------------------------+------------+
 |               Company               |   Field    |
 +-------------------------------------+------------+
 | [UBC, GoldmanSachs, Morgan Stanley] |  Finance   |
 |     [Amazon, Microsoft, Huawei]     | Technology |
 |               [Pfizer]              |   Pharma   |
 +-------------------------------------+------------+
 '''
 
 | 
当然也可以用 iterrows 方法逐行处理:
| 12
 3
 4
 5
 6
 7
 8
 9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 
 | >>>data = {'Company': ['UBC|GoldmanSachs|Morgan Stanley','Amazon|Microsoft|Huawei', 'Pfizer'],
 'Field': ['Finance', 'Technology', 'Pharma']
 }
 >>>df = pd.DataFrame(data)
 >>>for index, row in df.iterrows():
 ...    row_list = row['Company'].split('|')
 ...    df.loc[index, 'Company'] = row_list
 ...
 >>>df
 '''
 输出
 +-------------------------------------+------------+
 |               Company               |   Field    |
 +-------------------------------------+------------+
 | [UBC, GoldmanSachs, Morgan Stanley] |  Finance   |
 |     [Amazon, Microsoft, Huawei]     | Technology |
 |               [Pfizer]              |   Pharma   |
 +-------------------------------------+------------+
 '''
 
 |