Adjacent records grouping with the original order
Task:List the team information with the most consecutive NBA titles.
Python
1 | import pandas as pd |
2 | import numpy as np |
3 | pd.set_option('display.max_columns', None) |
4 | nba_file = 'E:\\txt\\nba.txt' |
5 | nba_champion = pd.read_csv(nba_file,sep='\t') |
6 | nba_champion = nba_champion.sort_values(by = 'Year') |
7 | arr = np.zeros(len(nba_champion)) |
8 | arr[nba_champion['Champion']!=nba_champion['Champion'].shift(1)]=1 |
9 | arr = np.cumsum(arr) |
10 | nba_champion['flag']=arr |
11 | nba_champion_g = nba_champion.groupby(by='flag') |
12 | max_num = nba_champion_g.size().idxmax() |
13 | max_champion = nba_champion_g.get_group(max_num) |
14 | print(max_champion) |
Python doesn't have the ability to group by adjacent conditions. You need to create a list of grouping flags.
esProc
A | ||
1 | =connect("mysql") | Connect to database |
2 | =A1.query("select * from nba order by year") | Sort by year |
3 | =A2.group@o(Champion) | When adjacency is different, start a new group |
4 | =A3.maxp(~.len()) | List the team information with the most consecutive NBA titles. |
The set of esProc is ordered, and it is very convenient to merge the adjacent same records into one group and start a new group when the adjacent record is different.