Group by specified order
Task:List in order the number and average age of female employees of technology, production, sales and HR department.
Python
1 | import pandas as pd |
2 | import datetime |
3 | emp_file = 'E:\\txt\\employee.txt' |
4 | dept_seq = ['Technology','Production','Sales','HR'] |
5 | emp_info = pd.read_csv(emp_file,sep='\t') |
6 | emp_g = emp_info.groupby(by='DEPT') |
7 | emp_g_index = [index for index,group in emp_g] |
8 | today = datetime.datetime.today().year |
9 | dept_femal_list = [] |
10 | for dept in dept_seq: |
11 | if dept not in emp_g_index: |
12 | dept_femal_num = 0 |
13 | age_a = None |
14 | else: |
15 | dept_emp = emp_g.get_group(dept) |
16 | dept_emp_femal = dept_emp[dept_emp['GENDER']=='F'] |
17 | dept_femal_num = len(dept_emp_femal) |
18 | arr = pd.to_datetime(dept_emp_femal['BIRTHDAY']) |
19 | age_a = (today-arr.dt.year).mean() |
20 | dept_femal_list.append([dept,dept_femal_num,age_a]) |
21 | dept_femal_info = pd.DataFrame(dept_femal_list,columns=['DEPT','femal_num','femal_age']) |
22 | print(dept_femal_info) |
esProc
A | |
1 | E:\\txt\\employee.txt |
2 | [Technology,Production,Sales,HR] |
3 | =file(A1).import@t() |
4 | =A3.align@a(A2,DEPT) |
5 | =A4.new(A2(#):DEPT,(F_EMP=~.select(GENDER:"F"),F_EMP.count()):Femal_num,F_EMP.avg(age@y(BIRTHDAY)):Femal_age) |
It is much more convenient to deal with this kind of calculation by esProc based on the ordered set and the special alignment operation.