Filter according to cumulative value

Task:Find the top n customers whose sales accounts for half of the total sales, and rank them by sales in descending order.


1 import pandas as pd
2 sale_file = "E:\\txt\\sales_volume.csv"
3 sale_info = pd.read_csv(sale_file)
4 sale_info.sort_values(by='Amount',inplace=True,ascending=False)
5 half_amount = sale_info['Amount'].sum()/2
6 vip_list = []
7 amount = 0
8 for client_info in sale_info.itertuples():
9     amount += getattr(client_info, 'Amount')
10     if amount < half_amount:
11         vip_list.append(getattr(client_info, 'Client'))
12     else:
13         vip_list.append(getattr(client_info, 'Client'))
14         break
15 print(vip_list)

Pandas does not have an existing loop function to calculate the position that meets the conditions, so it can only use for loop to complete the calculation.


1 E:\\txt\\sales_volume.csv Data storage path
2 =file(A1).import@tc().sort@z(Amount) Import data and sort by amount in descending order
3 =A2.sum(Amount)/2 Calculate half of total sales
4 =A2.pselect(cum(Amount)>=A3) Find the position where the cumulative sales is more than half of total sales
5 =A2.(Client).m(:A4) Take the client names before the position

It is recommended to use loop function to calculate in esProc, and filter and record the position information according to the conditions.