General operation

Task: Find out the males over 25 and females over 23 from the text: 1) list by order of name; 2) Group by gender and calculate the average age; 3) list all the surnames that have appeared (do not consider multiple surnames)

Python

1	import pandas as pd
2	file = "D.csv"
3	data=pd.read_csv(file)
4	data_select = data[((data['sex']=='male')&(data['age']>=25))\|((data['sex']=='female')&(data['age']>=23))]
5	data_sort = data.sort_values('name')
6	data_group = data.groupby('sex')['age'].mean()
7	data['sur'] = data['name'].apply(lambda x:x[0])
8	data_distinct = data.drop_duplicates(['sur'])

Pandas needs to rebuild a column to duplicate.

esProc

	A
1	=file("D.csv").import@t(name,sex,age;",")
2	=A1.select(sex=="Male"&&age>=25\|\|sex=="Female"&&age>=23)	Filter
3	=A2.sort(name)	Sort
4	=A2.groups(sex;avg(age):age)	Group and aggregation
5	=A2.id(left(name,1))	Unique value

esProc provides rich structured computing functions. To some extent, it can operate on text as a database table, and obtain similar computing power as SQL without database.
When there is a large amount of data, pandas can only extract the data in sections, and then calculate and merge, and the amount of code and calculation will increase sharply.
When the data volume is large, these calculations can be done based on cursors:

	A
1	=file("D.csv").cursor@tm(name,sex,age;",")
2	=A1.select(sex=="Male"&&age>=25\|\|sex=="Female"&&age>=23)	Filter
3	=A2.sortx(name)	Sort
4	=A2.groups(sex;avg(age):age)	Group and aggregation
5	=A2.groupx(left(name,1);)	Unique value
6	=A3.fetch(…)	Fetch result

Different from the in-memory calculation, the cursor can only be traversed once, and only one of the above sorting and grouping operations can be executed. The cursor needs to be rebuilt when another is executed.

General operation

Python

esProc

SPL vs Python