Excel has limits: it becomes very slow with hundreds of thousands of rows and cannot handle millions of rows.
Excel is not good at handling complex operations like set operations and calculations after grouping. For example:
While databases can handle large volumes of data, setting up the environment deters 90% of Excel guys. Additionally, the complexity of SQL often exceeds that of Excel.
This nested code is a challenge even for professional programmers.
SELECT CODE, MAX(con_rise) AS longest_up_days FROM ( SELECT CODE, COUNT(*) AS con_rise FROM ( SELECT CODE, DT, SUM(updown_flag) OVER (PARTITION BY CODE ORDER BY CODE, DT) AS no_up_days FROM ( SELECT CODE, DT, CASE WHEN CL > LAG(CL) OVER (PARTITION BY CODE ORDER BY CODE, DT) THEN 0 ELSE 1 END AS updown_flag FROM stock ) ) GROUP BY CODE, no_up_days ) GROUP BY CODE
Excel: An Intuitive Four-Step Solution
Though BI, leveraging databases, can handle large volumes of data, it also come with setup challenges.
BI lacks the advanced computational capabilities of SQL, leaving experienced Excel users feeling quite constrained
Poor support for tabular data, and even simple tasks require extensive code
It lacks built-in big data capabilities, and self-implementation is quite tedious.
Decent capabilities for tabular data but still somewhat complex.
import pandas as pd stock_file = "StockRecords.txt" stock_info = pd.read_csv(stock_file,sep="\t") stock_info.sort_values(by=['CODE','DT'],inplace=True) stock_group = stock_info.groupby(by='CODE') stock_info['label'] = stock_info.groupby('CODE')['CL'].diff().fillna(0).le(0).astype(int).cumsum() max_increase_days = {} for code, group in stock_info.groupby('CODE'): max_increase_days[code] = group.groupby('label').size().max() – 1 max_rise_df = pd.DataFrame(list(max_increase_days.items()), columns=['CODE', 'max_increase_days'])
It also lacks built-in big data support, and you have to hard code manually
The interactivity of programming languages is also bad, nowhere near the user-friendly WYSWYG of Excel.
A | |
1 | =T("stock.xlsx") |
2 | =A1.sort(DT) |
3 | =A2.group(CODE;~.group@i(CL< CL[-1]).max(~.len()):max_increase_days) |
Solving Excel puzzles along the way
A | |
1 | =T("stock.xlsx") |
2 | =A1.sort(CODE,DT) |
3 | =A2.group@i(CODE!=CODE[-1]||CL< CL[-1]) |
4 | =A3.select(~.len()>=5) |
5 | =A4.conj() |
Cursor technology effortlessly handles big data that exceeds memory limits; simply adding options enables parallel computing to boost performance.
A | |
1 | StockRecords.txt |
2 | =file(A1).cursor@t().sortx(CODE,DT) |
3 | =A2.group(CODE;~.group@i(CL< CL[-1]).max(~.len()):max_increase_days) |
A | |
1 | StockRecords.txt |
2 | =file(A1).cursor@t().sortx(CODE,DT) |
3 | =A2.group(CODE).conj(~.group@i(CL< CL[-1])) |
4 | =A3.select(~.len()>=5).conj() |
Highly interactive, WYSWYG interface.
Additionally, there is XLL to enhance Excel
=spl("=E(?1).sort(CODE,DT).group@i(CODE!=CODE[-1]||CL< CL[-1]).select(~.len()>=5).conj()",A1:D253)