Scudata inherits from Raqsoft. It is composed of SPL team of Raqsoft. Scudata will further develop and promote SPL as an open source software.

For over a decade, Raqsoft has been the leading reporting tool vendor in China. In business, we were often asked to help our users with developing complex reports. Such occasions were so many that over time we found that complex reports have not only complex layouts but, more importantly, hard-to-deal-with data. The raw database data is generally far from the data ready for presentation. Often multistep, complicated computations are needed to process the raw data. Reporting tools can only handle a small number of computations in the presentation stage, but they can’t handle computations before it. Report developing is painful and inefficient because reporting tools are only good at dealing with the layouts and computations already entered the presentation phase.

The problem has remained unsolved. And the industry has come up with workarounds. They turn to writing complicated SQL queries (and stored procedures) in an application. The methods are slow and cumbersome and bring problems like tight coupling and difficult maintenance due to SQL's certain programming features.

Where there is a problem there is an opportunity. And we want to find a way of approaching the problem.

About the end of the year of 2007 when our reporting tool became stable, we set to cracking the hard nut. We collected and analyzed various data computing problems and designed a brand-new data model, and based on it, a programming language (the **SPL**, which stands for Structured Process Language) to concisely describe algorithms of complex and big data computing.

It's easier to think of an idea than to realize it. There were numerous alterations because we always thought better ideas up or find important needs are neglected sometimes, both of which would lead to a remodeling. Beginning from the year of 2008, SPL underwent four major transformations before it finally evolved into a finished product with a stable model and architecture. During the time, parallel computing and cluster computing were added in 2012 and 2013, as well as countless smaller modifications. And though SPL always had a few trial customers, it had not been independent commercial product until the second half of 2016.

Simply put, SPL is the software for computing (semi)structured data, mainly. Technologically, it constitutes the triad of three things:

**A programming language**: SPL is a brand new programming language to facilitate the programming process for structured data computing, and is proved to have excellent performance in handling most computing scenarios. It is particularly suitable for dealing with complicated procedural computations. SQL, Java and Python are the languages that can handle the similar thing.

**A data computing middleware**: SPL provides a computing ability independent of databases. It can perform various types of structured data computations independently, and is very integration-friendly as it can be easily embedded into an application. It is suitable for dealing with computing scenarios where databases are absent or that involve multiple databases. Databases and ETL tools are now generally used to handle those computations.

**A big data computing platform:** SPL has its own cluster computing system for processing large amounts of data. The system offers a distributed computing environment where programmers can control the task distribution to implement the most appropriate and efficient algorithm. The two rivals in this field are Hadoop and MPP database.

SPL aims to **facilitate algorithm description** and **speed up execution.**

A difficult description results in high development cost and makes coding process really hard. The slow execution is time-consuming and thus the program is unserviceable. **Essentially, the fast execution comes from efficient algorithm description.** Since we can't enhance hardware performance through software, creating more efficient algorithms is the only choice. But if it is hard or impossible to describe a fast algorithm, the efficient execution is impossible, too.

How does SPL do to achieve both goals?

As we know, computers can only recognize and execute formalized programming statements. The process of using computers to solve problems is one that translating algorithms into a certain programming language. If a language's syntax system and data types are badly-designed, it's probably that** the translation is more difficult than the solution**. We hold a great, efficient algorithm but it's hard to express it or just can't implement it. That’s the situation that you have the key but don't know how to use it and then you resort to the clumsiest way of breaking the door open.

Unfortunately this is not uncommon for structured data computations, especially for multi-step procedural computations and order-related computations. SQL, the mainstream programming language for dealing with structured data, is responsible for this. Here are two simple cases:

1.Finding the max number of days when a stock consecutively rises

It's a piece of cake for a Java or C++ programmer. Create a counter whose initial value is 0, sort data by date and perform traversal, during which add 1 in the counter if the price rises and reset the value if it falls, and check the maximum value later. The solution is natural, but difficult to implement in SQL. SQL will create numbers for dates and a grouping mark, put a rising date and the last date together and a falling date and the next together, and then calculate COUNT() over these groups. The code is lengthy and not easy to understand.

2.Finding top 10 from 1 billion records

For a programmer familiar with algorithms, there's no need to sort all the records. They will create a ten-member empty set and traverse the records while ensuring that members of the set are always the top 10 of the traversed records. Just one traversal is needed and memory consumption is small. However, it's hard, even impossible to express the algorithm in SQL. Logically we can only sort all records and then get the top 10. We can depend on database optimization when the computation is simple, but there's no better way for a complicated computation (like one involving a subquery or a JOIN operation).

SPL can easily describe the fast algorithms in the two cases.

Why SPL is a more efficient programming language? In other words, why SPL can do better than SQL, the mainstream programming language?

SQL is founded on relational algebra, which was developed in 1970s. The over half-century-old theory has undergone modifications but has still remained its core.

It's understandable that the ancient theory struggles with dealing with the current computing needs and hardware environments.

Back then when relational algebra budded, computers were not so popular for information management and data processing needs are as simple as some ordinary queries and aggregates. Now Computers are an indispensable part of contemporary organization management, and data computing needs reach a new height of complexity. The old theory begins to stagger and limp. One example is that the unordered-sets-based relational algebra has trouble handling order-related computations, which have strong user needs (scenarios include computing link relative ratio and Y-O-Y growth ratio). Besides, the 1970s' hardware was unsophisticated. The algebraic theory was invented to fit in the hardware environments in order to be practically feasible. It was hard for the inventors to predict a hardware environment that features large memory, multi-CPU and clusters. The genes of SQL decide that it can hardly take advantage of the hardware capabilities of contemporary computers to achieve the optimal performance.

SPL belongs to now because** it is based on an innovative theoretical model**.

The significance of revolutionizing an old theoretical model can be illustrated through the following two examples:

1.It is convenient to use Arabic numerals to represent numbers and perform four arithmetic operations. But if Roman numerals are used, can you still make it?

2.All multiplications between integers can be represented by additions. But by introducing multiplication into the arithmetic system to represent additions of multiple same integers and inventing the multiplication table, we can do the calculation very fast.

SPL and SQL are like the Arabic numerals and Roman numerals. One is awkward but the other is convenient to use. SPL has the performance advantage because it enables a series of “multiplication operations” to replace the slow “addition operations”.

This performance advantage, proved in many field tests, isn't derived from more elegant programming code – actually in this aspect we are not better than others, particularly the established database products – but comes from its innovative data model that enables the description of simple and efficient algorithms. For simple computations for which algorithms cannot be further optimized, SPL cannot do better than the traditional databases, which, after all, are still powerful. Generally a complicated computation has more room for algorithm optimization, thus the performance increases more significantly while having the extra benefit of simplifying the code. That's the situation where SPL shows its strength. It can produce a much shorter program with performance many times higher.

Related Reading: New Algebra New Database

**Address:**C009, Floor 2, Building 1, Yard 41, Shangdi West Road, Haidian District, Beijing