Intersection Analysis Optimization
QDBase optimizes ...
The Challenge
A Bank required efficient computation of intersecting customer groups for large datasets (up to 100 million records per group) with multiple filtering dimensions (e.g., gender, age, region). Existing OLAP server-based solutions performed poorly, requiring 2 minutes per task with 100 CPUs, far exceeding the target of 10–20 concurrent tasks within 10 seconds. Pre-calculation methods were infeasible due to excessive storage and computational requirements for millions of group combinations.
Performance
Query execution on 12 months of data was reduced from 120 seconds on 100 CPUs to 4 seconds on 12 CPUs, a 250x improvement.
Scalability
Achieved target concurrency with efficient resource utilization.
Flexibility
Eliminated the need for pre-calculations, enabling direct queries on raw data with consistent performance across any number of groups.
The QDBase Solution
Performance Improvement
The optimization reduced query execution time from 120 seconds on a 100-CPU virtual machine to just 4 seconds on a 12-CPU virtual machine. This represents a 250x performance improvement, achieved with only 12% of the original CPU resources. The solution exceeded expectations by meeting the performance target of completing 10–20 tasks concurrently within 10 seconds, significantly enhancing the operational efficiency of the analysis system.
Scalability
By addressing the inefficiencies in the original OLAP system, the optimized solution provided robust scalability. Even as data volumes grew, the approach maintained consistent performance by leveraging efficient storage and computational techniques. This ensured that Bank X could handle increasing data demands without further infrastructure investments.
Flexible
The redesigned solution eliminated the need for pre-calculated intersections, enabling direct queries on raw, monthly data. This flexibility allowed the analysis system to adapt to varying customer group combinations dynamically, without impacting performance. It also supported additional filtering dimensions and group combinations without requiring changes to the data structure.
Reliability and Accuracy
The use of boolean sequences and bitwise operations ensured high reliability and precision. These methods minimized the risk of errors typically associated with complex joins and SQL-based filtering on large datasets. The results were consistent across all queries, instilling confidence in the analytical insights provided.
Resource Efficiency
The optimized system utilized fewer hardware resources to achieve superior performance. This was particularly beneficial for cost management, as the reduced computational requirements translated to lower energy consumption and infrastructure costs. With only 12 CPUs needed to achieve the desired results, the solution provided a cost-effective alternative to traditional big data analytics systems.
Ease of Implementation and Maintenance
QDBase SPL's high-level abstraction and rich algorithm support enabled rapid implementation of the optimization strategy. Despite the complexity of the algorithms, the SPL code was concise and modular, comprising only 20 lines for the primary logic. This streamlined approach simplified debugging, reduced development time, and made the solution easier to maintain and extend in the future.
Business Impact
By resolving the performance bottlenecks and introducing an efficient, scalable, and adaptable system, Bank X could enhance its customer analytics capabilities. Faster and more reliable insights allowed for better decision-making and improved targeting for marketing and customer engagement strategies.
QDBase Feature Highlights
- High-Performance Columnar Storage: SPL supports compressed, binary columnar storage, reducing data size and improving read performance. Optimized for sequential access, ideal for the traversal-based filtering and bitwise operations used in the solution.
- Boolean Sequence Filtering: Enables efficient, constant-time evaluation of filtering conditions by converting dimension attributes into boolean sequences. Eliminates the performance bottlenecks of traditional SQL IN operations on large datasets.
- Bitwise Calculations: SPL provides native support for bitwise operations, crucial for intersecting customer groups stored as binary-encoded integers. This feature allowed for rapid computation of group intersections with minimal computational overhead.
- Precursor Filtering: Implements early elimination of rows that do not meet initial conditions, reducing the need to read and process unnecessary fields. Significantly minimizes memory usage and speeds up query execution.
- Concise and Modular Code: SPL's built-in functions for common algorithms allowed the solution to be implemented in only 20 lines of code. Its modular design made the codebase easier to develop, debug, and extend compared to high-level languages like Java or C++.
- Rich Functionality for Advanced Analytics: Provides prebuilt support for high-performance algorithms, such as dimensional encoding and boolean filtering, reducing the need for custom implementations. These built-in functions allowed for rapid deployment of complex optimization strategies.
- JDBC Driver Integration: SPL can be seamlessly integrated into existing systems through its JDBC driver. This allowed Bank X's front-end applications to access SPL's optimized analytics capabilities as easily as calling a database stored procedure.
- Efficient Memory Management: SPL leverages small integer objects and efficient data structures, ensuring low memory overhead even with large datasets. Avoids the excessive object generation associated with traditional high-level languages, enhancing runtime performance.
- Visual Debugging Mechanism: SPL offers an intuitive debugging interface, simplifying the development process and ensuring the correctness of complex algorithms.
- Cost and Resource Efficiency: By optimizing both storage and computation, SPL significantly reduced the hardware requirements for Bank X, providing a scalable and cost-effective solution.
Conclusion
This optimized approach not only solved the Bank's performance bottlenecks but also demonstrated the potential of tailored algorithms over traditional SQL in handling complex, large-scale data operations. By leveraging capabilities provided by QDBase, the solution set a new standard for efficiency and scalability in customer group analysis.
Read the full in-depth study, with code examples on the SCUDATA Blog
Read about more use cases with real results on our Case Studies page