Home
News
Research
Teaching
Publications
Software
Group
Biography
                     
             

Research Interests

I am interested in the general areas of nonlinear optimization, artificial intelligence, data warehousing, and data mining. More specifically, I have performed research in:

  • Large-scale constrained nonlinear optimization, in discrete, continuous, and mixed-integer spaces
  • Partitioned and distributed optimization
  • High-performance optimization on parallel and distributed computers
  • Stochastic optimization algorithms
  • Automated planning and scheduling
  • Machine learning algorithms
  • OLAPing and data cubing, statistical databases
  • Data mining, including classification, clustering, and stream data modeling

Research Synopsis

Constrained Nonlinear Optimization and AI/OR Applications. The central contribution of our research is a general approach that can significantly reduce the complexity in solving constrained nonlinear optimization (NLP) problems arising from applications including engineering design, operations research, planning and scheduling, and machine learning.

A key observation we have made is that most NLPs from real-world applications have structured arrangements of constraints. We have developed techniques to exploit these constraint structures by partitioning the constraints into subproblems related by global constraints. Constraint partitioning leads to much relaxed subproblems that are significantly easier to solve. Therefore, the total time to solve all subproblems is also significantly reduced. A key factor deciding the effectiveness of constraint partitioning is the efficiency in resolving violated global constraints.

We have proposed the theory of extended saddle points (ESP) to support constraint partitioning. ESP offers a necessary and sufficient condition for constrained local optima of NLPs satisfying all global constraints. It facilitates constraint partitioning by providing a set of necessary conditions, one for each subproblem, to characterize the local optima. It further reduces the complexity by defining a much smaller search space in each subproblem for backtracking. Since resolving the global constraints only incurs a small amount of overhead, our approach leads to a significant reduction of complexity. The partitioned search algorithms are well-suited for parallel implementation on high-performance architectures due to the strong parallelism in solving independent subproblems.

Our constraint-partitioning approach has achieved substantial improvements over existing methods in AI planning and mathematical programming. We have
applied our method to solve some large-scale AI planning problems, as well as some continuous and mixed-integer NLPs in standard benchmarks. We have solved some large-scale problems that were not solvable by other leading methods and have improved the solution quality on many problems.

We are applying the constraint partitioning approach to solve large-scale NLPs from many scientific, engineering, and medical applications, such as AI/OR planning and scheduling, high-performance network optimization, molecular biology, particle simulations, radiation oncology, and machine learning.  

Data Warehousing and Data Mining. Data mining engines are typically built on top of data warehouses that provide online analytical processing (OLAP) tools for interactive analysis of multi-dimensional data. The fast development of OLAP technology has led to high demand for more sophisticated data analyzing capabilities, such as prediction, trend monitoring, and exception detection of multi-dimensional data. The simple distributive or algebraic measures, e.g. sum and avg, become insufficient to support these functions. We have been the first to extend the measures to support nonlinear regression analysis measures in a data cube architecture. OLAPing regression measures offer an analytical modeling engine to generate trend summarization for forecasting and statistical analysis at each level of granularity and at each dimension intersection.
                     
The needs for stream data mining can be found in abundant applications, such as power supply, network traffic and stock exchange. A fundamental difference in the analysis of stream data from that of relational and static warehouse data is that stream data can be examined in a single pass only. The requirement for multi-level, multi-dimensional, online analysis of stream data raises a challenging research question: Is it feasible to mine a huge volume of stream data since the data cube is usually much larger than the original data set, and its construction may take multiple database scans? We have answered this question by proposing a feasible architecture called Stream Cube for online computation of a multi-dimensional, multi-level stream data cube under stringent time and space constraints. Based on several key innovations, including tilted time frames, critical layers, and path cubing, the Stream Cube architecture achieves a reasonable trade-off between space, computation time, and flexibility. It has both quick aggregation time and analysis time. It provides a realistic approach to mine stream data based on the current computer technology.
                      
Clustering is an important subject for data mining and machine learning. We have proposed a fundamentally new cellular automata model called the Ants Sleeping Model (ASM) for ant colony clustering. ASM makes a key change over previous models in that it does not differentiate data objects and ant agents. Instead, in ASM, each agent represents both a data unit and an ant. This critical change of the model leads to fundamental impacts to clustering efficiency as it avoids a lot of random idle moves and repetition in previous methods. The Adaptive Artificial Ants Clustering Algorithm (A4C) based on ASM, when applied to standard clustering benchmarks, converges much faster than previous methods, saves a large amount of memory space, and generates solutions of higher quality.

Related Information

Sponsors