|
|
Research Interests
I am interested in
the general areas of nonlinear optimization,
artificial intelligence, data warehousing, and data mining. More specifically,
I have performed research in:
- Large-scale constrained nonlinear
optimization, in discrete, continuous, and mixed-integer spaces
- Partitioned and distributed optimization
- High-performance optimization on parallel and
distributed computers
- Stochastic optimization algorithms
- Automated planning and scheduling
- Machine learning algorithms
- OLAPing and data cubing, statistical databases
- Data mining, including classification, clustering,
and stream data modeling
Research Synopsis
Constrained Nonlinear Optimization and AI/OR
Applications. The central contribution of our research is a
general approach that can significantly reduce the complexity in solving
constrained nonlinear optimization (NLP) problems arising from
applications including engineering design, operations research, planning
and scheduling, and machine learning.
A key observation we have made is that most NLPs
from real-world applications have structured arrangements of constraints.
We have developed techniques to exploit these constraint structures by
partitioning the constraints into subproblems related by global
constraints. Constraint partitioning leads to much relaxed subproblems
that are significantly easier to solve. Therefore, the total time to solve
all subproblems is also significantly reduced. A key factor deciding the
effectiveness of constraint partitioning is the efficiency in resolving
violated global constraints.
We have proposed the theory of extended saddle points (ESP) to support
constraint partitioning. ESP offers a necessary and sufficient condition
for constrained local optima of NLPs satisfying all global constraints. It
facilitates constraint partitioning by providing a set of necessary
conditions, one for each subproblem, to characterize the local optima. It
further reduces the complexity by defining a much smaller search space in
each subproblem for backtracking. Since resolving the global constraints
only incurs a small amount of overhead, our approach leads to a
significant reduction of complexity. The partitioned search algorithms are
well-suited for parallel implementation on high-performance architectures
due to the strong parallelism in solving independent subproblems.
Our constraint-partitioning approach has achieved substantial improvements
over existing methods in AI planning and mathematical programming. We have
applied our method to solve some large-scale AI planning problems, as well
as some continuous and mixed-integer NLPs in standard benchmarks. We have
solved some large-scale problems that were not solvable by other leading
methods and have improved the solution quality on many problems.
We are applying the constraint partitioning
approach to solve large-scale NLPs from many scientific, engineering, and
medical applications, such as AI/OR planning and scheduling,
high-performance network optimization, molecular biology, particle
simulations, radiation oncology, and machine learning.
Data Warehousing and Data Mining. Data
mining engines are typically built on top of data warehouses that provide
online analytical processing (OLAP) tools for interactive analysis of
multi-dimensional data. The fast development of OLAP technology has led to
high demand for more sophisticated data analyzing capabilities, such as
prediction, trend monitoring, and exception detection of multi-dimensional
data. The simple distributive or algebraic measures, e.g. sum and avg,
become insufficient to support these functions. We have been the first to
extend the measures to support nonlinear regression analysis measures in a
data cube architecture. OLAPing regression measures offer an analytical
modeling engine to generate trend summarization for forecasting and
statistical analysis at each level of granularity and at each dimension
intersection.
The needs for stream data mining can be found in abundant applications,
such as power supply, network traffic and stock exchange. A fundamental
difference in the analysis of stream data from that of relational and
static warehouse data is that stream data can be examined in a single pass
only. The requirement for multi-level, multi-dimensional, online analysis
of stream data raises a challenging research question: Is it feasible to
mine a huge volume of stream data since the data cube is usually much
larger than the original data set, and its construction may take multiple
database scans? We have answered this question by proposing a feasible
architecture called Stream Cube for online computation of a
multi-dimensional, multi-level stream data cube under stringent time and
space constraints. Based on several key innovations, including tilted time
frames, critical layers, and path cubing, the Stream Cube architecture
achieves a reasonable trade-off between space, computation time, and
flexibility. It has both quick aggregation time and analysis time. It
provides a realistic approach to mine stream data based on the current
computer technology.
Clustering is an important subject for data mining and machine learning.
We have proposed a fundamentally new cellular automata model called the
Ants Sleeping Model (ASM) for ant colony clustering. ASM makes a key
change over previous models in that it does not differentiate data objects
and ant agents. Instead, in ASM, each agent represents both a data unit
and an ant. This critical change of the model leads to fundamental impacts
to clustering efficiency as it avoids a lot of random idle moves and
repetition in previous methods. The Adaptive Artificial Ants Clustering
Algorithm (A4C) based on ASM, when applied to standard clustering
benchmarks, converges much faster than previous methods, saves a large
amount of memory space, and generates solutions of higher quality.
Related Information
Sponsors


|