GECCO 2012 Tutorial: Real World Modeling

Abstract: Capturing the value in real-world data requires more than fitting trivial models or visually exploring the data. Rather, we must efficiently isolate driving variables, confirm or reject potential outliers and build models which are both accurate and trustable. Fortunately, multi-objective genetic programming (aka, ParetoGP) allows us to achieve this objective. ParetoGP will be the foundation technology in this tutorial; however, we will address the entire modeling process including data balancing, outlier detection and model usage/exploitation — as well as the model development.

In addition to covering the basic theory of ParetoGP, we explore key points using real-world industrial data modeling case studies as well as review best practices of industrial data modeling. Current economic conditions demand maximum efficiency in developing and exploiting maximal quality models; ParetoGP has been used for applications ranging from energy trading to active design-of-experiments to plant trouble-shooting to patent litigation modeling.

Download slides in PDF (85Mb) or in Keynote (zipped) (25Mb)

You can find a short presentation on the design decisions which went into DataModeler: in PDF and in Keynote (zipped).