FERNANDES, T. C. R. L.; http://lattes.cnpq.br/2107979968723473; FERNANDES, Thalita Cristine Ribeiro Lucas.
Résumé:
This is an attempt to create a simple, but quite systematic, automated machine learning (AutoML) algorithm. The main contribution is to produce the simplest regression model (e.g., second order polynomial regression model via OLS based sequential feature selection) whenever possible, or else generate more complex, and therefore less desirable, nonlinear (e.g., gaussian process regression) models. It does so by efficiently using sequential design techniques to cleverly fill the sample space with “interesting” points, generating a dataset (which includes the responses obtained by “querying” the actual underlying process) on demand that is used to select the simplest possible regression model, among a predefined set of candidate regression models, in an iteratively way until particular convergence criteria are met. The intended goal is therefore
to minimize the number of calls to the generating process, resulting in the least number of samples. Each dataset produced iteratively is exhaustively and effectively used up in an effort to converge even difficult responses that have not met the criteria even with a large number of samples. Application of the proposed algorithm to important cases shows its effectiveness in building metamodels with significant predictive capabilities. It is suggested the use of pure nonlinear regression techniques in situations in which data takes more time to gather than to be processed by the algorithm. In general, a carefully chosen mix of both linear and nonlinear regression methods to metamodel building is recommended for
most cases, as a tradeoff between processing time and predictive capacity.