1 Laboratoire d’Informatique de Paris 6, CNRS UMR 7606, 8 rue du capitaine Scott, 75015 Paris, France
2 Laboratoire de Génétique Moléculaire de la Neurotransmission et des Processus Dégénératifs, CNRS UMR 7091, Hôpital La Pitié-Salpêtrière, 75013, Paris, France
Received on March 17, 2003 ; accepted on June 9, 2003
This article deals with the identification of gene regulatory networks from experimental data using a statistical machine learning approach. A stochastic model of gene interactions capable of handling missing variables is proposed. It can be described as a dynamic Bayesian network particularly well suited to tackle the stochastic nature of gene regulation and gene expression measurement. Parameters of the model are learned through a penalized likelihood maximization implemented through an extended version of EM algorithm. Our approach is tested against experimental data relative to the S.O.S. DNA Repair network of the Escherichia coli bacterium. It appears to be able to extract the main regulations between the genes involved in this network. An added missing variable is found to model the main protein of the network. Good prediction abilities on unlearned data are observed. These first results are very promising: they show the power of the learning algorithm and the ability of the model to capture gene interactions.
Keywords: gene regulatory networks, structure extraction, expression profiles, dynamic Bayesian networks, Kalman filter, penalized likelihood, EM algorithm.