e-science and
 bioinformatics

 

 

DOWNLOAD OF SOFTWARE TOOLS:

the poliGMDH application for adaptive neural net construction, written in Java, can be downloaded from here.

The IReNNS software for learning from structures can be downloaded from here

 

 

e-Science is the road to develop science, in the next future, through distributed global collaborations enabled by the Internet. An important feature of e-science is that it will require access to very large data collections, very large scale computing resources. Another important feature is that unseen correlation between such large data would be automatically detected by data mining and inductive systems. Bioinformatics, the broad area to develop a methodological approach to biology through the tools of informatics, is the most important example of e-science.

 

We are developing mathematical modelling of chemical, biological, toxicological activities. A summary of projects/techniques/applications is in the table.

 

Project name

EST

NATO

COMET

IMAGETOX

openmolGRID

DEMETRA

easyring

fateallchem

ION

ECB

RAINBOW

CAESAR

ORCHESTRA

ANTARES

Funded by

EU

NATO

EU

EU

EU

EU

EU

EU

EU

EU

EU

EU

EU

EU

years

1995-98

1998-99

1998-2000

2000-4

2002-4

2003-6

2003-6

2003-5

2004-6

2005

2006

2006-9

2009-12

2010-12

Tecniques

 

 

 

 

 

 

 

 

 

 

 

 

 

 

SAR/QSAR

x

 

x

x

x

x

x

x

x

x

 

x

x

x

ensembling

 

 

 

 

 

x

 

 

 

 

 

x

 

 

hybrid

x

 

 

x

 

x

 

 

x

 

 

 

 

x

Evolutive NN

 

 

 

x

 

 

 

 

x

 

 

x

 

 

Neuro-fuzzy

 

 

x

x

 

 

 

 

 

 

 

 

 

 

applications

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Drug discovery

 

 

 

 

 

 

 

 

x

 

 

 

 

 

Environment protection

 

 

 

x

 

x

 

 

 

 

 

 

x

x

REACH

 

 

 

 

 

 

 

 

 

 

 

x

x

x

 

 

QSAR

The seminal work in the field of QSAR was reported in Hansh papers (1963) , where he demonstrated the use of regression analysis for model building.

As the number of descriptors increases as in modern computational systems  regression analysis becomes problematic. One problem is redundancy in information when descriptors are correlated. A second problem is the a priori assumption of a model form (i.e. quadratic, cubic, use of cross terms, etc.).

Modern approaches using machine learning methods are explored.

DOWNLOAD

G. Gini, E. Benfenati, D. L. Boley, "Clustering and classification techniques to assess aquatic toxicity", Proc. IEEE KES, IOS Press, Brighton (UK), September 2000

G. Gini,P. Mazzatorta, E. Benfenati, C.-D. Neagu, “The importance of scaling in data mining for toxicity prediction”, J of Chemical Information and Computer Sciences, , 42, pp.1250-1255,.2002.

E. Benfenati, G. Gini, N. Piclin, A. Roncaglioni, M.R. Vari', "Predicting logP ofpesticides using different software", Chemosphere, 53, p 1155-1164, 2003.

G. Gini, “Scoping study for the development of an internet based decision support system for (quantitative) structure activity relationships”, posted online 3 September 2005, http://ecb.jrc.it/QSAR/

T. I. Netzeva, A. O. Aptula , E. Benfenati , M. T.D. Cronin, G. Gini, I. Lessigiarska, U. Maran, M. Vracko , G. Schürmann , "Description of the Electronic Structure of Organic Chemicals Using Semi-Empirical and Ab initioMethods for Development of Toxicological QSARs", J of Chemical Information and Modelling, (The American Chemical Society), 2005, 45 (1) pp 106-114.

PREDICTIVE TOXICOLOGY – IN SILICO

Predictive toxicology is the specific problem to infer toxicology against a given biological target. The field may use QSAR methods as well as other knowledge_based approaches. Applications are in medicine, life science, environment protection. The general area of toxicology is illustrated.

environment effect

 

individual concentration

 

species effect

 

individual effect

 

Bio

availability

 

concentration in environment

 

DOWNLOAD:

G. Gini,E. Benfenati, "Computational predictive programs (expert systems) in toxicology", Toxicology, (Elsevier), 119, 213-225 , 1997.

G. Gini,E. Benfenati, V. Testaguzza, R. Todeschini, "Hytex(Hybrid Toxicology Expert System): Architecture and implementation of a multi-domain hybrid expert system for toxicology", Chemometrics and intelligent laboratory systems 43, (Elsevier), pp 135-145, 1998.

G. Gini,E. Benfenati, M. Lorenzini, M. Bruschi, P. Grasso, "Predictive Carcinogenicity: a Model for Aromatic Compounds, with Nitrogen-containing Substituents, Based on Molecular Descriptors Using an Artificial NeuralNetwork" ,J of Chemical Information and Computer Sciences , 39, pp. 1076-1080, 1999.

D. J.Musliner, B. Pell, W. Dobson, K. Goebel, G. Biswas, S. A. McIlraith, G. C.Gini, S. Koenig, S. Zilberstein, W. Zhang, “Reports on the AAAI Spring Symposia(March 1999)”. AI Magazine 21(2):79-84, 2000.

 

EVOLUTIVE AND NEURO-FUZZY SYSTEMS

The successes of neural networks in chemistry also highlighted important factors which must be considered when using neural networks. First the design of the network is critical with respect to the number of hidden units involved. The network will overfit or memorize the data if too many hidden units are used. Conversely, the network will fail to generalize and become unstable if too few hidden units are used. To this end we are developing evolutive NN based on GMDH approach. Finally, the results obtained from neural networks can be difficult to interpret and apply. For this reason neuro fuzzy systems that can insert symbolic knowledge are also of interest.

DOWNLOAD

P. Mazzatorta, E. Benfenati, C.-D. Neagu, G. Gini, " Tuning Neural and Fuzzy-Neural Networks for Toxicity Modeling", J of Chemical Information and Computer Sciences, 43, pp.513-518, 2003.

M.Pintore, N. Piclin, E. Benfenati, G. Gini, J.R. Chretien, "Predicting toxicity againstthe Fathead Minnow by Adaptive Fuzzy Partition ", QSAR Comb. Sci, (Wiley-VCH)22, p 210-219, 2003.

C.-D. Neagu, E. Benfenati, G. Gini, P. Mazzatorta, A. Roncaglioni, "Neuro-fuzzy knowledge representation  for toxicityprediction of organic compouds", Proc. 15nt European Confon Artificial Intelligence, ECAI, Lyon (France), July 2002, pp 498-50

G. Gini, M. Giumelli, E. Benfenati, N. Piclin, J. Chrétien, M. Pintore, "A Comparison of Probabilistic, Neural, and Fuzzy Modeling in Ecotoxicity ", Proc. 3rd Int Conf on Knowledge-based intelligentinformation engineering systems, KES 2002, IOS Press, pp 542-546

C.-D. Neagu, A. O. Aptula,G. Gini, "Neural and Neuro-fuzzy models of toxic action of phenols", Proc. First International IEEE Symposium "Intelligent Systems", IS 2002, Varna (Bulgaria), September2002, pp.283-288.

 

ENSEMBLING

 

Combining the predictions of a set of classifiers has shown to be an effective way to create composite classifiers that are more accurate than any of the component classifiers.

Starting from basic combination strategies we tried to extend the concept of ensembling different models in order to build a model with the maximum possible value for our application. We employed methods from Pattern Recognition to Artificial Intelligence, including attention to the statistical meaning of the result and on the knowledge level of the proposed combination. Instead of concentrating on building the best expert, we combine some good experts that are accurate and conceptually different, so they make different errors.

DOWNLOAD:

C. Koening, G. Gini, M. Craciun, E. Benfenati, “Multi-class classifier from a combination of local experts: toward distributed computation for real-problem classifiers”,Int J of Pattern Recognition and Artificial Intelligence, Vol. 18, No. 5 , 2004, p 801-817.

G. Gini, M. Craciun, E. Benfenati. Combining unsupervised and supervised artificial neural networks to predict aquatic toxicity”, J of Chemical Information and Computer Sciences, (TheAmerican Chemical Society), Vol 44, N 6, 2004, p1897-1902.

 

HYBRID MODELS

 

In the present investigation we integrate models according to averaging or to stacking criteria. If we want to compare the different models we can draw together their ROC curves for classifiers, REC curves for regression models, as in figure.

DOWNLOAD

G. Gini, M. Lorenzini, E. Benfenati, R. Brambilla, L. Malve', "Mixing a symbolic and a subsymbolic expert to improve carcinogenicity prediction of aromatic compounds", Proc. Second Workshop on Multiple Classifier Systems (MCS 2001), Cambridge, UK, July 2001, Springer-Verlag.

G. Gini, M. Lorenzini, E. Benfenati, R. Brambilla, L. Malve', "Integrating rules and neural nets for carcinogenicity prediction", Proc. IEEE IFSA/NAFIPS, Vancouver, Canada, July2001.

E. Benfenati, P. Mazzatorta, C.-D; Neagu, G. Gini, "Combining classifiers of pesticides toxicity through a neuro-fuzzy approach", Proc. 3rdInternational Workshop on Multiple Classifier Systems, MCS 2002,  Springer, Cagliari (Italy), June 2002,pp 293-303.

G. Gini, E. Benfenati, C.-D. Neagu, "Training through European Research Training Networks - Analysis of IMAGETOX", ISBN 9637154 07 8 , Proc. 3rd IEEE Int. Conf on Information Technology based higher education and training, ITHET 2002, Budapest (Unghery), July 2002, pp237 – 242.

G. Gini, T. Garg, M. Stefanelli, "Ensembling regression models to improve their predictivity: a case study in QSAR (Quantitative Structure Activity relationships) within computational chemometrics”, Applied Artificial Intelligence, 23, p 261-281, March 2009.

 

back to G. Gini personal page