MyWEST
My Web Extraction Software Tool

Marco Masseroli, PhD and Andrea Stella, MS

 

MyWEST is a software package for effective mining of web interfaced biomolecular databanks. It provides an intuitive visual interface for building templates that define which information should be extracted from HTML pages of web databanks, then uses the created templates to mine information from multiple web pages of different databanks, stores and aggregates in a common database the extracted data, and allows performing articulated queries on the aggregated data for identification of hidden significant biological information.

A template configuration module enables the visual definition of the information to mine on selected reference HTML pages of web interfaced databanks of interest, and the creation of extraction templates. Furthermore, it allows definition of access parameters both to web accessible databanks of interest and to a relational database for storing all extracted data.

In a data extraction module, users can provide identification codes of nucleotide or amino acid sequences of interest and use the created templates to automatically mine, in batch mode from different web interfaced databanks at once, the available annotations of interest. The mined information is stored in text excel file format for easy and immediate use, and in a relational database. In the database all extracted data are aggregated and structured to allow performing articulated queries for further comprehensive mining.

A specifically designed updating software agent enables automatically updating of all information contained inside the database of the mined data.



MyWEST has been developed by Andrea Stella, MS 1 and Marco Masseroli, PhD 1 in collaboration with Myriam Alcalay, MD 2,3. Special thanks to Heiko Muller 2,3 for suggesting the initial idea from which we started the development of MyWEST, to Natalia Meani 2,3 for providing the experimental data used to test MyWEST, to Massimo Mazzarani 1 for aiding in the development of this web site, and to Giovanni E. Carrara 1 for testing MyWEST prototype.

1 Bioengineering Department, Politecnico di Milano, Milano, Italy
2 IFOM - FIRC Institute of Molecular Oncology, Milano, Italy
3 IEO - European Institute of Oncology, Milano, Italy


© Marco Masseroli, PhD E-mail masseroli@biomed.polimi.it - Last update on June 10, 2004 - 17:37:18