Malaysian Journal of Science 28 (2): 113– 125 (2009)
Communication Architecture for Biodiversity Information Retrieval (CABIR)
Sarinder K. K. S.1*, Lim L. H. S.1, Dimyati K.2 and Merican A. F.1
1
Institute of Biological Sciences, Faculty of Science, 2 Department of Electronic and Electrical Engineering, Faculty of Engineering, University of Malaya, 50603 Kuala Lumpur, Malaysia *[email protected] (corresponding author) Received 11th September 2008, accepted in revised form 14th July 2009. ABSTRACT This paper focused on biodiversity database integration. Firstly, the problems are studied and explained. They are divided into four broad categories, which are technical, political, syntactical and semantical. Following this, the approaches to build a database integration system werereviewed after which, the mediator based approach was selected. A new database integration system, which is called CABIR (Communication Architecture for Biodiversity Information Retrieval) was built and discussed thoroughly in this paper. A performance test was also conducted to measure the efficiency of the CABIR system. The development of CABIR system is expected to contribute in the field ofbiodiversity as this system could link distributed biodiversity databases regardless of the heterogeneity and the platform they were built upon. ABSTRAK Artikel ini memberi tumpuan kepada integrasi pangkalan data biodiversiti. Pertama sekali, masalah-masalah yang dihadapi dikenalpasti and dijelaskan. Masalah-masalah ini dibahagikan kepada empat kategori iaitu teknikal, politikal, sintetikal andsemantikal. Seterusnya, langkah-langkah untuk membangunkan satu integrasi pangkalan data dikaji setelah “mediator” telah dipilih. Satu sistem integrasi pangkalan data yang baru, iaitu CABIR (Communication Architecture for Biodiversity Information Retrieval) telah dibangunakan and dibincang dengan teliti dalam artikel ini. Satu ujian prestasi telah dijalankan untuk menentukan kecekapan sistem CABIR.Pembangunan sistem CABIR ini dijangka akan menyumbang kepada bidang biodiversiti kerana sistem ini boleh menghubungkan pangkalan data biodiversiti yang bertaburan tanpa mengira kepelbagaian and platform dimana pangkalanpangkalan data tersebut dibina. (Keywords: Biodiversity, database integration, heterogeneous)
INTRODUCTION The revolutions caused by advanced computing power, advanced informaticsand the Internet have changed the way biodiversity data are stored, located and disseminated. As the internet provides a unified transport infrastructure for information sharing, today almost every scientist uses the internet to share data, sometimes just with a close colleague, other times through largescale databases. The volume, complexity and heterogeneity of biodiversity data make data sharinga daunting task. According to an article by Philippi [1], the problems of data integration in life sciences faces a variety of problems and can be grouped into four categories: technical, political, syntactical and semantical. In a syntactical point of view, flatfiles are still the de-facto standard for exchange of data whereas 113
technical problems arise if a database is not available as aflatfile, but via dynamically generated HTML pages [1]. However, this paper specifically addresses relational databases systems rather than flatfiles and HTML pages, therefore syntactical and technical problems are beyond the scope of this research. Other problems are of political and semantical. Political reasons such as copyright and ownership issues affect the way data is stored, disseminatedand shared among the contemporaries. While data warehouses are easier to manage and manipulate due to their homogeneity, they are not very well accepted by the scientific community. This is due to the political reason that data goes out of the researcher’s storage to a central warehouse managed by an administrator, who might be unknown to him. In addition, data integrity becomes a problem as the…