Bioinformatics
Semantic & Cloud Computing
Semantic Computing addresses technologies which facilitate activities that allow users to create, manipulate or retrieve computational content based on semantics (“meaning”, “intention”), where “content” may be anything such as video, audio, text, software, hardware, network, environment, process, etc.
Our research specifically applies Semantic Computing to Cloud Computing and biomedical applications. Considering the Internet as one single platform, the CommonWeb Project was launched to provide a personalized web for each “Common People” that (1) everyone is connected and has an equal ownership of the Internet; (2) everyone is equally capable of accessing and creating resources including information and applications; and (3) everyone is equally capable of turning an application into a business and marketing a business, where “everyone” includes any person, company, organization or group, and “Common People” refers to indeed everyone including those who are not computer scientists.
An integral component of the project is WebID and WebSpace, which are free to the public. With the WebID you can easily build your own “web” – a collection of inter-connected spaces that each may host a different application with its own users, business model and revenue. It is designed for EVERYONE to create, protect and capitalize on his resources including data, information, knowledge, expertise, tools, connections and applications.
The WeaveSphere Semantic Computing Consortium (SCC) runs of top and supports the CommonWeb WeaveSpace Internet OS infrastructure with a focus on web services and problem solving. Designed to connect all biomedical services, SCC is driven by user problems expressed in Natural Language (NL) and matches user problems with the capabilities of services at a high resolution.
Prof. Phillip C-Y. Sheu
Systems Biology
Systems biology is an interdisciplinary area of research which aims at understanding biology at a system-level through data analysis, modeling and wet lab experiments. Component of the system is not treated as an independent unit, instead interactions between the components are studied, and new biological properties are emerged at another level of organization. Systems biology is a scientific field that demands for integrating a few areas, such as molecular biology, computer science, mathematics, physics and chemistry. This approach is applicable to different levels of organization; for instance, from pathway to cell, tissue, organ, organism, population, or ecosystem.
Due to the high-throughput experimental technique, large sets of experimental genomic, transcriptomic, proteomic and metabolomic data are available for an increasing number of organisms. To handle such a large amount of information, new algorithms are designed for data analysis. Also, mathematical models are proposed, developed or modified to describe the dynamical behaviors of a model system, where predictions derived from the model are compared with experimental results.
Our research activities mainly focus on: (1) predicting cancer-related protein through protein-protein interactions study, (2) identifying cancer-related microRNAs base on microarray data analysis, (3) studying host-pathogen interaction in plant systems, and (4) identifying network motifs in cancer pathways.
A strategy to gain a better understanding of disease proteins is to consider the interactions of these proteins by making use of the protein-protein interaction (PPI) data. It is known that proteins are composed of multiple functional domains. Domain information is introduced to study a biomedical problem. A domain-domain interaction (DDI) model is proposed to obtain specific sets of DDI for cancer-related proteins. A specific set of DDI, i.e. cancer protein and cancer protein, is derived from PPI. This can provide a potential approach for cancerous protein prediction based on DDI, which can contribute to biomedical study.
Microarray experiment enables us to record the expression levels of thousands of genes at one time and identify differentially expressed genes (DEGs). In this study, DEGs for microarray of cancer data are obtained by using the R language and the Bioconductor package. By integrating the Tumor Associated Gene (TAG), non-coding RNA protein-protein interaction database (ncRNAppi) and the disease-related microRNA database (miR2Disease), cancer-related microRNA as well as microRNA regulate pathways are identified.
Plants are continuously subjected to infection by the fungi, bacteria, viruses, and nematodes. In this study, microarray data will be analyzed to identifying genes and the pathogen-directed pathway induced by pathogens, such as Xanthomonas campestris pv. Campestris (Xcc), and agrobacterium. These studies will set the stage for understanding the molecular mechanisms enabling plants to respond to pathogen attack. In this research project bioinformatics platform will be set up for genome-scale studying of miRNA regulatory networks involving in the development processes and the defense responses to pathogens attack for Arabidopsis. Aims of this proposal are:
- Identify which specific groups of miRNAs, their target genes, and the coupling between miRNAs and PPI networks which are involved in the development gene regulatory networks (GRN),
- Studying the biological roles of network motifs compose of miRNA and TF, in particular, both of the feedback loop (FBL) and feedforward loop (FFL) types of motifs will be considered, and
- Understanding the molecular mechanisms enabling Arabidopsis to respond to pathogen attack.
In the post-genome era, it is more productive to investigate how bio-molecules regulate or cooperate on a system level. The graph theory approach is a powerful tool for investigating the underlying global topological structures of different molecular networks; such as the yeast protein interaction network and metabolic network. Such topological structures are also called network motifs or modules. Examples of such motifs are auto-regulation (either catalytic or repression), coherent feed-forward loop (cFFL), single-input module (SIM) and bi-fan. Given the raw regulatory relations collection, we propose to identify several major types of network motifs, such as auto-regulation, cFFL, iFFL, SIM, multiple-input module (MIM), bi-parallel, three-chain and bi-fan. Network motifs do not perform biological functions independently, instead motifs are interconnected which lead to observed phenotypic change. Another task is to identify possible motif-motif interaction pairs. A globally interacting network can be reconstructed by integrating these interacting information. And it is general believe that the global topological architecture of biological networks are hierarchical in nature.
Prof. Ka-Lok Ng
Omics
Omics is a broad discipline in biological science for data integration and the intra-correlations inferring. These include genome, proteome, transcriptome, metabolome, physiome, interactome and so on. The core aims of omic research are 1) to identify and annotate the complete set of omic objects, 2) to find interaction relationships among the objects by experimental observation or artificial definition, 3) to map information objects such as genes, proteins, and ligands for different biological states, 4) to present the objects in network structure as to understand and manipulate the mechanisms for specific biological state, and 5) integrating various omes and omics subfields. For instance, in cancer genomics, we integrated 148 subtypes of cancer microarray data to study the human cancer genome for the full genes and mutations--both inherited and sporadic--that contribute to the development of a cancer cell and its progression from a localized cancer to one that metastasizes. The gene interaction relationships were inferred by novel methods and visualize in networks which applying important clues in anti-cancer drug design.
At present, the main omic subfields and their descriptions are listed as the following table:
Term |
Description |
Object |
Yeas* |
Genome |
The full complement of genetic information both coding and non coding in the organism |
Gene |
1932 |
Proteome |
The protein-coding regions of the genome and the protein expressions |
Protein |
1995 |
Transcriptome |
The population of mRNA transcripts in the cell, weighted by their expression levels |
Transcript |
1997 |
Physiome |
Quantitative description of the physiological dynamics or functions of the whole organism |
Phenotype and genotype |
1997 |
Metabolome |
The quantitative complement of all the metabolite molecules present in a cell in a specific physiological state |
Metabolite |
1998 |
Interactome |
List of interactions between all macromolecules in a cell |
Biological molecular |
1999 |
Glycome |
The population of carbohydrate molecules in the cell |
Carbohydrates |
2000 |
*The years of first PubMed citation
Prof. Pei-Chun Chang
Disease Co-morbidity
The term comorbidity refers to the association of distinct diseases in the same individual at a rate higher than expected by chance. However, disease-comorbidity issue is more and more important; this is because comorbidity can affect the moment of detection, prognosis, therapy, outcome, and even the policy making of our government.
The Nation Health Insurance (NHI) Program has already established in Taiwan in 1995, and all citizens who have established a registered domicile in Taiwan are mandated to join the program. In June 2003, more than 99% of the 23-million populations have enrolled in the NHI Program; therefore, the NHI Bureau has accumulated huge medical records forming the NHI research database (NHIRD). In fact, the NHIRD is hold by the cooperation of National Health Research Institutes (NHIR) and the NHI Bureau. Furthermore, NHIRD can be used by application for academic research.
The NHIRD covers both Western and Chinese medical records and it contains registration files and original claim data. In registration files, they include registry for contracted medical facilities (HOSB), supplementary registry for contracted medical facilities (HOSX), registry for medical services (HOX),registry for contracted specialty services (DETA), registry for contracted beds (BED), registry for medical personnel (PER), registry for board-certified specialists (DOC), registry for beneficiaries (ID), registry for drug prescriptions (DRUG) and registry for catastrophic illness patients (HV). On the other hand, original claim data includes monthly claim summary for ambulatory care claims (CT), monthly claim summary for inpatient (DT), inpatient expenditures by admissions (DD), details of inpatient orders (DO), ambulatory care expenditures by visits (CD), details of ambulatory care orders (OO), expenditures for prescriptions dispensed at contracted pharmacies (GD), and details of prescriptions dispensed at contracted pharmacies (GO).
Further, integrate the data in different files of NHIRD; we can obtain information such as patient’s ID (it was coded), gender, age, diseases (ICD-9-CM), drugs and dosage. According to these information and statistical analysis can help us to explore disease-comorbidity based on national population-level data.
Besides, using the data in NHIRD to proof comorbidity; there are many methods to detect the disease-comorbidity, for example constructing metabolic network, protein-protein interaction network, gene network, disease network, and so on.
Prof. Yu-Ching Chen