Activities per year
Abstract
Recently, the focus of cancer drug discovery has shifted towards developing targeted drugs that specifically target deregulated proteins in cancer tissues. Despite extensive efforts to sequence cancer genomes and identify potential drug targets, the efficacy of targeted drugs in clinical trials has often been disappointing due to inconsistent treatment responses. This can be attributed to a lack of comprehensive understanding of drug-target interactions (DTIs) and how they contribute to treatment efficacy and adverse effects. This thesis aims to address this gap by FAIRification the drug screening experiments and utilizing text mining techniques to build a comprehensive knowledge base of drug targets, which is significant for enhancing precision medicine.
The high-level objective of this research is divided into three main tasks. Firstly, we have developed the Minimal Information for Chemosensitivity Assays (MICHA) pipeline, which enables the FAIRification (Findable, Accessible, Interoperable, and Reusable) of drug screening experiments. MICHA provides a web server and database that integrate compound annotation, including chemical structures, targets, and disease indications. It also facilitates the annotation of cell line samples, assay protocols, and literature references through curated catalogues.
Secondly, we tackle the challenge of handling the massive amount of scientific articles published in drug discovery research. While text mining techniques have been widely applied to extract relationships in other data types, such as protein–protein interactions (PPIs) and disease-gene interactions, there have been limited studies on automatically identifying DTIs articles. To achieve that, we employed Bidirectional Encoder Representations from Transformers (BERT) to classify articles that potentially contain DTIs. Furthermore, we aim to predict the assay format, as DTI data is closely tied to the specific assay used for its generation. Our novel method identifies a significant number of articles (0.6 million) not previously included in public DTI databases. We achieved a high accuracy in identifying articles with quantitative drug-target profiles and demonstrated room for improvement in predicting assay formats.
Finally, we explore the challenge of drug-target interactions (DTIs) extraction, as an entity-relationship extraction, using advanced pre-trained transformer models like BERT. To enhance the extraction accuracy, we incorporate distinct ensemble strategies. The first strategy fuses a pre-trained language model with Convolutional Neural Networks (CNN) to discern the relationships more effectively. Simultaneously, our second strategy synergizes gene descriptions derived from the Entrez Gene database with chemical descriptions obtained from the Comparative Toxicogenomics Database (CTD). Remarkably, the ensemble model that utilizes descriptions proves superior, registering a commendable F1 score of 80.6 on the concealed DrugProt hidden test set. This performance outpaces other competing models. Furthermore, our analysis comparing gene textual descriptions from both the Entrez Gene and UniProt databases provides valuable insights into their influence on the extraction's success.
The importance of this research extends beyond its technical contributions. By enhancing the accuracy and depth of drug-target interaction data, the research has potential implications for improving the prediction and understanding of drug efficacy and adverse reactions in cancer therapy. It sets the stage for more precise and individualized therapeutic strategies, which are the cornerstone of personalized medicine. Ultimately, the methods and findings of this research have the potential to impact the successful development of new drugs and the re-purposing of existing ones, underscoring its significance in the ongoing battle against cancer.
The high-level objective of this research is divided into three main tasks. Firstly, we have developed the Minimal Information for Chemosensitivity Assays (MICHA) pipeline, which enables the FAIRification (Findable, Accessible, Interoperable, and Reusable) of drug screening experiments. MICHA provides a web server and database that integrate compound annotation, including chemical structures, targets, and disease indications. It also facilitates the annotation of cell line samples, assay protocols, and literature references through curated catalogues.
Secondly, we tackle the challenge of handling the massive amount of scientific articles published in drug discovery research. While text mining techniques have been widely applied to extract relationships in other data types, such as protein–protein interactions (PPIs) and disease-gene interactions, there have been limited studies on automatically identifying DTIs articles. To achieve that, we employed Bidirectional Encoder Representations from Transformers (BERT) to classify articles that potentially contain DTIs. Furthermore, we aim to predict the assay format, as DTI data is closely tied to the specific assay used for its generation. Our novel method identifies a significant number of articles (0.6 million) not previously included in public DTI databases. We achieved a high accuracy in identifying articles with quantitative drug-target profiles and demonstrated room for improvement in predicting assay formats.
Finally, we explore the challenge of drug-target interactions (DTIs) extraction, as an entity-relationship extraction, using advanced pre-trained transformer models like BERT. To enhance the extraction accuracy, we incorporate distinct ensemble strategies. The first strategy fuses a pre-trained language model with Convolutional Neural Networks (CNN) to discern the relationships more effectively. Simultaneously, our second strategy synergizes gene descriptions derived from the Entrez Gene database with chemical descriptions obtained from the Comparative Toxicogenomics Database (CTD). Remarkably, the ensemble model that utilizes descriptions proves superior, registering a commendable F1 score of 80.6 on the concealed DrugProt hidden test set. This performance outpaces other competing models. Furthermore, our analysis comparing gene textual descriptions from both the Entrez Gene and UniProt databases provides valuable insights into their influence on the extraction's success.
The importance of this research extends beyond its technical contributions. By enhancing the accuracy and depth of drug-target interaction data, the research has potential implications for improving the prediction and understanding of drug efficacy and adverse reactions in cancer therapy. It sets the stage for more precise and individualized therapeutic strategies, which are the cornerstone of personalized medicine. Ultimately, the methods and findings of this research have the potential to impact the successful development of new drugs and the re-purposing of existing ones, underscoring its significance in the ongoing battle against cancer.
Original language | English |
---|---|
Awarding Institution |
|
Supervisors/Advisors |
|
Award date | 24 May 2024 |
Place of Publication | Helsinki |
Publisher | |
Print ISBNs | 978-952-84-0127-8 |
Electronic ISBNs | 978-952-84-0128-5 |
Publication status | Published - May 2024 |
MoE publication type | G5 Doctoral dissertation (article) |
Fields of Science
- 113 Computer and information sciences
-
Computational and Structural Biotechnology Journal (Journal)
Aldahdooh, J. M. F. (Reviewer)
1 May 2023 → 31 May 2023Activity: Publication peer-review and editorial work types › Peer review of manuscripts
File -
Using BERT to identify drug-target interactions from whole PubMed
Aldahdooh, J. M. F. (Speaker)
2 Nov 2021Activity: Talk or presentation types › Oral presentation
-
R-BERT-CNN: Drug-target interactions extraction from biomedical literature
Aldahdooh, J. M. F. (Speaker)
9 Nov 2021Activity: Talk or presentation types › Oral presentation