報告題目:Image Understanding of Figures in Biomedical Literature
報告時間:2019年6月6日 上午9:00
報告地點:伟德国际BETVlCTORA521
報告人:許東 教授
報告人簡介:
Dong Xu is Shumaker Endowed Professor in Department of Electrical Engineering and Computer Science, Director of Information Technology Program, with appointments in the Christopher S. Bond Life Sciences Center and the Informatics Institute at the University of Missouri-Columbia. He obtained his PhD from the University of Illinois, Urbana-Champaign in 1995 and did two years of postdoctoral work at the US National Cancer Institute. He was a Staff Scientist at Oak Ridge National Laboratory until 2003 before joining the University of Missouri, where he served as Department Chair of Computer Science during 2007-2016. His research is in computational biology and bioinformatics, including machine-learning application in bioinformatics, protein structure prediction, post-translational modification prediction, high-throughput biological data analyses, in silico studies of plants, microbes and cancers, biological information systems, and mobile App development for healthcare. He has published more than 300 papers. He was elected to the rank of American Association for the Advancement of Science (AAAS) Fellow in 2015.
報告内容簡介:
Figures in the scientific literature contain rich information. For example, many new molecular mechanisms of genomics, pharmacogenomics, immunology, and other fields are reflected in pathway figures and need to be curated for various applications, especially in precision medicine. Current manual curation approaches are inadequate in keeping up with the pace of biomedical literature growth. Compared with textual representations, pathway figures in biomedical literature often contain more direct representations of the mechanisms. However, no systematic method for curating pathway figures exists in publications. Here, we propose a pathway curation pipeline, which integrates a deep learning model with an optical character recognition method and an image processing strategy to capture the locations, names, and interactions of pathway entities in the figure. Our pipeline was evaluated on the figures from PubMed publications. The results demonstrate that our model can effectively retrieve molecular entities and their interactions from pathway figures at a large scale. The proposed pipeline provides an alternative way to text-mining approaches in biological literature mining. In future work, we will combine our method with text-mining tools to enrich extracted information and reconstruct pathway mechanisms fully.
主辦單位:
伟德国际BETVlCTOR
伟德国际BETVlCTOR軟件學院
伟德国际BETVlCTOR計算機科學技術研究所
符号計算與知識工程教育部重點實驗室
伟德国际BETVlCTOR國家級計算機實驗教學示範中心