Publications

Google Scholar will be the most up-to-date.

The UCF Systems for the LoResMT 2021 Machine Translation Shared Task

William Chen and Brett Fazio

Published in The 4th Workshop on Technologies for MT of Low Resource Languages, 2021

We present our systems for the LoResMT 2021 Shared Task, participating in the English-Irish and English-Marathi translation pairs. We focused our efforts on constrained track of the task, using transfer learning and subword segmentation to enhance our models given small amounts of training data. Our models achieved the highest BLEU scores on the fully constrained tracks of English-Irish, Irish-English, and Marathi-English with scores of 13.5, 21.3, and 17.9 respectively.

Download here

Morphologically-Guided Segmentation For Translation of Agglutinative Low-Resource Languages

William Chen and Brett Fazio

Published in The 4th Workshop on Technologies for MT of Low Resource Languages, 2021

Neural Machine Translation for Low Resource Languages is often limited by the lack of available training data, making it necessary to explore additional techniques to improve translation quality. We propose the use of the Prefix-Root-Postfix-Encoding (PRPE) subword segmentation algorithm to improve translation quality for LRLs, using two agglutinative languages as case studies: Quechua and Indonesian. We achieve state-of-the-art results for both languages, obtaining higher BLEU scores than large pre-trained models with much smaller amounts of data.

Download here

In Silico Model for miRNA-mediated Regulatory Network in Cancer

Khandakar Tanvir Ahmed, Jiao Sun, William Chen, Irene Martinez, Sze Cheng, Wencai Zhang, Jeongsik Yong, and Wei Zhang.

Published in Briefings in Bioinformatics, 2021

Current data analyses on gene expression are mostly focused on differential gene/transcript expression in big data-driven studies. However, a poor connection to the proteome changes is a widespread problem in current data analyses. In this study, we overcome these limitations and introduce a graph-based learning model, PTNet, which simulates the miRNAs (microRNAs) that regulate gene expression post-transcriptionally in silico.

Download here