Abstract:Pathogenicity prediction of missense variants is important in genetic research and clinical application. Among current related methods, computational methods have been widely applied and shown outstanding performance. Most computational methods are based on functional impact of alteration or conservation of sequence. Considering some natural language processing (NLP) methods are used in biological sequence tasks by transfer learning, handling DNA sequences as a kind of biological language and predicting the pathogenicity of genetic variants are encouraging. Based on a pretrained NLP model and DNA sequences with altered allele, we propose a deep learning model named MissenseBert to predict the pathogenicity of missense variants. Training and evaluated with multiple datasets, MissenseBert achieves promising performance and illustrate the feasibility of predicting the pathogenicity by DNA sequence.