Research on Neural Machine Translation Integrating Sentiment Analysis
Liang Rui1 You Guoqiang Cai Yuanyuan
College of Engineering and Technology Xi’ an FanYi University Shaanxi Xi’ an 710105; Xi’an iFLYTEK H
Abstract:In order to solvethe problem scarce Chinese emotionalcorpus, pre trained BERT GPTmodels areusedType isgrafted to constructa H Chineseneural machinetrslation model with encoder decoder architecture,which isused to trslate Chinese sentiment corpus into Chinese sentiment corpus, thereby exping the Chinese sentiment corpus.Adopting a special grafting method ensures that the decoder c focus on the context the source sentence, while ensuring that the lguage knowledge already learned in BERT GPT is not compromised.
Keywords:sentiment alysis; Neural machine trslation; Artificial Intelligence Trslation;GPT
1. Introduction
1.1 Research Background Significce
Since the birth the Internet, the amount information faced by mkind has exceeded that produced in the past thouss years Raw data. In today's highly information-based society, everyone, whether old or young, c access the Internet, everyone has the opportunity to express their opinions online. Therefore, there is a large amount personal comments on the internet, including product reviews left by customers in online shopping malls, discussions on hot topics on social media, comments on government policies by ordinary people on news websites. How to use these comments on the Internet to generate benefits for society has become a new research topic. By alyzing the product reviews shopping malls, e-commerce c help underst the popularity quality issues products, make corresponding decisions. By alyzing the distribution public sentiment on hot topics public comments on government policies, it c help government departments monitor the direction public opinion make corresponding decisions quickly, better serving the public. Therefore, a technique is needed to alyze the emotional tendencies these texts, which is called text sentiment alysis.
According to the grularity text processing, text sentiment alysis c be roughly divided into document level sentiment alysis sentence level sentiment alysis Sentiment alysis aspect level sentiment alysis. Document level sentiment alysis aims alyz whether the otional attitude article is positive, negative, or neutral based on the ntir aly a sentence as the basic unit to alyze whether l. The grularity aspect level sentime valuated object) in a sentence as the basic t is positive, negative, or neutral arios, such as public opinion monitorin ommerce, the industry needs more aspect level text sentiment alysis th coarse- rained text sentence level text sentiment alysis.
2. Pretrained Chinese-English Neural Machine Trslation Mode
2.1 Design Chinese Neural Machine Trslation Model
In order to address the scarcity emotional lguage data in Chinese, this chapter will construct a neural machine for trslating Chinese to Chinese The tool trslation model is used to trslate the rich emotional corpus in Chinese into Chinese, thereby exping the Chinese emotional corpus. Firstly, pre train the BERT model using monolingual corpora from multiple lguages, pre train the GPT model using monolingual corpora from Chinese. Then, the pre trained Chinese BERT model Chinese GPT model are grafted together to form a trslation model with encoder decoder structure. Finally, the existing bilingual parallel c orpus in the laborato was utilized to fine tune the model for the Chinese Chinese trslation task, trsferring the lguage knowledge learned from the monolingual corpus to the Chinese Chinese trslation task.
The process neural machine trslation c usually be divided into three parts: representation, trsformation, generation, which are implemented by encoder, cross attention, decoder, respectively. The GPT model only predicts generates the next word based on the internal context, while the decoder in the trslation task must predict generate the target lguage word sequentially based on the semtic context the source sentence. The decoder used for trslation tasks includes a Cross Attention sublayer that captures the context the source sentence, which is responsible for converting the semtic representation the source sentence to the generation the target lguage sentence. However, the GPT model does not include the function capturing the semtic context the source sentence. Therefore, the GPT model cnot be directly used as a decoder for neural machine trslation. To address this issue, relevt studies have proposed mually inserting a cross attention sublayer into the GPT model [40], enabling it to capture the semtic context the source sentence. However, as shown in Figure 3-2, this direct insertion cross attention layers breaks the connection between the trained model parameters structure in the decoder, chges the original information flow path, thus forgets the knowledge that has already been self supervised learned on monolingual corpora in the model.
2.2. Chinese Chinese neural machine trslation model based on grafting BERT GPT
Based on the issues raised above, this article uses other approach to graft the BERT model GPT model to form a machine trslation model with encoder decoder architecture [42]. This approach does not chge the original model structure BERT GPT, but adds K layers Trsformer encoder without cross attention layer on the BERT model K layers Trsformer decoder with cross attention layer on the GPT model. The model structure is shown in Figure 3-3. The model structure is ultimately used for the trslation task from Chinese to Chinese, so it is named the Grafted Chinese Chinese Trslation Model. For the convenience subsequent discussions, the model name is abbreviated as GratedCTM (Grated ChineseTo Chinese Model).
This GratedCTM model structure c effectively solve the problem connecting BERT GPT mentioned earlier. It introduces a cross attention sublayer to capture the context the source sentence, complete the information conversion between the encoder decoder, ensure that the GPT model structure is not destroyed. Specifically, the GratedCTM model does not insert cross attention sub layers into the pre trained GPT model, but rather attaches them to it. This approach not only preserves the already trained model structure but also captures the context the source sentence. The model structure For the encoder side, additional K-layer Trsformer encoder K=6 in this pre trained on multiple monolingual corpora) to help the BER l K-layer Trsformer decoder (including a cross attent olingual corpora) at the decoder end to help it captur in Figure 3-3 represents the predicted output the e This structure maintains the integrity BERT GP with ause the model to forget the knowledge learned durin mbine the hidden state output GPT with the output the additional K-lay de dded tual information is fed into the stmax layer. The purpose using residual connection echism i utilize the p re trained generative ability in GPT to help generate better lguage models.
In order to maximize the effectiveness representation, trsformation, generation in the process machine trslation models, We will use multiple bilingual parallel corpora multiple monolingual corpora, as well as the Chinese Chinese bilingual parallel corpora notated by our laboratory. So, the training GratedCTM model c be simply divided into three stages:
(1) Pre train the encoder decoder separately using multiple monolingual corpora to obtain independent
Encoder BERT decoder GPT. Among them, the encoder is used for semtic representation the source lguage, the decoder is used to generate the target lguage. When training in multiple lguages, use byte pair encoding (BPE) to build a universal vocabulary for multiple lguages, allowing multiple lguages to share the same embedding layer parameters.
(2) Connect the pre trained BERT GPT together as shown in Figure 3-3 to form a neural machine trslation model GratedCTM with encoder decoder architecture. Then, multiple pairs bilingual parallel corpora are used to fine tune multilingual trslation on the trslation model, allowing the trslation model to learn the conversion mechism between the semtic representation the encoder the target lguage generation the decoder.
(3) Using a bilingual parallel corpus 1.2 million Chinese to Chinese words to alyze the GratedCTM model in Chinese Fine tuning Chinese trslation tasks, trsferring the trslation abilities learned from multilingual trslation tasks to Chinese Chinese trslation tasks.
2.3 monolingual pre training loss function (1) Pre training BERT encoder on monolingual corpora multiple lguages
For individual pre trained BERT encoders, use a Trsformer encoder stacked with N layers (N=6). Drawing on the pre training ideas previous BERT models [11], a Masked Lguage Model (MLM) is used as the training target, with a masking probability 15% for the input sequence. In order to enhce the universality cross linguistic lexical representation, it is chosen not to include linguistic lexical elements.
(2) Pre training M-GPT decoder on multiple monolingual corpora
For GPT decoders, use Trsformer decoders stacked with N layers (N=6) without cross attention sub layers. Using Auto regressive Lguage Model (ALM) as the training objective, for Trsformer decoders, which are entirely based on self attention mechisms, it is necessary to mask the non generated lexical elements in the model input to experiment with the characteristics autoregression.
2.4 Trslation Task Fine tuning Loss Function (1) Fine tuning multilingual trslation tasks
After obtaining the pre trained BERT encoder GPT decoder, it is necessary to connect them into a machine trslation model (MT) with encoder decoder architecture, then adjust it according to the GratedCTM model structure to better adapt to machine trslation tasks.
(2) Fine tuning Chinese Chinese trslation tasks
After fine-tuning the model on a multilingual parallel corpus, it has learned how to extract semtic tables from the source lguage The model has already developed good trslation capabilities for generating trslations into the target lguage. However, during the fine-tuning stage multilingual trslation, the Chinese Chinese bilingual parallel corpus was not included, resulting in the need to improve the trslation quality the model from Chinese to Chinese. Therefore, it is necessary to use a bilingual parallel corpus Chinese Chinese to fine tune the trslation model again, in the same way as the multilingual trslation mentioned above, ultimately enabling the GratedCTM model to trsfer knowledge from multilingual trslation tasks to Chinese Chinese trslation tasks, thereby achieving optimal results in Chinese Chinese trslation tasks.
3. Experimental setup training (1) Pre training dataset
When pre training BERT GPT, monolingual corpora from multiple lguages are used to alyze the input sequence
Implementing self supervised pre training through rom masking lexical elements. In this stage, the News Crowl corpus WMT dataset were used. After eliminating duplicate data labeling the data by lguage type, there were a total 1.4 billion sentences in 45 lguages, which is one-fifth the training corpus for the mBART model.
(2) Multi lguage trslation fine-tuning dataset
After combining pre trained BERT GPT to form a GratedCTM model, it is fine tuned using a multilingual trslation task, using parallel corpora from multiple lguages. By training on the trslation task, the unfrozen parameters in the model c be fully learned to adapt to sequential trslation tasks. In the fine-tuning stage multilingual trslation, the TED dataset is used, which is a widely used multilingual neural machine trslation dataset. We extracted one-way corpus from 30 lguages to English, including approximately 3.18 million sentence pairs 10.1 million bilingual parallel sentences. The lguage corresponding corpus size the dataset are shown in Table 3-2.
(3) Chinese Chinese Trslation Fine tuning Dataset
After fine-tuning the trslation GratedCTM using multiple bilingual parallel corpora, the model has learned two pes
Trslation between lguages. Further fine-tuning the model using bilingual parallel corpora from Chinese to Chinese c enable it to effectively trsfer trslation capabilities to Chinese to Chinese trslation tasks. The corpus used in the fine-tuning stage Chinese Chinese trslation is 1.2 million parallel sentence pairs from Chinese to Chinese mually notated by our laboratory. Among them, there are approximately 28 million Chinese words 15 million Chinese words.
Funding
This work was supported by Artificial Intelligence Trslation Shaxi Research Center. References [1] Lin Z, P X, Wg M, et al. Pre-training Multilingual Neural Machine Trslation byLeveraging Alignment [C]//Proceedings the 2020 Conference onEmpirical Methods in Natural Lguage Processing (EMNLP). 2020: 2649-2663. [2] Lin Z, Wu L, Wg M, et al. Learning Lguage Specific Sub-network for MultilingualMachine Trslation[C]//Proceedings the 59th Annual Meeting the Associationfor Computational Linguistics the 11th International Joint Conference on NaturalLguage Processing. 2021: 293-305. [3] Ko W J, El-Kishky A, Renduchintala A, et al. Adapting High-resource NMT Models to Trslate Low-resource Related Lguages without Parallel Data[C]//Proceedings the 59th Annual Meeting the Association for Computational Linguistics the 11th International Joint Conference on Natural Lguage Processing. 2021: 802-812. [4] P X, Wg M, Wu L, et al. Contrastive Learning for My-to-my Multilingual Neural Machine Trslation[C]//Proceedings the 59th Annual Meeting theAssociation for Computational Linguistics the 11th International Joint Conferenceon Natural Lguage Processing. 2021: 244-258.