Peng Yufang, Chen Jianghao, He Zhiqiang
[Purpose/Significance] The study attempts to extract the fine-grained evidence data of the South China Sea from the document carrier to the document content(full-text search) to the data level.Firstly, it can improve the retrieval performance of the digital resources of the South China Sea literature;secondly, it provides sufficient evidence materials for professionals;and finally, it provides a foundation for the construction of the evidence chain association model of the South China Sea rights protection.[Method/Process] According to the characteristics of the South China Sea rights protection evidence, the extraction rules were formulated.Unstructured data were transformed into structured data through text cleaning, text segmentation, paragraph segmentation, and word segmentation.Then the evidence data extraction effects of Naive Bayes, SVM, Random Forest, DNN, TextCNN, Bi-LSTM, LightGBM and XGBoost were compared respectively.Finally, in order to further improve the accuracy of evidence extraction, "5W" rule filtering and manual verification were added.[Result/Conclusion] The experimental results showed that based on the TensorFlow deep learning framework, the evidence data extraction effect of the DNN model was better, and the accuracy rate was 0.88.Through further integration of "5W" rule filtering and manual verification, the accuracy of evidence extraction was significantly improved.The method of evidence extraction from the South China Sea literature in this article has certain feasibility.