simplest situation, the resulting effect is very general,
and many times it is unreasonable. In order to ensure
the generation effect, some complex generation
strategies are generally used, such as Beam Search,
Top-k sampling, Top-p sampling (Nuclear sampling),
Repetition_penalty, Length_penalty, etc., in this way,
some other factors of text generation, such as fluency,
richness, consistency, etc. will be taken into account,
and the effect of text generation can be greatly
improved.
4 CONCLUSIONS
Domain knowledge discovery generally has the
connotation of "discovery, search, induction and
extraction", and the contents to be sought are often
not obvious, but hidden in the text, or people can not
directly find and summarize in a large range. If you
want to pull out the pieces, you need to combine
domain knowledge (such as the common sense of
poetry in the paper), use a variety of analytical
methods (such as various NLU and NLG methods in
the paper), and sometimes even need to reverse
thinking (such as the generation of poetry in the
paper), and it is best that all kinds of analysis should
be a sequential and complementary organic whole. In
this way, the exploration of text data can be
completed with the highest efficiency. Machine
learning algorithms provide powerful tools for text
mining and knowledge management. By choosing the
algorithm model and feature representation method
reasonably, valuable knowledge and information can
be obtained quickly and accurately. However, it is
also necessary to pay attention to the training data
quality, feature selection and model evaluation. With
the continuous development of technology, machine
learning algorithms will play an increasingly
important role in the field of text mining and
knowledge management, bringing us more
applications and discoveries.
ACKNOWLEDGEMENTS
Supported by Science and Technology Research
Project of Education Committee of Hubei Province
(Grant NO.B2022062 ).
REFERENCES
Sebastiani, F., 2002. Machine learning in automated text
categorization. Journal of ACM Computing Surveys.
Li, F, L. , Chen, H. , Xu, G., et al, 2020. AliMe KG: Domain
Knowledge Graph Construction and Application in
Ecommerce. CIKM '20: The 29th ACM International
Conference on Information and Knowledge
Management. ACM.
Huang, S., Wan, X., AKMiner., 2013. Domain specific
knowledge graph mining from academic literatures.
International Conference on Web Information Systems
Engineering. Springer, Berlin, Heidelberg.
Kim, H. Howland, P., Park, H., 2005. Dimension reduction
in text classification with support vector machines.
Journal of Machine Learning Research.
Yang, Y.,1999. An evaluation of statistical approaches to
text categorization.Journal of Infor-mation Retrieval.
Apte, C., Damerau, F., Weiss, S., 1998. Text mining with
decision rules and decision trees. In Proceedings of the
Conference on Automated Learning and Discovery,
Pittsburgh,USA.
Robertson, S, E., Harding, P., 1984. Probabilistic automatic
indexing by learning from human indexers. Journal of
Documentation.
Sarah, A., Alkhodair, Benjamin,C.M., Fung,Osmud
Rahman, Patrick, C.K., Hung, 2018. Improving
interpretations of topic modeling in microblogs.
Journal of the Association for Information Science and
Technology.
Qiao, B., Fang, K., Chen, Y., et al, 2017. Building thesaurus
based knowledge graph based on schema layer. Cluster
Computing.
Joachims, T., 1997. A probabilistic analysis of the Rocchio
algorithm with TFIDF for text categorization.
International Conference on Machine Learning.