entity not in the system's lexicon was encountered, the
system failed as expected. As an illustration, the
following query containing the word "temperature",
which is not defined in the parser file, was tested:
Query D: "Show temperature sensor data for
the last 15 minutes”
In this case, the parser function returned an empty
dictionary, because it found no matches to the
Matcher rules. The control mechanism identified this
empty result, and the system responded to the user
with an HTTP 400 Bad Request error: “Could not
extract both the measure and field names from your
query”.
7 CONCLUSION
The evaluation of the prototype's rule-based NLP
methodology reveals both distinct advantages and
inherent limitations. This analysis reveals a
fundamental trade-off between semantic flexibility
and a strictly constrained vocabulary. This result is a
natural consequence of the system's current rule-
based and closed-vocabulary design. While the
system is robust to grammatical variations in the
terms it is taught, it lacks the ability to understand or
predict concepts outside its knowledge base. This is a
price to pay for reliability and predictability. The
system clearly prefers to fail rather than hallucinate
an unfamiliar topic and generate an incorrect query.
This behavior is particularly desirable for critical
monitoring systems.
Future work will target enhanced semantic
understanding and analytical depth. The extant rule-
based Matcher is planned for augmentation with
enhancing flexibility for synonyms and extra-
vocabulary expressions. Concurrently, the Query
Builder layer will be developed to support advanced
Flux functions, such as aggregation, to enable cross-
source correlation analyses. A dialogue management
module is envisioned to preserve context across
sequential requests, extending interaction beyond
discrete queries.
REFERENCES
Vial, G. (2019). Understanding digital transformation: A
review and a research agenda. The Journal of Strategic
Information Systems, 28(2), 118–144.
https://doi.org/10.1016/j.jsis.2019.01.003
McAfee, A., Brynjolfsson, E., Davenport, T., Patil, D. J., &
Barton, D. (2012). Big data: The management
revolution. Harvard Business Review, 90(10), 61–67.
Woods, W., Kaplan, R., & Webber, B. (1972). The Lunar
Sciences Natural Language Information System.
Affolter, K., Stockinger, K., & Bernstein, A. (2019). A
comparative survey of recent natural language
interfaces for databases. The VLDB Journal, 28, 793–
819. https://doi.org/10.1007/s00778-019-00567-8
Jurafsky, D., & Martin, J. (2008). Speech and language
processing: An introduction to natural language
processing, computational linguistics, and speech
recognition (2nd ed.). Prentice Hall.
Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to
sequence learning with neural networks. In
Proceedings of the 27th International Conference on
Neural Information Processing Systems (NIPS’14) (pp.
3104–3112).
Chang, Y., Wang, X., Wang, J., et al. (2024). A survey on
evaluation of large language models. ACM
Transactions on Intelligent Systems and Technology,
15(3), Article 39. https://doi.org/10.1145/3641289
Cai, R., Xu, B., Yang, X., Zhang, Z., & Li, Z. (2017). An
encoder-decoder framework translating natural
language to database queries. arXiv Preprint
arXiv:1711.06061.
https://doi.org/10.48550/arXiv.1711.06061
Bazaga, A., Gunwant, N., & Micklem, G. (2021).
Translating synthetic natural language to database
queries with a polyglot deep learning framework.
Scientific Reports, 11, 18462.
https://doi.org/10.1038/s41598-021-98019-3
Hornsteiner, M., Kreussel, M., Steindl, C., Ebner, F., Empl,
P., & Schönig, S. (2024). Real-time text-to-Cypher
query generation with large language models for graph
databases. Future Internet, 16(12), 438.
https://doi.org/10.3390/fi16120438
Jiang, Y., Pan, Z., Zhang, X., et al. (2024). Empowering
time series analysis with large language models: A
survey. In Proceedings of the International Joint
Conference on Artificial Intelligence (IJCAI’24) (pp.
8095–8103). https://doi.org/10.24963/ijcai.2024/895
InfluxData. (n.d.). InfluxDB time series data platform.
https://www.influxdata.com/
Honnibal, M., & Montani, I. (2017). spaCy 2: Natural
language understanding with Bloom embeddings,
convolutional neural networks and incremental
parsing. https://spacy.io/
Dubey, A., et al. (2024). The Llama 3 herd of models. arXiv
Preprint arXiv:2407.21783.
https://doi.org/10.48550/arXiv.2407.21783
Ollama. (n.d.). Ollama. https://ollama.com/
Ramírez, S. (2018). FastAPI [Computer software].
https://github.com/fastapi/fastapi
APPENDIX
The source code can be downloaded at
https://github.com/kayalaboratory/NLQ-Flux
NLP-Based Query Interface for InfluxDB: Designing a Hybrid Architecture with SpaCy and Large Language Model