Teaching machines to understand natural language is extremely complex and is considered the holy grail in the field of Artificial Intelligence. Extracting and providing relevant and useful information from the right places is even more challenging and with current approaches ultimately impossible to scale.
We're at the intersection of Natural Language Processing, Dialogue, and Knowledge Bases, and aim to bring them together by leveraging state-of-the-art deep learning.
Our Natural Language Processing Engine is a new and responsive approach to the unpredictable nature of real conversations. We develop our own bespoke approaches as regular methods can have difficulties learning long range dependencies even over shorter sequences. That becomes a serious problem in NLP because the meaning of a sentence isn't always clustered closely together.
We utilise a combination of state-of-the-art Deep Learning techniques based on Recurrent Neural Networks and Convolutional Neural Nets to power our NLP Engine. With the right data, we can use these networks for different critical tasks, e.g. accurate semantic analysis of user queries and dialogue state.
To get the best out of virtual assistant technology, we are pushing boundaries by finding the right balance between structured and unstructured data in order to provide more meaningful results, i.e. mapping the relationship of entities directly from naturally asked queries to the Knowledge Base.
One of the biggest determinants of the quality of a neural network is the data it is trained on. We are developing large-scale and rich datasets for our learning algorithms which will ensure optimal performance. These datasets are expanded and improved in continuingly.
Our research is focused on advancing latest concepts and techniques in relevant fields. As a result, we will publish parts of our findings in form of research papers at conferences and in journals.
We at Wluper are open to opportunities with universities, R&D departments, or researchers in related research areas. We can collaborate either through projects, degree projects or skills development. You are welcome to
Dialogue systems have the potential to changehow people interact with machines but are highly dependent on the quality of the data used to train them. It is therefore important to develop good dialogue annotation tools which can improve the speed and quality of dialogue data annotation. With this in mind, we introduce LIDA, an annotation tool designed specifically for conversation data. As far as we know, LIDA is the first dialogue annotation system that handles the entire dialogue annotation pipeline from raw text, as may be the output of transcription services, to structured conversation data. Furthermore it supports theintegration of arbitrary machine learning models as annotation recommenders and also has a dedicated interface to resolve inter-annotator disagreements such as after crowdsourcing annotations for a dataset. LIDA is fully open source, documented and publicly available.
(Links to the paper and code will follow soon.)
Classification tasks are usually analysed and improved through new model architectures or hyperparameter optimisation but the underlying properties of datasets are discovered on an ad-hoc basis as errors occur. However, understanding the properties of the data is crucial in perfecting models. In this paper we analyse exactly which characteristics of a dataset best determine how difficult that dataset is for the task of text classification. We then propose an intuitive measure of difficulty for text classification datasets which is simple and fast to calculate. We show that this measure generalises to unseen data by comparing it to state-of-the-art datasets and results.