What we are working on

Teaching machines to understand natural language is extremely complex and is considered the holy grail in the field of Artificial Intelligence. Extracting and providing relevant and useful information from the right places is even more challenging and with current approaches ultimately impossible to scale.

We're at the intersection of Natural Language Processing, Dialogue, and Knowledge Bases, and aim to bring them together by leveraging state-of-the-art deep learning.

  • Natural Language Processing

    Natural Language Processing


    Our Natural Language Processing Engine is a new and responsive approach to the unpredictable nature of real conversations. We develop our own bespoke approaches as regular methods can have difficulties learning long range dependencies even over shorter sequences. That becomes a serious problem in NLP because the meaning of a sentence isn't always clustered closely together.

  • Deep Learning

    Deep Learning


    We utilise a combination of state-of-the-art Deep Learning techniques based on Recurrent Neural Networks and Convolutional Neural Nets to power our NLP Engine. With the right data, we can use these networks for different critical tasks, e.g. accurate semantic analysis of user queries and dialogue state.

  • Graph Databases

    Knowledge Base


    To get the best out of virtual assistant technology, we are pushing boundaries by finding the right balance between structured and unstructured data in order to provide more meaningful results, i.e. mapping the relationship of entities directly from naturally asked queries to the Knowledge Base.

  • Data Sets

    Data Sets


    One of the biggest determinants of the quality of a neural network is the data it is trained on. We are developing large-scale and rich datasets for our learning algorithms which will ensure optimal performance. These datasets are expanded and improved in continuingly.

  • Research

    Our research is focused on advancing latest concepts and techniques in relevant fields. As a result, we will publish parts of our findings in form of research papers at conferences and in journals.

    We at Wluper are open to opportunities with universities, R&D departments, or researchers in related research areas. We can collaborate either through projects, degree projects or skills development. You are welcome to get in touch for more information.

    September 2019

    LIDA: Lightweight Interactive Dialogue Annotator

    EMNLP 2019 - Hong Kong, China

    Dialogue systems have the potential to changehow people interact with machines but are highly dependent on the quality of the data used to train them. It is therefore important to develop good dialogue annotation tools which can improve the speed and quality of dialogue data annotation. With this in mind, we introduce LIDA, an annotation tool designed specifically for conversation data. As far as we know, LIDA is the first dialogue annotation system that handles the entire dialogue annotation pipeline from raw text, as may be the output of transcription services, to structured conversation data. Furthermore it supports theintegration of arbitrary machine learning models as annotation recommenders and also has a dedicated interface to resolve inter-annotator disagreements such as after crowdsourcing annotations for a dataset. LIDA is fully open source, documented and publicly available.

    (Links to the paper and code will follow soon.)

    October 2018

    Evolutionary Data Measures: Understanding the Difficulty of Text Classification Tasks

    CoNLL 2018 - Brussels, Belgium

    Classification tasks are usually analysed and improved through new model architectures or hyperparameter optimisation but the underlying properties of datasets are discovered on an ad-hoc basis as errors occur. However, understanding the properties of the data is crucial in perfecting models. In this paper we analyse exactly which characteristics of a dataset best determine how difficult that dataset is for the task of text classification. We then propose an intuitive measure of difficulty for text classification datasets which is simple and fast to calculate. We show that this measure generalises to unseen data by comparing it to state-of-the-art datasets and results.

    Read Publication Open Source Data & Code