2024-2025 Evaluation campaign - Group E

SD department - Data Science

Portfolio of team LaHDAK
Large-scale Heterogeneous DAta and Knowledge
Données et Connaissances Massives et Hétérogènes

Efficient Computation of General Modules for ALC Ontologies

Context

This work has been conducted in the setting of the PSPC AIDA project [2019-2023] funded by PBI France. This work was carried out as part of the PhD project of Hui Yang co-supervised by Dr Yue Ma and Pr Nicole Bidoit from the LaHDAK team in collaboration with Dr Patrick Koopmann from TUD Dresden University of Technology in Germany. This work is an important contribution to the "Axis 1: Knowledge refinement and automatic reasoning" of the team and provides a novel efficient method for extracting general modules for ontologies formulated in the description logic ALC.

Contribution

In the context of knowledge acquisition, this work proposes a novel efficient method for extracting general modules for ontologies formulated in the description logic ALC. A module for an ontology is an ideally substantially smaller ontology that preserves all entailments for a user-specified set of terms. As such, it has applications such as ontology reuse and ontology analysis. Different from classical modules, general modules may use axioms not explicitly present in the input ontology, which allows for additional conciseness. So far, general modules have only been investigated for lightweight description logics. We present the first work that considers the more expressive description logic ALC.The method originality lies in guarantying conciseness of the extracted modules.

Impact

This work has been published at IJCAI 2023, one of the major conferences in Artificial Intelligence at the international level. This work has given rise to several invited talks at workshops such as the ones organised by the GDR RADIA (ex-IA). This work will be pursued in the setting of the ANR project EXPIDA (2023-2027).

Sequential Learning Algorithms for Contextual Model-Free Influence Maximization

Context

This work was carried out as part of the PhD project of Alexandra Iacob co-supervised by Bogdan Cautis and Silviu Maniu from the LaHDAK team. It addressed the problem of information diffusion. The generic problem of Influence (i.e., spread) Maximization (IM) is one of the most studied problems in the the graph mining domain due to its numerous applications (e.g., viral marketing, biological systems, electrical grids). This work brings an important contribution to the "Axis 2: Data Mining, Graphs and Optimization" of the team and provides original contributions to the problem of influence maximisation problem when the underlying probabilistic diffusion model is unknown, i.e. influence maximisation in the dark.

Contribution

In the context of information diffusion when the underlying diffusion model is unknown, our approach exploits ongoing diffusion campaigns to learn the diffusion model parameters by relying to sequential learning. We developed an original algorithm which implements the optimism in the face of uncertainty principle for episodic reinforcement learning with linear approximation. The learning agent estimates for each seed node in the diffusion graph its remaining potential with a Good-Turing estimator, modified by an estimated Q-function. The algorithm has been empirically proven and shown better performances than state-of-the-art methods on two real-world datasets and a synthetically generated one.

Impact

This work has been published at the A*-ranked ACM International Conference on Knowledge Discovery and Data Mining (KDD’2023), one of the major conferences in data mining at the international level. This work has given rise to several invited talks and has been pursued in the setting of the international CNRS@Create DesCartes in collaboration with National University of Singapore (2021-2026).

Artificial Intelligence for Digital Automation (AIDA)

Context

Artificial intelligence (AI) has become part of our daily lives and is having an impact on a wide range of activities. However, the potential of its impact on companies' operational systems is still far from being fully realized. Major issues remain unresolved today, in particular with regard to control and trust - including bias management; the suitability of technologies; the flexibility and methodology of implementation; the skills required, etc.

Injecting intelligence into the automation of business operations therefore remains a strategic imperative in order to reduce costs, improve performance and the use of resources, including human resources, and offer personalized, differentiating services in the face of ever-increasing competition, particularly from new, purely digital players with data and algorithms at the heart of their strategy.

The AIDA (AI for Digital Automation) project aims to develop a platform combining AI by learning and symbolic AI, enabling businesses to improve their performance by integrating artificial intelligence into their day-to-day operations. It aims to develop a combined approach to 'injecting' AI into the heart of operational systems, with complete confidence. By combining AI and automation, it aims to enable companies to:

Contribution

The scientific work conducted in the AIDA project aimed at hybridizing AI techniques (i.e. combining machine learning and symbolic AI) by addressing several research questions that concern the dimensions AI systems, i.e. the knowledge, the data and the decisions. The Paris Saclay University labs have contributed in interaction with the industrial partners to these five research questions:

Some of the results we are very proud of include:

Impact

The research work carried out in the laboratories of Université Paris Saclay has led to 9 doctoral theses and important research results recognized by two patents and 35 publications in international conferences and journals such as the very prestigious conferences in artificial intelligence such as AAAI, IJCAI, ISWC, NeurIPS and ICLR. The LaHDAK team of LISN is currently involved in discussions for future collaborations with IBM (as part of the DATAIA convergence institute) and other companies developing technologies related to the subjects addressed in the AIDA project.

AI Chair - Fraud Detection and Automated Trading

Context

The LUSIS chair [2020-2024] is a research contract between LISN, CentraleSupélec and the LUSIS company. LUSIS is mainly a software editor, with their TANGO high-value product, a credit card transaction engine. LUSIS has top-tier world banks among their clients. Our partnership started with master level projects for CentraleSupélec students. End of 2019, we brought our partnership to a new level by signing a 4 years research contract (chair) for 550k€ between LUSIS, CentraleSupélec and LISN. Fabrice Popineau from the LaHDAK team is holding the chair, and the other researchers involved in the chair are Bich-Liên Doan from the A&O team and Arpad Rimmel from the GALAC team. The chair supports funding for 3 or 4 master-level projects per year and 2 PhD students. There are two lines of research which are fraud detection in credit card payments and algorithmic trading.

Contribution

The work carried out as part of the LUSIS Chair focused on two themes: Fraud Detection and Automated Trading.

  1. Fraud detection: regardless of the big amount of annual fraud (around 25 billion dollars losses), every possible detection solution must be considered to limit its spread. This is why a great deal of research has been carried out on this issue over the last 20 years. As the publisher of a high-performance transactional platform for payment systems, LUSIS has a key interest in fraud countermeasures. The research work in LUSIS chair has focused on the study of fraud detection both from the point of view of algorithm performance, but also with the constraint of realistic implementation on real data. This work let to several novel machine-learning algorithms based on anomaly detection that are able to handle unbalanced data (less than 0.5% transactions being fraudulent) while avoiding false positives. Some of the developed algorithms allow online detection of fraudulent transactions.
  2. Automated trading: Automated trading systems seek to place orders on the financial markets in such a way as to grow capital while limiting the inherent risk. The prices of financial products are time series that vary according to numerous parameters, not all of which are observable. Within machine learning (ML), time series are very specific data requiring the use of specific methods. Most often, the evaluation of ML-based methods is based solely on the accuracy of the market's direction (up or down). However, over a series of predictions, a high level of accuracy can also result in a high loss, since the amplitude is not taken into account. A highly accurate model can result in a negative gain. In the LUSIS chair several contributions have been brought for real operability metrics beyond accuracy, the use of backtesting to build better performing models and reinforcement learning for building robust models.

All these contributions have conducted to several publications in international conferences and workshops in artificial intelligence and data mining such as the "International Joint Conference on Neural Networks" and "International Joint Conference on Knowledge Discovery, Knowledge".

Impact

Thanks to the research results obtained in LUSIS chair, the chair has just been renewed for another 4 years with a budget of 825k€ and another line of research related to microbiota and health. It is planned to hire a full-time researcher to work on these lines of research and help with advising the students.

DesCartes - Intelligent Modelling for Decision-making in Critical Urban Systems

Context

DesCartes project is one of the project that have been launched in 2021 by the Campus for Research Excellence and Technological Enterprise (CREATE) program ported by the National Research Foundation in Singapore. CREATE is an international collaboratory housing research centres set up by top universities. At CREATE, researchers from diverse disciplines and backgrounds work closely together to perform cutting-edge research in strategic areas of interest, for translation into practical applications leading to positive economic and societal outcomes for Singapore. The interdisciplinary research centres at CREATE focus on four areas of interdisciplinary thematic areas of research, namely human systems, energy systems, environmental systems and urban systems.

In this setting DesCartes a five years research program funded by CNRS@CREATE Singapore for 25ME [2021-2026]. Its main objective is to build AI-based Decision making techniques for Critical Urban Systems in the context of smart cities. This project provided funding for 3 Ph.D. theses and 3 Post-docs co-supervised with colleagues from NSU (e.g. Vincent Y. F. Tan). Yuting FENG, a former Ph.D. student of LaHDAK, has been recruited as a research fellow at NSU.

Contribution

LaHDAK team is leading the WP2 that focuses on hybrid artificial intelligence (HAI) whose objective is to contribute to intelligent-control, developing efficient and effective techniques for decision making under uncertainty, which are paramount in many application scenarios, and in particular in urban systems. Further, WP2 considers smart data in scenarios with limited/constrained data and resources (e.g., with throttled or streaming data, or in the presence of selection bias), based on complex (e.g., graphs) or uncertain/incomplete data, possibly in online and adaptive processes.

The work conducted in DesCartes program has already led to several publications in very prestigious international conferences such as the Web Conference 2024.

Impact

This project has played an extremely important role in raising the international visibility of the team and fructifying numerous collaborations with both academic and industrial partners.