Hinweis: Die aktuelle OOP-Konferenz finden Sie hier!

SOFTWARE MEETS BUSINESS:
The Conference for Software Architecture
31 January - 04 February 2022

Conference Program

Please note:
On this site, there is only displayed the English speaking sessions of the OOP 2022 Digital. You can find all conference sessions, including the German speaking ones, here.

The times given in the conference program of OOP 2022 Digital correspond to Central European Time (CET).

By clicking on "EVENT MERKEN" within the lecture descriptions you can arrange your own schedule. You can view your schedule at any time using the icon in the upper right corner.

Applying AI Methods to Help Users in Fixing Static Analysis Violations

The adoption of static analysis of C++ and Java requires that the findings and errors can be prioritised in an efficient way. Our work shows that Machine learning (ML) can support this presentation of static analysis results to end-users. The ML engine learns from the codebase itself, and also observes the violations that the user fixes and which he ignores. The ML uses this to suggest the next best violations to fix, relying on probability of violations to be harmful or most likely to be a noise.

Target Audience: Developer Managers, R&D Managers, Software Architects, Software Engineers
Prerequisites: English, Software development, Coding experience, C++, Java, C#.
Level: Expert

Extended Abstract
Static code analysis is often understood as a mandatory part for checking the source code compliance to government and industry regulations, company-wide guidelines and practices. It can play, however, a more fundamental role in estimating the quality of the code in general, understanding the amount of technical debt, creating the strategy to reduce the amount of technical debt, as well as a helper in making decisions on how to speed up the development by creating a more maintainable, understandable, sociable codebase.

However, by its nature, static code analysis is bound to produce a large amount of noise and false alerts that can distract the team from the actual bugs in the code and prevent them from working thoroughly with the findings. One of the reasons for that is the level of soundness of the static analysis tool. If we want to be sure that the analysis is bound to find all errors in the code, the static analysis tool has to report all possible candidates. The more sound the tool is configured to be, the larger the number of the possible errors is reported, which increases the number of false positives as well.

To improve the user experience of working with static analysis technology, we have developed a machine learning (ML) based approach to presenting the results of the static analysis to users. The ML engine can learn from the code base itself, from a user's preferences, as well as from the interaction within the team. At the code level, our engine learns from the syntactical and semantical structure of the analyzed code to understand which violations are more likely to cause more harm, which violations are more likely to be noise, what underlying problems can be fixed to drastically reduce the number of reported violations. At the user level, the ML engine observes which violations the user fixes and which violations the user ignores. Based on these observations, the ML engine builds a model and uses it to suggest the next best violations to fix.

Igor Kirilenko

Igor Kirilenko is VP of Development at Parasoft. He joined Parasoft in 2013, and currently he is responsible for technical strategy, architecture, and development of all products delivered by the company. For the past several years Igor Kirilenko has also been leading the R&D team of highly trained engineers at Parasoft who are focused on research of AI and Machine Learning technologies and creation of new approaches for improvement of accuracy in static analysis findings.

Igor Kirilenko

Track: Artificial Intelligence Now!

09:00 - 10:45

Vortrag: Mi 7.1-1

Themen: Artificial Intelligence
Programming Languages
C++

Vortrag Teilen

Keeping a Huge Product Database up to Date With State of the Art Machine Learning

Maintaining a database containing millions of products can be very challenging, especially when the information you require of these products is subject to changes over time.

We show how we used state of the art Deep Learning methods (such as Transformers, BERT) in connection with smart text matching in order to extract relevant information from free-form text.

We also explain how we leveraged the existing database to create an automatically labelled training dataset.

Our model enables us to continuously update idealos database automatically.

Target Audience: Decision Makers, Technical Project Leaders, Developers
Prerequisites: Basic knowledge of machine learning methodology
Level: Advanced

Extended Abstract

To maintain idealos product base, product information in the form of values of predefined product attributes needs to be extracted from free-form text product descriptions.

Before the use of a Machine Learning based solution, this process required a lot of manual work to define rules to extract this information. There is also a high effort connected to keeping these rules consistent across the whole database and different types of products, especially since the source of this information (the product descriptions) as well as the required information (the product properties) are subject to changes over time.

In this talk we present a machine learning solution, based on fine tuned state-of-the-art models such as BERT, which is able to extract product information automatically from product descriptions with production-ready performance.

Our solution contains two different models, each following one of the well-known problem settings in Natural Language Processing (NLP): Semantic Segmentation of text (also known as Token Classification) and Question Answering. We will present both models in detail, as well as discussing their advantages and disadvantages for solving the task at hand and how we measured its performance (metrics).

We will also emphasize the importance of identifying aspects of your data that ensure that the developed model can actually fulfill your business needs before curating your dataset.

This highlights another benefit of implementing a Machine Learning model for a huge database: You will get sanity checks of your existing data “for free”, as consistent data is a prerequisite for a successful Machine Learning project.

One problem that is very common in large organisations is that there is often no or only very little training data in the form of labels for specific text sections available. We show how we mitigated this problem by leveraging the existing database to generate a large artificial training dataset. This allowed us to only use a few thousand manually labelled examples for training and testing to reach sufficient performance.

Jan Anderssen

Jan Anderssen (PhD, Linguistics) is Domain Lead Inventory Business at idealo internet GmbH. He has more than 10 years of experience in various product development and leadership roles in e-commerce.

Jona Welsch

Jona Welsch is Machine Learning Project Lead at dida, where he is responsible for the development of machine learning solutions in the areas of Natural Language Processing and Computer Vision.

Jan Anderssen, Jona Welsch

Track: Artificial Intelligence Now!

09:00 - 10:45

Vortrag: Mi 7.1-2

Themen: Artificial Intelligence
Programming Languages
C++

Vortrag Teilen

Zurück