Automatic Data Explorer

Open data API in a single place

Provided by Ministry of Administration and Digitization of Poland

Get early access to Automatic Data Explorer API!

Let us know and we will figure it out for you.

Dataset information

Country of origin
Updated
2023.03.13 01:11
Created
2023.03.10
Available languages
Polish
Keywords
Deep Feature Synthesis, oczyszczanie danych, meta learning, machine learning, Deep Learning, word embedding, algorytm predykcyjny, Featuretools, detekcja typów danych, algorytmy klasyfikacji, konwersja tekstu, Model Sato, analizy statystycznej atrybutów danych, macierz wystąpień słów, kodowanie wektorów uczących
Quality scoring
305

Dataset description

<p>Industrial research:<br /> Task No. 1. Research on Data Refinement and Feature Engineering algorithms</p> <p>Stage tasks:<br /> Task 1: Development of algorithms for statistical analysis of attribute values for data purification.<br /> The aim of the task was to develop an algorithm that is able to identify the type of attribute (scalar, discrete) and depending on the type (text, number, date, text label, etc.) and deduce which values can be considered correct and which are incorrect and cause noise dataset, which in turn affects the quality of the ML model.</p> <p>Task 2: Development of algorithms for statistical analysis of data attributes in terms of optimal coding of learning vectors.<br /> The aim of the task was to develop an algorithm that is able to propose optimal coding of the learning vector to be used in the ML process and perform the appropriate conversion, depending on the type (text, number, date, text label, etc.) for each type of attribute (scalar, discrete). e.g. converting text to word instance matrix format. It was necessary to predict several possible conversion scenarios that are most often used in practice, resulting from the heuristic knowledge of experts.</p> <p>Task 3: Developing a prototype of an automatic data cleaning and coding environment and testing the solution on samples of production data.</p> <p>Industrial Research: Task No. 2. Research on the meta-learning algorithm</p> <p>Task 1: Review of existing meta-learning concepts and selection of algorithms for further development<br /> The aim of the task was to analyze the state of knowledge on meta-learning in terms of the possibility of using existing research results in the project - a task carried out in the form of subcontracting by a scientific unit.</p> <p>Task 2: Review and development of the most commonly used ML algorithms in terms of their susceptibility to hyperparameter meta-learning and practical usefulness of the obtained models.<br /> The aim of the task was to develop a pool of basic algorithms that will be used as production algorithms, i.e. performing the right predictions. The hyperparameters of these algorithms have been meta-learning. It was therefore necessary to develop a model of interaction of the main algorithm with individual production algorithms. – task carried out in the form of subcontracting by a scientific unit.</p> <p>Task 3: Development of a meta-learning algorithm for selected types of ML models<br /> The aim of the task was to develop the main algorithm implementing the function of optimizing hyperparameters of production models. It should be noted that the hyperparameters have a different structure depending on the specific production model, so the de facto appropriate solution was to use a different optimization algorithm for each model separately.</p> <p>Task 4: Developing a prototype of the algorithm and testing the operation of the obtained production data models.</p> <p>Experimental development work: Task No. 3. Research on the prototype of the architecture of the platform implementation environment</p> <p>Task 1: Developing the architecture of the data acquisition and storage module.<br /> The aim of the task was to develop an architecture for a scalable ETL (Extract Transform Load) solution for efficient implementation of the source data acquisition process (Data Ingest). An attempt was made to consider appropriate parsing algorithms and standardization of encoding data of various types (e.g. dates, numbers) in terms of effective further processing.</p> <p>Task 2: Development of a module for configuring and executing data processing pipelines in a distributed architecture.<br /> Due to the high complexity of the implemented algorithms, it was necessary to develop an architecture that would allow pipeline processing of subsequent data processing steps on various machines with the possibility of using a distributed architecture in a cloud and/or virtual environment. The use of existing concepts of distributed architectures, such as Map Reduce, was considered here.</p> <p>Task 3: Development of a user interface enabling intuitive control of data processing.</p>
Build on reliable and scalable technology
Revolgy LogoAmazon Web Services LogoGoogle Cloud Logo
FAQ

Frequently Asked Questions

Some basic informations about API Store ®.

Operation and development of APIs are currently fully funded by company Apitalks and its usage is for free.
Yes, you can.
All important information such as time of last update, license and other information are in response of each API call.
In case of major update that would not be compatible with previous version of API, we keep for 30 days both versions so you will have enough time to transfer to new version. We will inform you about the changes in advance by e-mail.

Didn't find the API you need?

Let us know and we will figure it out for you.

API Store provides access to European Open Data via scalable and reliable REST API interface.
Copyright © 2024. Made with ♥ by Apitalks