Classification models (and its results) trained on certain CSV resources of data.gouv.fr

Open data API in a single place

Provided by etalab

Get early access to Classification models (and its results) trained on certain CSV resources of data.gouv.fr API!

Let us know and we will figure it out for you.

Dataset information

Country of origin
Updated
2020.06.24 18:24
Created
2020.06.24
Available languages
French
Keywords
classification, datascience, machine-learning
Quality scoring
155

Dataset description

## Context Data.gouv.fr (DGF) contains thousands of CSV-type resources. Determining by hand which resources can be useful to make explanations/tutorials in Machine Learning (ML) with open data seems a monumental task. In addition, being able to use open data presents a great opportunity to familiarise users with the open data approach and at the same time promote the reuse of this data. ## Methodology In order to possibly speed up the process of selecting datasets relevant to the ML, in this dataset we present for each of 5479 analysed CSV files (cataloged in data.gouv.fr), a list of the models trained on each of the categorical variables detected in each CSV. For now we only focus on supervised classification models. Briefly, the analysis consists of detecting the categorical columns of each dataset, testing several classification models by having each of these columns as a variable to explain. Finally, we save the details of each model tested as well as its results in terms of performance. “'” For each CSV: Determine the categorical columns; For each categorical variable (or categorical columns): a. Turn a set of “baseline” learning models (GaussianNaiveBayes, LogisticRegression, DecisionTrees,...); B. Recover performance from validation based on performance metrics: (accuracy, recall_macro, precision_macro, f1_macro, roc_auc) Save this information in a CSV “'” This methodology is absolutely based on the library [dabl: The data analysis baseline library](https://dabl.github.io/dev/). ## Output The CSVs of this dataset are organised by dataset producer. The name of each CSV file follows the format ‘id-dataset-id-resource.csv’. Each generated CSV can contain these columns: ‘csv_id’: ID of the DGF dataset followed by the resource’s Id (separated by ‘--’); ‘task’: Task ML (only classification for now); ‘Algorithm’: Name of the algorithm tested as well as this initial configuration; ‘target_col’: Names of the category column tested; ‘nb_features’: Number of features used in the model; ‘features_names’: Names of features used; ‘classes’: Names of classes predicted; ‘nb_classes’: Number of classes predicted; ‘nb_lines’: Number of rows in the original dataset; ‘nb_samples’: Number of rows in the sample tested; ‘date’: Date of analysis; ‘accuracy’ ‘recall_macro’ ‘precision_macro’ ‘f1_macro’ ‘average_precision’ ‘roc_auc’ ‘avg_scores’: Average of the calculated scores; ## Code The code to produce this dataset is [on](https://github.com/psorianom/mlearnable-datasets-detective/tree/master). ## TODO 1. Launch the same analysis for regression (with continuous values) 2. Standardise the columns of all CSV products (the same header for all CSVs) 3. Add a variable that displays the correlation between columns.
Build on reliable and scalable technology
Revolgy LogoAmazon Web Services LogoGoogle Cloud Logo
FAQ

Frequently Asked Questions

Some basic informations about API Store ®.

Operation and development of APIs are currently fully funded by company Apitalks and its usage is for free.
Yes, you can.
All important information such as time of last update, license and other information are in response of each API call.
In case of major update that would not be compatible with previous version of API, we keep for 30 days both versions so you will have enough time to transfer to new version. We will inform you about the changes in advance by e-mail.

Didn't find the API you need?

Let us know and we will figure it out for you.

API Store provides access to European Open Data via scalable and reliable REST API interface.
Copyright © 2024. Made with ♥ by Apitalks