The material consists of TF-IDF data matrices for machine learning, compiled from the “Ask a librarian” question/answer corpus. The corpus is in Finnish. Data matrices are particularly suitable for so-called so-called high-class so-called data matrices. Extreme Multi-label Text Classification (XMTC) machine learning models.
The original corpus contains 3150 short documents from the Ask Librarian service in Finnish. Each document is a question from the public that the librarian has answered.
The Corpus was picked from a collection of more than 25,000 questions/answers, limiting that there must be at least 4 subjects in the document.
The Corpus is divided into the following directories:
all: includes all documents (N = 3150)
train: includes questions before 2016 (N = 2625) for education and training
Maui-train: random sample set (N = 200) from the train directory for training the Maui model
validate: includes questions from 2016 (N = 213) for validation (e.g. selection of hyperparameters for classifier)
test: includes the 2017 questions (N = 312) for final evaluation
The original corpus is available at https://github.com/NatLibFi/Annif-corpora/tree/master/fulltext/kirjastonhoitaja
The actual Ask Librarian service can be found at https://www.kirjastot.fi/kysy. Kirjastot.fi delivery is responsible for the development and maintenance of the service.
Build on reliable and scalable technology
FAQ
Frequently Asked Questions
Some basic informations about API Store ®.
Operation and development of APIs are currently fully funded by company Apitalks and its usage is for free.
Yes, you can.
All important information such as time of last update, license and other information are in response of each API call.
In case of major update that would not be compatible with previous version of API, we keep for 30 days both versions so you will have enough time to transfer to new version. We will inform you about the changes in advance by e-mail.