AI Sauna/Resources

Add a new resource

Edit resources

– Click here for instructions

Add a new resource:

Click on Add a new resource.
Fill in the basic information and click on Publish changes... to save the page.
Return to this page and edit the entry to add more information.

Edit an existing resource:

Click on Edit resources above.
Select the resource you wish to edit. Click Edit
In the popup window, make the changes you like. You can add more input options from the left panel by ticking one of the blue checkboxes. Confirm by clicking on Apply changes.
Finally, click on Publish changes... to save the page.

Finna API

API The API provides a way to perform searches to the material provided by the organizations (Finnish libraries, archives and museums) participating in Finna.fi.

Osma Suominen

Resource links - Swagger - Python API client library

National Library of Finland on Hugging Face

Collaboration platform A collection of datasets and AI models (Annif and fine-tuned LLMs) published by the National Library of Finland.

Osma Suominen

Resource links Organization

A Generated Family of Man

Publication An exploratory publication made by the Flickr Foundation in 2023 to investigate and reveal the state of the art of machine-generated captions and imagery.

George Oates

Resource links →A Generated Family of Man

Avoin data – tarjolla Ylen sisältöjä ja metatietoa

Usein kysyttyä

Micke Hindsberg

Resource links Avoin data – tarjolla Ylen sisältöjä ja metatietoa

Word vectors based on Yle's article corpus

Data

Micke Hindsberg

Resource links Word vectors based on Yle's article corpus

LUMI Supercomputer

Computing Access to Jupyter Notebook on LUMI supercomputer.

Mats Sjöberg

Resource links LUMI access

Future Audiences / List of experiment ideas

Idea list This page outlines experiments for using new technology, like generative AI tools, in the Wikimedia movement to innovate knowledge sharing. These experiments are small-scale and can be quickly executed in hackathons or by volunteer developers. Generated by the Future Audiences team, they aim to inspire Wikimedia community members and others to contribute, discuss, and try out these ideas.

Johan Jönsson

Resource links List of experiment ideas

National Archives of Finland on Hugging Face

Collaboration platform

Mikko Lipsanen

Resource links Hugging Face Organization

FinBERT-NER - A Finnish named entity recognition model trained to recognize named entities from OCR'd archival data.

Senate Department of Justice records from Finland - A dataset containing the HTR'd text content of a collection of documents produced by the Finnish Senate's Department of Justice between 1900 and 1918.

Early 20th century court records from Finland - A dataset containing the HTR'd text content of a sample of early 20th century court records from Finland.

Harmonized Finnish National Bibliography - Fennica

Data Fennica encompasses metadata for over a million documents, including books, newspapers, maps, etc., with records spanning from 1488 to the present. This dataset includes bibliographic details such as author information, titles, publication years, publication locations, publishers, content types, bibliographic levels, and call numbers (signum) indicating the books’ locations within a library.

Julia Matveeva

Resource links fennica [1]

Photographs from Helsinki City Museum

Data A collection of ca 6000 old photographs (until 1917) from the collections of the Helsinki City Museum along with metadata such as captions, keywords, location and photographer. Intended for e.g. generating descriptions or colorizing.

Osma Suominen

Resource links dataset on HuggingFace Hub

Photographs from Journalistic Picture Archive JOKA

Data A collection of ca 5000 old photographs (until 1940) from the collections of the Journalistic Picture Archive JOKA along with metadata such as captions, keywords, location and photographer. Intended for e.g. generating descriptions or colorizing.

Osma Suominen

Resource links dataset on Hugging Face Hub

Finna metadata

Data A dataset consisting of ca 30M metadata records from the Finna service.

Juho Inkinen

Resource links dataset on Hugging Face Hub

Qdrant vector database

API Qdrant is a database for storing vectors along with other data items. It can be used for similarity search, multi-modal search, recommendations engines, retrieval-augmented generation (RAG), etc. See a list of example applications with end-to-end codes.

Juho Inkinen

Resource links TBA

Help and instructions URL and keys for the database API will be provided in person, contact Juho via email or Telegram

OpenAI GPT3.5-turbo

API Access to GPT3.5-turbo version 1106 for text generation etc. See Azure documentation on GPT models.

Juho Inkinen

Resource links TBA

Help and instructions URL and keys for the service API will be provided in person, contact Juho via email or Telegram

Finto AI API

API Finto AI — a service based on Annif for automated subject indexing. Finto AI suggests subject headings for texts from a vocabulary to support information retrieval.

Osma Suominen

Resource links - Swagger-UI API documentation - Python API client library

OpenAI text embedding model Ada-002

API Access to text embedding model Ada-002. Text embeddings are representations of texts as a numerical vectors that encode the meaning of the text. This way the texts that are close in the vector space are expected to be similar in meaning. See Azure tutorial on embeddings.

Juho Inkinen

Resource links TBA

Help and instructions URL and keys for the service API will be provided in person, contact Juho via email or Telegram

Translocalis clippings

Data Translocalis is a digital database for reader letters written in different locations and published in Finnish papers up to the year 1885. The Translocalis database contains 72 000 reader letters from Finland and abroad.

Tuula Pääkkönen, Heikki Kokko

Resource links - Dataset description - download link

Linked Data Finland

Data A collection of Linked Data from many Finnish cultural heritage organizations and Sampo systems

Resource links LDF.fi