AI Sauna/Resources
< AI Sauna
– Click here for instructions
Add a new resource:
- Click on Add a new resource.
- Fill in the basic information and click on Publish changes... to save the page.
- Return to this page and edit the entry to add more information.
Edit an existing resource:
- Click on Edit resources above.
- Select the resource you wish to edit. Click Edit
- In the popup window, make the changes you like. You can add more input options from the left panel by ticking one of the blue checkboxes. Confirm by clicking on Apply changes.
- Finally, click on Publish changes... to save the page.
Finna API
API The API provides a way to perform searches to the material provided by the organizations (Finnish libraries, archives and museums) participating in Finna.fi.
Resource links - Swagger
- Python API client library
National Library of Finland on Hugging Face
Collaboration platform A collection of datasets and AI models (Annif and fine-tuned LLMs) published by the National Library of Finland.
Resource links Organization
A Generated Family of Man
Publication An exploratory publication made by the Flickr Foundation in 2023 to investigate and reveal the state of the art of machine-generated captions and imagery.
Resource links →A Generated Family of Man
Avoin data – tarjolla Ylen sisältöjä ja metatietoa
Usein kysyttyä
Resource links Avoin data – tarjolla Ylen sisältöjä ja metatietoa
Word vectors based on Yle's article corpus
Data
Resource links Word vectors based on Yle's article corpus
LUMI Supercomputer
Computing Access to Jupyter Notebook on LUMI supercomputer.
Resource links LUMI access
Future Audiences / List of experiment ideas
Idea list This page outlines experiments for using new technology, like generative AI tools, in the Wikimedia movement to innovate knowledge sharing. These experiments are small-scale and can be quickly executed in hackathons or by volunteer developers. Generated by the Future Audiences team, they aim to inspire Wikimedia community members and others to contribute, discuss, and try out these ideas.
Resource links List of experiment ideas
National Archives of Finland on Hugging Face
Collaboration platform
Resource links Hugging Face Organization
FinBERT-NER - A Finnish named entity recognition model trained to recognize named entities from OCR'd archival data.
Senate Department of Justice records from Finland - A dataset containing the HTR'd text content of a collection of documents produced by the Finnish Senate's Department of Justice between 1900 and 1918.
Early 20th century court records from Finland - A dataset containing the HTR'd text content of a sample of early 20th century court records from Finland.Harmonized Finnish National Bibliography - Fennica
Data Fennica encompasses metadata for over a million documents, including books, newspapers, maps, etc., with records spanning from 1488 to the present. This dataset includes bibliographic details such as author information, titles, publication years, publication locations, publishers, content types, bibliographic levels, and call numbers (signum) indicating the books’ locations within a library.
Resource links fennica [1]
Photographs from Helsinki City Museum
Data A collection of ca 6000 old photographs (until 1917) from the collections of the Helsinki City Museum along with metadata such as captions, keywords, location and photographer. Intended for e.g. generating descriptions or colorizing.
Resource links dataset on HuggingFace Hub
Photographs from Journalistic Picture Archive JOKA
Data A collection of ca 5000 old photographs (until 1940) from the collections of the Journalistic Picture Archive JOKA along with metadata such as captions, keywords, location and photographer. Intended for e.g. generating descriptions or colorizing.
Resource links dataset on Hugging Face Hub
Finna metadata
Data A dataset consisting of ca 30M metadata records from the Finna service.
Resource links dataset on Hugging Face Hub
Qdrant vector database
API Qdrant is a database for storing vectors along with other data items. It can be used for similarity search, multi-modal search, recommendations engines, retrieval-augmented generation (RAG), etc. See a list of example applications with end-to-end codes.
Resource links TBA
Help and instructions URL and keys for the database API will be provided in person, contact Juho via email or Telegram
OpenAI GPT3.5-turbo
API Access to GPT3.5-turbo version 1106 for text generation etc. See Azure documentation on GPT models.
Resource links TBA
Help and instructions URL and keys for the service API will be provided in person, contact Juho via email or Telegram
Finto AI API
API Finto AI — a service based on Annif for automated subject indexing. Finto AI suggests subject headings for texts from a vocabulary to support information retrieval.
Resource links - Swagger-UI API documentation - Python API client library
OpenAI text embedding model Ada-002
API Access to text embedding model Ada-002. Text embeddings are representations of texts as a numerical vectors that encode the meaning of the text. This way the texts that are close in the vector space are expected to be similar in meaning. See Azure tutorial on embeddings.
Resource links TBA
Help and instructions URL and keys for the service API will be provided in person, contact Juho via email or Telegram
Translocalis clippings
Data Translocalis is a digital database for reader letters written in different locations and published in Finnish papers up to the year 1885. The Translocalis database contains 72 000 reader letters from Finland and abroad.
Resource links - Dataset description - download link
Linked Data Finland
Data A collection of Linked Data from many Finnish cultural heritage organizations and Sampo systems
Resource links LDF.fi