Wikimedia Italia/Web app Wiki Loves Monuments/Backend
This content is a stub. You can help by expanding it. |
Overview
editThis document contains the documentation useful for configuring the web application "app.wikilovesmonuments" and the related data scraping processes. The configuration administration interface is based on the "Django" web framework and in particular on the "django-admin" component which allows data management essentially through a web-based interface for searching, viewing, and modifying the entities managed within the project.
Access to the administration platform is available at the URL:
https://wlm-it-visual.wmcloud.org/admin/
The wikilovesmonuments app allows the management of WLM contests in various geographical contexts, managing the related scraping and configuration data separately.
To manage a WLM contest, it will be necessary to perform a series of configurations related to the specific geographical context (i.e., the countries for which you want to manage the contests). Each country can be configured independently both in terms of data and contests and their respective dates.
In the following sections, all the entities that need to be configured to manage the contest in a country will be illustrated.
Workflow
editThe WLM application is based ona server web application with the following purposes:
- Defines a data model and the SQL database structure
- Schedule and run a data scraping process that periodically updates WLM data
- Provides a REST api to access data from the web frontend
The administration interface described in this document controls the contents of the SQL database and the scheduling of the scraping processes and is contained in the server application.
Structure of the editing interface
editThe editing interface, based on the "django-admin" component, manages the configuration through the compilation of records in the database. The interface is organized with a sidebar on the left that lists the various manageable data models. Clicking on each of the listed data models directs you to a page listing the instances of the respective data model, which allows:
- listing of instances, through which it is possible to manage the single instance
- searching (through the filter bar above the list). This feature is enabled only for some data models
- filtering (through the right sidebar). This feature is enabled only for some data models.
Country definition (GeoContext)
editThe definition of the characteristics of countries, or more generally "geographical contexts," is the basis for managing a contest related to the country itself.
The configuration involves defining an instance of the "GeoContext" data model, selectable from the left menu of the administrative panel. In the rest of the documentation, the concept of "Country" or "GeoContext" will be referred to equivalently.
As with all entities managed by the interface, access to the list of defined geographical contexts is done using the left sidebar.
To create the country, the following fields must be populated:
Fields for basic configuration
edit- label: label of the GeoContext within the system (reference from other entities, filters) - mandatory
- description: optional description of the GeoContext.
- country code: two-letter country code
- monument definition: definition of the monument (e.g., for Italy: "Italian monument")
- app domain: web domain (e.g., for Italy: "WLM.it")
- language code: language code used when proposing the "upload wizard" mode within the web app. This code is used to create the correct link to the wizard.
- commons category label: label to be used in commons categorization when images are uploaded.
- flag: unicode symbol of the GeoContext flag
- timezone name: name of the time zone (field "TZ Identifier" from this table: https://en.wikipedia.org/wiki/List_of_tz_database_time_zones). This data is used to correctly determine the start date and time of the contests and activate them in the web app.
Fields for configuring the map on the public web app
edit- centroid: point for the initial positioning of the map center on the public web app
- zoom level: initial map level on the public web app (5-20)
Fields for configuring geographical entities
editGeographical entities are defined at 3 levels:
- region
- province
- municipality
These entities are populated with a procedure that queries the OpenStreetMap database using the "Overpass" query language. https://wiki.openstreetmap.org/wiki/Overpass_API.
- regions overpass query: overpass query for selecting entities at the region level
- regions name tag: name of the tag to be used in the overpass query result of the regions from which to take the label
- provinces overpass query: overpass query for selecting entities at the province level
- provinces name tag: name of the tag to be used in the overpass query result of the provinces from which to take the label
- municipalities overpass query: overpass query for selecting entities at the municipality level
- municipalities name tag: name of the tag to be used in the overpass query result of the municipalities from which to take the label
Fields related to the configuration of the donation popup
editThe following fields govern the functionalities related to the donation popup, a popup window that can appear on the web app to invite users to make a donation. The activation of this functionality and the related operating parameters are specific to each country and are managed by the following fields:
- enable donations text: flag that enables the donation popup mechanism
- donations popup probability: probability of displaying the donation popup after selecting a monument in the web app. It is a number from 0 to 100 that indicates the probability that the popup will be shown.
- Geo context donation texts: HTML texts to be displayed in various languages within the donation popup
Geographical contexts administrators
editFrom the "Geo context admins" section, it is possible to create new administrators for geographical contexts.
To do this, it is necessary to create a new record specifying the reference GeoContext and the user.
The user must already be registered in the system and have a password.
If the user has registered through the web app, i.e., by logging in on Commons, they will not have a password to access the administration site. In this case, the system will generate a temporary password that will be shown after enabling the user. This password must be communicated to the user who can subsequently change it, once they have logged in for the first time, through the page:
https://wlm-it-visual.wmcloud.org/admin/password_change/
It is also possible to force the generation of a new password (even if one is already defined) through the "Generate or regenerate password" flag present in the administrator creation form. It should be noted that this password is not linked to access to the web app, but only to the administration interface.
The generated password will be shown only once after saving. If it is necessary to recover the set password, the procedure to follow to edit the record, and saving it flagging the "Generate or regenerate password" checkbox.
Note that each user who is an administrator of one or more geographical contexts will only be able to create administrators for those contexts.
Icons
editThe management of icons allows generating the set of icons that can be associated with each category of the WLM contest and are used in the contest web application to display monuments on maps and lists.
The icons are associated with each geographical context, allowing independent management.
Specifically, for each category of monument, a graphic symbol is defined, which is used to identify the type, plus several "themes":
number of photos: symbol on background color, with 3 colors indicating the absence of photos, the presence of a number of photos from 1 to 10, or the presence of more than 10 photos contest: symbol filled with white if the monument is in the contest and black if the monument is out of the contest The combinations of these two themes lead to the pre-generation of 6 icons to be used on maps, plus other "partial" renderings, such as white, black, and the primary color symbol of the web app theme on a transparent background.
The generation of these icons starts from uploading an .svg file containing the symbol. This file must contain a symbol with black fill on a transparent background. For most of the icons configured for the 2024 contest, icons from the "Maki" icon set released under the "Creative Commons CC0 Public Domain Dedication" license were used, but any icon in svg format can be used.
Management is done through the "Icons" section of the administration panel with the usual interface for listing, creating, and modifying/deleting.
App categories
editApp categories are defined for each geographical context and are applied to monuments during the scraping phase. Specifically, each monument will be associated with a single app category.
For each category, the following fields must be defined:
- geo context: selection of the geographical context for which the category is defined
- name: name of the category (both in the web app and in the administration interface)
- sector: optional string related to the management of local contests (see the relevant section)
- order: display order of the category in the web app
- icon: icon to be associated with the category (see the section on icons)
- is municipality: special flag indicating that the category is related to the "overview" of municipalities
- is other monuments: special flag indicating that the category is generic and indicates non-belonging to a specific category. If a category with this flag exists, it is used by the scraping process to assign this category if it is not possible to assign another one based on the configured categorization rules.
App categories, in addition to enabling filtering and theming on the public web app, are used in the scraping process along with "Category Rules".
Contests
editThrough the "Contests" section, it is possible to configure the contests for the various managed countries.
Managing a contest involves defining the following fields:
- label: label used within the administration interface
- start date: start date of the contest
- end date: end date of the contest
The dates determine whether a given contest is active or not in the web app. For each GeoContext, it is not possible to enter two overlapping contests in terms of dates (only one contest can be active).
The following additional fields:
- description
- link
Are not currently used within the web app.
Local contests
editThe concept of "Local Contest" was defined so that, when a user submits images during the contest, they are also categorized for participation in local contests.
Defining "Local contests" requires filling in the following fields:
- contest: the contest to which the local contest refers
- label: a label that will be used in the categorization of images in case of a match with the local contest
- has award: indicates whether the local contest offers a prize
- sparql: optional SPARQL query to determine Q numbers for which the local contest is active
- regions: any regions for which the local contest is defined
- provinces: any provinces for which the local contest is defined
- municipalities: any municipalities for which the local contest is defined
Note that for the local contest to be significant, at least one of the fields sparql, regions, provinces, or municipalities must be populated.
It is also possible to define a series of exceptions by entering a series of Q numbers, for which, even if there is a match for the previously defined parameters, the monument is excluded from the local contest.
At the time of image submission, the inclusion of a monument in a local contest is determined by analyzing the "Local contests" configured for the reference country.
Specifically, the procedure is as follows: for each defined local contest associated with the current contest, at the time of selecting the monument for image submission, the following conditions are evaluated:
- the inclusion of the monument's Q number in the local contest's exclusion list. If positive, the monument will not be part of the contest, and the next contest is evaluated.
- the inclusion of the monument with any geographical entities defined for the contest. Additionally, if a SPARQL query is defined for the local contest, the monument's Q number is compared with the query results. If the monument belongs to the geographical areas or its Q number is among the SPARQL query results of the local contest, the commons category constructed with the following template is added: "Images from Wiki Loves Monuments ||year|| in ||country|| - ||local_contest.label||". Additionally, if the monument is associated with an "App category" and this app category has a defined "sector" property, the commons category generated by the template "Images from Wiki Loves Monuments ||year|| in ||country|| - ||local_contest.label|| - ||monument.app_category.sector||" is also added.
- if the monument does not fall into any local contest or falls into local contests that do not have the "has award" flag set to "True", the commons category "Images from Wiki Loves Monuments ||year|| in ||country|| - without local award" is added.
In the previous paragraphs, when referring to the "country" placeholder within category templates, the "commons_category_label" field defined on the referenced GeoContext will be used.
Queries
editThe "queries" section defines the queries that are executed during the scraping phase. When periodic scraping is performed for a country (GeoContext), all queries are processed.
As defined by the fields described below, some are used for contest-related categorization (i.e., assignment of "App category"), while others are processed without performing this categorization. In the latter case, if defined, the app category with the "is other monuments" flag is associated.
Queries are defined for each GeoContext and are essentially SPARQL queries that are used to execute against the "Wikidata Query Service" (https://query.wikidata.org/).
The fields to be filled in to define a query are as follows:
- geo context: the GeoContext for which the query is defined
- label: query label in the administration system and in the web app scraping panel
- sparql: SPARQL code to be executed, see the following sections of this page for a more detailed explanation
- description: free description field of the query
- categorize for app: boolean flag to indicate that the query results should be evaluated through the "Categorization Rules" to classify the monuments according to the "App categories" and related rules defined in the system
- data categories: "Data categories" to be associated with the monuments. The selected categories are assigned to all monuments resulting from the query
- placeholder: any placeholder to be expanded (see the next section)
SPARQL Syntax and "Query Placeholders"
editTo avoid the timeout of particularly complex queries, both in terms of search and result serialization, a run partialization feature has been implemented, which allows repeating the same query based on a placeholder, which is replaced at run-time by a list of values. For each value defined on the placeholder, a query is executed, and the results are then concatenated.
In the following example, this strategy was used to execute the "WLM" query on Italy, actually performing a query for each Italian region. The placeholder in question in this case is the string ITA_REGION
.
To function correctly, this placeholder must be present in the system and selected among the query configuration fields (field "placeholder").
Example of a query with placeholders:
SELECT DISTINCT ?mon ?monLabel ?locationLabel ?article ?commonsCat ?geo ?wlm (group_concat(DISTINCT ?instanceOf; separator=";") as ?instanceOf_n) (group_concat(DISTINCT ?parent; separator=";") as ?parent_n) (group_concat(DISTINCT ?children; separator=";") as ?children_n) (group_concat(DISTINCT ?place; separator=";") as ?place_n) (group_concat(DISTINCT ?start; separator=";") as ?start_n) (group_concat(DISTINCT ?end; separator=";") as ?end_n) (group_concat(DISTINCT ?approvedBy; separator=";") as ?approvedBy_n) (group_concat(DISTINCT ?endorsedBy; separator=";") as ?endorsedBy_n) (group_concat(DISTINCT ?accreditedBy; separator=";") as ?accreditedBy_n) (group_concat(DISTINCT ?relevantImage; separator=";") as ?relevantImage_n) (SAMPLE(?address) as ?address) (SAMPLE(?place) as ?adminEntity) (SAMPLE(?location) as ?location) WHERE { SERVICE wikibase:label { bd:serviceParam wikibase:language "it, [AUTO_LANGUAGE]". } #-FILTER-MONUMENT-PLACEHOLDER-# # select monuments participating in WLM in Italy ?mon wdt:P17 wd:Q38; #region selection parameterized at run time; wdt:P131* wd:ITA_REGION; p:P2186 ?wlms. ?wlms ps:P2186 ?wlm. ?mon wdt:P31 ?instanceOf . OPTIONAL { ?wlms pq:P580 ?start . } OPTIONAL { ?wlms pq:P582 ?end . } OPTIONAL { ?wlms pq:P790 ?approvedBy. } OPTIONAL { ?wlms pq:P8001 ?endorsedBy. } OPTIONAL { ?wlms pq:P5514 ?accreditedBy. } OPTIONAL { ?mon wdt:P625 ?geo . } OPTIONAL { ?mon wdt:P131 ?place . } OPTIONAL { ?mon wdt:P361 ?parent } OPTIONAL { ?mon wdt:P527 ?children } OPTIONAL { ?mon wdt:P18 ?relevantImage . } OPTIONAL { ?mon wdt:P373 ?commonsCat . } OPTIONAL { ?mon wdt:P276 ?location . } OPTIONAL { ?article schema:about ?mon ; schema:isPartOf <https://it.wikipedia.org/> . } OPTIONAL { ?mon wdt:P6375 ?address . } } GROUP BY ?mon ?monLabel ?locationLabel ?article ?commonsCat ?geo ?wlm
Configuration of "Query Placeholders"
editThe "Query placeholders" introduced in the previous section are entities managed on the administration site in the appropriate section.
Each placeholder is characterized by the following fields:
- geo context: the GeoContext for which it is defined
- symbol: the text string that will be used in the SPARQL queries to call this placeholder
- query placeholder values: the list of values that will be expanded in place of the placeholder in the SPARQL query. For each value, it is possible to add a comment that helps in management.
Below is an example of defining placeholders for the query mentioned earlier:
Placeholder for monument update
editThe web application has an additional function that allows scraping a single monument to update the data. To perform this update, a query containing the special placeholder is searched among the queries defined for the relevant GeoContext:
#-FILTER-MONUMENT-PLACEHOLDER-#
The single monument update procedure requires that within the query code, this special placeholder is replaced with the code
FILTER(?mon = wd:|q_number|).
where |q_number| represents the Q number of the monument for which the update is requested.
WLM Category Rules
editThe "WLM Category Rules" are entities that serve to manage the categorization of SPARQL query results to apply the correct category for the web app to the monuments resulting from the query.
As indicated in the query configuration section, not all queries are subject to this categorization process. The process may be "opted in" by activating the flag "Categorize for app" on the query.
Categorization is specific to each GeoContext and is managed by the "WLM Category Rules" section of the administration site.
The definition of a categorization rule is done through the following fields:
- Geo context: the GeoContext for which it is defined
- App category: the category among the "App Category" entities defined in the system for which the rule is defined
- order: order of rule evaluation
- WLM category rules predicates: a list in which it is possible to indicate a set of properties (selectable from a predefined set) and their respective values, which are used to evaluate whether the monument subject to categorization satisfies the rule.
The procedure for categorizing a monument is as follows:
- the categorization rules in the system for the GeoContext on which the scraping is being performed are evaluated in order (based on the "order" field defined above)
- if all the predicates defined for a rule are satisfied, the category associated with the rule is selected, and the evaluation ends
- if no rule is satisfied, the monument is categorized with the category that has the is other monuments flag, if defined for the GeoContext
This mechanism is based on the fact that the SPARQL queries candidate for categorization provide an output set of fields compatible with those evaluated by the predicates specified for each rule.
Scraping
editLanguages
editThe administration of languages and related settings can only be managed by super users (not simple country administrators).
In the web app, the concept of language is separate from that of the country where the contest is managed.
In the web app, the current browser language is initially selected. If the browser language is not among those configured in the system, the English language is selected. The user can change the current language at any time.
To manage languages, it is necessary to use the "Languages" section of the administration panel. Managing a language record involves filling in the following fields:
- code: language code (For example, "it" for Italian and "en" for English)
- name: name of the language
Both fields are mandatory.