Web2Cit/Docs/Core
Web2Cit Core is a JavaScript library that implements the Web2Cit translation features described in the How Web2Cit works section of our Basics documentation.
Installation
editWeb2Cit is available as an npm package so can be installed in any npm project with:
npm install web2cit
The package includes TypeScript .d.ts
declaration files which provide type information for TypeScript projects.
Basic usage
editThe Web2Cit server imports the Web2Cit core library to expose some of its capabilities as a web service. See our Server documentation for more information.
To use Web2Cit core in a JavaScript (or TypeScript) project, briefly:
- Import Web2Cit.
- Begin by creating a
Domain
object for the domain you want Web2Cit to return translation results for. - Use the
fetchAndLoadConfigs
instance method (of theDomain
class) to fetch and load configuration files from the Web2Cit storage. - Use the
translate
instance method to get translation results for one or more target paths.
Usage examples
editIn this section we will explain some basic usage of Web2Cit core via examples from the Web2Cit server's codebase. We will go in the order in which they are presented in the server's code. At the time of writing this, Web2Cit server version is 1.1, which uses Web2Cit core v2.
Check the Web2Cit integrated editor source code for other usage examples (for example, those related to editing domain configuration). And the Core library's source code for the full API (see Development section).
Importing from the library
editimport { Domain, Webpage } from "web2cit";
As explained in our Basics documentation, Web2Cit translation happens on a per-domain basis. Accordingly, the Domain
class is usually the starting point of anything related to Web2Cit (see T302588 for a proposal to use a top-level Web2Cit class instead).
The Webpage
class provides features concerning with fetching and caching of target-specific HTML and Citoid responses (see below).
Webpage objects
editconst target = new Webpage(url);
domainName = target.domain;
targetPath = target.path;
In the particular case quoted here, the Webpage creator is being used to validate the target URL that has been passed to the server.
However, Webpage
objects implement additional features, mainly fetching and caching HTML responses from the target server, and the corresponding Citoid responses, as discussed below.
The Domain object
editconst window = new JSDOM().window;
domain = new Domain(domainName, window, {
userAgentPrefix,
});
The Domain
constructor creates a Domain
object, which is the base for most Web2Cit operations. The constructor takes the following positional arguments:
domain
(string): the domain or hostname (e.g.,www.example.com
). This currently does not support schemes (assumed to behttps
) or port (see T315020).windowContext
(Window): aWindow
object that will be used for XPath operations.options
(DomainOptions): a series of optional parameters.
DomainOptions |
---|
|
Changing storage location
editlet storageRoot = domain.templates.storage.root;
storageRoot = `User:${user}/` + storageRoot;
domain.templates.storage.root = storageRoot;
domain.patterns.storage.root = storageRoot;
domain.tests.storage.root = storageRoot;
By default, the core library fetches configuration files from the main Web2Cit storage on Meta-Wiki at Web2Cit/data/
. In this code snippet, we see how the Web2Cit server changes this default location to support using configuration files from a user's sandbox storage (see Server documentation):
Note this is a hack until passing custom storage locations to the Domain
constructor is supported (see T306553).
domain.templates
, domain.patterns
and domain.tests
refer to DomainConfiguration
objects. Check the source code for other properties that may be used to further customize the storage location, such as the MediaWiki instance (set to https://meta.wikimedia.org
by default), or the corresponding file names.
Fetching and loading configuration
editThe code below fetches the configuration files from the storage repository and loads them (i.e., parses template/pattern/test definitions and sets them as the current configuration):
await domain.fetchAndLoadConfigs(options.tests);
configsFetched = true;
targetPaths = domain.getPaths();
The Domain
object's getPaths()
function simply gets all target paths which have been configured as translation templates or tests. In the server snippet here, it is used when no specific target has been specified (i.e., domain
parameter set, path
unset; see Server documentation), to get a translation result for all paths configured.
The Domain's Webpage factory
editif (options.citoid) {
for (const targetPath of validTargetPaths) {
const target = domain.webpages.getWebpage(targetPath);
// Make the citoid cache fetch its data
// regardless of whether it is needed or not by one of the translation procedures.
target.cache.citoid.getData();
}
}
As explained in the Server documentation, the citoid
parameter controls whether we want to prepend the raw citation as returned by Citoid. To prevent fetching the Citoid response for a target twice (once to get the raw citation requested by the citoid
parameter, and once for any Citoid selection step that may be needed further down during translation) we make use of both the Domain's Webpage factory and the Webpage object capabilities.
First, domain.webpages
refers to a Webpage factory, which stores Webpage
objects previously created via the getWebpage
method and returns them if they are requested again instead of creating new ones. This is used here from the server, but of course is widely used within the core code.
On the other hand, as mentioned above, the Webpage
objects handle fetching and caching of the HTML and Citoid responses for a target. Here, the getData
method of the citoid
cache, will fetch the Citoid data and cache it so it doesn't have to be fetched again if it is needed later in the process.
For example, note how these functions may be called again in lines 400 and 415 down below in the source code. They will return the same Webpage
object and prevent Citoid from being called again (if already called).
Finally, translation
edit
So far, we have prepared the Domain
object to run its most important function: the translate
function.
const targetOutputs = await domain.translate(validTargetPaths, {
// if debug enabled, return non-applicable template outputs
onlyApplicable: options.debug ? false : true,
});
The translate
method accepts the following parameters:
paths
(string | string[]
): one or more paths, corresponding to webpages to be translated.- options object (
TranslateOptions
):allTemplates?
(boolean
; default:false
): iftrue
, translation won't stop on the first applicable template, but rather all candidate templates will be tried.onlyApplicable?
(boolean
; default:true
): iftrue
, only results for applicable templates will be returned.fillWithCitoid?
(boolean
; default:false
): if true, undefined citation fields will be populated with values from the Citoid response (pending implementation; see T302019).forceTemplatePaths?
(string[]
): a list of paths corresponding to translation templates to try, instead of using the list of candidate templates from the same URL path pattern group.forcePattern?
(string): used to force a specific URL path pattern translation group. This option will be ignored if theforceTemplatePaths
option has been set.
The return value is a promise that resolves to an array of TargetOutput
objects, one per translation target.
TargetOutput objects
|
---|
Each
Each
Each
Each
Note that Web2Cit core's output interfaces may be normalized in the future; see T302431. |
Detailed information
editThis section includes detailed information of some parts of the library.
Domain configuration objects
editThere is the DomainConfiguration
abstract class, inherited by PatternConfiguration
, TemplateConfiguration
and TestConfiguration
subclasses.
The objects of these classes are configured with a domain name and with storage configurations, and they "know" how to fetch the corresponding configurations from the storage, for which they have a series of methods.
In addition, domain configuration subclasses implement specific parse
and loadConfiguration
instance methods that "know" how to parse revision content into configuration values.
Configuration objects have a private values
property holding an array of configuration values (templates, patterns or tests), either added manually, or loaded from a revision.
Redirects
editDomain configuration objects' fetch methods follow MediaWiki redirects (see T304772). This is useful for domain aliases; for example if www.example.com
is an alias of example.com
, configuration files of the former may be redirected to those of the latter. See the Domain aliases section of the Editing documentation for further information.
Fetch wrapper
editWeb2Cit core uses fetch
wrapper to use custom user agents.
It may also be used to use custom fetch functions (via the Domain
constructor's originFetch
option; see above) for specific origins, a feature needed to circumvent CORS restrictions in the Web2Cit integrated editor.
JSON schema files
editThese JSON schema files indicate the shape of the Web2Cit configuration files saved to the Web2Cit storage repository on Meta-Wiki. They are used by Web2Cit server to create custom JSON editor forms for specific configuration file types (see the Editing documentation).
Ideally, these files should be generated automatically from the Typescript types (or vice versa) as described in T308347.
These files are currently served via the Github mirror due to Wikimedia GitLab's CORS restrictions (see T305700). See T318352 for a proposal to serve them from the Web2Cit server instead.
Development
editSource code
editThe project's source code is hosted on Wikimedia's GitLab here, and mirrored to Github here.
Source code is written in TypeScript, a JavaScript superset with type checking support.
Development environment
editThis project uses npm for installing and managing dependencies. To set up the development environment:
- Clone the git repository, or your fork of it. For example:
git clone https://gitlab.wikimedia.org/diegodlh/w2c-core.git
. cd
into the cloned repository and runnpm install
. This will:
Software design
editBuilding
editTypeScript code is compiled into JavaScript using TypeScript tsc
compiler. This compiler's behavior can be controlled via the ./tsconfig.json
file.
To do so, simply run npm run build
. Alternatively, you may run npm run build:watch
to automatically rebuild upon changes.
Publishing a new version
edit- Run
git clean -xdf
to clean the git repository. Make sure to stash any uncomitted changes before doing so. - Run
npm install
(this is needed to installgenversion
, which is needed below). - Run
npm version --no-git-tag-version
with the corresponding newversion argument (e.g.,npm version ... patch
). This will:- update
package.json
andpackage-lock.json
with the increased version number; - run
genversion
to update./src/version.ts
and stage it.
- update
- Move changes from the "Unreleased" section of the changelog to the new version's section.
- Stage all changes and commit with commit message "Bump vX.Y.Z".
- Tag as "vX.Y.Z" using
git tag
. - Publish to npm using
npm publish
. You may use--tag next
to tag the new npm version asnext
, instead of the default taglatest
. - If published successfully, run
git push
andgit push --tags
.
Automatic tests
editSome automatic tests have been defined using the Jest framework.
To run them, simply run npm run test
To run individual tests, run npm run test
followed by a test name pattern. For example, npm run tests selection
should run the tests defined in the ./src/templates/selection.test.ts
file only.
Debugging tests
editIf you use Microsoft's Visual Studio Code editor, the repository's .vscode/launch.json
file will automatically configure test debugging for you.
To debug tests, open the Run and Debug panel, choose the "Debug Jest Tests" option (provided by the launch.json
file) from the drop-down, and hit Start Debugging. Test execution will pause at any breakpoints you define, both on test or regular source code files.
If you want to debug specific tests, modify the launch.json
file by adding the test name pattern (as used to run specific tests, above) as an argument to the args
array of the Debug Jest Tests configuration.