File:T-SNE visualisation of word embeddings generated using 19th century literature.png

Original file (1,592 × 1,080 pixels, file size: 913 KB, MIME type: image/png)

This is a file from the Wikimedia Commons. The description on its description page there is copied below.

Summary

Description
English: Word embedding algorithms derive a set of real-valued vectors representing the vocabulary of a text corpus in a new embedded space. This provides a useful means of measuring the underlying similarity between words.

This image consists of word embeddings generated from 19th century literature. Gender-encoded unigrams, such as ‘she’ and ‘he’, by female authors are depicted as large, pink circles while the corresponding male authored unigrams are depicted as large, grey circles. Gender-encoded embeddings occupy four different spaces within this embeddings projection annotated A-D.

A: Female- and male-authored plural nouns {fellows, women, men,..} surrounded by past-participles verbs. No family related nouns such as {daughters, sisters, brothers} by female authors despite presence of male-authored counterparts.

B: Singular gender-encoded nouns by both female and male authors nested within nouns referring to (typically male) occupations {priest, clerk, magistrate, farmer,..}. All male-authored pronouns but only one female authored pronoun, "himself".

C: Family related nouns (singular and plural) by only female authors, nested within a cluster of characters predominately from Jane Austen’s novels.

D: Female authored pronouns next to past-participles and past verbs. Provides interesting counterpoint to Argamon et al. [1] who found differences in how women and men use words particularly personal pronouns.

[1] Argamon, S., Koppel, M., Fine, J., Shimoni, A.R.: Gender, genre, and writing style in formal written texts. TEXT 23, 321–346 (2003)
Date
Source Own work
Author Siobhán Grayson

Licensing

Siobhán Grayson, the copyright holder of this work, hereby publishes it under the following license:
w:en:Creative Commons
attribution share alike
This file is licensed under the Creative Commons Attribution-Share Alike 4.0 International license.
Attribution: Siobhán Grayson
You are free:
  • to share – to copy, distribute and transmit the work
  • to remix – to adapt the work
Under the following conditions:
  • attribution – You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
  • share alike – If you remix, transform, or build upon the material, you must distribute your contributions under the same or compatible license as the original.


Captions

Add a one-line explanation of what this file represents

Items portrayed in this file

depicts

19 June 2017

File history

Click on a date/time to view the file as it appeared at that time.

Date/TimeThumbnailDimensionsUserComment
current23:44, 2 December 2017Thumbnail for version as of 23:44, 2 December 20171,592 × 1,080 (913 KB)Ras67=={{int:filedesc}}== {{Information |description={{en|1=Word embedding algorithms derive a set of real-valued vectors representing the vocabulary of a text corpus in a new embedded space. This provides a useful means of measuring the underlying similari...

The following page uses this file:

Global file usage

The following other wikis use this file:

Metadata