Japanese-English Legal Parallel Corpus

A parallel corpus for machine translation systems, information extraction and other language processing techniques

The Japanese-English Legal Parallel Corpus was created by crawling the Japanese Law Translation Database System. It contains parallel text of over 250,000 Japanese laws and over 4,000 legal terms.

The "ContentElements" field contains four options: "LawData", "DictionaryData", "LawDataset" and "DictionaryDataset". "LawData" and "DictionaryData" are structured as associations. "LawDataset" and "DictionaryDataset" are structured as datasets.

Examples

Basic Examples

Obtain the first three examples of law text:

In[1]:=

Out[1]=

Obtain the first three examples of legal terms:

In[2]:=

Out[2]=

Dataset Form

Obtain five random pairs from the set of laws in Dataset form:

In[3]:=

Out[3]=

Obtain five random pairs from the set of legal terms in Dataset form:

In[4]:=

Out[4]=

Analysis

Obtain a character-level histogram of legal term lengths:

In[5]:=

Histogram[
Map[StringLength, ResourceData["Japanese-English Legal Parallel Corpus", "DictionaryData"], {2}], ChartLegends -> Automatic, LegendAppearance -> "Column"]

Out[5]=

Bibliographic Citation

Wolfram Research, "Japanese-English Legal Parallel Corpus" from the Wolfram Data Repository (2018)

License Information

Japanese Law Translation Database System Standard Terms of Use

Data Resource History

Date Created: 8 June 2018

Source Metadata

Title: Japanese-English Legal Parallel Corpus
Creator: Graham Neubig
Publisher: http://www.phontron.com/jaen-law
Date: 23 July 2014
Language: English, Japanese
Source: Japanese Law Translation Database System

Publisher Information

Publisher of Record: Wolfram Research