Japanese-English Legal Parallel Corpus

A parallel corpus for machine translation systems, information extraction and other language processing techniques

The Japanese-English Legal Parallel Corpus was created by crawling the Japanese Law Translation Database System. It contains parallel text of over 250,000 Japanese laws and over 4,000 legal terms.

The "ContentElements" field contains four options: "LawData", "DictionaryData", "LawDataset" and "DictionaryDataset". "LawData" and "DictionaryData" are structured as associations. "LawDataset" and "DictionaryDataset" are structured as datasets.

Examples

Basic Examples

Obtain the first three examples of law text:

In[1]:=
ResourceData["Japanese-English Legal Parallel Corpus"][[All, ;; 3]]
Out[1]=

Obtain the first three examples of legal terms:

In[2]:=
ResourceData["Japanese-English Legal Parallel Corpus", "DictionaryData"][[All, ;; 3]]
Out[2]=

Dataset Form

Obtain five random pairs from the set of laws in Dataset form:

In[3]:=
RandomSample[
 ResourceData["Japanese-English Legal Parallel Corpus", "LawDataset"],
  5]
Out[3]=

Obtain five random pairs from the set of legal terms in Dataset form:

In[4]:=
RandomSample[
 ResourceData["Japanese-English Legal Parallel Corpus", "DictionaryDataset"], 5]
Out[4]=

Analysis

Obtain a character-level histogram of legal term lengths:

In[5]:=
Histogram[
 Map[StringLength, ResourceData["Japanese-English Legal Parallel Corpus", "DictionaryData"], {2}], ChartLegends -> Automatic, LegendAppearance -> "Column"]
Out[5]=

Wolfram Research, "Japanese-English Legal Parallel Corpus" from the Wolfram Data Repository (2018)  

License Information

Japanese Law Translation Database System Standard Terms of Use

Data Resource History

Source Metadata

Publisher Information