>100 Views
July 16, 25
スライド概要
DH2025:
Constructing and Integrating Knowledge Graphs for the Koji-Ruien and Waka Databases
いんたーねっと
Constructing and Integrating Knowledge Graphs for the Koji-Ruien and Waka Databases Hiroki UEMATSU1,2, Hideaki TAKEDA2,1, Shoji YAMADA3,1, Mitsuru AIDA4 1The Graduate University for Advanced Studies, SOKENDAI, 2National Institute of Informatics, 3 International Research Center for Japanese Studies , 4Japan Women’s University
sli.do#3167229 Please ask a question from here (sli.do) sli.do#3167229 2
sli.do#3167229 • Introduction • About Koji-Ruien • Structuring Koji-Ruien Table of Contents sli.do#3167229 – Structure of Koji-Ruien – Data modeling • Structuring Waka Databases – Structure of Waka and Waka Collections – Data modeling • Knowledge Graphs for Koji-Ruien and Waka • Future Works • Conclusion 3
sli.do#3167229 Introduction • Data structuring is important – Creating models that express semantic and data structures – Describing in RDF and publishing as Linked Data – Discovering data relationships • Research topic: – Creating knowledge graphs in various fields • Earthquakes, corporate data, medical institutions, war-related materials, etc. – Discovering relationships in historical materials such as classical texts • Koji-Ruien, and Waka collections such as Man'yōshū and 4 Kokin Wakashū
Koji-Ruien (古事類苑)
sli.do#3167229 What is Koji-Ruien? • Encyclopedic historical material dictionary – compiled under the direction of the Japanese government from the Meiji to Taisho periods • Compilation began in 1879 (Meiji 12) • Published from 1896 (Meiji 29) to 1914 (Taisho 3) • Contents – Main text: 1,000 volumes • 350 Japanese-style bound books, 51 Western-style bound books – Number of entries: 40,354 items (30 sections total) – Format of classified books • Systematic overview of various historical materials in Japan 6
sli.do#3167229 Main 13 Sections of Koji-Ruien • 13 sections digitized by Nichibunken Section Ten (Heaven) contents Celestial bodies, climate Jin (People) Seimei Saiji Annual events Chi Geography, topography (Food and Drink) Teiō Emperor, Imperial family (Utensils) (Seasonal Events) (Earth) (Emperor) Hōroku (Stipends) Shōryō (Weights and Measures) Hōgi (Techniques) (Names) Inshoku Kiyō Stipend fields, ranks Dōbutsu Weights and measures Shokubutsu Onmyōdō, calendar, medicine (Animals) (Plants) Relatives, body, emotions, master-disciple, friendship Family names, surnames, family crests Cooking, rice, noodles, sweets, alcohol, seasonings Tableware, decorations, bedding, lighting, vehicles Beasts, birds, insects, fish, shellfish Trees, grass, fungi 7
sli.do#3167229 Table 1 : A part of Heaven Section Structure of Koji-Ruien • Hierarchical headings – 部:"Bu" (Section) – 門:"Mon" (Field) – 項:"Kō" (Article) – 目:"Moku" (Entry) – 細目:"Saimoku" (Sub-entry) 8
sli.do#3167229 Koji-Ruien Book • Ten-bu (Heaven Section) – Ten-mon (Heaven Field) Article • Hōgaku (Directions) Explanation – Explanation of directions » 方角トハ四方四角ノ稱ニシテ、 四方ハ東、西、南、北ヲ云ヒ、 四角ハ艮、巽、坤、乾ヲ云フ、 Cited work – Shūgaishō (Collection of Gleanings » 下末方角 » 五方 東木位 西金位 南火 位 北水位 中央土位…… 9
sli.do#3167229 Digitized Koji-Ruien • Koji-Ruien Page Search System – https://lapis.nichibun.ac.jp/kojiruien/ – Search and display PDF and IIIF of Koji-Ruien book • Koji-Ruien Full-text Database – https://ys.nichibun.ac.jp/kojiruien/ – Full-text written in Wiki format – 22,477 pages published (as of May 2025) Searchable as a dictionary for terms and images 10
sli.do#3167229 Characteristics of Koji-Ruien • For pre-modern cultural concepts, usage examples from all literature before the Meiji period are included – List of terms – Meanings of terms – Usage examples from cited works – Supplementary meanings from reference materials Dictionary structure and relationships with cited works and reference materials 11
sli.do#3167229 Data Model of Koji-Ruien
sli.do#3167229 Entry in RDF • SPARQL Query example SELECT DISTINCT * WHERE { <http://kojiruien.kgraph.jp/collection/天部一/方角> ?p ?o . OPTIONAL{ ?o ?pp ?oo . } } 13
sli.do#3167229 Data Model of Cited Works Cited Work
sli.do#3167229 Data Model of Cited Works Cited Work Cited Work
sli.do#3167229 Mochizuki (Full Moon) Citation from Kokin Waka Rokujō-shū 16
sli.do#3167229 Example of Cited Work Description Naniwagata, as the tide rushes in, The moon rising above the mountains Also seems to be full. 17
sli.do#3167229 Connection with Cited Works and Reference Materials • Cited works and reference materials from Tenbu (Heaven Section) Table 2 : Connection with Cited Works and Reference Materials Category Item Cited Work Reference • Many PDF images are published, but few have full-text data • Utilization of Wikipedia/Wikisource and their bibliographic information – Descriptions in Chinese – Waka poetry • Utilization of Nichibunken's Waka Database 18
sli.do#3167229 Nichibunken Waka Database • Waka collection published by Nichibunken – 496 waka collections are included – Waka and authors are assigned IDs and can be referenced by URL 19
sli.do#3167229 Reference Relationships in Waka • Ten-bu (Heaven Section) – 望月(Full moon) • Kokin Waka Rokujo Shu「1st, Seasonal Event」 – なにはかた−しほみちくれは−やまのはに−いつるつきさへ−み ちにけるかな Naniwagata, as the tide rushes in, The moon rising above the mountains Also seems to be full. • Waka Databases 20
sli.do#3167229 Structuring Nichibunken Waka Database • 496 waka collections are included – Collection name (title and reading) – Volume name – Poems • Poem number • Variant material poem number • Headnote • Author (name and reading) • Author number • Poem (kanji and kana) 古今集 巻五:秋下 21
sli.do#3167229 Data Model of Nichibunken Waka Database 22
sli.do#3167229 Structuring Waka Collections • Multiple versions of waka and waka collections – Multiple versions exist due to manuscripts, etc. – Differences in transcription and kana usage of waka • Want to indicate which version of waka is being referenced Structuring waka collections considering versions 23
sli.do#3167229 Expressing Manuscripts and Printed Editions • Using FRBR of IFLA LRM FR Family – Functional Requirements for Bibliographic Records – Concepts of four bibliographic entities • Work – Abstract creation without physical form • Expression – Realization of a work • Manifestation – Expression appearing as a concrete entity • Item – Individual manifestation 24
sli.do#3167229 Works and Expressions of Waka 25
Knowledge Graph for Koji-Ruien and Waka Databases
sli.do#3167229 Statistical Data of Koji-Ruien Knowledge Graph • Knowledge graph created from 13 sections – Ten-bu, Saiji-bu, Chi-bu, Teiō-bu, Hōroku-bu, ... – Targeting data already included in the Koji-Ruien Full-text Database Table 3 : Statistical Data of Koji-Ruien Knowledge Graph Headings Items Links Section Field Article Entry Description Cited Work 27
sli.do#3167229 Koji-Ruien LOD • Web version of Koji-Ruien Knowledge Graph – Can display Section > Field > Article > Entry – Search for articles with specific cited works, SPARQL endpoint Koji-Ruien LOD 28
sli.do#3167229
List of Articles Referencing The Tale of Genj
SELECT DISTINCT * WHERE {
<http://kojiruien.kgraph.jp/collection/古事類苑> dcterms:hasPart ?bu .
?bu rdfs:label ?bu_name ;
dcterms:hasPart ?mon .
?mon rdfs:label ?mon_name ;
skos:narrower ?kou .
?kou rdfs:label ?kou_name .
OPTIONAL {
?kou dcterms:references ?ref_kou .
?ref_kou schema:Text ?kou_text .
}
OPTIONAL {
?kou skos:narrower ?moku .
OPTIONAL {
?moku dcterms:references ?ref_moku .
?ref_moku schema:Text ?moku_text .
}
}
FILTER( CONTAINS( str(?ref_kou), "源氏物語") )
} ORDER BY ?bu
29
sli.do#3167229
Connect Koji-Ruien and DBpedia
SELECT DISTINCT * WHERE {
GRAPH <http://kojiruien.kgraph.jp> {
<http://kojiruien.kgraph.jp/collection/古事類苑> dcterms:hasPart ?bu .
?bu rdfs:label ?bu_name ;
dcterms:hasPart ?mon .
?mon rdfs:label ?mon_name ;
skos:narrower ?kou .
?kou rdfs:label ?kou_name .
OPTIONAL {
?kou skos:narrower ?moku .
?moku rdfs:label ?moku_name .
}
FILTER( CONTAINS( str(?bu_name), "地部") )
}
SERVICE <https://ja.dbpedia.org/sparql> {
?db_s a <http://dbpedia.org/ontology/Place> ;
rdfs:label ?db_s_name ;
<http://dbpedia.org/ontology/abstract> ?db_s_abs .
FILTER( CONTAINS(?db_s_name, ?kou_name) && CONTAINS(?db_s_abs, ?kou_name) )
}
} ORDER BY ?bu
30
sli.do#3167229
Connect Koji-Ruien and DBpedia
SELECT DISTINCT * WHERE {
GRAPH <http://kojiruien.kgraph.jp> {
<http://kojiruien.kgraph.jp/collection/古事類苑> dcterms:hasPart ?bu .
?bu rdfs:label ?bu_name ;
dcterms:hasPart ?mon .
?mon rdfs:label ?mon_name ;
skos:narrower ?kou .
?kou rdfs:label ?kou_name .
OPTIONAL {
?kou skos:narrower ?moku .
?moku rdfs:label ?moku_name .
}
FILTER( CONTAINS( str(?bu_name), "地部") )
}
SERVICE <https://ja.dbpedia.org/sparql> {
?db_s a <http://dbpedia.org/ontology/Place> ;
rdfs:label ?db_s_name ;
<http://dbpedia.org/ontology/abstract> ?db_s_abs .
FILTER( CONTAINS(?db_s_name, ?kou_name) && CONTAINS(?db_s_abs, ?kou_name) )
}
} ORDER BY ?bu
31
sli.do#3167229 Future Works • Improving the Accuracy of Kojiruien and Waka Anthology Data – Conversion of non-standard characters, such as old kanji forms (kyūjitai) – Standardization of names of waka anthologies and their reciters/authors (unifying variant spellings) • Discovery and Structuring of External Resources – Discovery of digitized waka anthologies – Connecting with data beyond just waka anthologies • Description of Relationships between Manuscripts, Printed Editions, etc. – Structuring of multiple versions of waka data and other related information • Continuous Data Management and Operation 32
sli.do#3167229 Conclusion • Knowledge graph construction of Koji-Ruien – Data structure as an encyclopedia – Display of articles and entries on the Web and SPARQL search • Structuring of waka database – Knowledge graph construction of Nichibunken waka database • Statistical use and relationship discovery of historical materials • Future works – Citation analysis by section and field – Analysis of cited materials using the waka database – Knowledge graph construction of additional materials 33