453 Views
June 04, 26
スライド概要
Seminar at Academia Sinica
日本大学 文理学部 情報科学科 北原研究室。 「Technology Makes Music More Fun」を合言葉に、音楽をはじめとするエンターテインメントの高度化に資する技術の研究開発を行っています。
Music as a Material for Information Science Education and Research Tetsuro Kitahara Professor, Nihon University, Japan Specially Appointed Professor, Shiga University, Japan Visiting Scholar, Academia Sinica, Taiwan (Apr. 2026—Jan. 2027)
Self introduction • Name: 北原 鉄朗 (きたはら てつろう / KITAHARA Tetsuro) • Affiliation: 日本大学 (Nihon University) 文理学部 (College of Humanities and Sciences) 情報科学科 (Dept. of Information Science) • Career: PhD from Kyoto Univ. (Musical instrument recognition) PostDoc at Kwansei Gakuin Univ. (Music generation) Assist., Assoc. & Full Professor at Nihon Univ. • Interests: All topics related to music computing (in particular, symbolic music generation technologies)
Kitahara Lab at Nihon University • Established in 2010 • Catch-phrase: Technology Makes Music More Fun • The lab typically consists of: 1 faculty member, 0--2 part-time staff members 0--1 PhD students, 0--6 MSc students, 14--18 Bach students • Almost all students are engaged in music computing
Today’s talk • Why we focus on music • Examples of music-related information science research in our laboratory • Attempts to teach machine learning through music
Why we focus on music
Music Music is familiar Music has various aspects • A form of art/entertainment • Signal • Spectrogram • Large business market • Popular as a hobby (incl. playing instruments) • Everyone learns it at school (to some extent) Everyone has personal musical experience • A sequence of notes
What we should teach • Data representation • How we represent various types of content digitally • Programming • How we implement computational processing as executable programs • Signal processing • How we extract meaningful information from signals • Machine learning • How we build intelligent systems from data • Human-computer interaction • How we design smooth interactions between humans and computers
Various representations of music Signal Symbolic time series 2D image Event series Hierarchical (note_on, 60), 0.50, (note_on, 64), 0.00, (note_on, 67), 0.50, (note_off, 60), 0.00, (note_on, 62), 0.50, We can learn programming / machine learning for various types of data
Other reasons Music is real-time Everyone has musical experience • Listening interaction • Playing an instrument as a hobby • Club activity at school • For some applications, real-time processing is mandatory Good material for exercising real-time, low-latency, multi-threaded processing • Music classes at school • etc. Easy to find research topics based on personal experience
Examples of music-related inf. sci. research in our laboratory
Typical process of Bachelor thesis projects Grade 3 Semester 1 Join the lab Discuss personal interests in research Decide the research topic roughly Grade 3 Semester 2 Study basic knowledge (e.g. basis of machine learning) Grade 4 Semester 1 Decide the details of the research topic Grade 4 Semester 2 Complete developing the system, model, etc. Start preliminary analysis of data related to the topic Start developing a system, model, etc. Conduct experiments Write a thesis
Research topics Music generation • Four-part harmonization (2014) Symbolic • Guitar tablature generation (2025) • Drum loop morphing (2023) Symbolic Audio Music analysis • Phrase tendencies of a particular bassist (2017) Symbolic Music interaction • Drum velocity control (2024) Symbolic • Drawing-based improvisation system (2023) Symbolic
Music generation • Four-part harmonization (2014) • Guitar tablature generation (2025) • Drum loop morphing (2023)
Case 1 [S. Suzuki & T. Kitahara, JNMR, 2014] Four-part harmonization Model Difficulty We have to consider both continuity and simultaniety Learning-based - Neural net (Hild ‘91) - HMM (Allen, ‘05) - Weighted finite transducer (Buys ‘12) Non-learning-based - Expert system (Ebcioglu ‘90) - Constraint satisfaction problem (Pachet ‘98) - GA (Phon, ‘99)
Problem in chord nodes C Am G A E E C C Most existing studies use nodes representing chords or harmonic functions Practically, using chord nodes is not easy If chord symbols distinguish voicings If not C6 C6 on G Too many elements C Difficult to train models with a limited # of data Am Am7 Too ambiguous C Am Is it better not to use chord nodes? One symbol corresponds various sounds
Model Determined before inference at time i
Training data • 254 Hymnal four-part melodies Example Chord model • Transposed to C major Non-Chord model Input (soprano melody)
Case 2 Guitar tablature generation [S. Sakai et al. SMC 2024] Motivation Our goal In finger-style solo guitar, a player often plays both a melody and chords on a single guitar Difficult to find how to play both (within physical restrictions) Automatically generate a tablature for playing both a melody and chords from a given lead sheet Input Output Includes chord voicings playable with the melody
What’s the difficulty Key idea for solution Many possibilities of chord voicings Search the minimal cost state transitions Must find physically playable ones together with the melody Example: Dm7 with A (melody) Not playable state = fingering form on the fretboard cost = performing difficulty (with an HMM-like idea) To get easily playable tablatures Introduce typical forms X X Playable X X F’s typical form =(1, 1, 2, 3, 3, 1) C’s typical form =(0, 1, 0, 2, 3, -1) States are restricted to typical forms and their modified forms
Basic formulation State definition Input: {(x1, c1), (x2, c2), ..., (xN, cN)} Highest note melody note xn: melody note, cn: chord Lowest note root note of the chord Output (state): Q = {q1, q2, …, qN} qn: fingering form (6-dim vec) (0, 1, 0, 2, 3, -1) Typical forms and their modified forms are added to the state set 3 types of costs Initial cost C(q1): neck-side positions are prior Minimize: Initial cost C(Q) = C(q1) + C((x1, c1) | q1) + … + C(qN | qN-1) + C((xN, cN) | qN) Transition cost Emission cost Transition cost C(qn | qn-1): smaller position changes are better Emission cost C((xn, cn) | qn): melody note and chord tones must be emitted
Example Evaluation Evaluator: one professional classical guitarist Voicing richness should depend on metrical positions Many simultaneous notes
Case 3 Drum loop morphing Motivation [M. Kawahara et al. CMMR 2023 (demo)] Loop sequencers need many sound loops to enable to compose various music We focus on morphing as a method for generating new loops (New loop) = α × (Loop A) + (1 – α) × (Loop B)
VAE-based model Convolution Deconvolution Loop A Spectrogram New loop Loop B Dataset 224 loops taken from “Techno & Trance” of “Sound PooL” (Drums, 2-bar, BPM=135)
Example Subjective evaluation Loop A 1. Listen to Loops X & Y (One is generated; the other is a reconstruction of an existing one) Loop B Generated loop 2. Answer which is ML-generated Ratio of correctly answered participants Mean: 0.374 SD: 0.180 They could not distinguish generated and existing loops
What they learned through the research Four-part harmonization • Basic knowledge of probabilistic models • Designing a practical model (model complexity vs. data size) • Dataset construction (incl. managing the data input team) Guitar tablature generation • Formulating the task as a mathematical optimization problem • Implementing an HMM-like optimization algorithm Drum loop morphing • Basic knowledge of CNN, VAE, etc. • Designing and conducting experiments
Music analysis & interaction • Phrase tendencies of a particular bassist (2017) • Drum velocity control (2024) • Drawing-based improvisation system (2023)
Case 4 Evolution of phrase tendencies in a particular bassist [Matsuura et al. CSMC 2017] Motivation Musicians’ individuality often changes for various reasons (change in personal preference, change of band members, etc.) Analyze changes in phrase-level individuality of a particular musician Target player Flea (the bassist of Red Hot Chilli Peppers) “(As John returned in 1999,) Flea’s bass play drastically changed; he plays thoroughly simply, focusing on root notes.” (originally in Japanese; translated by us) Year Higher 1989 Ground』 『 Year Parallel 1999Universe』 『 Year 1989 『Higher Ground』 Ba. Ba. Year 1999 『Parallel Universe』 Ba. Ba.
Research questions • How can we confirm that the phrase tendency changed in 1999? • What are the differences between the phrases before and after 1999? Solution Pattern recognition approach Using MIDI transcriptions of Flea’s bass phrases ① before/after classification 1999 before after Higher accuracy before 2002 after Lower accuracy More remarkably changed in 1999 ② feature selection 1999 before after Using features A, B & C High accuracy A, B & C are main differences
Results ① (10-fold cross validation) J48 IBk Bayes MLP Net 1999 76% 78% 73% 84% 2002 61% 54% 61% 63% 2006 65% 55% 62% 50% ② Mean pitch ⚫ Ratio of succ. notes with pitch diff. of 0 ⚫ Num of succ. notes with pitch diff. of 3 ⚫ Ratio of notes with top 5 note nums ⚫ Accuracy: 82% with only 4 features Classification between pre-1999 and post-1999 phrases achieved the highest accuracy Changed most remarkably in 1999 Pitch and simplicity are the main differences
Case 5 Drum velocity control based on human piano performance [S. Seki et al. GCCE 2024] Motivation A band should share global changes in dynamics e.g. start the intro with low dynamics, play the bridge with high dynamics Control the drum velocity according to human piano performance velocity Global velocity changes are shared velocity Piano (human) time Drums (system) time
Proposed method velocity + mean 0 = actual velocity time Global velocity change Local velocity change measure-wise mean of velocity Deviation from global velocity change measure m-2 measure m-1 measure m measure m+1 Predict Piano (human) ×α Reflect Δv(i)m,n ~ N(μ(i)c, σ(i)c2) Randomly determined following a normal distribution (μ and σ2 are learned with data) Drums (system) ×(1-α) Predict Linear regression
Demo Sorry, the difference between the sounds with high and low velocity is unclear
Extra case Improvisation system based on ※ This is my own project user-drawn melodic outlines (not a student's one) [Kitahara et al. ACM MM Asia 2022 (demo)] • Improvisation is difficult because it requires creating melodies while playing • Once the user draws a melodic outline, the system generates a melody in real time Cmaj7 Am7 Create a melody Play it Harmony theory Musical scale Learned melodies
Key idea • Use a dataset of symbolic transcriptions of professional improvisations • Make pseudo outlines by smoothing pitch trajectory of melodies • Make a model that estimates (before-smoothing) melodies from outlines Transcribed melody (Weimar Jazz DB) smoothing A sequence of notes Pseudo melodic outline Estimate a before-smoothing sequence (with CNN)
time Melodic outline Input Chord Model time Output Dataset Melody notes (onset) 96 Blues melodies from Weimar Jazz DB Melody notes (cont’d) (Half for training) Rest conv. conv. deconv. deconv. Input time Output time time time time Let’s see a live demo
What they learned through the research Phrase analysis of a bassist • Basic knowledge of pattern recognition techniques (But he didn’t learn how to implement them; he used Weka) • How to analyze musicians’ intuitive impressions quantitatively Drum velocity control • Basic knowledge of probabilistic models and statistics • Formulating a time-series prediction problem using regression • Implementing a real-time system
Discussions: what they learned Computational thinking • Formulating tablature generation as an optimization problem • Analyzing bass phrase tendencies as a pattern recognition problem • Modeling ensemble interaction as a time-series prediction problem Basic knowledge of specific areas in information science • Probabilistic models & machine learning (Bayesian networks, VAE, etc.) Programming • How to use libraries (e.g. TensorFlow) • Some projects involve real-time processing They didn’t learn to implement ML algorithms from scratch
Discussions: how they chose topics Students who play instruments • Most students chose topics related to instruments they play • Guitar, bass, drums, etc. • They tended to come up with topics from personal experience • Wanted to play solo guitar pieces but could not find suitable tablatures Students who do not play instruments • They tended to choose topics not related to specific instruments • Some of them avoided symbolic music generation • Music knowledge (harmony, chords, scales) is required
Attempts to teach machine learning through music
One of my classes at Nihon University Objective To learn “deep learning” through exercises in music analysis and generation Details of content 1. Let’s learn MLP through major/minor key classification 2. Let’s learn RNN through two-part harmonization 3. Let’s learn VAE through melody morphing 4. Let’s learn CNN through polyphonic melody generation 5. Let’s learn GAN through polyphonic melody generation
Overview of exercises Polyphonic pianoroll Data • Four-part harmonized pieces from the Infinite Bach dataset (about 250 pieces) • Provide codes for converting MIDI data to 4- or 8-bar pianoroll matrices (8th note grid) Environments & libraries • Python on Google Colab • TensorFlow • PrettyMIDI Basic codes are provided Partwise pianoroll
Learn RNN through two-part harmonization Learn VAE through melody morphing Latent space Soprano pianoroll matrix Decode ? Encode Learn GAN through polyphonic melody generation z 1/0 Alto pianoroll matrix Data from dataset
Textbook • “Learning Deep Learning with Music” • Written by T. Kitahara (me) • Published by Ohmsha in 2023
Results (personal observations) Pros • We could learn RNN, CNN, GAN, etc. with a unified type of data • Listening to generated melodies was enjoyable • Practically useful for students planning a music-related thesis Cons • Learning music-related knowledge (incl. MIDI) required overhead for students not planning a music-related thesis • Evaluating generated content was difficult for most students (they lacked musical knowledge)
Conclusion • Music has multiple computational representations (signals, images, event sequences, hierarchical structures) • Music naturally involves many areas of information science (machine learning, signal processing, optimization, HCI, etc.) • Students learned computational thinking through music-related research • Personal musical experience strongly motivated students’ topic selection • Music also worked well as a teaching material for machine learning Music provides an intuitive and engaging gateway to information science