Page 1 of 1

Yet another speaker contraption this time playing AI song from modded samples

Posted: Tue May 21, 2024 10:25 pm
by mmmPI
Context:
A long process related to music more than factorio (but really both) leading to a dumb machine described in details follows this sentence, it starts by a simple observation : Making music is easy, making it sound pleasing not so much.

The difference is difficult to quantify, it is very subjective, which makes it difficult to train AI to do so, yet it is trendy sometimes. Music is a super old art and automatons too ! So what's new ? i think AI companies that make the most noise in the public sphere are not doing it well, ethically, which may be a marketing strategy of controversy, i can understand yet dislike both. This could be the conclusion of this reflexion, (TL DR).
Disclaimer :
This post still contain some AI generated "music" or sound, played by robots. I think part of the adapation from artists is to document the process of creation (and inspiration), because that is something AI don't do or can't do, and it also allows other humans to understand the (lack of) work being done leading to such creation, and wether or not they can expect similar things , from an artist in the future or if it was just a lucky seed in a service that generate products ready to consume ( like song or picture).

Part of this reflexion is the continuation of previous experiments in factorio, with machines allowing players to "technically" play some "music". Or rather "compose". The technical side involving from a very minimalist point of view, "playing sound at the right time". This doesn't require computer, clapping hand in unison is enough to fit the definition. This is doable in factorio, with the drum kit sounds from the programmable speaker and some clocks. This let the musician in full control of the music created. Following tutorials on the internet it is possible to learn a lot about one simple thing "the rythm" from many genre of music, which helps broadening perspectives and would recommend :)

To build on that, the next step to me felt like playing "notes", because it's kind of a big deal in music right ?; i didn't ask chatGPT for confirmation, at this point still no AI were involved. It is possible in factorio to use combinators to play completly random notes in a correct tempo. That's not very good sounding, but with some learning about musical rules, it is possible to learn "what to avoid". At this stage it is also possible to do in factorio. This makes robots that play according to programmation, it is possible to configure the range of notes, which notes, the signature, and press buttons to have one or the other randomized. This is similar to what can be found in Digital Audio Workstation, software to make music or edit sound for youtube video and video games :) No "AI" is involved yet.

One shortcoming of this approach is that if the randomness is very high, then it sounds like very random not so surpringly, not like music, more like a cat walking on a piano. If the randomness is very low, then it sound like if you are yourself playing music except super slowly because you are writing rules instead of playing notes.(when it's me it's pretty bad). It require already a little knowledge about music, or some time experimenting with "does it sound good if i do this?" or "what do i like in this sound ?" to get a pleasing result.

This a big marketing argument for AI-music "as a service", "you won't need to learn anything" to create. It solve the previous dilema, in words. But if the AI chooses everyting, you didn't created anything. And if you want to "guide" the AI or anything that would give the feeling of "ownership" over a creation, it means calling oneself "prompt-architect" or something and asking the machine to play some things that are sometimes stupid. ( if you ask a drawing AI for a sheep, or a sheep in ultra HD 16K, most commercial AI think ultra HD 16K means "add tons of details and wide palette of color) . The same kind of things work for music AI. You can ask for a green or a blue melody even if it makes no sense that is intelligible.

Experiments were run on many services but those i used the most are from the Limewire AI, Suno AI and the OpenVino pluggin for audacity , this one run on local machine after download of the model weights and you can do funny experiment like asking the AI to continue a sound you make up with percussion and old phone high pitch tone. Done a lot it can end up being a recognizable piece of song that were most likely used to feed such AIs greatly limiting the feeling of "creation". The others require combining rudimentary musical vocabulary and syntax like [chorus] [bridge] [verse] "A major", 120 BPM, "summer hit", "holiday song" and hoping the machine do not read them or sing them but understand them as instruction.

Since the dominant business model seem to be asking money for every attempt at "rolling the dice" on the cloud ones, there is no incentive to create documentation to explain how the machine was trained / which words it recognize / which artist were "copied". It feel to me like ripping of artist and lying to clients about what they are doing. Luckily the local AI (free audacity plugin) can be used to "continue" the music from the paid ones. This is what i used to create the sound for factorio purposes. It helped to generate some sample that have a very very very precise duration. Or adjust the tempo. It also means the local AI is not updated with new music, and cannot receive feedback, it has stopped its training.
So it's just a machine that play a bad song in several pieces ?
Yes, it's even worse than that, it will not play properly when the UPS aren't 60 or during the autosave. And also the first time it tries to play a song. ( when sample are not loaded ) Any of those will mess up the precisely measured timing for samples to last an integer number of ticks. The sound is generated by my computer in 32000Hz and 16 bit PCM. This means the sound is represented by 32000 16bit numbers for 1 second of sound that's what the AI generate after the convolutions. Then in the workstation (audacity) it is made into 48000Hz 32bit-float for working purposes, like applying filter or equalizer. In 48000Hz, it means 1 tick of sound, 1/60 of a second, will be represented by 48000/60 = 800 binary numbers ( sampling points). I am no expert on the thing, just learning, may be doing oversimplification in summing up research.

The duration for sample i used are multiple of 800. Sometimes though, it wouldn't fit with the actual music the melody or the rythm, you can't really speed a sound by 0.0000000613333333 %. What you can do is to make samples that are 27 26 27 26 27 26 27 ticks when the duration should have been 26.5, it makes it so that 2 samples that are 26 or 27 can't be played in a row, or more complex rules when the ratios are different. I'm still experimenting on this for future song.
This is what it look like :
fanbot.png
fanbot.png (594.11 KiB) Viewed 978 times


The attached blueprint require 3 mods to function, the pushbutton to mute the robot, go forward or backward, otherwise there is only on/off mode by turning on/off the constant combinator on top of the head.

DJRobotMIXTAPE1 This is the mod i made that contain the sample to make the song.
Robotic Voice Sample For Speaker I also made this one, it's not used here, but if it's not present, then the blueprint try to use instrument that have a too high number. Another hurdle in the road of the music... It also means if you have other mods installed that add sounds on the programmable speaker, it may not work.

Given the popularity of the previous demo i thought to attach a map to spare the couple players that would want to hear the trouble of searching them in the portal :
Demo djmod.zip
(2.47 MiB) Downloaded 30 times
It also features some simpler contraption that can also play the 3 other song in the mod :

This one is the same song that the smiley robot, except not in a smiley robot form, it is much more readable :
mm1.jpg
mm1.jpg (333.15 KiB) Viewed 978 times
This is another song with better lyrics :
mm2.jpg
mm2.jpg (427.03 KiB) Viewed 978 times
timings are simpler
This one is even better maybe because it has less lyrics :
positive.jpg
positive.jpg (121.28 KiB) Viewed 978 times
Timing are all the same for all sample, making it the easiest to build

Last one for now :
Peace.jpg
Peace.jpg (90.02 KiB) Viewed 978 times
It has a longer intro compared to the other part of the song, but is similar to the previous one for the easy part of the song. The second part of the song is not really a song, it's more several loops made available to extend the first one or attempt extended mix and some experiment showcasing the various timing problems sometimes.



I think this part helped me to learn about structure in a song, and which musical genre uses which, when sampling the song outside of factorio it is easier to do because it is visible, and when playing it is necesary to keep in mind such things because the sample supposedly respect the proper signature of song when possible with the variation that occurs when the song go on. I don't think the results is good music, but it also helped connecting the dots from various things i learned when younger about sound, and binary numbers, and music in a fun environmment. As such i would recommand and also give thanks and express appreciation for the "content" and tools used and the people who made them or inspired me :)

Re: Yet another speaker contraption this time playing AI song from modded samples

Posted: Tue Jun 11, 2024 10:48 pm
by mmmPI
Added 2 funky instrumental to the mod, in case someone doesn't like techno, it would be a sad, but it is possible to play with djrobot x) , they are very repetitive and electronic and have no lyrics, also in case you are not a fan of MiMo, or electronic voice, which would also be sad, the song are supposedly joyful and relaxing so they could be listen to for pleasure even if the purpose is not making a robot DJ. They are still cut into samples for easier use by robot though.

Here is a blueprint that will play the first song :
FG-1.jpg
FG-1.jpg (88.56 KiB) Viewed 651 times



And one for the second song :
FG-2.jpg
FG-2.jpg (98.14 KiB) Viewed 651 times


Here i updated the player so that it's easier to switch from the 1 to the 2, the duration of the samples are stored in tick as the "D signal" in a constant combinator, and the number of them as the "N signal".

From the song 1 to the song 2 the difference are : duration is not 266 ticks but 300 per sample and number of those sample to play the song is not 28 but 24. This means updating only those 2 numbers and the 2 speakers is enough to switch between song whose samples are the same duration.