email sol follow sol rss feed of the blog wishlist Sol::News
S

Important note:

When linking to these pages, please use the URL:

www.iki.fi/sol/ - it's permanent.

RSS
(1,0) (1,3) (-2,-1) (3,-1) (3,-3) (-2,-3)

Musing On 8 Bit Audio Compression

August 1st, 2021 #

While doing some video compression on the ZX Spectrum Next, I found it ironic that raw 8 bit audio (even at a low sample rate) ended up taking as much space as the accompanied video, so I started pondering on what to do about it.

Looking at the schemes that exist, there's no really good clear option. On Amiga, samples could be delta encoded and then RLE compressed. Since my target is a 8 bit micro and I'm also doing video decoding, I really don't have the processing power to do generic decompression. And the compression ratios, even with delta encoding, aren't too hot.

Schemes like ADPCM target 16 bit audio, but we can look at what they're based on. A couple simple DPCM (without A) schemes come to mind: for each input, compare it with the previous output and try to modify the value to be as close to the input as possible, and send it out. Repeat.

Two 4-bit schemes come to mind: first would be to use a 16 value table of values to add; these could either be evenly spaced (0, 17, 34, 51, 68, 85, 102, 119, 137, 154, 171, 188, 205, 222, 239), or not - it would take some experimentation to find what kinds of values work well.

The second scheme would use 1 bit to say whether a bit should be set or cleared, and 3 to say which bit (as 2*2*2=8). The obvious negative side of this scheme is that half of the potential opcodes are NOPs, so the first scheme probably transfers more data.

Both of the 4 bit schemes yield lossy 1:2 compression with (relatively) low processing power. Whether the results are better or worse than simply halving the sample rate requires testing.

To make things worse for these schemes, they need to sound better than a simpler alternative: we could just interpolate every second sample. Or interpolate even more of the samples for higher "compression".

Another, completely different scheme would be to generate 256 grains out of the source data and just store the grains plus indexes to the grains. This scheme has rather high compression potential, but I don't know if it will work at all. In addition to the higher compression potential, being byte based it should be even faster to decompress than the 4-bit schemes.

To find the grains to store, I thought of the following scheme:

Each potential grain exists as a point in N dimensional space; N being the number of samples in the grain. Find the dimension with the highest difference between smallest and largest value, and split the space in two along that axis. Repeat splitting subspaces until the desired number of subspaces - 256 - is generated.

Once this is done, we average the samples in each subspace. This, unfortunately, tends towards 0.5 when the number of samples and/or dimensions is high. One solution is to just pick a random grain from the subspace, or to normalize the averaged values to get back to near the original dynamic range.

One potential addition which may or may not help is to rotate the samples in the grain so that the smallest (or biggest) value comes first.

Whether this scheme works at all remains to be seen. It makes sense, logically, but in practice it may blow up the audio so badly that it's unusable. One fun fact for this scheme is that there's no real reason why it wouldn't work for stereo samples too.

The compression ratio depends on the grain size (gs):


output = gs * 256 + input / gs

So for 64k samples and 4 sample grains, the compression ratio is 26%. Going up to 16 sample grains, the compression goes to 13%. For 1 megabyte of samples and 16 sample grains, 7%. Going the other way, for 8k samples and 2 sample grains, the compression ratio is 56%.

The likelihood that single set of 256 grains would sound good over 1 megabyte of samples is quite low, though, so it might make sense to split input sample into smaller chunks and compress them separately. On the other hand, switching from one set of grains to another may cause an audible difference, so this may not be a great idea. I guess one could switch grains gradually over time, but that gets rather complicated.

For granular synthesis to sound nice we'd need to overlap the grains with an envelope, but this would take way too much processing power. We're talking 8 bits, after all.

Finally - how to know which scheme is the best? I could calculate absolute difference with the source data, but with audio you never know if that matters. You can literally inverse the whole source and it will sound exactly the same. Or you can shift by a single sample, for instance. Or one could generate waterfall FFT images and compare those - that would at least show if signals are lost or gained.. but I suppose I should just play it by ear.

I'm pretty sure all of the results will be horrible.

Sassy Audio Spreadsheet

May 31st, 2021 #

Sooo.. after the previous blog post I kept thinking about that virtual modular thingy, and even played around with a graph editor for a bit.

Then I recalled an idea I had a long time ago: the audio generation of modular synths kind of remind me of spreadsheets. You have cells with values or formulas, and refer to other cells.

After a bit of tinkering..

I don't really know if the project is really going to go anywhere, but I put it on itch.io for the heck of it, maybe a lot of people want to buy it. Who knows? (It hasn't, so far)

After getting the first versions off the ground a friend and much more of a music person than me, Antti Tiihonen made a few test spreadsheets that really opened my eyes to what it can do.

Once the 1.0 beta was built I made a video which attracted some attention, with mentions in music industry sites all over the place. The mentions were generally positive, and reactions were mostly humorous. You can check the trailer out here:

Like I realized some time ago, a lot of synth stuff is actually pretty simple, and it's more about user interfaces. I'm not sure if spreadsheet is the worst possible user interface when it comes to synthesizers, but that's what the MIDI integration is for - you can very easily connect your spreadsheet creation to physical encoders and pots of your MIDI gear.

I'm currently slowly working on beta 1.8 - hopefully with ASIO support - but the 1.7 beta that's available to download off itch.io (widget above) is already pretty powerful, with tons of additions after the 1.0 beta.

Some internals that may be of interest;

Ocornut's IMGUI library (relatively) recently added grid support which is the basis of the editor. Each of the grid cells contains a formula, basically what you might find on a single line of a programming language. This can get pretty complicated.

The formula is parsed into a string of tokens, so we get things like "2", "+", "3". At this point simple constant folding is performed, by trying to evaluate each operation or function; if all components are constant, we can do the calculation here and replace the formula (or part of the formula) with a constant value.

The constant folding is very limited, meaning that if you have, say, 2*x*3, a smart constant folder could say that's 6*x, but since we're not reordering (or even building a tree), we can't. It's not perfect, but it's something.

Next, the tokens are re-arranged to reverse polish notation, so "2", "+", "3" turns into "2", "3", "+"; in this form the tokens can already be executed via a simple algorithm: if token is a constant, push it to stack. Otherwise, pop the number of arguments, perform calculation and push the result. There's further small rules for parentheses and functions, but that's the gist of it.

This works, but isn't exactly fast. Enter JIT compiling - using a library called xbyak. Xbyak generates x64 opcodes and handles things like memory protection changes, but you still need to know x64 assembly and calling conventions and stuff.

Using xbyak the opcode array turns into a single huge x64 function with zero loops, but a lot of function calls. Some things I wrote out, so if you add two variables together, it will not produce a function call, but a call to low pass filtering definitely does. Just leaving out the need to loop through all of the spreadsheet cells sped things up a lot.

Compared to running the opcodes directly, the stack uses the actual machine stack instead of a virtual one, and after tinkering with it for a while I realized that I should just keep the top of the stack in a register, which means that if you're only calling single parameter functions, I don't need to push or pop the stack at all.

After the initial interest for Sassy started to wane, I also started to feel a bit tired on working on it, and thus the release rate has dropped. I hope I'll get the 1.8 beta out this summer, but I'm not holding my breath.

In other news, here's a z80 assembly tutorial where I write a complete game for the zx spectrum from scratch in 100% z80 assembler, along with a rather thorough look at the AY sound chip.

Synthalizer thoughts

February 14th, 2021 #

When the Atanua (logic sim) project was nearing its end I added some audio capabilities to it, but that didn't really work out due to Atanua's low simulation clock of 1kHz. Later on I started thinking about audio generation again, pondering if a spreadsheet would be a nice approach. Again, Atanua's node graph made more sense, as with a spreadsheet you can't really see the connections between things.

What I was reinventing was basically euro rack style modular synthesis in software. This is definitely not a new idea, there are free and commercial implementations out there. But for some reason I felt like I still wanted to write my own.

Today, instead of trying to massage Atanua's editor into something I could use, I'd use Ocornut's Dear Imgui and some implementation of node editor on top of it. There are several, I just linked thedmd's one, as that looks promising.

To be usable, the synth should be able to output at least 44k samples per second, so let's say it should be at least 50 times faster than Atanua was. Primary reason for Atanua's slowness is that its logic was actually pretty heavy; instead of simply saying a signal is low or high, Atanua also did stuff like not connected wires, error propagation, weak signals, etc. Each block had to have logic that dealt with various states, so even if you had a simple "and" operation, you had to deal with all of that.

I don't see any reason why the synth would have to deal with anything that complex. Let's think of a simple attenuator block (i.e, volume control);

    .-----.
    |c    |
    |  |  |
    |i + o|
    |  |  |
    |     |
    '-----'        

There's two inputs: input signal and control voltage, and one output. The control voltage is also biased based on the user input slider, so that might be considered a third input, but in practice it would be member of the node itself. The code might look something like:


void attenuate(float control, float input, float &output)
{
    output = input * (control + mBias);
}

That's pretty lightweight compared to what Atanua had to do. Of course, to call that function there needs to be a pile of code that goes through all the nodes, deals with what output goes to what input, and there needs to be some logic stating what the signals should be if some inputs are not connected; i.e, if the attenuator doesn't get a control voltage, the "bias" will be the only thing that affects the result, so the control would default to zero. That could be dealt by having an implicit "zero" wire that's connected to everything if there's no wire, so no additional logic is needed.

This by itself might be fast enough, but what could be done to go faster? The first idea is to pass buffers along instead of single samples. Unfortunately that would mess up feedback effects, which would be a bummer. I'm pretty sure we could get away with passing a small number of samples though, which would allow the use of SSE instructions...

One option would be to let the modules specify the granularity of samples they accept, and make the framework add buffers in the inputs as needed. All of the modules add some latency to the system, some more than others, so that wouldn't be a total nightmare. Maybe.

Another thing I thought of when pondering how to make Atanua run faster would be to turn the graph into code. There's a few options that I can find. One would be to write the modules in Lua and use Luajit, which would already run on various platforms, but would make things like handling 4 samples at a time via SSE a bit difficult. I'm sure there are other scripting languages with jit out there, but I haven't studied them too deeply.

Second option would be to roll my own language, which is kind of tempting, but when we get to even slightly more complicated routines that would be needed, like noise (aka pseudorandom number generator), the required language features would get quickly out of hand. (And don't get me started on fft..)

At least the language would require the ability to call pre-existing c routines. Also, if I wanted anyone else to add modules to the system, forcing them to learn a new language might be a bit too much to ask. Which would largely rule out the scripting language approach too, I guess.

Third option would be to use an actual c/c++ compiler, either bundled one or a system dependency. This would be rather complicated, and would add a noticeable pause whenever the code compiled. The positive side would be that the system could output synth .dll files (or, at least, compileable c/c++ files) that could be used outside the system, which would be neat. There's also things like cling out there which make the c++ compiler online, but I really don't want to have llvm as a dependency.

Of course, if the compilation only happened if the user specifically asks for it, or during boxing if the system supported that, that wouldn't be an overkill..

Additional positive side of using c/c++ compiler is that it should be pretty trivial to make a build that doesn't use the online compilation at all, and just take the performance hit.

MMXXI

January 17th, 2021 #

Year 2020, huh. That was. Well. Something.

There's a pretty thorough breakdown, and rather disappointingly empty pouet page.

So year in review. Everybody knows the covid messed up just about everything, but since my goals are mostly such that I can do them at home, that shouldn't be a reason.. but there we go.

A year ago I stated that I'd try to get some more hobby programming done. I can't really say that I'd made any great strides there. Music wise I did play around with making music and even streamed playing with my new Korg Wavestate a few times to massive audience of maybe 3 people =)

I'll just go ahead and give up on making goals. I have a bunch of things I'd like to concentrate on, and I might, but it's possible something else comes up.

Stuff, in no particular order:

SoLoud. I only managed one update last year, let's try to get to at least one update this year. If nothing else, there's an accumulated bunch of pull requests and 3rd party libraries to integrate.

DialogTree and by extension MuCho built on top of it. This is pretty much in the shape I left it a year ago. The idea here would be to get it to run on the Spectrum Next to allow people to create games. Maybe I'll whip a visual novel engine using DialogTree. First step would be to get a minimal DialogTree engine running on the next, though.

Music. I'll try to continue to play around with the synths I have, maybe even stream once in a while for people who want to torture themselves listening to someone fumble around. The long-term goal that I have is not to learn to play, but to learn to jam. It's all just for fun.

Finishing a game and getting it on Steam. I had this as a goal a couple years ago, and I still think it's a realistic goal if I just get around to it. I'm not looking to make the next greatest hit, but just to see what the process looks like and do it for the experience. Funnily enough, last time I made this a goal, 3drealms did it for me.

Finishing a hardware project. I have a bunch of them in some half- or even less finished states, including a usb racing wheel thingy and turning raspberry pi into a synth. These wait for major inspirations to advance. At least I managed to finish the hardware project that was required for the new year demo.

Getting in shape. I've found that moving snow is much harder for me, particularly my back, than it has been before. I should do something about it. I'm unlikely to do anything about it, but I should. I've heard good things about hula hooping, maybe I should learn how to. We'll see. Not holding my breath here. Round is a shape, right?

Things I'm more likely to achieve include working through my Steam backlog and watching Netflix. Both of which I could, in theory, manage while using an exercise bike, so there might be some way of getting the fit thing in there. Again, not too optimistic about it.

After 2020, it's pretty hard to feel optimistic about goals, really.

 

Older news have been archived here: 2020 2019 2018 2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 even older

Site design & Copyright © 2021 Jari Komppa
Possibly modified around: August 01 2021