email sol follow sol rss feed of the blog wishlist Sol::News
S

Important note:

When linking to these pages, please use the URL:

www.iki.fi/sol/ - it's permanent.

RSS
(1,0) (1,3) (-2,-1) (3,-1) (3,-3) (-2,-3)

NextSync Revisited

May 24th, 2020 #

Well, I thought I was done with NextSync. I released it, as version 0.5, and to my surprise got a bunch of reports that it wasn't working. Come on! It works on my machine(tm)!

Turns out there are a lot of variables. And I apparently got really lucky that it (seemed to?) work for me.

I've made and released three versions since, each one (hopefully) more robust than the last. I've added two-dimensional hashing, packet numbering, retry and restart commands in the protocol, a lot of error handling, and still it manages to blow up sometimes. But at least the data should be safe. Or at least safer.

The problem is that instead of talking with the network, the program has to talk with uart, which talks (without handshaking) with the wifi module, which then talks with the network. The uart has a 512 byte i/o buffer, which, since there's no handshaking, may overflow and data is lost. The wifi module itself is a bit temperamental, and sometimes just decides that it's too busy to talk with us. Sometimes a random byte may be lost because of timing issues and possible racing conditions. It's all rather fragile.

Add to that the fact that there are different versions of the wifi module, different folk have slightly different screen timings (which affects how to talk with the uart - I have no idea why these things are connected), and naturally everyone's actual network environment varies. So there's a lot of small timing differences, if nothing else.

Draining the uart buffer as fast as possible is critical. At maximum speed (2 megabits) you only get around 100 z80 instructions per byte, which isn't a lot if you're writing in C and want to do something fancy with it. After speeding that up we got to version 0.6.

While I was sane enough to include at least some kind of checksum in the first version, that wasn't sufficient. 0.6 included two bytes of checksum per sent packet; first byte is xor sum of all bytes, and second byte is the sum of the running sum of xor. This is easier to understand in code, so:

    c0 = 0
    c1 = 0
    for x in data
      c0 = c0 xor x
      c1 = c1 + c0

The packets also included the data length, which helped make the protocol more robust. Instead of relying on timeouts, code would look at the incoming data and figure out how much data is still to be expected. The timeouts are still there, though, because something is always bound to go wrong.

I was also aggressively emptying the uart between operations in case the wifi module felt talkative. More about this later.

A large part of the transfer rate is handshaking. The whole protocol is driven by the next - it asks for data, receives it, checks that it's ok, saves it to disk, and then asks for the next bit. All those checks and saving take time, so the longer we spend doing checks, the slower the overall transfer rate gets. Thanks again to Ped7g for optimizing my assembler version of the checksum calculator! I've considered doing the checksum calculation while receiving data, but that's a timing critical bit which I'd rather not mess around with. It would, still, speed things up. The disk write delay I can't do much about - I have toyed with the thought of using the timer interrupt to multitask disk writes and the network transfers, but that's insanity.

Since 0.6 started working for other people, I quickly learned about various use cases which I had not thought about. People also ran the server python on linux and mac, and found that closing of the sockets didn't work, so I found out about the SO_LINGER option which can be configured to drop the connection on the floor on close, so I set it to that and things started working for those people.

Some people were curious about the tranfer rate, so after pondering how to put that in the spectrum program and realizing that getting a real time clock value would require me to set up an interrupt service, which might make the already rather fragile system more fragile, I opted to placing that calculation in the server. It's not as accurate as it would be on the client, but it's much better than nothing.

The version 0.7 also added checking whether the filename packet is valid, after I got a bunch of REALLY interesting looking files on my sd card by accident; the 0.7 version also attempts to create directories if they don't exist (which seems to fail for some people), as well as a bunch of optimizations which boosted the transfer rate a bit.

At this point I figured I'm done (again) and oh boy was I wrong.

Since more people started using it and reported their uses and problems, I had to keep going. The server synced files based on the file date, which was a problem for some people who wanted to sync old files (like .tap files from archives). I added known file tracking to the server so while it still checks the file dates, it also syncs files it didn't know about before. I added options to sync all files all the time, as well as some other options.

On the client side, I kept seeing some hangs. I kept investigating this, and I actually rewrote the whole client using the "transparent" wifi module mode. (Source code for this is in github for reference). While I did get that working I found that the transfer rates with it were rather bad, because it only sent data every 20ms regardless of how much (or little) data there was.

While working on that version I did what I should have done in the first place, and wrote my text output as a "console i/o" thing, meaning I could just output everything the program received on screen. And I learned a lot. This was too slow for some problems, though, so I also added a mode which logs the transfers on disk, and that helped a lot too.

Turns out the wifi module might not answer "OK" or "ERROR", but just about any operation might respond with "busy". And that might go on for a very, very long time. I added handling of these cases, making the client retry until the command succeeds.

Unfortunately there's still some corner cases where the 'busy' handling might eat part of an incoming packet, or we might send both "get" and "retry" requests out, messing up the result. I added packet number tracking for these cases: if the packet number does not match what we expect, we start the file over.

Even with all of this, sometimes the transfer just hangs. This is so rare that logging it is pretty difficult, so I don't know what happens. But since it's an open mystery, it's quite likely that I'll end up making yet another version, even though I honestly feel I'm done with it. Again.

The 0.8 version also comes with a few options the users can play with if they're having problems: there's three versions of the actual sync application (slow, normal and fast), and the server can be set to send safe 256 byte packets (or unsafe ~1.5k packets). The 256 bytes is so small it fits completely in the uart buffer, so overruns are very unlikely. That does pretty much decimate the transfer rate though - the average transfer rate of 23kBps with the normal sync drops down to about 16kBps with the "safe" packets. The combination of slow sync and safe packets drops down to 6kBps.

While all of this was going on, every release improved on the documentation. Some misconceptions were fixed in the documentation, more troubleshooting items were added, some important bits were written so that people who don't actually read, but just glance at the docs will find the important bits. I hope that effort shows.

The release thread is here on the specnext forums. I've yet to create a specnext page on this site..

SpecNext Adventures

May 19th, 2020 #

The ZX Spectrum Next finally shipped, and I've been playing with it for a while. And when I say "playing" I mean making some software for it. I've ordered a joypad so I can also play some games on it, but in all honesty, I bought it as a toy to code stuff for.

The next is a quite different beast from a 48k speccy. Some things are familiar, others much less so. In some ways it's quite constrained (like the 48k was), and in others it's almost too powerful. When small devices get powerful enough, you have to ask why not write programs for a desktop machine (or heck, as a web page) instead.

The most strict limitation for the 48k speccy was storage. You get around 40k of RAM, and that's it. That's your work memory and storage. On the next you have over half a meg to play with, which can be banked at 8k granularity. And you can load more stuff in from disk whenever you want. Or off the network, if you really want. So storage limits do not, in practice, exist on the next.

Second limitation of the 48k was audio - you only had the beeper which had to be busy-loop bit-banged to make noise. The 128k speccy introduced a 3-voice AY sound chip. Next has three of those. If that's not enough, you can play samples too. No real-time mixing as of yet, though, so it's slightly limited still, but a far cry from the 48k limits.

Third limitation of the 48k was speed. 3.5MHz z80 can only do so much in a frame. The next can boost this up to 28MHz (with some small caveats) which sounds slow in modern gigahertz era, but feels rather fast when you're used to the 3.5MHz speed. There's also some additional opcodes in the next's z80, including multiplication (which is ironically as fast as addition). The C compiler I use doesn't support that as of yet, though, but I think it's only a matter of time.

Finally, fourth limitation was the graphics mode, which is a 256x192 bitmap with a 32x24 two-color overlay (i.e, you can pick the two colors for each 8x8 cell of the bitmap) with tight limitations. The graphics mode is what makes zx spectrum games instantly recognizable, because, well, it's just bad.

The next introduces a bunch of new graphics modes, including paletted 256 color ones, as well as hardware sprites. So that limitation doesn't really exist either.

Not that I'm complaining; there's still enough limits to be creative. The biggest negative for developing for the next is that your target audience is relatively small, and even the emulators are still a work in progress.

In addition to supporting vast majority of old spectrum and spectrum-variant software, the next has two primary application formats: dot and nex. The nex format is meant for applications that completely take over the machine, which in most cases means games. The dot format is for tiny tools that behave nicely with the system and exit cleanly.

The dot commands can also be called from basic programs, which is interesting. They have a bunch of constraints, like size limit of 8k (which can be worked around if you really need to).

The first thing I wanted was to see if the zak file format I designed works on an actual machine, and I felt the dot application format was what it needs. I wrote tools to create these applications using SDCC, and made the player.

The next order of business was to create similar toolchain for the nex format, but in all honestly I hated the way I had to swap the DS card back and forth between PC and the next. In addition to the slowness of the physical motion, the next always noticed that I took the card out and had to be reset, and the PC sometimes noticed it too and I had to unplug the sd card reader. The movement was also risky because the sd card might get corrupted, which actually did happen once.

So instead of working on the nex format, I wrote another dot command: sync.

The dot command, in addition to a python script on the pc side, lets me copy files to the next using the next's wifi module. This was the first time I've written a program that uses the network, so I'm rather happy to see that it at least seems to work, and rather fast, too.

Python was the perfect choice for the server implementation. The whole server is about 130 lines of python, and it includes file date checking, recursion to find the files to send, ignore list file (using file masks), as well as the actual file sending protocol handling.

I'm not saying it's perfect, but it at least seems to work fine. After I got the file transfer going I continued development using the tool itself and it was SO much nicer than moving the sd card back and forth. The release thread with binary downloads link on next forums is here.

The Twenty-Eight

March 25th, 2020 #

When I was a kid in the 80's, it was in vogue to do holiday trips to southern Europe. We'd mostly do trips to Rhodos in Greece, but we also did a few trips elsewhere. We visited Spain and Italy. One of those trips was a typical "hotel" vacation where we'd spend most of the time in the hotel and its pool. I think it was in Italy, but it might have been in Spain. It's also possible I'm mixing the two trips. I was a kid, after all.

Anyway, I roamed the close by streets with my big brother, and we found a sandwich place which we called the "twenty eight" because that was the street number. It's likely that the place had a proper name, but I don't remember it. We'd eat there with my brother on several days, and one day we forgot to pay, and went back to do so, and the person who ran the place had forgotten that we had not paid either. Most of the times we'd sit at the sandwich place to eat, instead of taking the sandwiches with us. There was a closed English restaurant across the street - that makes me think it was probably Spain, but again, there may be English pubs in 80's Italy too.

We'd also spend time in some bar or another, when we were going around with our parents. I remember that some fancy drink my parents were having had a translucent plastic sword in it, and I said I'd want one - not the drink, but the sword. I think I ended up getting something like twenty of those. In retrospect I think I would have valued a single one more than the pile I got, so it goes to show that less is more, even when it comes to kids. Other times I'd use straws to build things to spend the time while the older people talked about things.

There were also coin-ops. I remember spending a lot of time playing one which I think was a wonder boy game, but I haven't found it afterwards. I'd get so good at it that I'd play for hours on one set of coins, which means it wasn't a very good as a coin-op game (i.e., profits are low) but it was a nice game. I got good enough to find some secrets in it even - jumping on top of clouds to find a better sword (Excalibur?). There was some other Finnish kid there who wanted to play, so he'd come stand behind me and start saying things like "what do you think you're doing?" and "that's not the way to play this game" etc trying to unnerve me so I'd make mistakes and die. We thought that was hilarious, and that became one of our running jokes with my brother for years to come.

As an aside - I remember playing Yie Air Kung-Fu as a coin-op on a ferry once, playing it through with one coin, then turning around to realize I was surrounded by a crowd of kids who had gathered to watch me play.

Things were really different back then. Who'd let 10 year old kids roam around freely in a foreign country these days? Those memories came to me when I started pondering about this pandemic we're experiencing and when - or whether - we'd get to make a holiday trip next time.

SSE and Memory Alignment

February 28th, 2020 #

Working towards a second update of SoLoud this year, fixing a bunch of omissions from the last one and adding a few little things here and there, I also looked at profiling and figured I'd add some SSE magic to the panAndExpand function which was taking about 40% of runtime from my simple benchmark loop.

The function is a prime SSE target, because it reads from one buffer, does some multiplications and writes to another buffer. Both of the buffers are already memory-aligned, so it's a breeze. The result was a dramatic speed increase, and I even considered doing AVX version of the same, because it would be trivial, but having several code paths for various CPU feature bits isn't, so I didn't bother.

And then someone complains that there's a crash in the new code. What gives?

Well, as it turns out, while the buffers are aligned, accesses to them weren't. The way audio is stored internally inside SoLoud is that if you have 512 stereo samples, they're stored as 512 left samples followed by 512 right samples. So, to get to the right samples, you'd get the base pointer (aligned) and add 512 * samplesize to it.

Now, what happens if it's not 512 samples, but 431? Boom.

But I had tested the code, with those kinds of buffer sizes too, and had no problem.

First I suspected it was older CPUs, but these were ryzen5 and pretty recent corei7 systems.

Digging further info about it, it seems misaligned access isn't such a big deal. On x64, the alignment check is disabled by default, and on x86, the operating system deals with it. And to enable the check, you need to flip a CPU configuration register that's only accessible to the operating system. And even if you did enable it, ALL misaligned reads would crash, including trivial ones like reading 32bit ints from non-32bit aligned addresses. Which includes stdlib calls.

I did find some code snippet that should enable the alignment check but it didn't do anything for me, so I guess the operating system doesn't have it enabled.

The common factor in the two systems where the crash occurs is gcc on mingw, so what I'm guessing is that it's been written to support only aligned calls and it tries to set the alignment check on, which then succeeds or fails based on what drivers you happen to have on your system and what state the CPU config register ends up using said drivers.

In the end I did align those buffers by introducing stride parameter that is passed around. It's better to have everything aligned for cache reasons anyway.

If there's an easy way to turn the alignment check on, I haven't found it.

Regrets

February 23th, 2020 #

While in bed trying to get back to sleep in the middle of the night, I started thinking of maybe making some kind of list of things I'd wish I could have told the younger me. Not that I'd have really learned anything from being told, because the fun thing about life lessons is, while you can condense them into a few words, it's the experience that lead to those words that is what actually made you learn it.

Anyway, for some reason I gave this project the name "hundred thousand", maybe because there's so many things to tell, or maybe it's a hundred things to tell a thousand people, or vice versa, or something. Doesn't matter, I was half asleep.

Most of the things I could think of aren't things to avoid, but things that help - like keeping scope small, redirecting procrastination, not being afraid to experiment, not being afraid to give out source code - but there's one thing that has lead to regret several times:

Thinking I'm in a hurry when I'm actually not.

In the army whenever we left for vacation, we had a few options on how to get to the train station from the garrison. One could walk, but that would take half an hour, or we could hitch a ride from one of our car-owning fellows, or one could take a taxi or a bus, but those cost money and are in short supply. So hitching a ride was the preferred option, for me, at least. To do so, you'd have to be quick enough so everybody hasn't left yet. There were a few people who had given me a lift, but I was in pretty good terms with most people, and even some people who never gave a lift to anyone might give a lift for me.

I was a scribe, working closer with the higher ups than most people there. There were two of us, working as a pair. When the day came for them to release us from the service, there were a few hectic days of paperwork, and then suddenly we were sitting in a room with everybody else, and people were dispatched in random order. We waited anxiously for our name to be called, but it never was - the higher ups held our passes back so they could ask us to go for a coffee.

In retrospect, if they had told us this instead of letting us wait for our names to be called, things may have turned out differently.

Anyway, as you might guess by this point, I felt I was in too much of a hurry to go for coffee, took my pass (as did my partner-scribe) and left. Why was I in such a hurry? To save myself a 30 minute walk? I would probably have gotten a ride from the higher ups, or they would have organized something. I'll never know what would have happened if I had stayed.

As it turns out my partner-scribe was in such a hurry to leave that he didn't wait to give me a ride. Luckily someone else wasn't, and I got to the train station, where, if I remember correctly, I had to wait about an hour for the next train anyway.

Happier New Year Again

February 15th, 2020 #

Last year I wrote that I have three goals for 2019: release a game, track my blood pressure and play with brilliant.org to get my head working in the mornings.

Well, all of those failed, but more of that a bit later. Here's the new year's demo:

Speaking of new year demos, it's interesting that youtube claims that tAAt 2016, released on January 1st of that year, steals from a song released two years later. The whole copyright enforcement thing is a farce.

I've been a bit lazy updating this blog lately, as you may have noticed (new year's post 1.5 months late?). No promises on updates on that front. I did manage to get a new SoLoud release out.. which means I only managed one release of that last year. I'll try to get at least two releases out this year, but... no promises.

Talking of promises.

I never really got into the habit of tracking my blood pressure. To do so, I'd really have to integrate it into my morning or evening routines, otherwise it just won't happen. Whether I will manage to do that remains to be seen, but I'm not too optimistic, having failed last year.

Waking up with brilliant.org was a nice idea in theory, considering that advent of code worked so well. Unfortunately, brilliant just ended up feeling more and more like busywork instead of interesting; on one hand most of the content worked fine by just doing the work on the phone, but others then basically required doing all the work on paper. I don't have the stats, but I think I gave up on it in a couple months or so. In short, it felt like waste of my time.

When it comes to releasing that game, well... life, and deaths, intervened, and I've been more or less in stress management mode most of 2019. Which meant I mostly worked on my Steam backlog. I've gotten a bit more done recently (working on DialogTree and SoLoud, mostly), so I'm a bit more hopeful for this year. But we'll see what happens.

Maybe it was a mistake to make new year's goals last year. As Vertasium puts it, using themes is better than goals, and when I look back to the resolutions I've made in the past, general themes have worked better than exact goals.

So.

This year, I'll try to get more hobby programming done. I have plenty of projects open, and we'll see how much those progress, or if I stumble on a completely new thing that will consume my time. The projects I'm talking about are: SoLoud updates, getting DialogTree together (it's in usable state now, but some bits are missing, including documentation update), and a demo I've been speccing for a while. I don't know if that demo will make it in time for any demo party, so I may end up releasing it just for the heck of it.

Second thing is music. I recently acquired Yamaha Reface DX, which is a fun little thing - it's dimensions are almost exactly the same as child's toy piano, except that its keys are super nice, velocity-sensitive and polyphonic, and it contains a powerful synth engine that's more or less comparable with the classic DX7. It's not perfect (if used with batteries, the batteries die in around 5 hours, which could be helped if the rather crisp and bright screen could be turned off.. and it sometimes just resets to factory defaults, forgetting the stored patches, which is somewhat frustrating), but it's way easier to just grab and jam for a while than regular-sized keyboards.

 

Older news have been archived here: 2019 2018 2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 even older

Site design & Copyright © 2020 Jari Komppa
Possibly modified around: May 24 2020