Mike's dev journal

More thoughts on foreign function interfaces

Fri, 23 Aug 2024 16:04:09 -0300

I have been thinking some more about Using a C library from Java, especially the Translate to JVM approach.

When I tried this, I mostly got caught up in compiling the target library to MIPS in the first place. I tried building a GCC cross compiler for MIPS, but ran into some build errors and kinda gave up.

Since then, I’ve looked at using clang to target a MIPS binary, and this looks feasible, but I haven’t tried it. There’s also an option to build to LLVM IR and interpret that. This seems like a good idea, but I’ve heard, via other projects that work off LLVM IR, that it’s something of a moving target; also, it seems like a more complex process than it would be to integrate a MIPS core.

I’ll need to compile a C library with enough functionality to support the target library, and the library has C++ components so I’ll need to deal with that too. At least the library exposes a C interface so I don’t need to build C++ FFI glue.

I identified all of these problems earlier on, but despite all this I wasn’t quite sure how I would actually call the target library’s functions from outside the MIPS core. I knew I could use interrupts to switch out of the MIPS context, like syscalls do to switch from user to kernel context, but I didn’t quite know where to fit that machinery in.

So I came up with some options:

I could write some glue, in C, that I package with the library, to provide a syscall-like interface between the MIPS guest and the host (Java) code. The Java code acts as a supervisor and has access to the MIPS core’s registers, memory (& stack), etc., so would be able to use that for passing arguments & context around. But if possible, I’d like to do it without writing C glue code.
I can have some host (Java) code that set up the stack & environment etc. required to call a function, and then start executing at the right location. It’ll have to capture the return somehow, which I can either intercept with something akin to a debugger breakpoint, or place some code in memory to invoke an interrupt (as in option 1).

It occurred to me while I was considering this option, that even though I’m running the code in a virtual host environment that’s local to my own program, it’s still C code and I still have to think about all the usual FFI stuff: Type signatures, struct packing, memory management etc. That means that even if I run the library in an emulated core, I’ll still end up building an interface similar to Java Native Access or the newer Java Foreign Function and Memory (FFM) API. I guess I kind of hoped that part of the process would be simpler.

I still think it’s a viable solution with real benefits, especially if I need to target multiple platforms. I also wonder if it might be feasible to compile or transpile the C library directly to Java, whether to source code or bytecode. That sounds far more difficult to me because it depends on compiler internals, but maybe building a new backend onto an existing compiler could be practical?

On generating structured data from templates

Mon, 17 Jun 2024 22:08:32 -0300

Recently I chatted with a friend about generating structured data from templates. Specifically, he observed that Jekyll’s atom feed is generated from an XML template. He posted about his experience.

I’ve long felt that it’s extremely challenging to correctly use templates to generate structured data, meaning files like source code - HTML/XML/C/etc. I don’t particularly like templating, but I acknowledge its value, especially in HTML templating where you’re mostly writing static markup to represent the page layout. In fact, template-engine substitutions in HTML templates are only a small part of most HTML template files.

The problem I have is that the template engine doesn’t actually know the context for how it’s making substitutions. No HTML template engine parses the HTML and decides what escaping rules are appropriate for each piece of data being placed into the generated output.

This post is kinda rambly, but I have some lightly-organized thoughts about templating that I wanted to put into words.

A typical problem

Let me give an example:

<div id="name">Hello, {{ email }}!</div>
<script>
    database.saveEmail('{{ email }}');
</script>

If email is 'Baz' <foo.bar@example.com>, what happens?

I see two typical options, both of which result in the wrong result:

The value of email is substituted directly into the template with no modification. The resulting output is incorrect:

  <!-- WRONG! <foo...> is not a valid HTML tag! -->
  <div id="name">Hello, 'Baz' <foo.bar@example.com>!</div>
  <script>
      // WRONG! Incorrectly quoted; leads to syntax error!
      database.saveEmail(''Baz' <foo.bar@example.com>');
  </script>

The value of email is escaped for HTML. Django and many other template engines will do this for regular strings.

  <!-- Correct: The string is escaped for inclusion as plain text in HTML. -->
  <div id="name">Hello, 'Baz' &lt;foo.bar@example.com&gt;!</div>
  <script>
      // WRONG! Still incorrectly quoted, and now we're incorrectly
      // storing HTML entities in the database instead of the
      // original string!
      database.saveEmail(''Baz' &lt;foo.bar@example.com&gt;');
  </script>

Comparison to SQL

The problem is similar to SQL injection, although the stakes are usually a lot lower. The typical solution for avoiding SQL injection bugs when writing SQL is by using SQL-aware interpolation functions provided by our SQL libraries:

db.execute('SELECT * FROM foo WHERE email = ?', email)

In SQL, this works because SQL allows every value in a query to be represented as a string literal with consistent quoting & escaping rules, regardless of the data type of the value being represented:

select '3'::integer + '4'::integer;
-- Result: 7

But SQL-aware interpolation only works for data. Other SQL syntax and identifiers cannot be represented as string data, so this code:

# This won't work!
table_name = "user_table"
db.execute("SELECT COUNT(*) FROM ?", user_table)

results in an invalid query:

SELECT COUNT(*) FROM 'user_table';
-- ERROR:  syntax error at or near "'user_table'"

In SQL, you often avoid this by marking the interpolated string as “safe”, which indicates that you’ve already verified that it won’t lead to problems if it’s substituted raw:

table_name = "user_table"
db.execute("SELECT COUNT(*) FROM ?", AsIs(user_table))

You do have to be really careful that table_name doesn’t include anything malicious, since its contents will be interpreted as valid SQL syntax. I might even suggest that we should be able to tag it as a different kind of identifier, like TableName(table_name), so the interpolating code can validate/quote/escape it for use ONLY as a table name.

Usage contexts

The main problem I see with HTML and other languages is that there are way more different kinds of contexts that variables get substituted into.

Above, I showed that one escaping rule isn’t sufficient when a variable gets substituted into both HTML and JavaScript.

A common solution here is to indicate the context to your template engine - perhaps using a filter:

<div id="name">Hello, {{ email | html }}!</div>
<script>
    database.saveEmail('{{ email | javascript_string }}');
</script>

This works OK when the number of different usage contexts is small, like in this example, but I don’t like that you have to remember to use the right filter every time you code in a substitution. If the default behavior is to escape for HTML, you’ll start omitting the | html part, and then it’s easy to accidentally miss the | javascript_string filter because it’s used so much less-frequently.

And if you do miss it, will you even notice? You’ll only see problems with strings that contain syntax that that’s meaningful to JavaScript. So it becomes a bug that happens infrequently, which makes it harder to find later on. This is actually also a problem that SQL interpolation would suffer from:

# Don't do this! It's SO UNSAFE!
# But it results in working code, and doesn't even break for most typical
# inputs, and that's almost worse!

email = request.GET['email']     # eg. foo@example.com
db.execute(f"SELECT * FROM foo WHERE email = '{email}'")

Too many contexts

If you have a lot of different contexts that substitutions need to be placed into, it can be arduous to make sure they’re all correct:

# NOTE: Function {{ fn_name | for_comment }} is generated from a template.
def {{ fn_name | for_identifier }}():
    num1 = {{ num1 | for_number }}
    num_squared = num1 * num1

    logging.debug(r'{{ name | for_raw_string }}')

    print(f'Hey {{ name | for_string }}, your squared number {{ num1 | for_string }} is {num1}!')

The idea here is that you need to escape/quote/etc. values depending on how they’re being used. Like, in a string, \ should be escaped to \\, but that would be inappropriate in a raw string. Numbers shouldn’t have spaces or anything in them.

Not using templates

There are libraries that allow you to generate HTML by writing code in your host language. This is comparable to generating JSON using a JSON library, or XML using an XML library, and it can also be a pain:

doc = htmltag('html')
body = htmltag('body').add_to(doc)

# Static elements are too much work to create.
div = htmltag('div', {
    'id': 'outermost'
}).add_to(body)

# Attribute values are too much work, but they'll be properly escaped on output.
textinput = htmltag('input', {
    'type': 'text',
    'name': 'user_email',
    'value': email_address
}.add_to(div)

# When htmltext is rendered to HTML, its contents are escaped.
htmltext(f'Hello, {email}!').add_to(div)

response.send(doc.render_to_html())

In my experience, nobody wants to write HTML in anything other than an HTML file. Totally understandable - editors have good syntax highlighting, feedback, etc. for HTML, and you’re mostly writing static HTML anyway with only a few substitutions here and there.

What do I want anyway

I don’t really know. I think templating has substantial problems.

I feel like the fact that frameworks usually have default settings that “just work” for most cases, so it’s easy to get complacent in less-common situations. Or, people fail to gain an understanding how templating works, and the gotchas when they need to substitute different contexts, like JavaScript code.

Even if you’re well-aware of the limitations and gotchas, it’s also easy to make a mistake and not notice until some unusual text shows up and breaks your output.

By the way:

<script>
    database.saveEmail('{{ email | javascript_string }}');
</script>

It’s inappropriate to replace < and > with HTML entities in JavaScript strings, so they’ll get substituted verbatim in the string. What happens if email contains the text </script>?

Using a C library from Java

Tue, 28 May 2024 23:51:02 -0300

Recently I’ve been considering making Java bindings to an open-source C library.

It’s such a pain though.

Update: On 2024-08-23, I wrote another post about this topic.

Native binding to C library

Traditionally you’d do this with JNI:

Compile the C library,
Write some JNI glue in C and Java,
Package it all up into a JAR

Writing JNI isn’t trivial. I experimented with it and there are a lot of gotchas around memory management and string handling. I’m confident I could manage it but it’s a lot of work even to just move a UTF-8 string from C into Java without leaking memory or mis-handling exceptions.

Java 22 reached General Availability recently (March 2024) and it includes the first non-preview release of the Java Foreign Function and Memory (FFM) API, which is like a libffi or Python ctypes mechanism for Java - which Java Native Access (JNA) also already provided.

With that approach, you don’t write any glue code in C: Instead, you describe the C library’s exports in Java and use FFM/JNA to access them.

So then, your process looks like:

Compile the C library,
Write FFM/JNA glue in Java,
Package it all up into a JAR

It’s still not perfect, though. In C, you can have platform- and implementation-dependent definitions of primitive types, standard library types and typedefs, etc. In C, these are resolved at compile time, so JNI gives you the opportunity to adapt to the platform’s specifications in a general way.

I wrote about this problem before: Using setjmp/longjmp from Java.

setjmp is a pretty obscure example, though. Here’s an easier example: long int is 64 bits on Linux x86_64, but 32 bits on Windows x86_64, and also on both Linux & Windows x86_32. So if you want to call unsigned long strtoul(...), you need to know how big unsigned long is at runtime when you’re describing strtoul to FFM/JNA.

In theory, types and sizes will vary depending on:

Operating system (Linux, Windows, macOS, …)
C library (glibc, musl, MSVC, …)
Compiler (gcc, clang, Visual C++, …)

The above typically choose different behaviour depending on CPU architecture (x86_32, x86_64, 64-bit ARM, …)

Compile the C library

The C library you’re wrapping also needs to be compiled to match all of the above too.

Most Linux distributions standardize on glibc, but musl is also common (like on Alpine Linux, which is extremely common in Docker images).

Practically speaking, you’ll need to link dynamically against the same C library that’s being used on the system. If you bring another C library in (like through static linking, or including it as a pre-packaged dynamic library), you’re likely to encounter conflicts with the system-installed library.

You can avoid that problem with statically-linked executables, but libraries are not executables, so they have less control over their immediate execution environment. That is, they need to stay compatible with other libraries that are also linked into the same project, which are certainly using the system C library.

More precisely, you need to link against a version of the C library that is ABI-compatible with the one that’s present at runtime: If I compile my library against glibc 2.13 x86_64 Linux, I can be pretty confident it’ll run on glibc 2.15 x86_64 Linux, because glibc is backward-compatible. However, glibc is not forward-compatible, so it won’t run on glibc 2.10 x86_64 Linux. And of course, it won’t work on x86_32, musl, etc.

This doesn’t apply to JUST the C library, but any dependency you need to link against. That could include a C++ standard library or other more exotic dependencies, depending on the library you’re trying to wrap.

And since Java is used on so many different platforms…

… you end up having to compile your library for every possible combination you’re willing to support.

Look at the matrix of SQLite-JDBC supported operating systems:

They compile the SQLite C library separately for each of those targets! You can peek at the platform targets in the Makefile for a hint on how they’re cross-compiling. I find their approach very impressive, but it sure seems like a lot of work to maintain!

Translate to JVM

Emscripten compiles C code and libraries to Javascript. At a very high level, it does this by providing C standard library functionality (primarily, the functionality provided by the OS kernel), and using JS/WASM as the compilation target.

In JS, you basically have no choice: You can’t run native code in the Javascript sandbox, so you have to provide everything as JS code.

You’d expect a performance hit for this, but that’s OK for a lot of libraries. This is true for me, too: The library I want to provide in Java provides unique functionality, and doesn’t necessarily need to run fast.

So I’d like to use a similar approach in Java:

Compile a C library to JVM bytecode (or even plain Java code),
Write glue to provide a better Java-style interface,
Package it all up as a JAR

This is totally feasible. I’ve found two projects that use translation to achieve this:

LLJVM translates LLVM IR (bitcode) to Java bytecode and provides C library via newlib & custom Java. Inactive since ~2010.
NestedVM translates MIPS binaries to Java bytecode. GCC can create MIPS binaries. Inactive since ~2009 with some more recent updates available on a fork.

A lot of the discourse I’ve read focuses on how people want to call native C libraries from Java because the native code is expected to perform better, so this kind of “translation” approach typically gets dismissed: Why write your high-performance code in C in the first place if you’re going to run it in the JVM?

The library I want to wrap:

Doesn’t require high performance
Wasn’t written by me, so I didn’t have the choice to write it in Java vs. C
Has unique and specialized functionality that’s hard to replicate
Already works in Emscripten

So I’d love to try a translation approach and see how it works. It has some pretty significant advantages over using a native library:

No difficulty compiling for all platform, OS, CPU architecture, compiler, and standard library configurations
No difficulty porting, running, and testing on obscure platforms/configurations
Low maintenance: Java code, even compiled, typically ages well and works unmodified for years/decades

It sounds absolutely delightful!

Uncommon zsh shell techniques, part 1

Mon, 08 Apr 2024 11:47:02 -0300

Some of these work in other shells too, but I only use zsh these days.

Anonymous functions

() { cp "$1" /tmp/ } filename

This works the same as cp filename /tmp/, but it’s more convenient in some cases:

When you’re running the same command on many filenames, and want to use command history (up + enter) to modify the filename. You don’t have to position the cursor onto the filename mid-command - it’s just at the end.

When you use the argument multiple times, or need to use variable modifiers on the input:

# Print the directory containing the passed file
() { echo "${1:A:h}" } file.csv

# Transcode to mp3, unless the source is already mp3
() { newfn="${1:r}.mp3"; if [[ "$1" != "$newfn" ]]; then echo "$1 -> $newfn"; ffmpeg -i "$1" "$newfn"; fi } foo.flac

It’s also safe to use with spaces & other sensitive characters.

-c with variables

# List all files without extensions
find . -type f -exec zsh -c 'printf "%s\n" "${1:r}"' . '{}' ';'

It’s tempting and common to place {} directly into the argument passed to zsh -c, like this:

# DON'T DO THIS!
find . -type f -exec zsh -c 'printf "%s\n" "{}"' ';'

This will cause problems for filenames that contain special characters like ", because find (and many other programs) won’t escape them. How could it? It doesn’t know what escaping strategy to use, because it depends on the command you’re invoking. For example, we’re using zsh here, but if you were writing inline Python code you’d need to escape the string following Python rules instead of zsh.

By passing the argument to zsh -c instead, you can use $1 in zsh as a variable with all the safety that comes along with that. You also get to use variable modifiers like :r.

Note also:

I passed . to act as the $0 argument to the command-line script. I’m not using the value of it in the script, but I need to pass it so that the filename is passed as $1.
I used printf instead of echo because echo will try to handle filenames like -n as an argument.

Globbing flags and qualifiers

I find the zsh documentation on filename generation pretty hard to read, but here are some examples I use that might help:

Globbing flags

Globbing flags appear right before the part of the glob you want to apply them on. I usually apply them to the whole pattern so I put them right at the start.

These require extended_glob (see docs) to be set.

# Match all .jpg files, matched case-insensitively (so it also includes
# *.JPG, *.Jpg, etc.), like the option nocaseglob.
setopt extended_glob
echo (#i)*.jpg

Glob qualifiers

Glob qualifiers are suffixes that modify how the glob works.

# List all jpg and gif files. No matches = no arguments.
echo *.jpg(N) *.gif(N)

Adding (N) to a glob string makes it expand to no arguments if there are no matches (same as the null_glob option). Without this, you’ll typically either pass the raw argument *.jpg(N) if there are no matches, or zsh won’t run the command and will raise an error instead.

The exact default behaviour depends on the setting of options null_glob, nomatch, and null_glob.

Together

You can use them together:

# Match GIF, JPG, JPEG, HEIF, and AVIF extensions
# with case-insensitive matching,
# and run with no arguments if no files match.
setopt extended_glob
echo "Image files:" (#i)*.(gif|jpe#g|heif|avif|png)(N)

Bluetooth codec scripts for pulseaudio

Thu, 21 Mar 2024 19:58:19 -0300

I made some scripts to help me see what codecs are supported by my Bluetooth audio devices, and select the one I want.

My devices were coming up with the sbc codec which is the most basic codec, but they support higher-bitrate codecs. Selection is a little clumsy on my headless, ssh-access-only Linux box that I’m playing audio from.

My devices support for example:

sbc: SBC
sbc_xq_453: SBC XQ 453kbps
sbc_xq_512: SBC XQ 512kbps
sbc_xq_552: SBC XQ 552kbps

I’m surprised my devices only support sbc codecs and not aac/mp3/whatever else. Actually, I don’t know what’s even typical! Do other operating systems use other codecs? I don’t know! Maybe I’ll try to find out what codecs these devices use on macOS or Windows someday.

It’s also possible I’m not seeing other options here because pulseaudio only supports sbc for Bluetooth. I have read that pipewire has better Bluetooth codec support, but I’m not currently willing to swap a working audio setup (pulseaudio) for one that might need tweaking (pipewire).

The scripts are available on GitHub.

Current mood: 😀 accomplished
Current music: Big Wreck - Hey Mama

My first 4K monitor, on Windows

Thu, 07 Mar 2024 15:58:19 -0400

I just got a pair of 4K monitors - one for a Mac Mini, and one for Windows.

The Mac is hooked up over HDMI and I use it purely for desktop applications. It works fine.

But on Windows, I’ve encountered a surprising number of issues.

Problem 1: No display during boot

I connected the monitor using DisplayPort because it seemed most appropriate to my video card, a GeForce GTX 960. It has 3 DisplayPort ports, and only one HDMI port; and I didn’t know if the HDMI port supported 4K at 60 Hz (it does), but I knew the DisplayPort ports would do it.

I swapped the monitor while the computer was on, and everything was fine… but when I rebooted, I had no display.

FIX: It wasn’t super easy to find information about this but eventually I found a post that pointed me towards an NVIDIA firmware update tool for DisplayPort 1.3 and 1.4 displays that fixes the issue:

Without the update, systems that are connected to a DisplayPort 1.3 / 1.4 monitor could experience blank screens on boot until the OS loads, or could experience a hang on boot.

Problem 2: Euro Truck Simulator 2 stuck minimized

UPDATE: Fixed: My Epson scanner software includes a tray icon. If I kill the process, this problem goes away. I guess it’s stealing focus when the resolution & scale change? Even though it isn’t actually showing a window? 🙄

My video card can’t handle 4K resolutions at a reasonable framerate, so I’m running games at 1080p. Also, I often stream games to my living room TV using Steam, and it’s a 1080p TV so it fits better.

When I launch Euro Truck Simulator 2, it immediately minimizes into the background, and any attempt to restore it brings it up for a brief moment but then it goes minimized again.

It doesn’t happen if one of the following is true:

ETS2 is run at the desktop resolution—but at 4K it takes a severe framerate hit… or
Windows display scale is set to 100%—but at 27” 4K, 150% is far more usable. This is the workaround I’m using but I wish I didn’t have to!

I don’t know if this is an ETS2 problem specifically, or a Windows problem. I assume other games will be affected too, but I’ve only tried Cities: Skylines and it has no such issue. ETS2 actually changes the desktop resolution for fullscreen, while Cities: Skylines uses a borderless mode that leaves the desktop resolution unchanged; this might explain the difference.

Problem 3: 1080p not pixel perfect

A 4K monitor can theoretically upscale 1080p using pixel doubling, where each 1080p pixel is displayed with four 4k pixels (doubled in both X and Y axes). I want this because it looks clear and perfect, as though I’m using a 1080p monitor…

… but my particular monitor (LG 27UL550-W) doesn’t do this - it performs smoothing/interpolation of some sort on the upscale, and as a result it looks blurry.

I feel that my GPU drivers should be able to render at 1080p but output at 4K, but if it can I haven’t found out how.

UPDATE: Integer scaling is available in the NVIDIA control panel for Turing-architecture GPUs (GeForce 16xx, GeForce 20xx and up). I have a 960 so outta luck!!

Problem 4: DisplayPort disconnects when monitor off

When I turn off the monitor, the computer sees that as a disconnected display. This is a well-known hotplug detection feature.

This isn’t really a problem when I’m sitting in front of it, but I like to stream games from the computer to my living room TV.

When I do that, I want to turn off the locally-attached display. If I do, then games basically don’t work - they see no display connected and aren’t able to select a display resolution because there’s no display to switch on. So streaming just doesn’t work at all.

Even if I’m not streaming, I prefer to have direct control over the power of my display, instead of having to use the display sleep timer to shut it off.

WORKAROUND: Use HDMI, but long-term I’m probably just gonna have to live with this problem because I understand some features require DisplayPort, like FreeSync. Some monitors have an option to turn off while appearing connected to the computer, but mine doesn’t!

Other thoughts

These are all small-ish problems. Some of them have workarounds or whatever, but like, they’re all surprising issues that I feel shouldn’t happen at all. And I’ve only had the monitor for one day!

Hardware

MSI NVIDIA GeForce GTX 960
MSI Z270-A Pro motherboard
Windows 10 up-to-date
NVIDIA drivers up-to-date

My old monitor is a 2560x1440 panel connected over dual-link DVI. It exhibited none of the above problems, but I used it at 100% scale, at native resolution, and without DisplayPort.

Un-mangling some mangled unicode

Sun, 27 Aug 2023 13:56:26 -0300

Recently I got some data from an external source that I’m to review and correct prior to use. One of the things I’ve been addressing is weird Unicode encoding stuff.

For example:

b'O\xc3\x82\xc2\x80\xc2\x99SAMPLA'

Clearly this is supposed to have an apostrophe ’, but how on earth did it get turned into \xc3\x82\xc2\x80\xc2\x99?

After poking at it with different coding systems for a while, I finally figured it out:

# Mangle input
('’'
    .encode('utf-8')    # b'\xe2\x80\x99'
    .decode('latin-1')  # 'â\x80\x99'
    .upper()            # 'Â\x80\x99'
    .encode('utf-8')    # b'\xc3\x82\xc2\x80\xc2\x99'
)

I’ve never seen mangled Unicode get passed through .upper() before. I wasn’t around to see this data get created in the first place, but my guess is something like this happened:

Software A accepted the input O’SAMPLA
Software A exported the data using UTF-8 encoding
Software B imported the data but incorrectly interpreted it using Latin-1 encoding
Software B uppercased the data (typical for this software)
Software B exported the data using UTF-8 encoding

Here’s the reverse, to restore the original data:

# Fix mangled input
(b'O\xc3\x82\xc2\x80\xc2\x99SAMPLA'
    .decode('utf-8')    # 'OÂ\x80\x99SAMPLA'
    .lower()            # 'oâ\x80\x99sampla'
    .encode('latin-1')  # b'o\xe2\x80\x99sampla'
    .decode('utf-8')    # 'o’sampla'
    .upper()            # 'O’SAMPLA'
)

This works for this particular input because Â needs to become â before the latin-1/utf-8 interpretation steps, but I don’t consider it appropriate to assume this will work for all inputs. Some inputs may not have been affected at all by upper(), and it would be incorrect to apply lower() to them.

Unfortunately I can’t predict with total confidence whether applying lower() is appropriate for each input, so this data is gonna require manual review.

Hurdles to making a multitasking environment on the NES

Wed, 23 Aug 2023 18:35:32 -0300

I’ve been thinking about what would be required to make a multitasking environment/platform on the NES.

Requirements:

Can load applications on-demand as independent processes
Can launch multiple instances of each application
Uses cooperative multitasking

Realistically you’ll want the cartridge to have some RAM and allow bank switching for both RAM and ROM in order to increase the memory & storage available to programs.

The 6502 has a single stack that’s fixed to live from $100 to $1ff. This is mapped to console RAM and can’t be bank-switched. Each process wants its own stack, so they’ll either have to share this very limited space, or you’ll have to swap the contents of the stack when switching tasks.
- Compared to x86, where you can update SS to any segment & SP to any location within the segment.
Similarly, process memory stored in system RAM will need to be swapped out on task switch. Memory located above $4020 could be bank-switched instead.
Accessing data in different banks is desirable so we can jump to code in a currently-unloaded bank, or simply access data from one.
- We can add code to perform this work and store it in a fixed bank that’s always available, and make the compiler use that instead of a plain JSR.
- Pointers will need to include bank information as well.
- Compared to x86, where you can jump to a different segment directly without losing access to the caller’s segment.
Graphics/PPU state also needs to be associated with each process.
- It’s probably easiest to give the active process the full screen, instead of allowing background processes to share the display (eg. overlapping/tiled windows). The CHR ROM (or other video data) for a background application probably needs to get swapped out for the active process so it won’t be available to show properly, and there are challenges around sharing the current palette.
  - It might be possible to switch banks/contexts between scanlines, which could require windows to be screen-width but let them successfully stack vertically.
- A system menu UI could be handled using code & data in reserved always-available banks, like the code we use to handle moves and jumps across banks.

Using setjmp/longjmp from JNA

Sun, 13 Aug 2023 20:00:00 -0300

TLDR: I didn’t think it would work, and it didn’t.

Today I had a goal to call an established C library from Java, but it uses setjmp and longjmp for error reporting.

I had been hoping/planning to use Java Native Access to interact with the libraries. This is just a simple hobby project, so I want to keep it as simple as I realistically can. That means I don’t want to add a C build step to my project at all, not to mention having the build target multiple OS platforms and CPU architectures.

But I didn’t really expect setjmp and longjmp to work in Java. I have no idea what the JVM does with the execution environment and I expected longjmp would interfere with it in a way that would very probably corrupt the JVM’s state.

I tried it anyway. It didn’t work. The program crashed with SIGABRT after longjmp (running on Linux).

I encountered some things I found a little more interesting than just “it doesn’t work”, though:

jmp_buf’s size isn’t predictable

setjmp requires that you allocate a jmp_buf to store the environment in.

jmp_buf is defined in the system setjmp.h. On my 64-bit Linux system, sizeof(jmp_buf) == 200, and it’s defined as a 1-element array containing a struct, so it can be allocated easily then passed by reference.

I dug into setjmp.h first to understand it more, and realized the size of jmp_buf isn’t really predictable:

It varies by architecture even with the same C library, and
It’s not specified by the standard that it even has to be a struct or anything. It could just be a handle or whatever.

setjmp could be a macro

The standard doesn’t specify whether setjmp is a function or macro. JNA can only call functions, since macros are inlined by the compiler at build time.

(I didn’t check how it’s implemented in other C libraries, like MSVCRT on Windows or libSystem on macOS.)

Not exactly related, but I also happened to call fflush(stdout) from Java. It turns out that stdout is actually specified in C89/C99 to be a macro. In glibc, it’s also exported as extern FILE *stdout so I was able to use that, but then my code would not conform to standard.

I guess I’m gonna have to write a C adapter library that’s more Java-friendly.

ChatGPT does canvas

Sun, 30 Jul 2023 17:06:00 -0300

I asked ChatGPT to draw a bunch of things using a Javascript canvas. I created a gallery of the results.