Stardust | Starbeamrainbowlabs

Using whiptail for text-based user interfaces

One of my ongoing projects is to implement a Bash-based raspberry pi provisioning system for hosts in my raspberry pi cluster. This is particularly important given that Debian 11 bullseye was released a number of months ago, and while it is technically possible to upgrade a host in-place from Debian 10 buster to Debian 11 bullseye, this is a lot of work that I'd rather avoid.

In implementing a Bash-based provisioning system, I'll have a system that allows me to rapidly provision a brand-new DietPi (or potentially other OSes in the future, but that's out-of-scope of version 1) automatically. Once the provisioning process is complete, I need only reboot it and potentially set a static IP address on my router and I'll then have a fully functional cluster host that requires no additional intervention (except to update it regularly of course).

The difficulty here is I don't yet have enough hosts in my cluster that I can have a clear server / worker division, since my Hashicorp Nomad and Consul clusters both have 3 server nodes for redundancy rather than 1. It is for this reason I need a system in my provisioning system that can ask me what configuration I want the new host to have.

To do this, I rediscovered the whiptail command, which is installed by default on pretty much every system I've encountered so far, and it allows you do develop surprisingly flexible text based user interfaces with relatively little effort, so I wanted to share it here.

Unfortunately, while it's very cool and also relatively easy to use, it also has a lot of options and can result in command invocations like this:

whiptail --title "Some title" --inputbox "Enter a hostname:" 10 40 "default_value" 3>&1 1>&2 2>&3;

...and it only gets more complicated from here. In particular the 2>&1 1>&2 2>&3 bit there is a fancy way of flipping the standard output and standard error.

I thought to myself that surely there must be a way that I can simplify this down to make it easier to use, so I implemented a number of wrapper functions:

ask_yesno() {
    local question="$1";

    whiptail --title "Step ${step_current} / ${step_max}" --yesno "${question}" 40 8;
    return "$?"; # Not actually needed, but best to be explicit
}

This first one asks a simple yes/no question. Use it like this:

if ask_yesno "Some question here"; then
    echo "Yep!";
else
    echo "Nope :-/";
fi

Next up, to ask the user for a string of text:

# Asks the user for a string of text.
# $1    The window title.
# $2    The question to ask.
# $3    The default text value.
# Returns the answer as a string on the standard output.
ask_text() {
    local title="$1";
    local question="$2";
    local default_text="$3";
    whiptail --title "${title}" --inputbox "${question}" 10 40 "${default_text}" 3>&1 1>&2 2>&3;
    return "$?"; # Not actually needed, but best to be explicit
}

# Asks the user for a password.
# $1    The window title.
# $2    The question to ask.
# $3    The default text value.
# Returns the answer as a string on the standard output.
ask_password() {
    local title="$1";
    local question="$2";
    local default_text="$3";
    whiptail --title "${title}" --passwordbox "${question}" 10 40 "${default_text}" 3>&1 1>&2 2>&3;
    return "$?"; # Not actually needed, but best to be explicit
}

These both work in the same way - it's just that with ask_password it uses asterisks instead of the actual characters the user is typing to hide what they are typing. Use them like this:

new_hostname="$(ask_text "Provisioning step 1 / 4" "Enter a hostname:" "${HOSTNAME}")";
sekret="$(ask_password "Provisioning step 2 / 4" "Enter a sekret:")";

The default value there is of course optional, since in Bash if a variable does not hold a value it is simply considered to be empty.

Finally, I needed a mechanism to ask the user to choose at most 1 value from a predefined list:

# Asks the user to choose at most 1 item from a list of items.
# $1        The window title.
# $2..$n    The items that the user must choose between.
# Returns the chosen item as a string on the standard output.
ask_multichoice() {
    local title="$1"; shift;
    local args=();
    while [[ "$#" -gt 0 ]]; do
        args+=("$1");
        args+=("$1");
        shift;
    done
    whiptail --nocancel --notags --menu "$title" 15 40 5 "${args[@]}" 3>&1 1>&2 2>&3;
    return "$?"; # Not actually needed, but best to be explicit
}

This one is a bit special, as it stores the items in an array before passing it to whiptail. This works because of word splitting, which is when the shell will substitute a variable with it's contents before splitting the arguments up. Here's how you'd use it:

choice="$(ask_multichoice "How should I install Consul?" "Don't install" "Client mode" "Server mode")";

As an aside, the underlying mechanics as to why this works is best explained by example. Consider the following:

oops="a value with spaces";

node src/index.mjs --text $oops;

Here, we store value we want to pass to the --text argument in a variable. Unfortunately, we didn't quote $oops when we passed it to our fictional Node.js script, so the shell actually interprets that Node.js call like this:

node src/index.mjs --text a value with spaces;

That's not right at all! Without the quotes around a value with spaces there, process.argv will actually look like this:

[
    '/usr/local/lib/node/bin/node',
    '/tmp/test/src/index.mjs',
    '--text',
    'a',
    'value',
    'with',
    'spaces'
]

The a value with spaces there has been considered by the Node.js subprocess as 4 different values!

Now, if we include the quotes there instead like so:

oops="a value with spaces";

node src/index.mjs --text "$oops";

...the shell will correctly expand it to look like this:

node src/index.mjs --text "a value with spaces";

... which then looks like this to our Node.js subprocess:

[
    '/usr/local/lib/node/bin/node',
    '/tmp/test/src/index.mjs',
    '--text',
    'a value with spaces'
]

Much better! This is important to understand, as when we start talking about arrays in Bash things start to work a little differently. Consider this example:

items=("an apple" "a banana" "an orange")

/tmp/test.mjs --text "${item[@]}"

Can you guess what process.argv will look like? The result might surprise you:

[
    '/usr/local/lib/node/bin/node',
    '/tmp/test.mjs',
    '--text',
    'an apple',
    'a banana',
    'an orange'
]

Each element of the Bash array has been turned into a separate item - even when we quoted it and the items themselves contain spaces! What's going on here?

In this case, we used [@] when addressing our items Bash array, which causes Bash to expand it like this:

/tmp/test.mjs --text "an apple" "a banana" "an orange"

....so it quotes each item in the array separately. If we forgot the quotes instead like this:

/tmp/test.mjs --text ${item[@]}

...we would get this in process.argv:

[
    '/usr/local/lib/node/bin/node',
    '/tmp/test.mjs',
    '--text',
    'an',
    'apple',
    'a',
    'banana',
    'an',
    'orange'
]

Here, Bash still expands each element separately, but does not quote each item. Because each item isn't quoted, when the command is actually executed, it splits everything a second time!

As a side note, if you want all the items in a Bash array in a single quoted item, you need to use an asterisk * instead of an at-sign @ like so:

/tmp/test.mjs --text "${a[*]}";

....which would yield the following process.argv:

[
    '/usr/local/lib/node/bin/node',
    '/tmp/test.mjs',
    '--text',
    'an apple a banana an orange'
]

With that, we have a set of functions that make whiptail much easier to use. Once it's finished, I'll write a post on my Bash-based cluster host provisioning script and explain my design philosophy behind it and how it works.

Switching from XFCE4 to KDE Plasma

While I use Unity (7.5) and Ubuntu on my main laptop, on my travel laptop I instead use Artix Linux. Recently, I've been experiencing an issue where when I login to the lock screen after resume the device from sleep, I get a black screen.

Rather than digging around endlessly attempting to fix the issue (I didn't even know where to start), I've been meaning to try out KDE Plasma, which is 1 of a number of popular desktop environments available. To this end, I switched from XFCE (version 4) to KDE Plasma (5.24 as of the time of typing). this ultimately did end up fixing my issue (my travel laptop would win a prize for the most unusual software setup, as it originated as a Manjaro OpenRC machine).

Now that I've completed that switch (I'm typing this now in Atom running in the KDE Plasma desktop environment!), I thought I'd write up a quick post about the two desktop environments and my first impressions of KDE as compared to XFCE.

(Above: My KDE desktop environment, complete with a desktop background taken from CrossCode. The taskbar is at the top because this is how I had it configured in XFCE.)

The best way I suppose to describe the difference between XFCE and KDE is jumping from your garden pond into the local canal. While XFCE is fairly customisable, KDE is much more so - especially when it comes to desktop effects and the look and feel. I really appreciate the ability to customise the desktop effects to tune them to match what I've previously been used to in Unity (though I still use Unity on both my main laptop and my Lab PC at University) and XFCE.

One such example of this is the workspaces feature. You can customise the number of workspaces and also have them in a grid (just like Unity), which the GNOME desktop that comes with Ubuntu by default doesn't allow for. You can even tune the slide animation between desktops which I found helpful as the default animation was too slow for me.

It also has an enormous library of applications that complement the KDE desktop environment, with everything from your staples such as the terminal, an image viewer, and a file manager to more niche and specialised use-cases like a graph calculator and a colour contrast checker. While these can of course be installed in other desktop environments, it's cool to see such an expansive suite of programs for every conceivable use-case right there.

Related to this, there also appears to be a substantial number of widgets that you can add to your desktop. Like XFCE, KDE has a concept of panels which can hold 1 or more widgets in a line. This is helpful for monitoring system resources for example. While these are for the most part just as customisable as the main desktop environment, I wish that their dependencies were more clearly defined. On more than 1 occasion I found I was missing a dependency for some widget to work that wasn't mentioned in the documentation. upowerd is required for the battery indicator to work (which wasn't running due to a bug caused by a package name change from the great migration of Manjaro back in 2017), and the plasma-nm pacman package is required for the network / WiFi indicator to work, but isn't specified as a dependency when you install the plasma-desktop package. Clearly some work is needed in this area (though, to be fair, as I mentioned earlier I have a very strange setup indeed).

I'm continuing to find 1000 little issues with it that I'm fixing 1 by 1 - just while writing this post I found that dolphin doesn't support jumping to the address bar if you start typing a forward slash / (or maybe it was another related issue? I can't remember), which is really annoying as I do this all the time - but this is a normal experience when switching desktop environments (or, indeed, machines) - at least for me.

On the whole though, KDE feels like a more modern take on XFCE. With fancier graphics and desktop effects and what appears to be a larger community (measuring such things can be subjective though), I'm glad that I made the switch from XFCE to KDE - even if it was just to fix a bug at first (I would never have considered switching otherwise). As a desktop environment, I think it's comfortable enough that I'll be using KDE on a permanent basis on my travel laptop from now on.

Hackathon in AI for Sustainability 2022

The other week, I took part in the Hackathon in AI for Sustainability 2022. While this was notable because it was my first hackathon, what was more important was that it was partially based on my research! For those who aren't aware, I'm currently doing a PhD at the University of Hull with the project title "Using Big Data and AI to Dynamically Predict Flood Risk". While part of it really hasn't gone according to plan (I do have a plan to fix it, I just need to find time to implement it), the second half of my project on social media has been coming together much more easily.

To this end, my supervisor asked me about a month ago whether I wanted to help organise a hackathon, so I took the plunge and said yes. The hackathon has 3 projects for attendees to choose from:

Project 1: Hedge identification from earth observation data with interpretable computer vision algorithms
Project 2: Monopile fatigue estimation from nonlinear waves using deep learning
Project 3: Live sentiment tracking during floods from social media data (my project!)

When doing research, I've found that there are often many more avenues to explore than there is time to explore them. To this end, a hackathon is an ideal time to explore these avenues that I have not had the time to explore previously.

To prepare, I put together some dataset of tweets and associated images - some from the models I've actually trained, and others (such as one based on the hashtag #StormFranklin) that I downloaded specially for the occasion. Alongside this, I also trained and prepared a model and some sample code for students to use as a starting point.

On the first day of the event, the leaders of the 3 projects presented the background and objectives of the 3 projects available for students to choose from, and then we headed to the lab to get started. While unfortunate technical issues were a problem for all 3 projects, we managed to find ways to work around them.

Over the next few days, the students participating in the hackathon tackled the 3 projects and explored different directions. At first, I wasn't really sure about what to do or how to help the students, but I soon started to figure out how I could assist students by explaining things, helping them with their problems, fetching and organising more data, and other such things.

While I can't speak for the other projects, the outputs of the hackathon for my project are fascinating insights into things I haven't had time to look into myself - and I anticipate that we'll be may be able to draw them together into something more formal.

Just some of the approaches taken in my project include:

Automatically captioning images to extract additional information
Using other sentiment classification models to compare performance
- VADER: A rule-based model that classifies to positive/negative/neutral
- BART: A variant of BERT
Resolving and inferring geolocations of tweets and plotting them on a map, with the goal of increasing relevance of tweets

The outputs of the hackathon have been beyond my wildest dreams, so I'm hugely thankful to all who participated in my project as part of the hackathon!

While I don't have many fancy visuals to show right now, I'll definitely keep you updated with progress on drawing it all together in my PhD Update blog post series.

Creating a 3D Grid of points in Blender 3.0

In my spare time, one of the things I like to play with is rendering stuff in Blender. While I'm very much a beginner and not learning Blender professionally, it is a lot of fun to play around it!

Recently, Blender has added geometry nodes (which I alluded to in a previous post), which are an extremely powerful way of describing and creating geometry using a node-based system.

While playing around with this feature, I wanted create a 3D grid of points to instance an object onto. When I discovered that this wasn't really possible, I set to work creating my own node group to do the job, and I thought I'd quickly share it here.

First, here's a render I threw together demonstrating what you can do with this technique:

(Above: Coloured spheres surrounded by sparkles)

The above is actually just the default cube, just with a geometry shader applied!

The core of the technique is a node group I call Grid3D. By instancing a grid at a 90° angle on another grid, we can create a grid of points:

(Above: The Grid3D node group)

The complicated bit at the beginning is me breaking out the parameters in a way that makes it easier to understand on the outside of the node - abstracting a lot of the head scratching away!

Since instancing objects onto the grid is by far my most common use-case, I wrapped the Grid3D node group in a second node group called Grid3D Instance:

This node group transfers all the parameters of the inner Grid3D node group, but also adds a new position randomness vector parameter that controls by how much each instance is translated (since I couldn't find a way to translate the points directly - only instances on those points) on all 3 axes.

(Above: instanced cubes growing and shrinking)

Now that Blender 3.1 has just come out, I'm excited to see what more can be done with the new volumetric point cloud functions in geometry nodes - which may (or may not, I have yet to check it out) obsolete this method. Still, I wanted to post about it anyway for my own future reference.

Another new feature of Blender 3.1 is that node groups can now be marked as assets, so here's a sample blender file you can put in your assets folder that contains my Grid3D and Grid3D Instance node groups:

https://starbeamrainbowlabs.com/blog/images/20220326-Grid3D.blend

A learning experience | AAAI-22 in review

Hey there! As you might have guessed, it's time for my review of the AAAI-22 conference(?) (Association for the Advancement of Artificial Intelligence) I attended recently. It's definitely been a learning experience, so I think I've got my thoughts in order in a way that means I can now write about them here.

Attending a conference has always been on the cards - right from the very beginning of my PhD - but it's only recently that I have had something substantial enough that it would be worth attending one. To this end, I wrote a 2 page paper last year and submitted it to the Doctoral Consortium, which is a satellite event that takes place slightly before the actual AAAI-22 conference. To my surprise I got accepted!

Unfortunately in January AAAI-22 was switched from being an in-person conference to being a virtual conference instead. While I appreciate and understand the reasons why they made that decision (safety must come first, after all), it made some things rather awkward. For example, the registration form didn't mention a timezone, so I had to reach out to the helpdesk to ask about it.

For some reason, the Doctoral Consortium wanted me to give a talk. While I was nervous beforehand, the talk itself seemed to go ok (even though I forgot to create a slide somewhere in the middle) - people seemed to find the subject interesting. They also assigned a virtual mentor to me as well, who was very helpful in checking my slide deck for me.

The other Doctoral Consortium talks were also really interesting. I think the one that stood out to me was "AI-Driven Road Condition Monitoring Across Multiple Nations" by Deeksha Arya, in which the presenter was using CNNs to detect damage to roads - and found that a model trained on data from 1 country didn't work so well in another - and talked about ways in which they were going to combat the issue. The talk on "Creating Interpretable Data-Driven Approaches for Tropical Cyclones Forecasting" by Fan Meng also sounded fascinating, but I didn't get a chance to attend on account of their session being when I was asleep.

As part of the conference, I also submitted a poster. I've actually done a poster session before, so I sort of knew what to expect with this one. After a brief hiccup and rescheduling of the poster session I was part of, I got a 35 minute slot to present my poster, and had some interesting conversations with people.

Technical issues were a constant theme throughout the event. While the Doctoral Consortium went well on Zoom (there was a last minute software change - I'm glad I took the night before to install and check multiple different video conferencing programs, otherwise I wouldn't have made it), the rest of the conference wasn't so lucky. AAAI-22 was held on something called VirtualChair / Gather.town, which as it turned out was not suited to the scale of the conference in question (200 people in each room? yikes). I found myself with the seemingly impossible task of using a website that was so laggy it was barely usable - even on my i7-10750H I bought back in 2020. While the helpdesk were helpful and suggested some things I could try, nothing seemed to help. This severely limited the benefit I could gain from the conference.

At times, there were also a number of communication issues that made the experience a stressful one. Some emails contradicted each other, and others were unclear - so I had to email the organisers at multiple points to request clarification. The wording on some of the forms (especially the registration form) left a lot to be desired. All in all, this led to a very large number of wasted hours figuring things out and going back and forth to resolve confusion.

It also seemed as though everyone appeared to assume that I knew how a big conference like this worked and what each event was about, when this was not the case. For example, after the start of the conference I received an email saying that they hoped I'd been enjoying the plenary sessions, when I didn't know that plenary sessions existed, let alone what they were about. Perhaps in future it would be a good idea to to distribute a beginner's guide to the conference - perhaps by email or something.

For future reference, my current understanding of the different events in a conference is as follows:

Doctoral Consortium: A series of talks - perhaps over several sessions - in which PhD students submit a 2 page paper in advance and then present their projects.
Workshop: A themed event in which a bunch of presenters submit longer papers and talk about their work
Tutorial: In which the organisers deliver content centred around a specific theme with the aim of educating the audience on a particular topic
Plenary session: While workshops and tutorials may run in parallel, plenary sessions are talks at a time when everyone can attend. They are designed to be general enough that they are applicable to the entire audience.
Poster session: A bunch of people create a poster about their research, and all of these posters are put up in a room. Then, researchers are designated specific sessions in which they stand by their poster and people come by and chat with them about their research. At other times, researchers are free to browse other researchers' papers.

Conclusion

Even though the benefit from talks, workshops, and other activities at the conference directly has been extremely limited due to technical, communication, and timezoning issues, the experience of attending this conference has been a beneficial one. I've learnt about how a conference is structured, and also had the chance to present my research to a global audience for the first time!

In the future, I hope that I get the chance to attend my first actual conference as I feel I'm much better prepared, and have a better understanding as to what I'm getting myself in for.

systemquery, part 2: replay attack

Hey there! As promised I'll have my writeup about AAAI-22, but in the meantime I wanted to make a quick post about a replay attack I found in my systemquery encryption protocol, and how I fixed it. I commented quickly about this on the last post in this series, but I thought that it warranted a full blog post.

In this post, I'm going to explain the replay attack in question I discovered, how replay attacks work, and how I fixed the replay attack in question. It should be noted though that at this time my project systemquery is not being used in production (it's still under development), so there is no real-world impact to this particular bug. However, it can still serve as a useful reminder as to why implementing your own crypto / encryption protocols is a really bad idea.

As I explained in the first blog post in this series, the systemquery protocol is based on JSON messages. These messages are not just sent in the clear though (much though that would simplify things!), as I want to ensure they are encrypted with authenticated encryption. To this end, I have devised a 3 layer protocol:

Objects are stringified to JSON, before being encrypted (with a cryptographically secure random IV that's different for every message) and then finally packaged into what I call a framed transport - in essence a 4 byte unsigned integer which represents the length in bytes of the block of data that immediately follows.

The encryption algorithm itself is provided by tweetnacl's secretbox() function, which provides authenticated encryption. It's also been independently audited and has 16 million weekly downloads, so it should be a good choice here.

While this protocol I've devised looks secure at first glance, all is not as it seems. As I alluded to at the beginning of this post, it's vulnerable to a reply attack. This attack is perhaps best explained with the aid of a diagram:

Let's imagine that Alice has an open connection to Bob, and is sending some messages. To simplify things, we will only consider 1 direction - but remember that in reality such a connection is bidirectional.

Now let's assume that there's an attacker with the ability listen to our connection and insert bogus messages into our message stream. Since the messages are encrypted, our attacker can't read their contents - but they can copy and store messages and then insert them into the message stream at a later date.

When Bob receives a message, they will decrypt it and then parse the JSON message contained within. Should Bob receive a bogus copy of a message that Alice sent earlier, Bob will still be able to decrypt it as a normal message, and won't be able to tell it apart from a genuine message! Should our attacker figure out what a message's function is, they could do all kinds of unpleasant things.

Not to worry though, as there are multiple solutions to this problem:

Include a timestamp in the message, which is then checked later
Add a sequence counter to keep track of the ordering of messages

In my case, I've decided to go with the latter option here, as given that I'm using TCP I can guarantee that the order I receive messages in is the order in which I sent them. Let's take a look at what happens if we implement such a sequence counter:

When sending a message, Alice adds a sequence counter field that increments by 1 for each message sent. At the other end, Bob increments their sequence counter by 1 every time they receive a message. In this way, Bob can detect if our attacker attempts a replay attack, because the sequence number on the message they copied will be out of order.

To ensure there aren't any leaks here, should the sequence counter overflow (unlikely), we need to also re-exchange the session key that's used to encrypt messages. In doing so, we can avoid a situation where the sequence number has rolled over but the session key is the same, which would give an attacker an opportunity to replay a message.

With that, we can prevent replay attacks. The other thing worth mentioning here is that the sequence numbering needs to be done in both directions - so Alice and Bob will have both a read sequence number and a write sequence number which are incremented independently of one another whenever they receive and send a message respectively.

Conclusion

In this post, we've gone on a little bit of a tangent to explore replay attacks and how to mitigate them. In the next post in this series, I'd like to talk about the peer-to-peer swarming algorithm I've devised - both the parts thereof I've implemented, and those that I have yet to implement.

Sources and further reading

Authenticated Encryption
tweetnacl on npm
Characters in cryptography
My first published paper at AAAI-22 Doctoral Consortium!
- More on this in the next blog post :D

PhD Update 12: Is it enough?

Hey there! It's another PhD update blog post! Sorry for the lack of posts here, the reason why will become apparent below. In the last post, I talked about the AAAI-22 Doctoral Consortium conference I'll be attending, and also about sentiment analysis of both tweets and images. Before we talk about progress since then, here's a list of all the posts in this series so far:

As in all the posts preceding this one, none of the things I present here are finalised and are subject to significant change as I double check everything. I can think of no better example of this than the image classification model I talked last time - the accuracy of which has dropped from 75.6% to 60.7% after I fixed a bug....

AAAI-22 Doctoral Consortium

The most major thing in my calendar in the next few weeks is surely the AAAI-22 Doctoral Consortium. I've been given a complimentary registration to the main conference after I had a paper accepted, so as it turns out the next few weeks are going to be rather busy - as have been the last few weeks preparing for this conference.

Since the last post when I mentioned that AAAI-22 has been moved to be fully virtual, there have been a number of developments. Firstly, the specific nature of the doctoral consortium (and the wider conference) is starting to become clear. Not having been to a conference before (a theme which will come up a lot in this post), I'm not entirely sure what to expect, but as it turns out I was asked to create a poster that I'll be presenting in a poster session (whether this is part of AAAI-22 or the AAAI-22 Doctoral Consortium is unclear).

I've created one with baposter (or a variant thereof given to me by my supervisor some time ago) which I've now submitted. The thought of presenting a poster in a poster session at a conference to lots of people I don't know has been a rather terrifying thought though, so I've been increasingly anxious over this over the past few weeks.

This isn't the end of the story though, as I've also being asked to do a 20 minute presentation with 15 minutes for questions / discussion, the preparations for which have taken perhaps longer than I anticipated. Thankfully I've had prior experience presenting at my department's PGR seminars previously (which have also been online recently), so it's not as daunting as would otherwise be, but as with the poster there's still the fear of presenting to lots of people I don't know - most of which probably know more about AI than I do!

Despite these fears and other complications that you can expect from an international conference (timezones are such a pain sometimes), I'm looking forward to seeing what other people done, and also hoping that the feedback I get from my presentation and poster isn't all negative :P

Sentiment analysis: a different perspective

Since the last post, a number of things have changed in my approach to sentiment analysis. The reason for this is - as I alluded to earlier in this post - after I fixed a bug in my model that predicts the sentiment of images from twitter, causing the validation accuracy to drop from 75.6% to 60.7%. As of now, I have the following models implemented for sentiment analysis:

Text sentiment prediction
- Status: Implemented, and works rather well actually - top accuracy is currently 79.6%
- Input: Tweet text
- Output: Positive/negative sentiment
- Architecture: Transformer encoder (previously: LSTM)
- Labels: Emojis from text [manually sorted into 2 categories]
Image sentiment prediction
- Status: Implemented, but doesn't work very well
- Input: Images attached to tweets
- Output: Positive/negative sentiment
- Architecture: ResNet50 (previously: CCT, but I couldn't get it to work)
- Labels: Positive/negative sentiment predictions from model #1
Combined sentiment prediction
- Status: Under construction
- Input: Tweet text, associated image
- Output: Positive/negative sentiment
- Architecture: CLIP → a few dense layers [provisionally]
- Labels: Emojis from text, as in model #1

Not mentioned here of course is my rainfall radar model, but that's still waiting for me to return to it and implement a plan I came up with some months ago to fix it.

Of particular note here is the new CLIP-based model. After analysing model #2 (image sentiment prediction) and sampling some images from each category on a per-flood basis, it soon became clear that the output was very noisy and wasn't particularly useful, and combined with it's rather low performance means that I'm pretty much considering it a failed attempt.

With this in mind, after my supervisor suggested I looked into CLIP. While I mentioned it in the paper I've submitted to AAAI-22, until recently I haven't had a clear picture of how it would be actually useful. As a model, CLIP trains on text-image pairs, and learns to identify which image belongs to which text caption. To do this, it has 2 encoders - 1 for the text, and 1 for the associated image.

You can probably guess where this is going, but my new plan here is to combine the text and images of the tweets I have downloaded into 1 single model rather than 2 separate ones. I figure that I can potentially take advantage of the encoders trained by the CLIP models to make a prediction. If I recall correctly, Ive read a paper that has done this before with tweets from twitter - just not with CLIP - but unfortunately I can't locate the paper at this time (I'll edit this post if I do find it in the future - if you know which paper I'm talking about please do leave a DOI link in the comments below).

At the moment, I'm busy implementing the code to wrap the CLIP model and make it suitable for my specific learning task, but this is as of yet incomplete. As noted in the architecture above, I plan on concatenating the output from the CLIP encoders and passing it through a few dense layers (as the [ batch_size, concat, dim ] tensor is not compatible with a transformer, which operates on sequences), but I'm open to suggestions - please do comment below.

My biggest concern here is that the tweet text will not be enough like an image caption for CLIP to produce useful results. If this is the case, I have some tricks up my sleeve:

Extract any alt text associated with images (yes, twitter does let you do this for images) and use that instead - though I can't imagine that people have captioned images especially often.
Train a new CLIP model from scratch on my dataset - potentially using this model architecture - this may require quite a time investment to get the model working as intended (Tensorflow can be confusing and difficult to debug with complex model architectures).

Topic analysis

Another thing I've explored is topic analysis using gensim's LDAModel. Essentially, as far as I can tell LDA is an unsupervised algorithm that groups the words found in the source input documents into a fixed number of related groups, each of which contains words which are often found in close proximity to one another.

As an example, if I train an LDA model for 20 groups, I get something like the following for some of those categories:

place   death   north   evacu   rise    drive   hng coverag toll    cumbria
come    us  ye  issu    london  set line    offic   old mean
rescu   dead    miss    uttarakhand leav    bbc texa    victim  kerala  defenc
flashflood  awai    world   car train   make    best    turn    school  boat
need    town    problem includ  effect  head    assist  sign    philippin   condit

(Full results may be available upon request, if context is provided)

As you can tell, the results of this are mixed. While it has grouped the words, the groups themselves aren't really what I was hoping for. The groups here seem to be quite generic, rather than being about specific things (words like rescu, dead, miss, and victim in a category), and also seem to be rather noisy (words like "north", "old", "set", "come", "us").

My first thought here was that instead of training the model on all the tweets I have, I might get better results if I train it on just a single flood at once - since some categories are dominated by flood-specific words (hurrican hurricaneeta, hurricaneiota for example). Here's an extract from the results of that for 10 categories for the #StormDennis hashtag:

warn    flood   stormdenni  met issu    offic   water   tree    high    risk
good    hope    look    love    i’m game    morn    dai yellow  nice
ye  lol head    enjoi   tell    book    got chanc   definit valentin
mph gust    ireland coast   wors    wave    brace   wow doesn’t photo

(Again, full results may be available on request if context is provided)

Again, mixed results here - and still very noisy (though I'd expect nothing else from social media data), though it is nice to see something like gust, ireland, and coast there in the bottom category.

The goal here was to attempt to extract some useful information about which places have been affected and by how much by combining this with the sentiment analysis (see above), but my approach here doesn't seem to have captured what I intended. Still, it does seem that on a per-category basis (for all tweets) there is some difference in sentiment on a per-category basis:

(Above: A char of the sentiment of each LDA topic, using the 20 topic model that was trained on all available tweets.)

It seems as if there's definitely something going on here. I speculate that positive words are more often than not used near other positive words, and so categories are more likely to skew to 1 extreme or another - though acquiring proof of this theory would likely require a significant time investment.

While LDA topic analysis was an interesting diversion, I'm not sure how useful it is in this context. Still, it's a useful thing to have in my growing natural language processing toolkit (how did this happen) - and perhaps future research problems will benefit from it more.

Moving forwards, I could imagine instead of doing LDA topic analysis it might be beneficial to group tweets by place and perhaps run some sentiment analysis on that instead. This comes with it's own set of problems of course (especially pinning down / inferring the location of the tweets in question), but this is not an insurmountable problem given strategies such as named entity recognition (which the twitter Academic API does for you, believe it or not).

Conclusion

While most of my time has been spent on preparing for the AAAI-22 conference, I have managed to do some investigating into the twitter data I've downloaded. Unfortunately, not all my methods have been a success (image sentiment analysis, LDA topic analysis), but these have served as useful exercises for both learning new techniques and understanding the dataset better.

Moving forwards, I'm going to implement a new CLIP-based model that I'm hoping will improve accuracy over my existing models - though I'm somewhat apprehensive that tweet text won't be descriptive enough for CLIP to produce a useful output.

With the AAAI-22 conference happening next week, I'll be sure to write up my experiences and post about the event here. I'm just hoping that I've done enough to make attending a conference like this worth it, and that what I have done is actually interesting to people :-)

Sources and further reading

gensim
AAAI-22 Doctoral Consortium (event is timezoned to UTC-8, registration is required)
AAAI-22
twitter-academic-downloader
Twitter Academic Research track
Deep Residual Learning for Image Recognition - the ResNet paper
- Open Access: https://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/He_Deep_Residual_Learning_CVPR_2016_paper.pdf
CLIP
- code

mutate-a-word!

As a programmer, one of the things that I find most inspiring about programming is that when I have an idea for a digital thing, chances are I have the programming skills to make my dream a reality.

Such is the story behind my latest quick creation: mutate-a-word! I often find naming things difficult, so a number of years ago I built a thing that combines 1 or more words in different ways. I think I've lost it now (it was a long time ago before I started using git), but the other day I had an idea for a similar but different thing that iteratively mutates a given starting word using user input.

With the idea in hand, it didn't take me long to put together a quick web-based project, and mutate-a-word was born!

You can find it here: https://starbeamrainbowlabs.com/labs/mutate-a-word/

A screenshot of mutate-a-word in action

Enter a word in the box, and 3 suggestions will show below it. Then, click on the suggestion that you like best and a new row based on the word you liked will appear beneath it.

When mutating, some basic rules are currently followed:

10% chance to add a random letter
10% chance to remove a random letter
80% to mutate a letter.

When mutating a letter, vowels are only ever replaced with other vowels and consonants are only ever replaced with other consonants. In the future, I'd like to implement a number of other features:

A linguistic drift algorithm to make mutations easier to pronounce
The ability to manually edit and correct the suggested words to avoid suggestions from getting too crazy

A special mention is due here to Haikei, the generator I used for the waves you see in the background. While it looks like they may end up going freemium at some point in the future, as of now they are completely free and have loads of generators and options for generating blobs, doodads, waves and more for use in the background of your webpages, and the web interface is pretty snazzy too! I'll definitely be using them again for future projects I think.

If you try out the generator and have some feedback, do leave a comment here. Your comments are both motivating and also help me to improve and make it better!

A review of graph / node based logic declaration through Blender

Recently, Blender started their Everything Nodes project. The first output of this project is their fantastic geometry nodes system (debuted in Blender 2.9, and still under development), which allows the geometry of an mesh (and the materials it uses) to be dynamically modified to apply procedural effects - or even declare a new geometry altogether!

I've been playing around with and learning Blender a bit recently for fun, and as soon as I saw the new geometry nodes system in Blender I knew it would enable to powerful new techniques to be applied. In this post, I want to talk more generally about node / graph-based logic declaration, and why it can sometimes make a complex concept like modifying geometry much easier to understand and work with efficiently.

Blender's geometry nodes at work.

(Above: Blender's geometry nodes at work.)

Manipulating 3d geometry carries more inherent complexity than it's 2d counterpart - programs such as Inkscape and GIMP have that pretty much sorted. To this end, Blender supplies a number of tools for editing 3d geometry, like edit mode and a sculpting system. These are powerful in their own right, but what if we want to do some procedural generation? Suddenly these feel far from the right tools for the job.

One solution here is to provide an API reference and allow scripts to be written to manipulate geometry. While blender does this already, it's not only inaccessible to those who aren't proficient programmers but large APIs often come with a steep learning curve (and higher cognitive load) - and it can often often be a challenge to "think in 3d" while programming (I know when I was doing the 3d graphics module at University this took some getting used to!).

In a sense, node based programming systems feel a bit like a functional programming style. Their strength is composability, in that you can quickly throw together a bunch of different functions (or nodes in this case) to get the desired effect. This reduces cognitive load (especially when there's an instantly updating preview available) as I mentioned earlier - which also has the side effect of reducing the barrier to entry.

Blender's implementation

There's a lot to like about Blender's implementation of a node-based editor. The visual cues for both the nodes themselves and the sockets great. Nodes are colour coded to group them by related functionality, and sockets are coloured according to data type. I would be slightly wary of issues with colourblind users though - while it looks like this has been discussed already, it doesn't seem like an easy solution has been implemented yet.

This minor issue aside, in Blender's new geometry nodes feature they have also made use of shape for the sockets to distinguish between single values and values that can change for each instance - which feels intuitive to understand.

When implementing a UI like this - as in API design - the design of the user interface needs to be carefully considered and polished. This is the case for Blender's implementation - and this only became apparent when I tried Material Maker's node implementation. While Material Maker is cool, I encountered a few minor issues which made the UI feel "clunky" when compared to Blender's implementation. For example:

Blender automatically wraps your cursor around the screen when you're scrubbing a value
Material Maker's preview didn't stack correctly underneath thee node graph, leading to visual artefacts

Improvements

Blender's implementation of a node-based editor isn't all perfect though. Now that I've used it a while, I've observed a few frustrations I (and I assume others) have had - starting with the names of nodes. When you're first starting out, it can be a challenge to guess the name of the node you want.

For example, the switch node functions like an if statement, but I didn't immediately think of calling it a switch node - so I had to do a web search to discover this. To remedy this issue, each node could have a number of hidden alias names that are also searched, or perhaps each node has a short description in the selection menu that is also searched.

Another related issue is that nodes don't always do what you expect them to, or you're completely baffled as to what their purpose is in the first place. This is where great documentation is essential. Blender has documentation on every node in all their node editors (shader, compositor, and now geometry), but they don't always give examples as to how each node could be used. It would also be nice to see a short tooltip when I hover over a node's header explaining what it does.

In the same vein, it's also important to ensure a measure of consistency if you have multiple node editors. While this is mostly the case with Blender, I have noticed that a few nodes have different names across the compositing, shading, and geometry nodes workspaces (the switch node), and some straight up don't exist in other workspaces (the curve nodes). This can be the source of both confusion and frustration.

Conclusion

In conclusion, node-based editors are cool, and a good way to present a complex set of options in an easy to understand interface. While we've looked at Blender's implementation of a node-based editor, others do exist such as Material Maker.

Node-based interfaces have limitless possibilities - for example the Web Audio API is graph-based already, so I can only imagine how neat a graphical node-based audio editor could be - or indeed other ideas I've had including a node-based SVG generator (which I probably won't get around to experimenting with for a while).

As a final thought, a node-based flowchart could potentially be a good first introduction to logic and programming. For example, something a bit like Scratch or some other robotics control project - I'm sure something like this exists already.

If you know of a cool node-based interface, do leave a comment below.

systemquery, part 1: encryption protocols

Unfortunately, my autoplant project is taking longer than I anticipated to setup and debug. In the meantime, I'm going to talk about systemquery - another (not so) little project I've been working on in my spare time.

As I've acquired more servers of various kinds (mostly consisting of Raspberry Pis), I've found myself with an increasing need to get a high-level overview of the status of all the servers I manage. At the moment, this need is satisfied by my monitoring system's (collectd, which while I haven't blogged about my setup directly, I have posted about it here and here) web-based dashboard called Collectd Graph Panel (sadly now abandonware, but still very useful):

This is great and valuable, but if I want to ask questions like "are all apt updates installed", or "what's the status of this service on all hosts?", or "which host haven't I upgraded to Debian bullseye yet?", or "is this mount still working", I currently have to SSH into every host to find the information I'm looking for.

To solve this problem, I discovered the tool osquery. Osquery is a tool to extract information from a network of hosts with an SQL-like queries. This is just what I'm looking for, but unfortunately it does not support the armv7l architecture - which most of my cluster currently runs on - thereby making it rather useless to me.

Additionally, from looking at the docs it seems to be extremely complicated to setup. Finally, it does not seem to have a web interface. While not essential, it's a nice-to-have

To this end, I decided to implement my own system inspired by osquery, and I'm calling it systemquery. I have the following goals:

Allow querying all the hosts in the swarm at once
Make it dead-easy to install and use (just like Pepperminty Wiki)
Make it peer-to-peer and decentralised
Make it tolerate random failures of nodes participating in the systemquery swarm
Make it secure, such that any given node must first know a password before it is allowed to join the swarm, and all network traffic is encrypted

As a stretch goal, I'd also like to implement a mesh message routing system too, so that it's easy to connect multiple hosts in different networks and monitor them all at once.

Another stretch goal I want to work towards is implementing a nice web interface that provides an overview of all the hosts in a given swarm.

Encryption Protocols

With all this in mind, the first place to start is to pick a language and platform (Javascript + Node.js) and devise a peer-to-peer protocol by which all the hosts in a given swarm can communicate. My vision here is to encrypt everything using a join secret. Such a secret would lend itself rather well to a symmetrical encryption scheme, as it could act as a pre-shared key.

A number of issues stood in the way of actually implementing this though. At first, I thought it best to use Node.js' built-in TLS-PSK (stands for Transport Layer Security - Pre-Shared Key) implementation. Unlike regular TLS which uses asymmetric cryptography (which works best in client-server situations), TLS-PSK uses a pre-shared key and symmetrical cryptography.

Unfortunately, although Node.js advertises support for TLS-PSK, it isn't actually implemented or is otherwise buggy. This not only leaves me with the issue of designing a encryption protocol, but also:

The problem of transferring binary data
The problem of perfect forward secrecy
The problem of actually encrypting the data

Problem #1 here turned out to be relatively simple. I ended up abstracting away a raw TCP socket into a FramedTransport class, which implements a simple protocol that sends and receives messages in the form <length_in_bytes><data....>, where <length_in_bytes> is a 32 bit unsigned integer.

With that sorted and the nasty buffer manipulation safely abstracted away, I could turn my attention to problems 2 and 3. Let's start with problem 3 here. There's a saying when programming things relating to cryptography: never roll your own. By using existing implementations, these existing implementations are often much more rigorously checked for security flaws.

In the spirit of this, I sought out an existing implementation of a symmetric encryption algorithm, and found tweetnacl. Security audited, it provides what looks to be a secure symmetric encryption API, which is the perfect foundation upon which to build my encryption protocol. My hope is that by simply exchanging messages I've encrypted with an secure existing algorithm, I can reduce the risk of a security flaw.

This is a good start, but there's still the problem of forward secrecy to tackle. To explain, perfect forward secrecy is where should an attacker be listening to your conversation and later learn your encryption key (in this case the join secret), they still are unable to decrypt your data.

This is achieved by using session keys and a key exchange algorithm. Instead of encrypting the data with the join secret directly, we use it only to encrypt the initial key-exchange process, which then allows 2 communicating parties to exchange a session key, which used to encrypt all data from then on. By re-running the key-exchange process to and generating new session keys at regular intervals, forward secrecy can be achieved: even if the attacker learns a session key, it does not help them to obtain any other session keys, because even knowledge of the key exchange algorithm messages is not enough to derive the resulting session key.

Actually implementing this in practice is another question entirely however. I did some research though and located a pre-existing implementation of JPAKE on npm: jpake.

With this in hand, the problem of forward secrecy was solved for now. The jpake package provides a simple API by which a key exchange can be done, so then it was just a case of plugging it into the existing system.

Where next?

After implementing an encryption protocol as above (please do comment below if you have any suggestions), the next order of business was to implement a peer-to-peer swarm system where agents connect to the network and share peers with one another. I have the basics of this implemented already: I just need to test it a bit more to verify it works as I intend.

It would also be nice to refactor this system into a standalone library for others to use, as it's taken quite a bit of effort to implement. I'll be holding off on doing this though until it's more stable however, as refactoring it now would just slow down development since it has yet to stabilise as of now.

On top of this system, the plan is to implement a protocol by which any peer can query any other peer for system information, and then create a command-line interface for easily querying it.

To make querying flexible, I plan on utilising some form of in-memory database that is populated with queries to other hosts based on the tables mentioned in the user's query. SQLite3 is the obvious choice here, but I'm reluctant to choose it as it requires compilation upon installation - and given that I have experienced issues with this in the past, I feel this has the potential to limit compatibility with some system configurations. I'm going to investigate some other in-memory database libraries for Javascript - giving preference to those which are both light and devoid of complex installation requirements (pure JS is best if I can manage it I think). If you know of a pre Javascript in-memory database that has a query syntax, do let me know in the comments below!

As for querying system information directly, that's an easy one. I've previously found systeminformation - which seems to have an API to fetch pretty much anything you'd ever want to know about the host system!

Sources and further reading

JPAKE
npm packages
- tweetnacl
- jpake

Stardust
Blog

Using whiptail for text-based user interfaces

Switching from XFCE4 to KDE Plasma

Hackathon in AI for Sustainability 2022

Creating a 3D Grid of points in Blender 3.0

A learning experience | AAAI-22 in review

Conclusion

systemquery, part 2: replay attack

Conclusion

Sources and further reading

PhD Update 12: Is it enough?

AAAI-22 Doctoral Consortium

Sentiment analysis: a different perspective

Topic analysis

Conclusion

Sources and further reading

mutate-a-word!

Links

A review of graph / node based logic declaration through Blender

Blender's implementation

Improvements

Conclusion

Further reading

systemquery, part 1: encryption protocols

Encryption Protocols

Where next?

Sources and further reading

Stardust Blog

Tag Cloud

Using whiptail for text-based user interfaces

Switching from XFCE4 to KDE Plasma

Hackathon in AI for Sustainability 2022

Creating a 3D Grid of points in Blender 3.0

A learning experience | AAAI-22 in review

Conclusion

systemquery, part 2: replay attack

Conclusion

Sources and further reading

PhD Update 12: Is it enough?

AAAI-22 Doctoral Consortium

Sentiment analysis: a different perspective

Topic analysis

Conclusion

Sources and further reading

mutate-a-word!

Links

A review of graph / node based logic declaration through Blender

Blender's implementation

Improvements

Conclusion

Further reading

systemquery, part 1: encryption protocols

Encryption Protocols

Where next?

Sources and further reading

Stardust
Blog