PhD Update 17: Light at the end of the tunnel
Wow..... it's been what, 5 months since I last wrote one of these? Oops. I'll do my best to write them at the proper frequency in the future! Things have been busy. Before I talk about what's been happening, here's the ever-lengthening list of posts in this series:
- PhD Update 1: Directions
- PhD Update 2: The experiment, the data, and the supercomputers
- PhD Update 3: Simulating simulations with some success
- PhD Update 4: Ginormous Data
- PhD Update 5: Hyper optimisation and frustration
- PhD Update 6: The road ahead
- PhD Update 7: Just out of reach
- PhD Update 8: Eggs in Baskets
- PhD Update 9: Results?
- PhD Update 10: Sharing with the world
- PhD Update 11: Answers to our questions
- PhD Update 12: Is it enough?
- PhD Update 13: A half complete
- PhD Update 14: An old enemy
- PhD Update 15: Finding what works when
- PhD Update 16: Realising the possibilities of the past
As I sit here at the very bitter end of the very last day of a long but fulfilling semester, I'm feeling quite reflective about the past year and how things have gone on my PhD. One of these posts is definitely long overdue.
Timescales
Naturally the first question here is about timescales. "What happened?" I hear you ask. "I thought you said you were aiming for intent to submit September 2023 for December 2023 finish?"
Well, about that.......
As it turns out, spending half of one's week working as Experimental Officer throws off one's estimation of how much work they do. To this end, it's looking more likely that I will be submitting my thesis in early-mid semester 2 this year. In other words, that's around about March 2024 time - give or take a month or two.
After submission the next step will be my viva. Hoping I pass, it's then likely followed by corrections that must be completed based on the feedback from the viva.
What is a viva though? From what I understand, it is an oral exam in which you, your primary supervisor, and 2 examiners comb through your thesis with a fine toothcomb and ask you lots of questions. I've heard it can take several hours to complete. While the standard is to have 1 examiner be chosen internally from your department / institute and one to be chosen externally (chosen by your primary supervisor), in my case I will be having both chosen from external sources as I am now a (part-time) staff member in the Department of Computer Science at the University of Hull (my home institution).
While it's still a little ways out yet, I can't deny that the thought of my viva is making me rather nervous - having everything I've done over the past 4.5 years scrutinised by completely unknown people. In a sense, it feels like once it is time for my viva, there will be nothing more I can do. I will either know the answers to their questions.... or I will not.
Writing
As you might have guessed by now, writing has been the name - and, indeed, aim - of the game since the last post in this series. Everything is coming together rather nicely. It's looking like I'm going to end up with the following structure:
- Introduction (not written*)
- Background (almost there! currently working on this)
- Rainfall radar for 2d flood forecasting (needs expanding)
- Social media sentiment analysis (done!)
- Conclusion
- Acknowledgements, Appendices, etc
- Dictionary of terms; List of acronyms (grows organically as I write - I need to go through and make sure I
\glsall the terms I've added later) - Bibliography (currently 27 pages and counting O.o)
- Technically I have written it, it's just outdated and very bad and needs throwing out the window of the tallest building I can find. Rewrite is pending - see below.

(Above: A sneak preview of my thesis PDF. I'm writing in LaTeX - check out my templates with the University of Hull reference style here! Evidently the pictured section needs some work.....)
I've finished the chapter on social media work, barring some minor adjustments I need to apply to ensure consistency. My current focus is the background chapter. This is most of the way there, but I need some more detail in several sections so I'm working my way through them one at a time. This is resulting a bunch more reading (especiall for vision-based water detection via satellite data), so this is taking some time.
Once I've wrapped up the background section, it will be time to turn my attention to the content chapter #2: Rainfall radar for 2d flood forecasting. Currently, it sits at halfway between a conference paper (check it out! You can read it now, though a DOI is pending and should be available after the conference) and a thesis chapter - so I need to push (pull? drag?) it the rest of the way to the finish line. This will primarily entail 2 things:
- Filling out the chapter-specific related works, which are currently rather brief given space and time limitations in a conference paper
- Elaborating on things like the data preprocessing, experiments, discussion, etc.
This will also take some time, which together with the background section explains the uncertaincy I still have in my finish date. Once these are both complete, I will be submitting my intent to submit! This will start a 3 month timer, by the end of which I must have submitted my thesis. During this timer period, I will be working on the introduction and conclusion chapters, which I do not expect to take nearly as long as any of the other chapters.
Once I am done writing and have submitted my thesis, I will do everything I can to ensure it is available under an open source licence for everyone to read. I believe strongly in the power of open source (and, open science) to benefit everyone, and want to share everything I've learned with all of you reading this.
At 102 pages A4 single space so far and counting though (not including the aforementioned bibliography), it's a big time investment to read. To this end, I have various publications I've written and posted about here previous that cover most of the stuff I've done (namely the rainfall radar conference paper and social media journal article), and I also want to somehow condense the content of my thesis down into a 'mini-thesis' that's about 3-6 pages ish and post that alongside my main thesis here on my website. I hope that this should provide the broad strokes and a navigation aid for the main document.
Predicting Persuasive Posts
All this writing is going to drive me crazy if I don't do something practical alongside it. Unfortuantely I have long since run out of exuses to run more experiments on my PhD work, so a good friend of mine who is also doing a PhD (they've published this paper) came along at the perfect time the other day asking for some help with a challenge competition submission they want to do. Of course, I had to agree to help out in a support role as the project sounds really interesting1.
The official title of the challenge is thus: Multilingual Detection of Persuasion Techniques in Memes
The challenge is part of SemEval-2024 and it's basically about classifying memes from some social media network (it's unclear which one they are from) as to which persuasion tactic they are employing to manipulate the reader's opinions / beliefs.
The full challenge page is can be found here: https://propaganda.math.unipd.it/semeval2024task4/index.html
We had a meeting earlier this week to discuss, and one of the key problems we identified was that to score challengers they be using posts in multiple unseen languages. To this end, it strikes me that it is important to have multiple languages embedded in the same space for optimal results.
This is not what GloVe does (it embeds them to different 'spaces', so a model trained data in 1 language won't necessarily work well with another) - as I discovered in my demo for the Hull Science Festival - definitely want to write about this in the final post in that series - so as my role in the team I'm going to push a number of different word embeddings through the system I have developed for the aforementioned science demo to identify which one is best for embedding multilingual text. Expect some additional entries to be added to the demo and an associated blog post on my findings very soon!
Currently, I have the following word embedding systems on my list:
- Word2vec
- FastText
- CLIP
- BERT/mBERT
- XLM/XLM-RoBERTa
If you know of any other good word embedding models / algorithms, please do leave a comment below.
It also occurs to me while writing this that I'll have to make sure the multilingual dataset I used for the online demo has the same or similar words translated to every language to rule out any difference in embeddings there.
A nice challenge for the Christmas holidays! My experience of collaborating with other researchers is rather limited at the moment, so I'm looking forward to working in a team to achieve a goal much faster than would otherwise be possible.
Beyond the edge
Something that has been constant nagging presence in my mind and steadily growing is the question of what happens next after my thesis. While the details have not been confirmed yet, once everything PhD-related is wrapped up I will most likely be increasing my hours by some amount such that I work Monday - Friday rather than just Monday - Wednesday lunchtime as I have been doing so far.
This extra time will consist of 2 main activities. To the best of my current understanding, this will include some additional teaching responsibilities - I will probably be teaching a module that lies squarely within 1 of my strong points. It will also, crucially, include some dedicated time for research.
This time for research I believe I will be able to spend on research related activities, including for example collaborating with other researchers, reading papers, designing and running experiments, and writing up results into publication form. Essentially what I've been doing on my PhD, just minus the thesis writing!
Of course, the things I talk about here are not set in stone, and me talking about them here is not a declaration of such.
Either way, I do feel that the technical is a strong point of mine that I am rather passsionate about, so I do desire very much to continue dedicating a significant portion of my energy towards doing practical research tasks.
I'm not sure how much I am allowed to talk about the teaching I will be doing, but do expect some updates on that here on my blog too - however high-level and broad strokesy they happen to be. What kind of teaching-related things would you be interested in being updated about here? Please do leave a comment below.
Talking more specifically, I do have a number of research ideas - one of which I have alluded to above - that I want to explore after my PhD. Most of these are based on what I have learnt from doing my PhD and the logical next steps to analyse complex real-time data sources with a view to extracting and processing information to increase situational awareness in natural disaster scenarios. When I get around to this, I will be blogging about my progress in detail here on my blog.
It should probably be mentioned that I am still quite a long way off actually putting any of these ideas into practice (I would definitely not recommend trusting any predictions my current rainfall radar → binarised water depth model makes in the real world yet!), but if you or someone you know works in the field of managing natural disasters, I would be fascinated to know what you would find most useful related to this - please leave a comment below.
Conclusion
This post has ended up being a lot longer than I expected! I've talked about my current writing progress, a rather interesting side-project (more details in a future blog post!), and initial conceptual future plans - both researchy and otherwise.
While my thesis is drawing close to completion (relatively, at least), I hope you will join me here beyond the end of this long journey that is almost at an end. As one book closes, so does another one open. A new journey is / will be only just beginning - one I can't wait to share with everyone here in future blog posts.
If you've got any thoughts, it would be cool if you could share them below.
-
It goes without saying, but I won't let it impact my writing progress. I divide my day up into multiple slices - one of which is dedicated to focused PhD work - and I'll be pulling from a different slice of time other than the one for my PhD writing to help out with this project. ↩
Reddit






