Reimplementing touchpad gliding, the pleb way: The first attempt

I’ve done the long ass rant in my previous post so I can focus on my road to solution in this one. Long story short:

  • I got a touchpad
  • I have multiple monitors, which means I have to drag my finger across the touchpad a lot
  • Cursor gliding is a solution
  • (Linux) drivers don’t do cursor gliding. They do coasting, but they’re shit at it.
  • I want to have that feature, but nobody’s going to write it for me.

We have a challenge, then.

What is cursor gliding?

Coasting a feature that was around as far as 10 years ago, but seems to have disappeared. The gist of it is: if you move the cursor fast enough with your touchpad, the cursor will continue moving even after you have lifted your finger. Since picure is worth a thousand words (and video a thousand more), here’s a quick demonstration:

Notice how cursor remains in movement even after the finger no longer touches the touchpad.

Let’s get it on

I am, of course, not very familiar with C. I know some basics, but that’s about it. Writing proper drivers is beyond my skillset, and it sounds complicated. I don’t like complicated, nor do I want to spend time learning stuff for just one projects. I want a quick solution, so “proper driver” way route is … it can’t go.

I have had some previous experience with xev and xinput, though (back from when I tried to hack together support for certain keyboard keys that weren’t supported under Linux). As it turns out, xinput has a debug mode that will print all events emitted by a given input device — and we’ll be using that.

I have tried to find a proper-er way to intercept movement events, but I ultimately haven’t found any. There was one C library that appears to be broken for the past few years, and then there’s Xlib. I’m sure that if I spent some time reading the documentation, I could implement this proper with it. I just don’t want to lose a week doing so.

I don’t get it either.

Piping xinput output to our program it is, then.

Scouting ahead

We still need Xlib to actually move the pointer (and get pointer position, since xinput tells us only how far the cursor has moved, but not its position). Scouting ahead, the pointer can be moved with XWarpPointer(). That function requires absolute coordinates of where the mouse cursor is to be moved, and this will be important later.

Parsing the data

As it turns out, piping output of one program to the output of another is fairly easy in C (although finding some examples of how to do that turned out to be a bit harder). You fork the program and in one of the forks. In the first fork, it takes about five lines of code to set everything up:

In the main thread, receiving said stream in something that can be parsed with getline() isn’t hard, either:

Things get a little bit more complicated from this point forward. We have to write a parser that will parse whatever xinput gives us. This is less easy, because goodies that other languages give us — such as trim() and startsWith() — do not exist in C. Fortunately for us, stackoverflow exists and provides us with copy-pasteable solutions which are ready for the taking.

With that in mind, we can continue with the main loop that will parse our events. Before we do anything at all, though, we need to look at the output we’re going to parse:

EVENT type 17 (RawMotion)
     device: 12 (12)
     detail: 0
     flags: 
     valuators:
           0: -1.93 (-2.00)
           1: -1.93 (-2.00)

The first line tells which event xinput gave us, so we ensure it matches EVENT type 17 (RawMotion). Then we ignore the next three lines. The data we want is below the valuators: line, which we have to parse. It’s a fairly straightforward affair. Once the valuators section is over, we start moving the cursor.

The code so far.

Don’t t(h)read on me

Now that we have a place where we can start moving the cursor, we want to start think about moving it. We don’t actually want to start cursor on every movement, just when we lift the finger — which means that we need to define what “lifting a finger” means for us.

Turns out that the only way to detect that finger was lifted is to wait a bit and see if any other movement events happen while we wait. This is a problem: if we just throw a usleep(<50 ms>) into our code, then our program will first wait 50 milliseconds, decide that it’s been more than 50 milliseconds since the last event, move the cursor and only then process the next event. Solution? Threads.

Threads are one of the few things that are actually easier in C than in other languages (cough javascript1okay I know, javascript can technically run in multiple threads via workers. Let’s not talk about them, cough), and they’re a pretty nice. Having multiple threads access to all the same threads is just what we need. We add some locks on parts where different threads access the same variables to avoid any unplanned behavior and we’re off! I even managed to luck out on avoiding any issues that careless usage of threads or locks may cause — a nice feat for someone who hasn’t touched C in a while.

With parsing and threads sorted out, we can finally start moving the cursor.

I like to move it

This is not a rocket science, either, but it did come with some trial and error attached. You start by calculating the speed at which the pointer moves. There’s two ways to do handle speed:

  • track speed and angle for the pointer
  • track speed of pointer along each axis

We don’t need to know absolute speed of the pointer, neither do we need to know the angle. More importantly, moving the mouse requires us to know x/y coordinates of where we want to move our mouse, which makes the second option the more sensible one.

We can’t just determine the speed from the final amount of movement we received from xinput. Most notably, since we don’t know when the finger was lifted, we cannot rely on the last measurement of movement to be accurate. For this reason, we keep a record of the last few measurements and — when end of finger movement was detected — average them. This will also help with the fact that touchpad may produce noisy data: calculating average speed from last few samples will help us to ensure that the speed at which the pointer continues to glide across the screen about as fast as we’d expect.

Moving the pointer is easy enough — every 1/60th of a second, we update the cursor position. This is simple enough: we store the last position (preferably in a float or double) and increment it by the movement speed. We also don’t want the mouse pointer to scroll forever, so we decrease it by some amount after every update. After the cursor speed has been reduced to near-zero, we can also stop moving the mouse and exit the thread since the speed is now too insignificant to actually matter.

Tada we’re done. Time to push it to github and call it a day. Thxbai we’re done here.

But then thing stops working 20 minutes down the line.

I hate C.

And memory leaks aren’t all — as I tweak around with touchpad sensitivity settings, it quickly realize that, at the sensitivity I find ideal, the diagonal movement of finger across the touchpad results in a very pronounced staircase-shaped movement of the cursor.

I guess it’s time to stop coding and start doing warranty returns.

Reimplementing touchpad coasting/gliding, the pleb way: Introduction

Quick note: I initially tried to keep the intro short, but quick history of touchpads ended being not … so quick. This post thus mostly talks about quick history of touchpads, available options and why you might want to get one.

Despite being in just about every laptop, touchpads seem to be a very neglected piece of hardware that nobody seems to use if they don’t absolutely have to. And I sorta get it: even as recently-ish as 10 years ago, touchpads were outright terrible. They often didn’t support gestures beyond two finger scroll (or dedicated scroll area at the edge) — and forget about double tapping to right click, or triple-tapping to middle click. They also tended to be small. This meant that if you had to move your mouse cursor from one end of the screen to the other … well, you’ve got roughly two choices:

  • Move the cursor 10% of the way across the screen, lift the finger and move it back to the other side of the touchpad where you started, move the cursor a little bit again, repeat
  • Set mouse sensitivity and/or mouse acceleration so impractically high you wouldn’t be able to hit those buttons.

Synaptics — one of the more famous names in the touchpad business back then — was aware that both options sucked, big time. Thus, a handy feature was born: coasting. If you moved your finger across on the touchpad, the cursor would continue to move even after you lifted the finger — until you stopped the slide by tapping or moving your finger.

Fortunately, the multi-touch innovation in the world of mobile phones (started with the first iPhone in 2007) eventually ended up trickling down to touchpads as well. We started getting bigger touchpads with multi-touch support — big enough that cursor coasting was no longer necessary for most people.

Depending on what make of laptop and operating system you have, touchpads are still cancer to use. Asus, for example, did not include a way to disable “natural scrolling1which is a retarded feature that should have never existed — but just like every step back Apple “innovated,” others were soon to copy. Removal of 3.5mm audio jacks is possibly the worst trend to date. ” You could, of course, download an updated touchpad driver which provided more options — but the next Windows update will just revert to the previous, useless driver.

This — as well as the fact that most people are used to using a mouse — probably contributes a lot to why touchpads are rarely seen on desktop. A year and a half ago, there were only two-ish viable touchpad models for the desktop that you could throw your money at. One was Apple’s trackpad for waaay too much. The second, more reasonably priced option, was some JellyComb model for about €50 from German amazon.

This thing was huge, by the way. About 7″ diagonal.

Back at the time, I was struggling with wrist pain while using the mouse for about two weeks. I quickly learned that touchpads are great for avoiding wrist strain. I promptly ordered two, as I really don’t want to develop a carpal tunnel.

As you look at the picture above, you might notice a few potential problems with that touchpad models. Most notably, there’s some unneded buttons that you could press by accident. You could disable those, but the button that toggled extra buttons on or off was itself very easy to press by accident. USB-micro connector on the touchpad was also pretty fragile, so both touchpads got broken in less than two years. I attempted to re-soldier, but failed miserably. I had to get new touchpads.

The new models are better: the entire touchpad is a button that you can click and it doesn’t have any soft buttons. But as I started to re-adapt myself to the touchpad life, I really started to miss cursor coasting (and scroll coasting). My main monitor is 34″ (ultrawide), with an extra monitor on either side … Needless to say, there’s no touchpad big enough for that.

State of drivers on Linux

The first thing that I tried to do was to turn on coasting in touchpad settings. But I was in for a rude surprise: there is no coasting option in the driver.

Why isn’t there a coasting option in the driver?

Because Synaptics driver has been deprecated for years and replaced by libinput, and libinput is hot trash. Of multi-touch gestures, it only supports two-finger scrolling, two-finger tap and three-finger tap. If your touchpad supports click, your choices for right-click strategy are “two finger click” or “click in the lower right corner” — pick one. (If I had the chance, I’d enable both of these at once. In some situations, two-finger click is better than clicking in the lower right corner. In others clicking in lower-right corner is preferable — e.g. if you try to drag something with right click).

There is a third alternative: mtrack. Mtrack is great. It mitigates a lot of xinput shittines, but it comes with its own set of downsides. The original repository appears to be unmaintained for the past five years. There’s plenty of unsolved issues and pull requests that await merge or any sort of decision. If you want the most features out of it, it’s best bet is to find an alternative fork. Of those, p2rkw’s branch seems to be the most promising one, with most features and best documentations.

Then there’s configuration: you pretty much have to configure the driver with xinput and a xorg.conf.d file, which is … less than ideal.

But for the price of being a major pain in the ass to setup, you do get some benefits. If your touchpad can be clicked like a button, then you can set mtrack to ignore finger movement when you’re clicking the touchpad on the bottom end. It allows you to both reserve bottom right corner for right click, while still maintaining ‘click with two fingers elsewhere to right click’ functionality. It even allows for scroll coasting (but not cursor gliding), but … it kinda sucks. Scrolling appears to go haywire if you start scrolling before the scroll coast has ended. Duration of the scroll coast is also fixed: it always coasts for 5 seconds, regardless of how fast you scrolled.

I obviously don’t like that. Unfortunately, nobody seems to want to do anything about it. The common mantra in open source world is that “if you don’t like it, do something about it” — and given you’re getting software for free, that seems a reasonably fair deal. The problem is, of course, that not everyone can write software. Even among programmers, not everyone has the experience to write everything. I know jack shit about drivers and the last time I used C (which is what the drivers use) was 4 years ago at the uni.

This means that I don’t really have the qualifications for writing a driver, so I can forget about doing things with mtrack. But with a very limited knowledge of C, I might be able to cobble something up in a different way …

Dealing with keyboards is mini-hell

It’s been a while since I’ve written about programming, but lately I’ve gotten back to working on Ultrawidify. With no major bugs or problems that require immediate fix, I can finally get to work on bugs and features that I’ve been kicking down the line for a while.

Problem of the week are, of course, keyboards (or keyboard layouts).

In javascript, you have roughly two ways of identifying the key the user has pressed. There’s event.keyCode, which gives you some number that’s associated with the key. Pro of using keyCode is that it’s going to be the same for a given key, regardless of keyboard layout. Con of this method is that keyCode is going to be the same for a given key, regardless of keyboard layout (+ a few other downsides).

To elaborate a bit further on the problem: a significant chunk of Europe uses QWERTZ layout. QWERTZ is much like QWERTY layout Americans (and the rest of Europe except France) are using, except ‘Z’ and ‘Y’ trade places. French weren’t content with swapping only two letters and came up with with AZERTY layout instead. Then you also have things like Dvorak and Colemak layouts, because some people insist it makes their typing much faster. As letters trade places around the keyboard with each different layout, the keycodes don’t.

As a result, you can never be sure what letter the end user get from pressing what key. On some layouts, keycode 90 will give you Y, on others Z. This means that if you base your keyboard shortcuts on a QWERTZ layout and have ‘Z’ as a shortcut for anything, French and QWERTY users will wonder why they have to press the Y key, and Dvorak users will just shout at you that keyboard shortcuts don’t work.

If your goal is to have keyboard shortcuts that won’t be flat out wrong for people using a different keyboard layout than your own, then keyCode is not the way. Handling different keyboard layouts is an easy road to code spaghettification. Most importantly, it’s a major pain in the ass. You’ll spend a lot of time on it, but with very little gain.

Fortunately for us, modern javascript provides a solution to that. Events for key handling come with a property event.key, which gives us the character we pressed.

Neat. Now I can just use this property for all my keyboard shortcuts. Since event.key gives us a specific character, I no longer have to pay attention. I can just say “press Z to zoom.” After all, the ‘z’ that event.key gives me is exactly the same, regardless of whether the user uses QWERTZ, QWERTY or something more exotic. Foolproof, isn’t it?

image
Cyka blyat1and yes, I know ‘cyka blyat’ is just a DotA2/CSGO meme and doesn’t actually make sense in russian.

Wrong, sir, wrong.

True, event.key will return the same letter regardless of what keyboard layout you pressed said letter on. However, some letters are unpressable on certain keyboard layouts. If you’re using cyrillic (or anything non-latin), you’ll quickly find that keyboard shortcuts using even the standard ASCII letters no longer work.

Certainly a mild oversight on my part, but in my defence: the only reason I’ve started developing this extension is because at the time, there was no extension for fixing aspect ratio available for Firefox (Chrome did have a fair share of aspect ratio fixers such as Ultrawide Video, but those hadn’t been ported to Firefox until way later) and I really wanted the functionality. And when all you want is a swingset, why build a rollercoaster? The good old days.

But let’s get back to the topic at hand. If we want to fix the Russian problem, we’re in a bit of a tough spot. event.keyCode is starting to look better and better by the minute … except it doesn’t, really.

What can be done?

The options roughly boil down to the following:

1. Use event.keyCode to determine keys.

This option brings a lot of problems. Not only will there be issues with people using non-QWERTZ layouts (unless I spend unreasonable amount of time working on getting around that), using event.keycode would mean I have to rewrite lots of the existing code. More importantly — since all keys have been fully rebindable for a while in extension settings, I would have to decide between writing something that will correctly preserve keyboard shortcuts for existing users (annoying, quick StackOverflow recon didn’t give encouraging results), or reset keyboard shortcuts to default for everyone (easy but rather unacceptable. I don’t want another Nosedive).

2. Use event.keyCode to determine keys for new users, event.key for existing ones

This one offers some benefits over purely keycode solution — I don’t have to write code to port keyboard shortcuts to the new system, I don’t have to wipe settings of existing users. Still has some drawbacks that I don’t like, though — namely, the fact that I’ll have to deal with displayed keyboard shortcuts being wrong for non-QWERTZ keyboard layouts.

3. Keep using event.key and fall back to event.keycode if event.key doesn’t contain an ASCII character

Hey look, this is the quick and lazy solution we’ve been looking for. It’s also dirty, but it’s going to work. Maybe not on custom shortcuts, but we’ll see.

And all will be fine.

Ultrawidify and the Improper Cropping

Yet another day, yet another post about stuff going wrong. This time, I’ve got a bug report that “videos are jumping around” on Facebook and some other pages. I tried to verify the problem … and everything worked fine for me. Then I decided to boot up Windows and there it was — the problem as described. So nice — we have a problem that happens on some operating systems and doesn’t on others, even though that shouldn’t be the case in theory.

But eventually, the issue was reproduced and that’s all that matters. The issue appears very familiar — it has been observed on reddit before.

A video and a player

In a very ELI5 way, every webpage is made out of a bunch of rectangles (layers, elements), one within another. In order to properly crop a video, we must know which of these elements is actually the player (‘player’ element is to our video what picture frame is to a picture), and we need to know which element is the player element. Picking the wrong element can result in extension cropping to little, too much, or moving the video out of the picture altogether.

We can’t just assume that the first element above the video is a player, either: sometimes sites put addiitonal elements between the two. This is why we need ‘guess’ the player element by looking at the size.

Side note: not all extensions use that approach. Some seem to just assume you use a 21:9 monitor and slap a ‘enlarge this element by 1.3’ on the video element. Great and foolproof strategy for fullscreen. Less great for youtube’s theater mode, twitch with chat opened at the side, or non-fullscreen Netflix.

Legacy and technical debt

The code for determining which element is the player element has some weird quirks thanks to the history of the addon. Most notably, the extension used to work by determining how tall and how wide the video should be back in the day when it was only focused on Youtube and Netflix. This method has a few drawbacks, with most notable ones being:

  1. If you ask browser to tell you the size of the video, it’ll tell you the dimensions you specified
  2. It worked for youtube and netflix, but not for everything else

In general, we can assume that initial size of the player will be exactly as wide as the video or exactly as tall as the video. However, since we actually changed the size of the video (as opposed to telling browser to just enlarge the video by some factor), we couldn’t check for that as if the video was cropped, browser would tell us the post-crop size (and post-crop size is useless for that purpose). Some wonky code was written to deal with this issue and it worked well enough for Youtube and Netflix and sometimes even other sites. However, said code is — in retrospect — pretty bad. Looking at it invokes a few questions that every programmer sometimes asks themselves: “the hell was I trying to do with this shit” and “how the fuck did this even work at all?”

You could farm karma at /r/badcode with this.

Due to problems with #2, a better solution to resizing the video needed to be implemented, and eventually it was in the form of transform: scale(x,y). Using this to crop video (as opposed to modifying width and height attributes of the video) has some nifty advantages: it’s possible to get the size of the video without taking transform into account. This allows us to rewrite the player detect loop in a way that will correctly detect the player element.

Dealing with duplicates

Another thing worth addressing is “duplicates” — that is, what happens when more than one element on our way from video element to the root of the page has the same size. I haven’t figured out what to do in this case, since the correctness of picking innermost over outermost element for player may differ from site to site. In absence of better options, I decided to score every element that could be our player. Rules of the game:

  • Every element that matches our criteria gets 100 base points
  • Elements with 'player' in their ID get 75 bonus points
  • Elements with 'player' in their classlist get 50 bonus points
  • The farther the element is from our video, the more penalty points it gets. First match gets 0 penalty points, second gets one, third gets two and so on.

I haven’t had the chance to test this thoroughly, so results may vary.

That’s it for the day.

Ultrawidify: The Twitchy Twitch Problem

Ultrawidify has been seeing some issues with constant aspect ratio readjustments. This post examines and explains why and how these issues happened.

I don’t think I’ve boasted about developing Ultrawidify much on this blog. Maybe I should have, but then again: the audience of this blog is a) people who know me and b) people who use Ultrawidify and were bored enough to click that link in update notes. The point is — you’re all familiar with this extension.

A while back, I’ve noticed an issue on Twitch. It turned out that the video was a bit … well, twitchy. However, the issue seemed to be fairly uncommon, so I made a note in my test videos file and decided to kick the can a little farther down the road. “No big deal, it surely can wait.”

Well, turns out that the issue was a bigger deal than I thought. I’ve recently accidentally visited my facebook feed, and the twitching issue appeared — except worse. I tend to avoid twatter as well, but long story short: I stumbled on a tweet with embedded video.

What a good time for a #literallyshaking joke.

On the plus side, the issue happens very often (more often than auto-detection interval), and it happens equally often even when the video is paused. Here we get the first (and perhaps the only) bit of good news for the day: auto-detection isn’t to blame — and since there’s exactly one other thing that could cause this behaviour, this means I already know where the problem is.

In Search of the Problem

In order to understand whats and whys of the problem, we have to take a quick look at how Ultrawidify crops the video. It’s very simple: it finds the video element and basically tells the browser: “Make sure the video is this wide and this tall and then enlarge it by this much,” where “this much” is whatever number auto-detection script (or user intervention) spat out. In programmer jargon, that’s called setting style string.

For technical reasons,1Blame the video alignment feature for that! Ultrawidify’s auto-detection will also “correct” the aspect ratio when there’s no need — in cases like this, the video would be enlarged by a factor of 1: same size as before.

This should, in theory, do the job just fine. In practice though, Ultrawidify isn’t the only thing doing that. Some sites will also tell the browser to make the video element that big because something on the page changed (example: switching between normal and theater mode on youtube). This effectively undoes any changes Ultrawidify has made to the page. And we really don’t want that, since that has the potential to uncrop the unnecessary letterbox.

Solution to this problem is easy enough at the first glance: we’re just gonna tell Ultrawidify to watch for sites trying to meddle with the video size. If the website tries to change anything, Ultrawidify will undo that change immediately. I think you can see where this is going.

Yes. Twitter is also watching for anything that would meddle with video sizes. If it detects that something changed how big the video element is supposed to be, it will undo that change.

Twitter’s behaviour would also break crop and stretch functionalities, though I don’t think improperly cropped videos are common on the platform. I have noticed improperly cropped videos on facebook, though.

Developer tools seem to agree with this assessment. In inspector view, video element is blinking like there’s no tomorrow while in the console, Ultrawidify is seen setting the same style string over and over again, and the zoom factor is always one. At this point you may wonder why the twitchy video if ultrawidify sets the zoom factor to one, and the answer is simple: twitter doesn’t.

Through the magic of “inspect element” we can see that Twitter is zooming the video ever so slightly (by a factor of 1.005).

With Twitter insisting that the video should be zoomed by a factor of 1.005, Ultrawidify wanting a zoom factor of 1, and neither being very keen on letting go. And this spells trouble for us.

Ultrawidify vs. twitter, 2019 (colorized)

In Search of Solution

If the site will just undo our changes, what can we do? Well, it turns out that there’s a way. As it turns out, there’s actually two sorts of CSS styles: author styles — which is CSS defined by the website you’re visiting; and user styles.

Through the magic of user styles, you — the user — have the final say over how the browser will display the site. If Facebook says the background of the page needs to be white, and you have user style that says the page background should be whatever meme is popular this week … well, tough luck Facebook. Nothing can override user styles, which makes them the perfect “fuck you, you’ll do what I tell you” card. We’ll take it.

There’s another piece of good news: you don’t have to define the styles in advance — you can make them up on the spot and tell the browser to use that. WebExtension API allows us to do that. There is a few caveats, though. Besides making up the style, you also have to make up a class name, attach it to the element and hope that the site won’t remove it. If you want to edit your style, you have to throw the old style at the browser and tell it that you want it removed. You also have to create a brand new style, tell the browser to use that.

Fortunately for us, it currently seems that the sites aren’t as thorough with removing “unauthorized” CSS classes from elements as they are with “unauthorized” styles, but that’s all it is for now: a rule of thumb that everyone seems to follow, until someone doesn’t.

Well, that was a fairly easy fix, wasn’t it? After all, it required very little work from us (the user-style injecting is already implemented — as a part of dealing with vimeo and its special snowflakey bullshit). Performance seems comparable to what it was before (albeit a tiny bit slower to react to changes) and all is well.

Further testing reveals that tie twitching image issue was fixed on Twitter, probably on Facebook as well. At least as far as Firefox is concerned.

Now it’s time to test in Chrome.

Hotel California

So I whip up my “variable aspect ratio” video example on youtube and start sweating profusely. Will the extension work?

Because there is one problem with my solution: Chrome doesn’t support tabs.removeCss(), which turns programmatically generated styles into a bunch of unsuspecting fellows checking into a hotel California. Sure, you can insert the style at any time you want, but it can never leave.

Chrome and chromium pls. The worst thing is that the “feature request” for this was opened in 2016, and the patch has existed for over a year and a half at this point (since February 2018).

Not being able to yank the previous style when adding a new one is problematic the same reason you telling your kids to wear a white t-shirt where your partner told them to wear a red one a minute ago is problematic. It’s also problematic the same reason putting a new highway tolling sticker on your windshield without removing the old one.

It can get messy real fast.

Fortunately for us, Chrome takes the “common sense” (and standard-compliant) approach to handling the first problem: in case you have multiple conflicting definitions, it’s gonna respect the last one. This makes it a bit easier to ignore the second problem, though having tens or in the very worst case (frequent aspect ratio changes, frequent resizings of the player and browser window and watching a video in the same time for long periods of time) even hundreds of conflicting styles shouldn’t be too much of a problem.

This is a bigger mess than How To Train Your Dragon: The Hidden World, even though that theoretically shouldn’t be possible.

Nonetheless, we’ll have to settle for this very terrible practice when in Google Chrome. I’ve ran out of fucks to give and the new ones are expected to arrive only in about a month.

Now, I could probably spend another day trying to deal with Chrome shittiness and try to invent new workarounds, it’s easier to just sit back, REEEEE at Chrome, use my userbase as a glorified guinea pigs and hope that people who watch youtube without closing/refreshing/opening video in a different tab won’t have their performance degraded.

And that’s a very dangerous game. I should know: the first update where autodetection was introduced had major performance issues if you watched videos in a tab continuously for long amount of time (Chrome was a complete lagfest in ~5 minutes of watching video, while performance in Firefox was somewhat better (and very dependent on your hardware) — just good enough that it escaped my testing. It’s been two years now and my ratings on Chrome Web Store still haven’t recovered.

I’ll still take the gamble though, against my better judgement.

At this point, I suppose it’s time for a PSA:

Public service announcement: Google Chrome is garbage (and so are all other Chromium-based browsers). If you aren’t already, you should really use Firefox instead.

Thanks for coming to my TEDx talk.

“You’re just like that coffee machine, y’know … from bean to cup, you fuck up.”

The Case of Twitchy Twitch

So the extension stopped doing twitch things on twitter and the new system for resizing video works. But after a quick visit to the out of season April fools joke (a.k.a. the immortal Blizzcon stream from hell) it turns out that the video still appears … twitchy, which means that twitching on twitch was a result of a different problem.

The ball is now back in the court of automatic aspect ratio detection. To be fair, that twitch issue is a problem with autodetection was known for a wihle. After all, most streams don’t have this problem, and the fact that this doesn’t happen when autodetection detects proper letterbox (or that aspect ratio correction works at all — on Twitter, it wouldn’t!) should be plenty of evidence to support that:

Look at how twitching stops as soon as WoW cinematic begins about 12 seconds into this loop.

This issue smells like a video with a very tiny, nigh unnoticeable black border. Not wide enough for you to notice, but just enough to trigger autodetection. 30 seconds of (local only) defacing, it turns out that our intuition was correct:

Notice this thin black line at the top of the video? It’s only about a pixel or two wide on the source stream, but with video and page background set to white and gray it quickly becomes apparent. Video element is outlined with red.

Turns out that this is a very interesting edge case, but to understand how things went wrong we’re gonna need a quick crash-course in how auto-detection works. The key steps are:

  1. Start counting rows that contain nothing else than black pixels, from top and bottom to the middle. Each row has a number: top row is row 0, last row has a number that’s one less than the number of rows. Due to performance reasons, we only pick a few columns of each row that we actually check for the presence of a (non-)black pixel.
  2. Remember the first row that contains black pixel (on both ends)
  3. Calculate the height of the black bars
  4. Check whether top and bottom black bar are roughly of equal thickness.
    We don’t require black bars to match exactly because:
    1. in theory, the “not black bars” portion of the video is not guaranteed to be of even height. In cases like this, the top and bottom black bar could differ by 1px
    2. in practice, not all pre-letterboxed videos are exactly centered. Case in point: first few seconds of this video.
  5. Calculate and apply aspect ratio. If top and bottom thickness are different, you’ve got two options. Either you over-cut or you end up with a thin black border on either top to bottom. I’m not exactly 100% sure which strategy I’m using: twitch issue suggests I’m using strategy A (overcut), but the previous video suggests black edges are preferable.

That’s the basic idea behind aspect ratio detection, but reality is far more complicated in order to crack down on false positives. This means that we want to avoid unnecessary checks if possible.

Once we determine the correct aspect ratio, we can take a shortcut. Since we know where black bars end and the image begins, we can just check these four rows:

Where ‘n px’ and ‘X%’ are some adjustable thresholds.

The reason why we aren’t checking the rows exactly on either side of the edge is to dodge compression artifacts and blurred edges.

Naming and shaming: Disney (Star Wars: Solo trailer).

So fine, we save numbers of these four rows. Top outer row is the last row in which we failed to detect a non-black pixel minus a pixel or two for safety, top inner row is the first row in which a non-black pixel was detected plus a pixel or two of safety margin.

There is, of course, a problem. If the top black border is one pixel thick and bottom black border doesn’t exist, the extension would have us check rows that don’t actually exist. Since the rows we need to check don’t exist, we can assume that black bars either don’t exist or are too thin to actually annoy anyone. When that problem happens, the extension resets the aspect ratio back to what it was originally.

Here’s a fun fact tho: this step — saving numbers of the four rows — happened only after extension already corrected aspect ratio. Normally that’s not a problem, because videos don’t have one pixel thick black bars on the top and no black bar at the bottom. They would either have substantial letterbox, or none at all.

In this particular case, though, things were a bit different. Ultrawidify would detect the one pixel letterbox and correct it. Then it would check where the black bar edges are and apply the safety margins. Safety margins would be out of bounds.

Would say Ultrawidify and undo the correction right away, and this cycle would repeat forever (or at least as long as one-pixel and zero-pixel letterbox was present).

Fortunately enough for us, the fix for that is easy enough: we try to save the edge rows, and only issue aspect ratio correction if the result isn’t bogus.

And that’s about it for today, really.

How to make speech bubbles draw themselves: a story of pain (part 3)

In the part two of this saga, we’ve discovered that not using brute force is not gonna be an option. After our first attempt at brute force failed miserably, we’ll have to find another way.

Me orc, me smash: Brute force revisited

Upon review, it turns out that our first solution was actually kinda close to working. Our algorithm kinda looked like this:

  1. Find the center of the ellipse
  2. Find aspect ratio of the text
  3. Assume that the aspect ratio we found is the same as the aspect ratio of our ellipse
  4. Squash text into the square and draw a circle around it
  5. Calculate the radius of the circle using maths
  6. Unsquash the circle, get ellipse (or two radii that define it)

The obvious problems with this approach were:

  • assumption under point #3 is incorrect
  • this means the radius we calculated was also wrong, and often too tight

We can work around that. We only really need to add very little on top of what we already have:

  1. Calculate radius of the circle using maths
  2. **Increase that radius by some factor in order to give us some breathing room**
  3. Unsquash the circle, get ellipse (or two radii that define it)
  4. **Try to make ellipse smaller, one small step at a time**
  5. Repeat #8 a few times, use the best result you get

Steps 8/9 can be done using a kind of binary search, and it goes roughly like this:

  1. Calculate whether all points are in the ellipse for given radii.
  2. If all points are inside the ellipse, decrease radii for half of a step
  3. If any of the points are outside the ellipse, increase radii for half of a step
  4. Decrease step by half
  5. If step is sufficiently small, you can stop
  6. If step isn’t sufficiently small, go back to #1

In our case, we have two ‘steps’ (each for one of the radii). Before starting our algorith, we determine two values for each of the radii:

  • “Radius can’t be bigger than this” number (e.g. width of a given line of text)
  • “Radius can’t be smaller than this” number (e.g. half of the width of a given line of text)

Step is the difference between the two (different for each radius). We use the bigger of the two numbers as our radius.

This works great, but we can improve it even further. Inside every step, we can try decreasing size of one radius while keeping the other radius the same. This can help us find smaller, better-fitting ellipses at the expense of some extra calculations (to the tune of O(n²)).

Trying it out

Now that we wrote the code, it’s time to try it out. Results ended up being slightly disappointing:

Yikes. That’s worse than our previous test. We even managed to fail to draw a rectangle properly.

What is worse: there doesn’t seem to be neither reason nor rhyme to why things don’t work. Our brute force approach was used at least five times and has two failures (where bubble is way too small). Multiline texts — the ones that very probably don’t use the bruteforce approach at all — are often off-center and sometimes even too small (all off-center bubbles are too small, so text doesn’t fit in the ellipse):

Those four texts must be related to Cinderella.

What gives?

Upon closer examination, it turns out that some ellipses that should be drawn with brute force method weren’t drawn by brute-force method. Secondly, despite determining ellipse’s radius by brute force, we still always use our matrix to determine one or both center coordinates. Our ‘high maths’ approach is giving us slightly suspect — and possibly even wrong — results.

The one thing to consider when writing code is that computers are generally fairly terrible with very tiny (and very big) numbers. The tinier your number (we’re talking anything past fifth decimal), the less “accurate” it is. The values the matrix calculation spat out contained some very tiny numbers. When those numbers were used to perform further calculations, the errors grew and we got garbage results.

Let’s forget the ‘high maths’ approach for a while and try to brute force everything.

Giving brute force another try

Step 1: determining the center of the ellipse

This one is going to be easy. We still only work with 4 points at the time. To get the ‘center point’, we find the average of the four coordinates. Then, we try to find the intersection between the lines connecting the points at the opposite sides of the quadrangle the four points form. If the point of intersection is mirrored over the ‘center point,’ it gives us the center of our ellipse.

Step 2: apply smash, remove smart

We feed the center of the ellipse we got from step 1 into the brute force algorithm we discussed earlier, run some test and …

So close, yet so far. Something crashed the script on the last bubble.

Good news is: brute force does find better results than attempt at using high maths. No bubble is drawn off-center, all bubbles are correctly sized. The reason for the crash is a weird point selection: one of the diagonals (in at least one of the 4-point subsets of the last bubble) is vertical. In order to calculate the center of our to-be-ellipse, we need to know the slope of thes line between the opposing sides. There’s a bit of a problem though: if you’re trying to calculate the slope of a vertical line in a coordinate system, you’re going to faceplant into a division by zero.

Computers don’t like dividng by zero.

Trying to find an intersection in this point combination is going to be problematic.

We can quickly see another thing: if we try to determine the center for our ellipse using these four points, we’ll just get less-than-optimal solution. That means we can skip it — all it would do is waste our time.

Let’s see if that fixed our probem:

And all is right. As far as performance goes, “stopwatch test” says brute-force approach isn’t really that much slower than the “proper”, “high maths” approach.

This leads us to the conclusion:

Lots of people love the “work smart, not hard” approach to things. Sure, using a little bit of brainpower to avoid using a lot of force is nice. But let’s not forget that sometimes, the opposite applies: a little bit of brute force can save us from using lots of brainpower.

Moral of the story: don’t overthink your programs, I guess? Sometimes, simple is better and ‘approximate’ is good enough.

You’re looking at one or two weeks of my free time going down the drain. Just like that.

How to make speech bubbles draw themselves: a story of pain (part 2)

When we left off last time, I still thought this wasn’t going to be that bad. I was wrong.

Matrix: reloaded

So we’re back to square one and I’m looking at the first answer about my problem that I’ve found over at math stackexchange. When I first saw it, I thought I haven’t watched enough Rick&Morty to understand it, but after playing around with it for way longer than I should have I started figuring shit out.

If you plop 4 points into the following equation:

ax² + by² + cx + dy + e = 0

And solve the system of those four equations, you should notice that the variables ‘a’, ‘b’, ‘c’ and ‘d’ are all some fraction or multiplier of ‘e’. Even without knowing the value ‘e’, we can calculate the center of the ellipsis. If we want radius, we need to know ‘e’. Assuming ‘e’ is -1 seems to do the job the way we wanted, though.

Since our convex hull often contains more than 4 points, we try to calculate the ellipse for every combination of 4 points (out of all points on the hull). Note that because we’re lazy, we don’t calculate the hull. We just take upper two corners from each row in the upper half of text and the bottom two corners from each row in the bottom half of text. If the text has odd number of rows, we include all four corners in the convex hull.

All edge points are marked with a circle.
— point not on the ‘convex hull’, therefore we ignore it.
— Points we assume are on the convex hull. No biggie if they aren’t as we throw out garbage results later

We calculate radii for each combination of 4 points that we have. We check if the new ellipse contains all the points of the hull (either inside or on the edge). We reject the result if it doesn’t. If we don’t reject the result, we compare it with the last result we didn’t reject. If the ellipse is bigger than the last result, we reject it. If not, this is our new result. Repeat until you’ve ran out of combinations.

It fits, more or less. There appears to be some rounding/off-by-one error down there at the bottom, but nothing that’s terribly problematic.

Sooner or later, though, we run into a problem. What if ‘c’ and ‘d’ are free variables instead of ‘e’? This happens if any of the four points are horizontally or vertically symmetrical relatively to the center of the ellipse. For example, if you plug points 0,0; 4,0; 0,2; 4,2 into the equation, you’ll find that ‘a’ is some fraction of ‘c’ and ‘b’ is some fraction of ‘d’. You can still calculate center from that, but you can’t get the radii. Even worse is the case of 0,2; 2,0; 4,0; 6,2: symmetry across only one axis means you can only get one coordinate of the center because there’s infinitely many solutions.

Since we’re after only one specific solution, for us ‘infinitely many solutions’ equals ‘no solution.

Forsaken cheats

In the second example from above, we could do some additional maths in order to find the center. We’re only interested in the smallest possible ellipse, so we could calculate some limits in order to get the other center coordinate. But word on the streets is that cheating is easier and could get us results that are close enough to what we want.

Turns out breaking symmetry isn’t too hard — we just need to ensure that points in the left and right halves don’t share the same ‘y’ coordinate, and points in top and bottom halves don’t share the same ‘x’ coordinate.

The first bit of this task can be easily achieved: since every row has two points with same ‘y’ coordinate (and since ‘y’ coordinate can’t repeat across multiple rows), we just need to shift one point a bit up and the other a bit down (or keep it at the same place).

Shifting ‘x’ coordinates is a bit more problematic: two different rows can absolutely contain same ‘x’ coordinate — and what is worse, if we shift the ‘x’ coordinate in any way, we may create the situation we’re trying to avoid. However, we know the following things:

  • point coordinates are whole numbers only (We get points by reading pixels, and there’s no such thing as ‘half a pixel’)
  • only points in the opposite rows are required to have different x and y coordinates


— point not on the ‘convex hull’, therefore we ignore it.
— Points of the same color aren’t allowed to share any coordinate.

If two neighbouring rows (e.g. top two rows in text with more than 3 rows) have points that share the same ‘x’ coordinate, it doesn’t really matter because ellipse containing those points would be rejected for being too small anyway.

This allows us to easily apply required offsets:
* upper left: x -= 0.5, y -= 0.5
* upper right: x -= 0.5
* bottom-left: y += 0.5

So I thought I could do it by shifting points a little outward. Turns out that two-line and one-line rows disagree. At the end of the day, “losing” half pixel of space won’t really be noticed when bubble is more than hundred pixels across most of the time — and when it’s not, the radius is still in high double digits. So let’s change that a bit:

  • upper left: x -= 0.5, y -= 0.5
  • upper right: x -= 0.5
  • bottom-left: y += 0.5

Not quite what we want, but better than what we had before.

Let’s try that out a bit more:

Oopsie whoopsie. Fucky wucky.

Turns out I spoke too soon.

The Right Way™

Sometimes, we don’t have enough data to determine where the center is. Sometimes, we will only get enough data to determine one of the center coordinates. For example, if you only have four points to go off, and if said points would form a symmetrical (acute) trapezoid they were to be connected with lines — and since our text is centrally aligned, that happens almost every time we want to draw a bubble around two lines of text — you can still determine horizontal center.

Turns out this comes out handy: we can determine one of the coordinates using the kind of “cheating” approach we used before, but with some twists:

  • We split points in two groups: those to the left and those to the right of the horizontal center (top and bottom if we have the vertical center of the ellipse)
  • Find the vertical center of the text (the spot halfway through the topmost and lowermost text edge)
  • Find the longest diagonal (upper left to lower left or lower left to upper right)
  • Find where the diagonal crosses from one half to the other, and flip that point over the vertical center of the text

The last step is important because longer lines will move the center of the ellipse away from center of text, towards themselves — but the point where the diagonal intersects the horizontal center is going to be offset in the opposite direction.

We need to account for the fact that center of the ellipse is going to gravitate toward longer lines.

Once we have both coordinates of the center, we still have some leftover data from the matrix that we can use to determine both radii.

However, this approach doesn’t cover all the cases (it fails at least one), and the ‘offset corners by half a pixel’ way of dealing with things doesn’t seem to harm it, so we’ll just keep both in for the time being.

All in all, we’re progressing somewhat nicely. There’s some work to be done when determining the anti-jag parameter of the rectangular selection (but that — as well as padding — will be user-provided arguments/options). The only thing we have to deal with now are the bubbles with one or two lines, where our current tactics for determining the bubble size fails.

Testing grounds. Things are mostly fine, except the bit where my script won’t handle the red bits.

Turns out that — spoiler alert — the brute force approach is the only approach that will consistently work. Maybe it should be revisited.

How to make speech bubbles draw themselves: a story of pain (part 1)

Every now and then, I sit down in front of my computer and start making a comic. It follows a very similar premise to DM of the Rings and Darths&Droids — that is, take a movie (in my case, How To Train Your Dragon) and pretend it’s a D&D session — except my work is much less original and not that good, probably. Oh well.

If you’re making any sort of comics, you’ll probably have to draw a ton of speech bubbles. Mind, drawing basic speech bubbles isn’t that hard: you use the oval selection tool, select an area and fill it with color. But boy does it take time. The more of them, the longer it takes — and boy do some pages feature a lot of them.

Just to illustrate how bad it can get. Now admittedly an episode normally won’t have this many bubbles, but that doesn’t mean drawing speech bubbles isn’t a colossal waste of time.

Because we like to work smarter, not harder, the question pops up: why don’t we make a script that would draw speech bublbes for us? I’m using GIMP anyway, and GIMP has plugin support. ‘‘This shouldn’t be too hard,’’ were the famous last words as I opened visual studio code on that day about two weeks ago and got to work.

Drawing rectangles is easy. Drawing ellipses is hard.

Comics often use two kinds of bubbles: there’s bubbles used for narration, which are often rectangular; and there’s bubbles used for speech, which are generally elliptical (more or less).

Determining borders

If we want to draw any kind of bubble (be it rectangular or elliptical), we must first figure out where to place it. This is the easy part, because:

  1. we use GIMP and put the text for each bubble on its own, separate, transparent layer. That’s one of the important assumptions that we’ll make (especially for elliptical bubbles): we don’t reuse text layers for more than one bubble, and the layer is always transparent except for text.
  2. images and layers are, in general, grids of squares (pixels), where each square is colored with a different color. This allows us to borrow some tricks from Ultrawidify: we’ll check every pixel of every row, and every pixel of every column for the presence of non-transparent pixels.

If we keep track of where the first non-transparent pixel was found for every row and column, we can figure out where the text starts and where it ends. If rows of text don’t overlap each other, we get the bounds of every row of text in the layer. And when we know that a row of text starts at line 32, ends at line 64, starts at column 3 and ends at column 125. That gives us 4 corner points per line of text, and that’s enough data to draw a rectangle. The vertices are at 3, 32; 125, 32; 3, 64; 125, 64. The procedure is so simple it doesn’t even deserve its own heading (although there is a few minor, seemingly simple improvements you can make to that). At least compared to what’s about to come.

Ellipses can honestly piss off

Do we even know what we’re trying to do?

Yes, actually. We want to draw an ellipse around a set of points, where:

  1. All the corner points must be inside or on the edge of said ellipse
  2. Points must be as close to ellipse edge as possible

In order to draw an ellise, we need to know:

  • width and height of the ellipse
  • center of the ellipse (or more accurately, upper left corner¹, but we can get that from width and height of the ellipse)

¹In computer graphics, the coordinate system is flipped over horizontal axis compared to the coordinate system used most everywhere else.

because those are the parameters GIMP’s ellipse selection tool takes. We already know the points that represent the edges of text. What we need to do is to take those points (bunch of pairs of x,y coordinates) and somehow convert that to parameters of the ellipse that will satisfy the two rules outlined above.

Computers are very good at two things: logic and maths. This means we need to tell it how to calculate the center, width and height from those points. Because I don’t recall that lesson from my math classes, I went to every programmer’s best friend (stackoverflow) and discovered that — spoiler alert — this is either a) more or less impossible or b) requires diploma or master’s degree in maths.

I studied computer science. We had maths, but not that kind of maths.

Time to cheat

First of all, we subtly change our second requirement. We don’t require that the corner points are as close to the edge of the ellipse anymore, instead we focus on trying to draw the smallest ellipse possible. It’s less than ideal, but we’ll have to live with that.

It’s a subtle difference that only matters in edge cases like this. ‘Points as close to the edge as possible’ will give us a bigger ellipse overall, which is sometimes desired (if the upper portion of the speech bubble were to be covered with something). ‘Smallest ellipse’ produces lots of wasted space around the bottom line.
Dots serve as a rough illustration of what the script would consider to be a corner point of the text.

After we’re done revising our requirements, we head back to google and look up the equations for the ellipse.

x = a * cos(t)
y = b * sin(t)

Where x, y are coordinates of a point on the ellipse; a, b are the length of the primary and secondary radii. We can do something with this. We can calculate where the center of ellipse will be. We just calculate where the middle point of every opposing pair of corner points is (e.g. the opposite point of upper left point in the top line of text is the lower right point in the bottom row of text). Since each pair of corner points may give a slightly different center, we take the average of those centers. Now that we know where the center is, we can use some high school maths to calculate the angle between straight line from center to a given point and the horizontal axis, and we do that for every point. Biggest ellipse will contain all the points, and we will have our answer.

Except t is not the angle to our point. tis the angle between horizontal axis and the point where our point would be, if the ellipse was squished (or stretched) into a circle. We can squish (or stretch) ellipse into a circle by multiplying (or dividing) one of the coordinates with the aspect ratio of the ellipse.

Do we know the aspect ratio of the ellipse? No.

We know the topmost point of the top line of text, we know bottommost point of the bottom line of text, we know leftmost and rightmost point of the longest line of text. We can get width and height from that. We can use that to calculate an aspect ratio.

Is that aspect ratio the same as the aspect ratio of the ellipse we want? Tests concluded that no. This is not the aspect ratio we’re looking for. Our final result looks something like this:

White ellipse is what the script did. The selection has same dimensions than ellipse, but is placed roughly where the ellipse is supposed to be. Red lines represent what our script considers to be a line of text.
Red line should be entirely within the ellipse (with at least 3 pixels to spare in the vertical direction, and 7 pixels to spare in horizontal). But the red seems to poke out for some reason.
And not only does the ellipse fail to contain the lines completely, it’s also offset to the left and a little bit to the right.

Can we fix the aspect ratio with some brute force? I could write another paragraph on how this was attempted, but long story short: turns out we can’t. We’d need more data to draw the ellipse we want.

Oddshot with sound here.

Bummer. Turns out cheating doesn’t pay (yet). I guess we’ll have to do things the proper way™.