Spectacle is KDE‘s screenshot utility. Since the beginning of time (but only reported years ago), it has a bug that annoys people using multiple high or mixed PPI monitors. For example, consider my desktop. It looks like this:
5120×2160 at 34″ gives us a PPI (pixels per inch) of about 160. This has its benefits, but also some drawbacks — especially on Linux. Wayland solves some of the issues in that regard as it allows you to set scaling factor on per-display basis. Unfortunately, there are show-stoppers that prevent me from going that way.
This leaves us with Xorg. Xorg doesn’t do mixed PPI setups at all, but there’s workarounds. You can tell Xorg to pretend your low PPI monitors run at a higher resolution than they actually are. This has downsides (blurred image), but it beats the alternative. So you set your scaling factor to your highest PPI monitor and “supersample” other monitors appropriately.
The ‘scaling factor’ is set by your desktop environment — in my case, KDE (which uses Qt). If you set this scaling factor above 1, you will sometimes into a bug (in Qt). It pops up in a few different places, most annoying of which is when you want to take a screenshot of a rectangular region.
The unwelcome offset and duplication of the screenshot area renders some parts of the screen impossible to screenshot using rectangular selection. This is annoying. Let’s fix it.
KDE’s window manager, kwin, is pretty nice thing. Among other things, it allows us to tell windows and programs how to behave. We can force windows to stay always on top, we can force windows to open on a specific monitor, we can prevent windows from being closed. We can make windows transparent and tons of other things, but most importantly: we can make windows appear at exact location we want them.
The still frame of our screen on which we select is technically a full screen window covering all three screens. We only need to tell kwin how to recognize that window and move it to where we want. In order to that, we need to know two things: window class and window title.
We don’t know them yet, but the ‘Detect Window Properties’ button looks inviting — except it doesn’t work when we try to detect properties of our “draw rectangle for screenshot here” window.
Okay then, we’ll run xprop in the terminal while the window for drawing rectangular selection is active and click it.
Fine, clicking the window is not the only way to coax data we want out of xprop. We can give it the window id, like so:
This works. We set this command to run while rectangular selection for spectacle is active, we get what we want:
Window class: spectacle. Window title: Spectacle (as opposed to filename — Spectacle of the regular window). This is enough for us to only move the rectangular selection window, while keeping the spectacle window that appears after we’ve taken a screenshot where it is.
Kwin rule is thus written. Another screenshot is taken. The issue persists.
How to waste an hour.
Where there’s a whip, there’s a way
But we’re not giving up quite yet. The good thing about Linux is that you can do a lot of things via terminal — such as, move windows around. xdotool can do that. The command looks like this:
xdotool windowmove <window id> <x> <y>
And we already know how to get window id. We put what we know together and paste our command to the terminal:
During the five seconds of grace, we try to screenshot a rectangular selection of the screen. We get the first real success of the evening — after the five seconds are up, the rectangular selection window moves to where it should be.
Now, let’s try to launch spectacle’s rectangular selection and the code that corrects its position at the same time:
… and now, it suddenly doesn’t work anymore. Spectacle launches, but the rectangular selection window remains offset. Of course, we can swap positions of spectacle and commands that correct the offset of rectangular selection window, like this:
Wouldn’t be nice if this would happen when I pressed ‘print screen’?
Yes, actually, yes it would. So we open the custom shortcuts settings, provide a key shortcut, paste our command into the command box.
We click okay. We try to take a screenshot. Nothing happens.
But let’s not give up. Files with bash scripts also counts as a command. We thus take this command and paste it into a file (and take care to not forget to put the #!/bin/bash in the topmost line), save, make it executable and jot down the path to the file in our custom keyboard shortcut.
And it finally works — after about two hours of experimenting.
It’s been a while since I’ve written about programming, but lately I’ve gotten back to working on Ultrawidify. With no major bugs or problems that require immediate fix, I can finally get to work on bugs and features that I’ve been kicking down the line for a while.
Problem of the week are, of course, keyboards (or keyboard layouts).
To elaborate a bit further on the problem: a significant chunk of Europe uses QWERTZ layout. QWERTZ is much like QWERTY layout Americans (and the rest of Europe except France) are using, except ‘Z’ and ‘Y’ trade places. French weren’t content with swapping only two letters and came up with with AZERTY layout instead. Then you also have things like Dvorak and Colemak layouts, because some people insist it makes their typing much faster. As letters trade places around the keyboard with each different layout, the keycodes don’t.
As a result, you can never be sure what letter the end user get from pressing what key. On some layouts, keycode 90 will give you Y, on others Z. This means that if you base your keyboard shortcuts on a QWERTZ layout and have ‘Z’ as a shortcut for anything, French and QWERTY users will wonder why they have to press the Y key, and Dvorak users will just shout at you that keyboard shortcuts don’t work.
If your goal is to have keyboard shortcuts that won’t be flat out wrong for people using a different keyboard layout than your own, then keyCode is not the way. Handling different keyboard layouts is an easy road to code spaghettification. Most importantly, it’s a major pain in the ass. You’ll spend a lot of time on it, but with very little gain.
Neat. Now I can just use this property for all my keyboard shortcuts. Since event.key gives us a specific character, I no longer have to pay attention. I can just say “press Z to zoom.” After all, the ‘z’ that event.key gives me is exactly the same, regardless of whether the user uses QWERTZ, QWERTY or something more exotic. Foolproof, isn’t it?
Wrong, sir, wrong.
True, event.key will return the same letter regardless of what keyboard layout you pressed said letter on. However, some letters are unpressable on certain keyboard layouts. If you’re using cyrillic (or anything non-latin), you’ll quickly find that keyboard shortcuts using even the standard ASCII letters no longer work.
Certainly a mild oversight on my part, but in my defence: the only reason I’ve started developing this extension is because at the time, there was no extension for fixing aspect ratio available for Firefox (Chrome did have a fair share of aspect ratio fixers such as Ultrawide Video, but those hadn’t been ported to Firefox until way later) and I really wanted the functionality. And when all you want is a swingset, why build a rollercoaster? The good old days.
But let’s get back to the topic at hand. If we want to fix the Russian problem, we’re in a bit of a tough spot. event.keyCode is starting to look better and better by the minute … except it doesn’t, really.
What can be done?
The options roughly boil down to the following:
1. Use event.keyCode to determine keys.
This option brings a lot of problems. Not only will there be issues with people using non-QWERTZ layouts (unless I spend unreasonable amount of time working on getting around that), using event.keycode would mean I have to rewrite lots of the existing code. More importantly — since all keys have been fully rebindable for a while in extension settings, I would have to decide between writing something that will correctly preserve keyboard shortcuts for existing users (annoying, quick StackOverflow recon didn’t give encouraging results), or reset keyboard shortcuts to default for everyone (easy but rather unacceptable. I don’t want another Nosedive).
2. Use event.keyCode to determine keys for new users, event.key for existing ones
This one offers some benefits over purely keycode solution — I don’t have to write code to port keyboard shortcuts to the new system, I don’t have to wipe settings of existing users. Still has some drawbacks that I don’t like, though — namely, the fact that I’ll have to deal with displayed keyboard shortcuts being wrong for non-QWERTZ keyboard layouts.
3. Keep using event.key and fall back to event.keycode if event.key doesn’t contain an ASCII character
Hey look, this is the quick and lazy solution we’ve been looking for. It’s also dirty, but it’s going to work. Maybe not on custom shortcuts, but we’ll see.
Yet another day, yet another post about stuff going wrong. This time, I’ve got a bug report that “videos are jumping around” on Facebook and some other pages. I tried to verify the problem … and everything worked fine for me. Then I decided to boot up Windows and there it was — the problem as described. So nice — we have a problem that happens on some operating systems and doesn’t on others, even though that shouldn’t be the case in theory.
But eventually, the issue was reproduced and that’s all that matters. The issue appears very familiar — it has been observed on reddit before.
A video and a player
In a very ELI5 way, every webpage is made out of a bunch of rectangles (layers, elements), one within another. In order to properly crop a video, we must know which of these elements is actually the player (‘player’ element is to our video what picture frame is to a picture), and we need to know which element is the player element. Picking the wrong element can result in extension cropping to little, too much, or moving the video out of the picture altogether.
We can’t just assume that the first element above the video is a player, either: sometimes sites put addiitonal elements between the two. This is why we need ‘guess’ the player element by looking at the size.
Side note: not all extensions use that approach. Some seem to just assume you use a 21:9 monitor and slap a ‘enlarge this element by 1.3’ on the video element. Great and foolproof strategy for fullscreen. Less great for youtube’s theater mode, twitch with chat opened at the side, or non-fullscreen Netflix.
Legacy and technical debt
The code for determining which element is the player element has some weird quirks thanks to the history of the addon. Most notably, the extension used to work by determining how tall and how wide the video should be back in the day when it was only focused on Youtube and Netflix. This method has a few drawbacks, with most notable ones being:
If you ask browser to tell you the size of the video, it’ll tell you the dimensions you specified
It worked for youtube and netflix, but not for everything else
In general, we can assume that initial size of the player will be exactly as wide as the video or exactly as tall as the video. However, since we actually changed the size of the video (as opposed to telling browser to just enlarge the video by some factor), we couldn’t check for that as if the video was cropped, browser would tell us the post-crop size (and post-crop size is useless for that purpose). Some wonky code was written to deal with this issue and it worked well enough for Youtube and Netflix and sometimes even other sites. However, said code is — in retrospect — pretty bad. Looking at it invokes a few questions that every programmer sometimes asks themselves: “the hell was I trying to do with this shit” and “how the fuck did this even work at all?”
Due to problems with #2, a better solution to resizing the video needed to be implemented, and eventually it was in the form of transform: scale(x,y). Using this to crop video (as opposed to modifying width and height attributes of the video) has some nifty advantages: it’s possible to get the size of the video without taking transform into account. This allows us to rewrite the player detect loop in a way that will correctly detect the player element.
Dealing with duplicates
Another thing worth addressing is “duplicates” — that is, what happens when more than one element on our way from video element to the root of the page has the same size. I haven’t figured out what to do in this case, since the correctness of picking innermost over outermost element for player may differ from site to site. In absence of better options, I decided to score every element that could be our player. Rules of the game:
Every element that matches our criteria gets 100 base points
Elements with 'player' in their ID get 75 bonus points
Elements with 'player' in their classlist get 50 bonus points
The farther the element is from our video, the more penalty points it gets. First match gets 0 penalty points, second gets one, third gets two and so on.
I haven’t had the chance to test this thoroughly, so results may vary.
Ultrawidify has been seeing some issues with constant aspect ratio readjustments. This post examines and explains why and how these issues happened.
I don’t think I’ve boasted about developing Ultrawidify much on this blog. Maybe I should have, but then again: the audience of this blog is a) people who know me and b) people who use Ultrawidify and were bored enough to click that link in update notes. The point is — you’re all familiar with this extension.
A while back, I’ve noticed an issue on Twitch. It turned out that the video was a bit … well, twitchy. However, the issue seemed to be fairly uncommon, so I made a note in my test videos file and decided to kick the can a little farther down the road. “No big deal, it surely can wait.”
Well, turns out that the issue was a bigger deal than I thought. I’ve recently accidentally visited my facebook feed, and the twitching issue appeared — except worse. I tend to avoid twatter as well, but long story short: I stumbled on a tweet with embedded video.
On the plus side, the issue happens very often (more often than auto-detection interval), and it happens equally often even when the video is paused. Here we get the first (and perhaps the only) bit of good news for the day: auto-detection isn’t to blame — and since there’s exactly one other thing that could cause this behaviour, this means I already know where the problem is.
In Search of the Problem
In order to understand whats and whys of the problem, we have to take a quick look at how Ultrawidify crops the video. It’s very simple: it finds the video element and basically tells the browser: “Make sure the video is this wide and this tall and then enlarge it by this much,” where “this much” is whatever number auto-detection script (or user intervention) spat out. In programmer jargon, that’s called setting style string.
For technical reasons,1Blame the video alignment feature for that! Ultrawidify’s auto-detection will also “correct” the aspect ratio when there’s no need — in cases like this, the video would be enlarged by a factor of 1: same size as before.
This should, in theory, do the job just fine. In practice though, Ultrawidify isn’t the only thing doing that. Some sites will also tell the browser to make the video element that big because something on the page changed (example: switching between normal and theater mode on youtube). This effectively undoes any changes Ultrawidify has made to the page. And we really don’t want that, since that has the potential to uncrop the unnecessary letterbox.
Solution to this problem is easy enough at the first glance: we’re just gonna tell Ultrawidify to watch for sites trying to meddle with the video size. If the website tries to change anything, Ultrawidify will undo that change immediately. I think you can see where this is going.
Yes. Twitter is also watching for anything that would meddle with video sizes. If it detects that something changed how big the video element is supposed to be, it will undo that change.
Developer tools seem to agree with this assessment. In inspector view, video element is blinking like there’s no tomorrow while in the console, Ultrawidify is seen setting the same style string over and over again, and the zoom factor is always one. At this point you may wonder why the twitchy video if ultrawidify sets the zoom factor to one, and the answer is simple: twitter doesn’t.
With Twitter insisting that the video should be zoomed by a factor of 1.005, Ultrawidify wanting a zoom factor of 1, and neither being very keen on letting go. And this spells trouble for us.
In Search of Solution
If the site will just undo our changes, what can we do? Well, it turns out that there’s a way. As it turns out, there’s actually two sorts of CSS styles: author styles — which is CSS defined by the website you’re visiting; and user styles.
Through the magic of user styles, you — the user — have the final say over how the browser will display the site. If Facebook says the background of the page needs to be white, and you have user style that says the page background should be whatever meme is popular this week … well, tough luck Facebook. Nothing can override user styles, which makes them the perfect “fuck you, you’ll do what I tell you” card. We’ll take it.
There’s another piece of good news: you don’t have to define the styles in advance — you can make them up on the spot and tell the browser to use that. WebExtension API allows us to do that. There is a few caveats, though. Besides making up the style, you also have to make up a class name, attach it to the element and hope that the site won’t remove it. If you want to edit your style, you have to throw the old style at the browser and tell it that you want it removed. You also have to create a brand new style, tell the browser to use that.
Fortunately for us, it currently seems that the sites aren’t as thorough with removing “unauthorized” CSS classes from elements as they are with “unauthorized” styles, but that’s all it is for now: a rule of thumb that everyone seems to follow, until someone doesn’t.
Well, that was a fairly easy fix, wasn’t it? After all, it required very little work from us (the user-style injecting is already implemented — as a part of dealing with vimeo and its special snowflakey bullshit). Performance seems comparable to what it was before (albeit a tiny bit slower to react to changes) and all is well.
Further testing reveals that tie twitching image issue was fixed on Twitter, probably on Facebook as well. At least as far as Firefox is concerned.
Because there is one problem with my solution: Chrome doesn’t support tabs.removeCss(), which turns programmatically generated styles into a bunch of unsuspecting fellows checking into a hotel California. Sure, you can insert the style at any time you want, but it can never leave.
Not being able to yank the previous style when adding a new one is problematic the same reason you telling your kids to wear a white t-shirt where your partner told them to wear a red one a minute ago is problematic. It’s also problematic the same reason putting a new highway tolling sticker on your windshield without removing the old one.
Fortunately for us, Chrome takes the “common sense” (and standard-compliant) approach to handling the first problem: in case you have multiple conflicting definitions, it’s gonna respect the last one. This makes it a bit easier to ignore the second problem, though having tens or in the very worst case (frequent aspect ratio changes, frequent resizings of the player and browser window and watching a video in the same time for long periods of time) even hundreds of conflicting styles shouldn’t be too much of a problem.
Now, I could probably spend another day trying to deal with Chrome shittiness and try to invent new workarounds, it’s easier to just sit back, REEEEE at Chrome, use my userbase as a glorified guinea pigs and hope that people who watch youtube without closing/refreshing/opening video in a different tab won’t have their performance degraded.
And that’s a very dangerous game. I should know: the first update where autodetection was introduced had major performance issues if you watched videos in a tab continuously for long amount of time (Chrome was a complete lagfest in ~5 minutes of watching video, while performance in Firefox was somewhat better (and very dependent on your hardware) — just good enough that it escaped my testing. It’s been two years now and my ratings on Chrome Web Store still haven’t recovered.
I’ll still take the gamble though, against my better judgement.
At this point, I suppose it’s time for a PSA:
Public service announcement: Google Chrome is garbage (and so are all other Chromium-based browsers). If you aren’t already, you should really use Firefox instead.
Thanks for coming to my TEDx talk.
The Case of Twitchy Twitch
So the extension stopped doing twitch things on twitter and the new system for resizing video works. But after a quick visit to the out of season April fools joke (a.k.a. the immortal Blizzcon stream from hell) it turns out that the video still appears … twitchy, which means that twitching on twitch was a result of a different problem.
The ball is now back in the court of automatic aspect ratio detection. To be fair, that twitch issue is a problem with autodetection was known for a wihle. After all, most streams don’t have this problem, and the fact that this doesn’t happen when autodetection detects proper letterbox (or that aspect ratio correction works at all — on Twitter, it wouldn’t!) should be plenty of evidence to support that:
This issue smells like a video with a very tiny, nigh unnoticeable black border. Not wide enough for you to notice, but just enough to trigger autodetection. 30 seconds of (local only) defacing, it turns out that our intuition was correct:
Turns out that this is a very interesting edge case, but to understand how things went wrong we’re gonna need a quick crash-course in how auto-detection works. The key steps are:
Start counting rows that contain nothing else than black pixels, from top and bottom to the middle. Each row has a number: top row is row 0, last row has a number that’s one less than the number of rows. Due to performance reasons, we only pick a few columns of each row that we actually check for the presence of a (non-)black pixel.
Remember the first row that contains black pixel (on both ends)
Calculate the height of the black bars
Check whether top and bottom black bar are roughly of equal thickness. We don’t require black bars to match exactly because:
in theory, the “not black bars” portion of the video is not guaranteed to be of even height. In cases like this, the top and bottom black bar could differ by 1px
Calculate and apply aspect ratio. If top and bottom thickness are different, you’ve got two options. Either you over-cut or you end up with a thin black border on either top to bottom. I’m not exactly 100% sure which strategy I’m using: twitch issue suggests I’m using strategy A (overcut), but the previous video suggests black edges are preferable.
That’s the basic idea behind aspect ratio detection, but reality is far more complicated in order to crack down on false positives. This means that we want to avoid unnecessary checks if possible.
Once we determine the correct aspect ratio, we can take a shortcut. Since we know where black bars end and the image begins, we can just check these four rows:
So fine, we save numbers of these four rows. Top outer row is the last row in which we failed to detect a non-black pixel minus a pixel or two for safety, top inner row is the first row in which a non-black pixel was detected plus a pixel or two of safety margin.
There is, of course, a problem. If the top black border is one pixel thick and bottom black border doesn’t exist, the extension would have us check rows that don’t actually exist. Since the rows we need to check don’t exist, we can assume that black bars either don’t exist or are too thin to actually annoy anyone. When that problem happens, the extension resets the aspect ratio back to what it was originally.
Here’s a fun fact tho: this step — saving numbers of the four rows — happened only after extension already corrected aspect ratio. Normally that’s not a problem, because videos don’t have one pixel thick black bars on the top and no black bar at the bottom. They would either have substantial letterbox, or none at all.
In this particular case, though, things were a bit different. Ultrawidify would detect the one pixel letterbox and correct it. Then it would check where the black bar edges are and apply the safety margins. Safety margins would be out of bounds.
Would say Ultrawidify and undo the correction right away, and this cycle would repeat forever (or at least as long as one-pixel and zero-pixel letterbox was present).
Fortunately enough for us, the fix for that is easy enough: we try to save the edge rows, and only issue aspect ratio correction if the result isn’t bogus.
Upon review, it turns out that our first solution was actually kinda close to working. Our algorithm kinda looked like this:
Find the center of the ellipse
Find aspect ratio of the text
Assume that the aspect ratio we found is the same as the aspect ratio of our ellipse
Squash text into the square and draw a circle around it
Calculate the radius of the circle using maths
Unsquash the circle, get ellipse (or two radii that define it)
The obvious problems with this approach were:
assumption under point #3 is incorrect
this means the radius we calculated was also wrong, and often too tight
We can work around that. We only really need to add very little on top of what we already have:
Calculate radius of the circle using maths
**Increase that radius by some factor in order to give us some breathing room**
Unsquash the circle, get ellipse (or two radii that define it)
**Try to make ellipse smaller, one small step at a time**
Repeat #8 a few times, use the best result you get
Steps 8/9 can be done using a kind of binary search, and it goes roughly like this:
Calculate whether all points are in the ellipse for given radii.
If all points are inside the ellipse, decrease radii for half of a step
If any of the points are outside the ellipse, increase radii for half of a step
Decrease step by half
If step is sufficiently small, you can stop
If step isn’t sufficiently small, go back to #1
In our case, we have two ‘steps’ (each for one of the radii). Before starting our algorith, we determine two values for each of the radii:
“Radius can’t be bigger than this” number (e.g. width of a given line of text)
“Radius can’t be smaller than this” number (e.g. half of the width of a given line of text)
Step is the difference between the two (different for each radius). We use the bigger of the two numbers as our radius.
This works great, but we can improve it even further. Inside every step, we can try decreasing size of one radius while keeping the other radius the same. This can help us find smaller, better-fitting ellipses at the expense of some extra calculations (to the tune of O(n²)).
Trying it out
Now that we wrote the code, it’s time to try it out. Results ended up being slightly disappointing:
What is worse: there doesn’t seem to be neither reason nor rhyme to why things don’t work. Our brute force approach was used at least five times and has two failures (where bubble is way too small). Multiline texts — the ones that very probably don’t use the bruteforce approach at all — are often off-center and sometimes even too small (all off-center bubbles are too small, so text doesn’t fit in the ellipse):
Upon closer examination, it turns out that some ellipses that should be drawn with brute force method weren’t drawn by brute-force method. Secondly, despite determining ellipse’s radius by brute force, we still always use our matrix to determine one or both center coordinates. Our ‘high maths’ approach is giving us slightly suspect — and possibly even wrong — results.
The one thing to consider when writing code is that computers are generally fairly terrible with very tiny (and very big) numbers. The tinier your number (we’re talking anything past fifth decimal), the less “accurate” it is. The values the matrix calculation spat out contained some very tiny numbers. When those numbers were used to perform further calculations, the errors grew and we got garbage results.
Let’s forget the ‘high maths’ approach for a while and try to brute force everything.
Giving brute force another try
Step 1: determining the center of the ellipse
This one is going to be easy. We still only work with 4 points at the time. To get the ‘center point’, we find the average of the four coordinates. Then, we try to find the intersection between the lines connecting the points at the opposite sides of the quadrangle the four points form. If the point of intersection is mirrored over the ‘center point,’ it gives us the center of our ellipse.
Step 2: apply smash, remove smart
We feed the center of the ellipse we got from step 1 into the brute force algorithm we discussed earlier, run some test and …
Good news is: brute force does find better results than attempt at using high maths. No bubble is drawn off-center, all bubbles are correctly sized. The reason for the crash is a weird point selection: one of the diagonals (in at least one of the 4-point subsets of the last bubble) is vertical. In order to calculate the center of our to-be-ellipse, we need to know the slope of thes line between the opposing sides. There’s a bit of a problem though: if you’re trying to calculate the slope of a vertical line in a coordinate system, you’re going to faceplant into a division by zero.
Computers don’t like dividng by zero.
We can quickly see another thing: if we try to determine the center for our ellipse using these four points, we’ll just get less-than-optimal solution. That means we can skip it — all it would do is waste our time.
Let’s see if that fixed our probem:
And all is right. As far as performance goes, “stopwatch test” says brute-force approach isn’t really that much slower than the “proper”, “high maths” approach.
This leads us to the conclusion:
Lots of people love the “work smart, not hard” approach to things. Sure, using a little bit of brainpower to avoid using a lot of force is nice. But let’s not forget that sometimes, the opposite applies: a little bit of brute force can save us from using lots of brainpower.
Moral of the story: don’t overthink your programs, I guess? Sometimes, simple is better and ‘approximate’ is good enough.
When we left off last time, I still thought this wasn’t going to be that bad. I was wrong.
So we’re back to square one and I’m looking at the first answer about my problem that I’ve found over at math stackexchange. When I first saw it, I thought I haven’t watched enough Rick&Morty to understand it, but after playing around with it for way longer than I should have I started figuring shit out.
If you plop 4 points into the following equation:
ax² + by² + cx + dy + e = 0
And solve the system of those four equations, you should notice that the variables ‘a’, ‘b’, ‘c’ and ‘d’ are all some fraction or multiplier of ‘e’. Even without knowing the value ‘e’, we can calculate the center of the ellipsis. If we want radius, we need to know ‘e’. Assuming ‘e’ is -1 seems to do the job the way we wanted, though.
Since our convex hull often contains more than 4 points, we try to calculate the ellipse for every combination of 4 points (out of all points on the hull). Note that because we’re lazy, we don’t calculate the hull. We just take upper two corners from each row in the upper half of text and the bottom two corners from each row in the bottom half of text. If the text has odd number of rows, we include all four corners in the convex hull.
We calculate radii for each combination of 4 points that we have. We check if the new ellipse contains all the points of the hull (either inside or on the edge). We reject the result if it doesn’t. If we don’t reject the result, we compare it with the last result we didn’t reject. If the ellipse is bigger than the last result, we reject it. If not, this is our new result. Repeat until you’ve ran out of combinations.
Sooner or later, though, we run into a problem. What if ‘c’ and ‘d’ are free variables instead of ‘e’? This happens if any of the four points are horizontally or vertically symmetrical relatively to the center of the ellipse. For example, if you plug points 0,0; 4,0; 0,2; 4,2 into the equation, you’ll find that ‘a’ is some fraction of ‘c’ and ‘b’ is some fraction of ‘d’. You can still calculate center from that, but you can’t get the radii. Even worse is the case of 0,2; 2,0; 4,0; 6,2: symmetry across only one axis means you can only get one coordinate of the center because there’s infinitely many solutions.
Since we’re after only one specific solution, for us ‘infinitely many solutions’ equals ‘no solution.
In the second example from above, we could do some additional maths in order to find the center. We’re only interested in the smallest possible ellipse, so we could calculate some limits in order to get the other center coordinate. But word on the streets is that cheating is easier and could get us results that are close enough to what we want.
Turns out breaking symmetry isn’t too hard — we just need to ensure that points in the left and right halves don’t share the same ‘y’ coordinate, and points in top and bottom halves don’t share the same ‘x’ coordinate.
The first bit of this task can be easily achieved: since every row has two points with same ‘y’ coordinate (and since ‘y’ coordinate can’t repeat across multiple rows), we just need to shift one point a bit up and the other a bit down (or keep it at the same place).
Shifting ‘x’ coordinates is a bit more problematic: two different rows can absolutely contain same ‘x’ coordinate — and what is worse, if we shift the ‘x’ coordinate in any way, we may create the situation we’re trying to avoid. However, we know the following things:
point coordinates are whole numbers only (We get points by reading pixels, and there’s no such thing as ‘half a pixel’)
only points in the opposite rows are required to have different x and y coordinates
If two neighbouring rows (e.g. top two rows in text with more than 3 rows) have points that share the same ‘x’ coordinate, it doesn’t really matter because ellipse containing those points would be rejected for being too small anyway.
This allows us to easily apply required offsets:
* upper left: x -= 0.5, y -= 0.5
* upper right: x -= 0.5
* bottom-left: y += 0.5
So I thought I could do it by shifting points a little outward. Turns out that two-line and one-line rows disagree. At the end of the day, “losing” half pixel of space won’t really be noticed when bubble is more than hundred pixels across most of the time — and when it’s not, the radius is still in high double digits. So let’s change that a bit:
upper left: x -= 0.5, y -= 0.5
upper right: x -= 0.5
bottom-left: y += 0.5
Not quite what we want, but better than what we had before.
Let’s try that out a bit more:
Oopsie whoopsie. Fucky wucky.
Turns out I spoke too soon.
The Right Way™
Sometimes, we don’t have enough data to determine where the center is. Sometimes, we will only get enough data to determine one of the center coordinates. For example, if you only have four points to go off, and if said points would form a symmetrical (acute) trapezoid they were to be connected with lines — and since our text is centrally aligned, that happens almost every time we want to draw a bubble around two lines of text — you can still determine horizontal center.
Turns out this comes out handy: we can determine one of the coordinates using the kind of “cheating” approach we used before, but with some twists:
We split points in two groups: those to the left and those to the right of the horizontal center (top and bottom if we have the vertical center of the ellipse)
Find the vertical center of the text (the spot halfway through the topmost and lowermost text edge)
Find the longest diagonal (upper left to lower left or lower left to upper right)
Find where the diagonal crosses from one half to the other, and flip that point over the vertical center of the text
The last step is important because longer lines will move the center of the ellipse away from center of text, towards themselves — but the point where the diagonal intersects the horizontal center is going to be offset in the opposite direction.
Once we have both coordinates of the center, we still have some leftover data from the matrix that we can use to determine both radii.
However, this approach doesn’t cover all the cases (it fails at least one), and the ‘offset corners by half a pixel’ way of dealing with things doesn’t seem to harm it, so we’ll just keep both in for the time being.
All in all, we’re progressing somewhat nicely. There’s some work to be done when determining the anti-jag parameter of the rectangular selection (but that — as well as padding — will be user-provided arguments/options). The only thing we have to deal with now are the bubbles with one or two lines, where our current tactics for determining the bubble size fails.
Turns out that — spoiler alert — the brute force approach is the only approach that will consistently work. Maybe it should be revisited.
Every now and then, I sit down in front of my computer and start making a comic. It follows a very similar premise to DM of the Rings and Darths&Droids — that is, take a movie (in my case, How To Train Your Dragon) and pretend it’s a D&D session — except my work is much less original and not that good, probably. Oh well.
If you’re making any sort of comics, you’ll probably have to draw a ton of speech bubbles. Mind, drawing basic speech bubbles isn’t that hard: you use the oval selection tool, select an area and fill it with color. But boy does it take time. The more of them, the longer it takes — and boy do some pages feature a lot of them.
Because we like to work smarter, not harder, the question pops up: why don’t we make a script that would draw speech bublbes for us? I’m using GIMP anyway, and GIMP has plugin support. ‘‘This shouldn’t be too hard,’’ were the famous last words as I opened visual studio code on that day about two weeks ago and got to work.
Drawing rectangles is easy. Drawing ellipses is hard.
Comics often use two kinds of bubbles: there’s bubbles used for narration, which are often rectangular; and there’s bubbles used for speech, which are generally elliptical (more or less).
If we want to draw any kind of bubble (be it rectangular or elliptical), we must first figure out where to place it. This is the easy part, because:
we use GIMP and put the text for each bubble on its own, separate, transparent layer. That’s one of the important assumptions that we’ll make (especially for elliptical bubbles): we don’t reuse text layers for more than one bubble, and the layer is always transparent except for text.
images and layers are, in general, grids of squares (pixels), where each square is colored with a different color. This allows us to borrow some tricks from Ultrawidify: we’ll check every pixel of every row, and every pixel of every column for the presence of non-transparent pixels.
If we keep track of where the first non-transparent pixel was found for every row and column, we can figure out where the text starts and where it ends. If rows of text don’t overlap each other, we get the bounds of every row of text in the layer. And when we know that a row of text starts at line 32, ends at line 64, starts at column 3 and ends at column 125. That gives us 4 corner points per line of text, and that’s enough data to draw a rectangle. The vertices are at 3, 32; 125, 32; 3, 64; 125, 64. The procedure is so simple it doesn’t even deserve its own heading (although there is a few minor, seemingly simple improvements you can make to that). At least compared to what’s about to come.
Ellipses can honestly piss off
Do we even know what we’re trying to do?
Yes, actually. We want to draw an ellipse around a set of points, where:
All the corner points must be inside or on the edge of said ellipse
Points must be as close to ellipse edge as possible
In order to draw an ellise, we need to know:
width and height of the ellipse
center of the ellipse (or more accurately, upper left corner¹, but we can get that from width and height of the ellipse)
¹In computer graphics, the coordinate system is flipped over horizontal axis compared to the coordinate system used most everywhere else.
because those are the parameters GIMP’s ellipse selection tool takes. We already know the points that represent the edges of text. What we need to do is to take those points (bunch of pairs of x,y coordinates) and somehow convert that to parameters of the ellipse that will satisfy the two rules outlined above.
Computers are very good at two things: logic and maths. This means we need to tell it how to calculate the center, width and height from those points. Because I don’t recall that lesson from my math classes, I went to every programmer’s best friend (stackoverflow) and discovered that — spoiler alert — this is either a) more or less impossible or b) requires diploma or master’s degree in maths.
I studied computer science. We had maths, but not that kind of maths.
Time to cheat
First of all, we subtly change our second requirement. We don’t require that the corner points are as close to the edge of the ellipse anymore, instead we focus on trying to draw the smallest ellipse possible. It’s less than ideal, but we’ll have to live with that.
After we’re done revising our requirements, we head back to google and look up the equations for the ellipse.
x = a * cos(t)
y = b * sin(t)
Where x, y are coordinates of a point on the ellipse; a, b are the length of the primary and secondary radii. We can do something with this. We can calculate where the center of ellipse will be. We just calculate where the middle point of every opposing pair of corner points is (e.g. the opposite point of upper left point in the top line of text is the lower right point in the bottom row of text). Since each pair of corner points may give a slightly different center, we take the average of those centers. Now that we know where the center is, we can use some high school maths to calculate the angle between straight line from center to a given point and the horizontal axis, and we do that for every point. Biggest ellipse will contain all the points, and we will have our answer.
Except t is not the angle to our point.tis the angle between horizontal axis and the point where our point would be, if the ellipse was squished (or stretched) into a circle. We can squish (or stretch) ellipse into a circle by multiplying (or dividing) one of the coordinates with the aspect ratio of the ellipse.
Do we know the aspect ratio of the ellipse? No.
We know the topmost point of the top line of text, we know bottommost point of the bottom line of text, we know leftmost and rightmost point of the longest line of text. We can get width and height from that. We can use that to calculate an aspect ratio.
Is that aspect ratio the same as the aspect ratio of the ellipse we want? Tests concluded that no. This is not the aspect ratio we’re looking for. Our final result looks something like this:
Can we fix the aspect ratio with some brute force? I could write another paragraph on how this was attempted, but long story short: turns out we can’t. We’d need more data to draw the ellipse we want.