$Id: a07cf90837a3c4373b82d6724b97593810766af7 $
I think I should blog more about random troubleshooting sessions, if nothing else it will remind me what steps I took when it inevitably happens again!
Okay, here is the first one – why is my xterm
opening so slowly?
I have two similarly specced machines at my desk – my primary workstation running Fedora Linux, and a Windows 11 machine. They share the same monitor and input devices, and I switch between them with an iogear KVM.
I do the bulk of my work in either a browser or a terminal. This is true even on Windows, where I rely heavily on WSL.
This works well for me, and I’m happy enough with the setup.
I have the shortcut Super+1 bound to xterm
on both machines, and I probably use this hundreds of times per day.
Here is how that looks on Fedora:
It takes about 300ms from key activation to a terminal being ready. This is fine, I’ve never noticed any problem.
However, let’s compare that to Windows:
That’s about 1600ms before I can type, over 5 times slower! This is slow enough that it bothers me, and I use this shortcut so often that I want to solve it.
I don’t think many people care about xterm
performance on Windows, so I guess that means it’s up to me to solve this 🙂
Hey, wait a minute… ENHANCE! 👀
Why does the window fade in like that? It looks like the Window is ready when the effect starts, but I can’t interact with it until it completes. If I count those frames, this animation must be costing me ~200ms! 🤬
I always disable anything like animation or compositing effects, so I’m confused where this is coming from.
I tested with some native windows programs like notepad
and calc
– they just appear instantly… so what is causing that?
I experimented with it a bit, I can see other windows behind it as it fades in, so I think something must be changing the opacity. Searching msdn, it looks like the function for that is SetLayeredWindowAttributes()
.
Could something be calling that, is my X server betraying me?
$ dumpbin /imports X410.exe | grep SetLayeredWindowAttributes
33A SetLayeredWindowAttributes
This looks like the culprit!! I’m using a server called X410, it seems like it’s adding it’s own animation effects, and doesn’t have any way to disable it. I’m reluctant to switch to an alternative – that could just replace this issue with a different issue to troubleshoot.
Is it possible I can just stop it from doing that with a debugger?
$ cdb -p 6624
(19e0.2ad0): Break instruction exception - code 80000003 (first chance)
ntdll!DbgBreakPoint:
00007ff9`1f9b3c90 cc int 3
0:014> eb win32u!NtUserSetLayeredWindowAttributes c3
0:014> .detach
Detached
NoTarget> q
quit:
Ah-ha, that actually worked!!!
I’ve added that cdb
command into my xinit initialization, and it looks a lot snappier. That saved nearly 300ms, so we’re down to just 4 times slower than Fedora!
Okay, let’s try get some real numbers. I like the tool hyperfine for this, here are the initial results:
If we run it under optimal conditions, it takes about 900ms on Windows, and about 100ms on Fedora.
Now that I can reproduce the delay reliably, I can start exploring some theories…
My first thought is that filesystem performance under WSL can be very slow, could that explain the difference?
taviso@WORKSTATION:~$ strace -wc -efile xterm -e true
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
43.93 0.014870 56 261 56 openat
23.79 0.008053 31 257 33 access
20.48 0.006932 32 211 12 newfstatat
7.07 0.002394 23 100 54 readlink
4.72 0.001596 1596 1 execve
------ ----------- ----------- --------- --------- ----------------
100.00 0.033845 40 830 155 total
Nope, it’s actually a bit faster on Windows! If I browse the logs, it looks related to fonts, and I do have fewer fonts installed on Windows. I suspect that causes fontconfig
to query less files on initialization.
Whatever the reason, I concluded it wasn’t a big proportion of startup time, so it doesn’t seem worth worrying about.
The issue must be the X server, how fast is a very simple X client?
taviso@fedora:~$ hyperfine xdpyinfo
Benchmark 1: xdpyinfo
Time (mean ± σ): 4.6 ms ± 0.8 ms [User: 1.9 ms, System: 1.6 ms]
Range (min … max): 3.1 ms … 9.4 ms 317 runs
That does run slower on Windows, but not significantly slower – perhaps I actually need to create a window to see a difference?
taviso@fedora:~$ x11perf -repeat 1 -subs 8 -popup
5600000 reps @ 0.0010 msec (1040000.0/sec): Hide/expose window via popup (8 kids)
That is also slower on Windows – this is understandable, it has to translate from X11 to win32 – but not so slow that it adequately explains the problem alone.
Could it be a FreeType or FontConfig issue?
taviso@WORKSTATION:~$ ftbench -p consola.ttf
executing tests:
Load 2.491 us/op 809010 done
Load_Advances (Normal) 2.437 us/op 827190 done
Load_Advances (Fast) 0.022 us/op 88575990 done
Load_Advances (Unscaled) 0.013 us/op 147448890 done
Render 2.039 us/op 390870 done
Get_Glyph 0.921 us/op 509040 done
Get_Char_Index 0.018 us/op 111551968 done
Iterate CMap 21.539 us/op 88503 done
New_Face 6.271 us/op 284029 done
Embolden 2.491 us/op 357540 done
Stroke 24.663 us/op 69690 done
Get_BBox 0.865 us/op 487830 done
Get_CBox 0.679 us/op 506010 done
Loading fonts is slightly slower, but the other numbers seem fine, and it’s not that much slower. I don’t think it’s this.
Maybe I can get some clues from ltrace?
$ ltrace -c xterm -e true
% time seconds usecs/call calls function
------ ----------- ----------- --------- --------------------
26.74 1.977389 988694 2 XtRealizeWidget
22.18 1.640132 14139 116 XtSetValues
13.39 0.989966 197993 5 XtVaCreateManagedWidget
6.42 0.474410 200 2361 strlen
5.51 0.407617 203808 2 read
4.21 0.311086 197 1572 FcCharSetHasChar
2.92 0.215975 2571 84 XtCreateManagedWidget
2.80 0.207397 69132 3 XtVaCreatePopupShell
2.39 0.176458 218 808 XftTextExtents32
1.51 0.111306 111306 1 XpmReadFileToPixmap
1.48 0.109486 109486 1 XtOpenApplication
1.44 0.106786 203 524 malloc
This XtRealizeWidget
call does seem slow, and I don’t see that on Fedora, what is calling that?
$ gdb --args ./xterm -e true
Reading symbols from ./xterm...
(gdb) b XtRealizeWidget
Breakpoint 1 at 0x2db20
(gdb) r
Breakpoint 1, 0x00007fffff43d940 in XtRealizeWidget () from /lib/x86_64-linux-gnu/libXt.so.6
(gdb) bt
#0 0x00007fffff43d940 in XtRealizeWidget () from /lib/x86_64-linux-gnu/libXt.so.6
...
#4 0x00007fffff44c4f6 in XtSetValues () from /lib/x86_64-linux-gnu/libXt.so.6
#5 0x0000000008086683 in UpdateMenuItem (menu=0x811c5c0 <mainMenuEntries>, which=0, val=1)
at ./menu.c:1026
#6 0x000000000808a763 in update_toolbar () at ./menu.c:3366
Ah-ha – it’s the toolbar feature. It’s disabled at compile time on Fedora, but I quite like it and enable it on Windows.
If I disable that, startup is a little faster, I wonder if there are any other features that are slowing down initialization…?
The hyperfine
utility has a feature called parameter scan, where it it will try a bunch of settings for you and tell you which one is fastest.
Let’s ask XTerm what features are available, and toggle each one on and off.
$ xterm -report-xres -e true
activeIcon : default
allowBoldFonts : true
allowC1Printable : false
allowColorOps : true
allowFontOps : false
allowMouseOps : true
...
I’ll start by extracting all the settings that are booleans.
$ xterm -report-xres -e true | grep -Po '^\S+(?=\s+: (true|false))' | tr '\n' ','
allowBoldFonts,allowColorOps,allowMouseOps...
Note:
xterm
has a lot of features, I’m truncating the list for brevity!
Now we can give each of those to hyperfine
, and let it figure out which settings have the most noticable effect.
That took about 20 minutes to run, and reports:
$ hyperfine --parameter-list res allowBoldFonts,allow... \
--parameter-list bool true,false \
"xterm -xrm 'XTerm*{res}: {bool}' -e true"
...
Benchmark 240: xterm -xrm 'XTerm*xftTrackMemUsage: false' -e true
Time (mean ± σ): 140.1 ms ± 7.4 ms [User: 30.8 ms, System: 23.4 ms]
Range (min … max): 129.2 ms … 153.4 ms 21 runs
Summary
xterm -xrm 'XTerm*tekInhibit: true' -e true ran
1.01 ± 0.08 times faster than xterm -xrm 'XTerm*allowSendEvents: false' -e true
...
This helped a little, I found a combination of options that saved around 200ms total. One example was tekInhibit
, which disables the Tektronix emulation. That’s usually used as a graphing mode – it’s actually pretty cool.
Still, it isn’t a big enough difference, and this is as far as I was able to get through tweaking settings.
I’m starting to think that this is just death by a thousand cuts, everything just has some small overhead on Windows and it adds up…
There’s a simple generic solution to slow startup performance: server mode.
The idea is to cache a few processes in the background, then all the slow stuff will already be done, ready for you to start working immediately.
XTerm doesn’t have this feature natively, but it’s not complicated, I can add it.
To do this, I will use deferred mapping – that just means that a program is running, but the window is not visible yet.
I tried a few solutions and found one that works well, an LD_PRELOAD
library. All it does is intercept any toplevel XMapWindow()
calls, then pause execution until it receives a signal.
It’s a bit hacky, but my code is here, if you’re interested.
To use it, you need something to manage the cache for you in the background, xargs
will work!
$ xargs --null --arg-file=/dev/zero --max-procs=3 --replace -- \
env LD_PRELOAD=defermap.so xterm -display :0 [PARAMS...]
This will keep three xterms running in the background.
Note: If you often rapidly start terminals in quick succession, increase
max-procs
.
When you want a new terminal, instead of running xterm
as you normally would, do this instead:
$ pkill --oldest --signal SIGUSR1 xtermserver
A terminal should appear near-instantly. You can now execute that instead of xterm
, and startup performance should be solved.
This whole process took a while! Now I need to adjust my shortcuts to run pkill
instead of xterm
, and I can compare the results.
Counting the frames in that video, it’s down to about 366ms, just 60ms slower than Fedora, this is totally acceptable!
I’ve been using this configuration for a few days, so far it’s working great. I haven’t noticed any issues running it this way.
I highly doubt anyone else will find this useful, who else is using XTerm on Windows? 😆
Nevertheless, if you have a better solution, or can think of something else I can try, let me know!