This is a guest article written by Simon Bond, Senior IT Consultant with Ultima Business Solutions. In this article he details his efforts in overcoming a slow App-V 5 publishing and streaming problem he encountered in a large desktop application upgrade project.
The article details the customer requirement, Simon’s meticulous investigation and the code he wrote to improve publishing and streaming performance. He has refined the original code over time with subsequent customer engagements.
As Simon says “This is a very detailed and lengthy blog (more like a web report than a web log) but I promise you the time taken to read it is a 1,000th of the time it took me to come up with this solution and write about it! I find it reads better with a strength 5 or 6 Taylors ground coffee.”
Customer requirement and issues
In early 2013 I was implementing an App-V 5 and AppSense Environment Manager solution for a healthcare customer in London. I had used AppSense Environment Manager and App-V 4.x for many years, but had only implemented small to medium (<1000 users) App-V 5 solutions by that time as it had only been released in late 2012 and, like everyone else, was still trying to learn this completely new beast.
This customer had approximately 10,000 users who regularly roamed between sites and expected to be able to logon to any Windows 7 desktop computer and use their applications without delay. There were over 400 applications which were regularly updated (and quite a few exceeded 1GB in size) although any particular user might only use 20 applications depending on their role within healthcare. Many users were doctors or medical consultants who expected everything to be immediate.
This type of environment is a challenge for application delivery – a large number of demanding users and applications which keep moving! But this is the kind of environment where App-V can really shine and I’d successfully implemented App-V 4.x in similar sized environments, so I forged ahead with App-V 5 not quite prepared / totally unprepared for the challenges ahead!
App-V 5 Full Infrastructure
I chose App-V 5 Full Infrastructure with user publishing for a number of reasons including:
- I need the applications to follow the users and for the user to only see applications assigned to them – I can’t have each user seeing 400 applications, most for different medical fields, so global publishing wasn’t an option.
- In my experience, App-V 5 Full Infrastructure can react faster than SCCM when assigning applications to users on the move. I’m waiting for the flames…… In this environment it is very common for a user to log into a computer they’ve never used before and where the applications won’t be in the cache because it has never been used on this computer before. The users (often doctors, nurses and consultants) are not prepared to wait for applications to arrive.
- The customer already had experience with App-V 4.6 Full Infrastructure so were familiar with all the concepts.
- Many desktops didn’t have the disk space to have all applications published to them all at once, again making global publishing a problematic option, whilst with user publishing the disk will only fill up slowly over time and I could write scripts to remove applications which hadn’t been used for a while.
Given the choice, a VDI solution with a Shared Content Store (SCS) would be my preferred solution, but VDI wasn’t an option for this customer who wanted to stick with their existing desktop environment. Unbeknown to me at the time, if I had gone for a SCS solution I wouldn’t have encountered this problem at all.
Slow publishing and streaming problem overview
After the App-V 5 Full Infrastructure environment had been built and user testing commenced, two slow performance problems were encountered:
- If a user had a lot of applications assigned to them, App-V 5 publishing could take over two minutes if they logged onto a computer they hadn’t used before / didn’t have a recent cached profile for that user.
- Large applications took a long time to “stream” to the computer – about 40 times slower than my calculations suggested it should take. This seems to affect all of the App-V 5 applications, although it obviously affects the larger ones more.
Once an App-V 5 app has locally cached 100%, it runs fine and with a decent performance.
The customer’s existing App-V 4.6 environment did not exhibit these problems with similar sized applications. I initially thought this would be a simple networking problem to overcome – how wrong I was!
I have an awful habit of mixing the words “streaming”, “mounting” and “downloading” when it comes to App-V 5 packages, so I thought I’d better make myself clearer. In these cases I’m referring to the transfer of an App-V 5 application from a network location (typically a SMB share or HTTP content server) to the App-V 5 client.
The environments I’m referring to are using a local App-V 5 disk cache as I’m not using the Shared Content Store (SCS) on desktop computers. So regardless of the word I use, the App-V 5 package (“.appv” file) is technically being downloaded to the desktop client and expanded (decompressed) into a local App-V 5 file cache (typically in “C:\ProgramData\App-V”).
When a user launches an App-V 5 application, this procedure takes place behind the scenes and the App-V 5 application will generally launch before the background streaming has completed (depending on Feature Block 1), although if using a hard disk, the disk IO is generally so high that the application won’t perform well until streaming has completed.
I use the phrase “mounting” a lot because the PowerShell command to 100% cache an App-V 5 application is “Mount-AppVClientPackage” although it performs the same streaming / download procedure that would take place if a user simply launched an application.
App-V 5 Full Infrastructure server performance
The performance of App-V 5 Full Infrastructure servers were almost immediately eliminated as a cause of the problem as they were high performance servers and all components of App-V 5 had been separated onto separate servers:
- 2 load balanced App-V 5 management servers.
- Separate SQL cluster to hold the databases.
- 2 load balanced App-V 5 publishing servers per site (6 in total).
- 2 load balanced IIS content servers per site (6 in total).
- Separate App-V 5 reporting / SSRS server.
All servers were 2 core with the exception of the publishing servers which were 4 core. No servers had exceeded 10% CPU during the tests. All servers had 6GB RAM and were hardly using it. Typically I assign more RAM to the IIS content servers these days and make use of IIS caching, but I didn’t know about that at the time.
It didn’t matter which publishing or content server the clients used, performance was always the same. All clients at all sites exhibited the problem.
I also had the same problem if I simply manually published and mounted an App-V 5 application from a local disk, bypassing the servers entirely (and yes I checked that the “PackageSourceRoot” registry value was blank at that time).
So the servers didn’t appear to be the cause of the problem, although I’m keeping my beady eye on them.
App-V 5 streaming performance is approximately the same at all times of the day, equally bad on clients connected to different edge switches and on a client desktop connected to the core switches, so if there is a network switch problem then it is at the core.
Network utilisation and CPU usage of the core switches is confirmed to be below 25% so it isn’t looking like a network switching problem, but we’ll keep monitoring the stats.
A virtual client desktop on the same Hyper-V server hosting a publishing and content server also experienced the same problems, so that pretty much eliminates any physical switches. Yes, I had reconfigured the client so that it was definitely getting its content from the content server on the same Hyper-V host.
I also had some publishing and content servers hosted on VMware ESXi so I could eliminate Hyper-V from my investigation (as the problem was equally bad on VMware).
I confirmed that I could use Internet Explorer to HTTP connect to the publishing server TCP port and get an XML list of packages available to the user and that this connection was fast. All publishing servers responded quickly. So there doesn’t appear to be a problem with communications to the publishing servers and this additionally confirms that the performance of the publishing servers appears to be adequate.
I confirmed that I could HTTP download the “.appv” file content from the IIS content servers to the clients very quickly, at approximately 400Mb/s which isn’t bad considering the physical size of the site and the number of network switches involved. So there doesn’t appear to be a network performance problem with the content servers or the core switches.
Streaming via both HTTP and SMB2.1 is approximately the same (tested by changing the value of PackageSourceRoot between http://xxxx and \\server\AppVContent). Wireshark used to confirm we really are using the protocols we think we are using. So the problem doesn’t lie with IIS or my choice of HTTP streaming (which had already been questioned by a few on-lookers).
The table below gives an indication of how bad the App-V 5 streaming is compared with simple file copy and a similarly sized App-V 4.6 app. A 149MB application was used in this example, but similar results were obtained for other sized packages. Each test was run 5 times at different locations and the variations between each timing were low (so I had confidence in the figures – although see “Hot and cold timings” later on):
|Method||Time taken in seconds|
|Manual SMB copy using Windows Explorer||3|
|Manual HTTP copy using Internet Explorer||3|
|App-V 5 100% cache using Mount-AppVClientPackage||150|
|App-V 4.6 100% cache||10|
Blimey! App-V 5 is taking 15 times longer than App-V 4.6 for this package.
At this point I’m having a bad day. I’ve blown a whole (long) day investigating the problem and have eliminated all of the simple explanations and still don’t know what is going wrong. The phone is starting to ring a lot more as well 🙁
Client and application analysis[vc_widget_sidebar sidebar_id=”ups-sidebar-blog-offer-app-v”]
The customer has a number of different types of client hardware and a few different “builds” and all exhibit the problem.
I remove AntiVirus software and still get the problem. Damn, I was hoping….. (actually I wasn’t – I should have checked if it was an AV problem right at the start and would have looked like an idiot if it was. Whew! but still Damn because I still have the problem!)
I try the original App-V 5.0 client instead of the SP1 client I’ve been using up till now. Nope. Still got the problem. I would have been SO happy at this point if it had been an App-V 5.0 sp1 bug.
I build a clean Windows 7 sp1 PC and put in an AD OU with inheritance blocked so I don’t get any GPOs. I still get the problem.
I try both Windows 7 x86 and x64 and still get the problem. I was hopeful on this one because the customer was using x86 almost exclusively due to medical hardware driver requirements / limitations, whilst I generally only use x64.
I build a Windows 8 x64 client and manage to shave about 2 seconds off the time when using SMB3 as opposed to SMB 2.1 or HTTP. Yay! 150 seconds drops to 148! For some reason nobody is impressed….. 🙁
I can’t see how it can be a client network card / network stack problem because a manual “.appv” file copy is so quick, but I try anyway because if I don’t do it then somebody will ask if I’ve tried:
I turned off TCP auto-tuning on the client with:
netsh interface tcp set global autotuninglevel=disabled
but this doesn’t fix the problem.
I try turning off TCP chimney offloading with:
netsh int tcp set global chimney=disabled
and nothing has improved.
NOTE: I didn’t expect it to anyway because firstly the NIC doesn’t support it (checked on the Vendor website) and Netstat -t output shows “InHost” for offload state for all connections so that confirms that no offloading is being performed. Yes I tried rebooting between each test….
Hot and cold timings
I discovered that I would get different (faster) timings if I mounted (100% cached) an App-V 5 application, then removed that package, confirmed it was indeed deleted from the cache and then published and mounted the package again. These faster timings would remain until I rebooted.
I had initially thought it was the Antivirus software, but I removed it (I learnt long ago to remove the AV software rather than simply disabling it) and the pattern didn’t change.
It wasn’t due to the computer simply being busy after a reboot because I would boot the computer, logon and wait 5 minutes before performing any tests, always waiting for the hard disk to settle before starting any timings. This is a good practice for consistency, ensuring that all my timings were from similar situations.
My best guess at this point is that some of the package must be cached in RAM. Something very similar takes place when using the App-V 5 Shared Content Store (SCS) which is typically used in VDI environments, so I’ve at the very least convinced myself that this is a possibility!
I have called these timings “cold” (from a restart or power-on) and “hot” (a package mount performed after a previous package mount without rebooting). “hot” timings were typically 50% faster than “cold” timings although it could vary. Any timing tests I perform now have C or H marked next to them, so I know the situation in which the timing was performed.
All values provided in this blog are “cold” timings since I feel this is most likely scenario: An App-V 5 package being streamed into the local App-V 5 disk cache is unlikely to have previously been in the App-V 5 cache in the past (and now not be there), because App-V 5 doesn’t have an automatic facility to clear the cache out.
So my App-V 5 package mount testing now includes the following test:
Is the package I want to test already cached?
If yes, un-publish and remove the package and reboot. Logon and wait 5 minutes before performing any testing.
If no, proceed with testing without a reboot.
Hard drive still busy after mounting
I also discovered that even when the App-V 5 package mounting is reported as complete, the hard drive would still be very busy for some time afterwards (sometimes up to a minute, sometimes only for 10 seconds). During this time, the computer would be very sluggish.
I think that this happens much more frequently and for longer since App-V 5 SP2 HF4 but I’m not certain. If I get time I need to go back and test this theory.
I now include the hard disk “busy” time in my timing calculations since the App-V 5 package is clearly still mounting, even though the UI or PowerShell command reports completion. I actually found my results to be even more consistent once I’d started including this time.
Wireshark analysis of App-V 5 streaming
I’ve run out of simple tests to perform so brew myself a strong coffee and download the latest version of Wireshark.
Tools & resources
I run a few Wireshark traces while using the PowerShell command “mount-appvclientpackage” to fully cache various App-V 5 applications using HTTP content (since SMB and HTTP are both similarly affected and I find it a LOT easier to debug HTTP TCP traffic than SMB).
I don’t have to look far fortunately: From the IO graphs you can clearly see that the slow performance is due to the transfer stopping and starting a lot. You only notice this when you zoom into the IO graph a fair bit:
Each time the HTTP content server stops sending traffic, it doesn’t start again until the client sends a “TCP Window update”. Each “stop” is of a different length, but just taking a few from the middle I get 0.06s, 0.11s, 0.13s wasted etc.
I can see that it’s the client stopping the transfer by reducing its advertised TCP Window size. I’ll provide an example:
Server sends 11 x 1514 bytes. Client responds with an ACK and a Window size of 37888 bytes (256×148)
Server sends 10 x 1514 bytes. Client responds with an ACK and a Window size of 23296 bytes (256×91)
Server sends 15 x 1514 bytes. Client responds with an ACK and a Window size of 1280 bytes (5 x 256)
Server stops sending completely (because the client advertised TCP window size is too low).
Client sends a “TCP Window Update” re-advertising a TCP window size of 65536 (256×256).
Server starts transmitting again.
I Wireshark the same transfer but using Internet Explorer to download the “.appv” file and get a similar trace (after-all it is the same transfer) but the TCP Window size remains constant at 65536 throughout the transfer (more or less) and the whole transfer takes 3 seconds (matching earlier tests) and the Wireshark IO graph doesn’t show a load of “gaps” in the transfer.
So the App-V 5 client is basically using standard TCP Window flow control to limit the transfer speed of the “.appv” file from the IIS content servers. This isn’t a server or network problem at all (as earlier investigations had already confirmed), but appears to be an App-V 5 client “thing”.
So the summary so far is: Day one was bad. Day two was really really bad. I still have no solution and I’m getting calls from the customer and various Project Managers and getting fed up explaining the problem. Yes I’ve tried rebooting. No it’s nothing to do with the firewall (it’s off). Yes I’ve tried Windows updates. No, putting more RAM in the client won’t help. Please leave me alone to work on the problem…. There seems to be something odd going on with the App-V 5 client and I’m going to need Chuck Norris to help me out with this one. He’s not answering my calls unfortunately.
The customer takes my original report and posts parts of it in a forum. I just had a look and found the article:
I didn’t get involved with this thread or answer any replies because I was still in a dark dank server room sweating over the problem.
I realise that mounting an App-V 5 package isn’t just about the network IO but also about decompressing the “.appv” file into the client cache (in “C:\ProgramData\App-V”) so I spend some time finding out how quickly the client is capable of extracting the “.appv” file (which is basically a ZIP file – more on this later).
I use the same 149MB file as before and put it on a separate SATA2 internal drive and test extracting it to “C:\ProgramData”. This ensures I’m reading from a separate drive and hence only writing to C:
|Decompression test:||Time taken in seconds|
|Windows explorer extract to C:||75|
|7-ZIP extract to C:||9|
Hmm. Interesting. The network file transfer of the 149MB file only took 3 seconds (see previous table), but it takes 3 times longer at best to extract it (via 7-ZIP) and 25 times longer (via Explorer). Admittedly 3 seconds (network) + 75 seconds (decompress) still doesn’t amount to 150 seconds (“mount-appvclientpackage” time I reported earlier) but I might be onto something here: If the App-V 5 client is “downloading” and “extracting” in a single thread and using the same slow ZIP engine that Explorer uses, the “download” would appear to be slow because the client is holding up the “download” (using TCP flow control) because of having to wait for the extraction.
I spent some time investigating if there is an App-V 5 sequencer option to NOT compress the “.appv” file (since I have spare network capacity and clearly need to reduce decompression time as much as possible), but there doesn’t appear to be one. Feature request!
Since I’ve got a “result”, I’m pressurised to demonstrate this “finding” to a few interested parties (the ones who are paying for my time and starting to get a bit “edgy”) and do so with a completely different “.appv” file as I’m asked to demonstrate using an important customer application and one which the customer already uses in App-V 4.6:
|Application name:||4D client|
|App-V 5 “.appv” file size:||278MB|
|App-V 4.6 “.sft” file size:||260MB|
|App-V 4.6 client full “download” time via HTTP streaming with AV||9 seconds|
|App-V 5.0 client full “download” time via HTTP streaming with AV||380 seconds (gulp)|
|App-V 5.0 client full “download” time via HTTP streaming without AV||380 seconds|
|App-V 5 .appv file SMB 2.1 copy time from content server to client using Windows Explorer||6 seconds (370Mb/s)|
|App-V 5 .appv file HTTP download time from content server to client using Internet Explorer||10 seconds|
|Time for Windows Explorer to unzip the “.appv” file with AV||105 seconds|
|Time for Windows Explorer to unzip the “.appv” file without AV||60 seconds|
|Time for 7ZIP to unzip the “.appv” file with AV||47 seconds|
|Time for 7ZIP to unzip the “.appv” file without AV||34 seconds|
Whilst these results are consistent with previous findings, there are a few interesting items to note:
- 9 seconds (network) + 105 seconds (decompress) is still far short of 380 seconds, so there is definitely something else going on in addition to a simple decompress.
- The App-V 5.0 client full “download” time (using the PowerShell “mount-appvclientpackage”) doesn’t seem to be affected by AV whilst the Windows Explorer extract of a “.appv” file definitely is. This could be because AV is scanning the “.appv” file as I’m reading it (to decompress it) as well as writing to C: or because I’m not on the right track at all.
At least these figures show me what the client is capable of: An HTTP network copy and decompress using a decent ZIP engine (7-ZIP) without AV is capable of being performed in 43 seconds (10+34) whilst the App-V 5 client is taking 380 seconds (over 8 times longer).
These all pale in comparison to the App-V 4.6 client which takes 9 seconds, although admittedly the App-V 4.6 client isn’t decompressing the “.sft” file.
Disk IO performance analysis
I think I’m on the right track but I feel there is more to be learnt about the client disk IO and decompression performed by the App-V 5 client so I experiment with different types of drive. The customer’s PCs use 5400 rpm 2.5” drives and I’m starting to think / worry that these are contributing to the problem, even though the figures above show they are capable of much more.
I’m no longer on-site at this point, so I experiment using a Hyper-V Virtual Machine in my test lab which I can easily migrate from a 5400rpm 2.5” HDD to a 7200rpm 2.5″ HDD to a Sandforce 2281 controller based SSD whilst keeping the CPU, RAM and other components the same.
I don’t have access to the same “.appv” files as before, so I used a Skype 6.7 App-V 5 package which I’d created the day before for another customer. The “.appv” file was 41MB.
Results with no AV are below:
|Test performed:||Disk type:||Time in seconds:|
|App-V 5.0 client full “download” time via HTTP streaming||5400rpm HDD||75|
|App-V 5.0 client full “download” time via HTTP streaming||7200rpm HDD||60|
|App-V 5.0 client full “download” time via HTTP streaming||SSD||6|
|App-V 5 “.appv” file HTTP download time using IE||7200rpm HDD||1|
|Windows Explorer unzip the “.appv” file||7200rpm HDD||10|
|7ZIP unzip the “.appv” file||7200rpm HDD||3|
|App-V 5 “.appv” file HTTP download time using IE||SSD||1|
|Windows Explorer unzip the “.appv” file||SSD||4|
|7ZIP unzip the “.appv” file||SSD||1.5|
OK. This is a result. When using a SSD, the problem seems to completely go away. This explains why I’ve never seen these App-V 5 performance problems in my lab before – I pretty much run everything off SSD these days.
Looking at the 7200rpm HDD result:
The App-V 5 client takes 60 seconds when technically it could be done in 4 seconds (1 + 3) or 11 seconds (1 + 10) if you use the slow Windows Explorer ZIP engine. So on a 7200rpm HDD the App-V 5 client is taking at least 6 times too long.
Looking at the SSD result:
The App-V 5 client takes 6 seconds when technically it could be done in 2.5 seconds (1 + 1.5) or 5 seconds (1 + 4) if you use the slow Windows Explorer ZIP engine. So on a SSD the App-V 5 client is performing much more closely to the expected result.
I have now half removed the noose from around my neck: I tell the customer about the SSD result and they buy some SSD drives for their test lab and confirm my result. So the people blaming the servers, the network, the wallpaper, have at least backed off (and have been replaced by people saying “I told you those Small-Form-Factor PCs were a bad idea, but didn’t say anything at the time” or, my favourite one: “I could have told you the 5400rpm drives were the cause, if only you’d asked”).
To get a clearer picture of disk IO, I thought I’d use Microsoft’s XPerf.
Tools & resources
For those of you who are unaware of this tool, it’s included in the Windows Performance Toolkit (WPT) which these days is part of the Windows Assessment and Deployment Kit (Windows ADK) or the Windows Software Development Kit (SDK) for Windows 8.1
An excellent article on XPerf can be found here:
At the time, I downloaded it from the Windows Software Development Kit (SDK) for Windows 8.0 and installed as follows:
- Run “sdksetup.exe”
- Choose “Download Windows SDK for installation on a separate computer”
- Choose “Windows Performance Toolkit”
Note: WPTx64-x86_en-us.msi is downloaded into “StandaloneSDK\Installers\”
- Run “WPTx64-x86_en-us.msi” to install Windows Software Development Kit for Windows 8 which installs “xperf.exe” amongst other things.
I captured an XPerf trace of an App-V 5 package mount on the 7200rpm hard disk as follows:
- I started xperf tracing by running the command: xperf -on diageasy
- I ran the following PowerShell command to fully mount the Skype package: mount-appvclientpackage Skype*
- I stopped XPerf tracing by running the command: xperf -d %TEMP%\MountAppVClientPackage_Skype.etl which creates an “.etl” file in the filename I specified on the command-line.
I viewed the trace by running the command below to analyse the “.etl” file created earlier:
From the screenshot above you can see that while the Mount-AppvClientPackage command is running, the CPU usage is very low but the hard disk is being hammered for about 58 seconds (which is the entire time the App-V 5 package was being mounted). The middle graph shows that the operations are mainly IO writes (as expected) and disk flushes (not expected).
I wasn’t expecting the disk flushing throughout the entire procedure and suspect this is the cause of the problem since this would seriously hamper disk write performance as the system would have to keep waiting for the data to be committed to disk before moving onto the next block of data (as I’ve pretty much concluded by now that it’s all taking place in a single thread).
To confirm that the disk flushing is unusual, I also captured an XPerf trace of a Windows Explorer unzip of the same “.appv” file. I didn’t bother with a separate hard disk (to hold the ZIP file) and simply unzipped from C: to C:. The results are in the screenshot below:
For the Explorer unzip of the .appv file from C: to C: you can see that the extraction takes about 10 seconds (from 22s to 32s), pushes the CPU harder (presumably because disk IO is less of a bottleneck) and there are more reads (reading the .appv file) than writes (AV scanning the file on read?) and almost no disk flushes at all.
I’ll stress that I do understand that streaming is different to simply unzipping a file, but the simple unzip at least gives an indication of what the client is capable of in terms of CPU and IO:
App-V 5 client takes 60 seconds (and flushes the disk throughout). Explorer takes 10 seconds. 7ZIP takes 3 seconds.
I’m pretty certain by now that the disk flushing is the problem but I don’t know why the App-V 5 client is flushing the disk.
I count 859 flushes during the 60 seconds taken to 100% cache the App-V 5 package.
That’s more or less 1 flush for every 64KB block (the expanded App-V 5 package is just over 50MB and if there was a disk flush ever 64KB block then you’d need just over 800 flushes).
Windows write-cache buffer
I thought I’d experiment with modifying the disk flushing settings in device manager. I ticked “Turn off Windows write-cache buffer flushing on the device”:
and result: The App-V 5 client now takes about 8 seconds to 100% cache the Skype sequence on the 7200rpm hard disk (instead of 60 seconds)!!
The xperf results can be seen below:
Whilst a fantastic result, this still isn’t a solution because it would be madness to apply this setting to 10,000 client computers which don’t have a UPS, but it at least shows that it’s the disk flushing causing the problem.
Reporting the issue to Microsoft
I report my findings to Microsoft and eventually am put in contact with some members of the App-V 5 development team who confirm my results. I was hoping that this would result in a code fix for the problem but it is explained to me that it’s a bit more complicated than that.
My understanding of what I’m told by Microsoft is as follows (this is not a quote):
The App-V 5 package format is in an Open Packaging Conventions (OPC) format which is also used for Windows Store apps (APPX file format) and Microsoft Office documents (e.g. “.docx”).
”Article”There is an excellent article about this in the GLADIATOR@MSFT blog: App-V 5: On App-V Package Modernization with the OPC (Open Package Container) or: One Package Container to rule them all!
It was explained that App-V 5 uses some of the Windows Store APPX delivery APIs to deliver App-V 5 packages and it is the APPX delivery code which is performing the disk flushing, so it wasn’t a simple case of modifying some App-V 5 code to fix the problem. Disclaimer: I should point out that this is my understanding of the Microsoft engineer’s explanation and I may have misunderstood or unintentionally misrepresented them.
So at least my findings have been confirmed and various people claiming I’ve built the system incorrectly (but not offering solutions – or offering stupid ones) have been shooed away, but I still don’t have a solution.
Programmatic solution to slow streaming problem
Since I cannot just leave the option “Turn off Windows write-cache buffer flushing on the device” enabled at all times, I decide to attempt to write a program to control the option “Turn off Windows write-cache buffer flushing on the device” on demand. I can envisage using this code in multiple ways:
- Enabling the option as publishing refresh starts, to speed up publishing performance since this also writes into the App-V 5 cache (writing Feature Block 0 data) and then turning it off again when publishing refresh completes.
- Being used within a GUI tool which allows users to 100% cache chosen App-V 5 applications, in a similar way to the existing Microsoft App-V 5 UI, but much faster because it will make use of the write-cache buffer flushing option.
- On logon, to automatically 100% cache important App-V 5 applications so they launch faster for the user when they logon to a computer they’ve not used before. My plan is to allow both the administrator and the user to define what applications are important.
But I’ve got two major hurdles to jump first:
The only API function I can find to control Windows write-cache buffer flushing is the DeviceIOControl() function in Kernel32.dll and using this API is non-trivial.
Only users with local administrative rights can modify the Windows write-cache buffer flushing setting (via the GUI or via DeviceIOControl), but I need to control this for normal users.
Anyway, I like a challenge so I brew my 6th coffee of the day, start Visual Studio and read a lot of MSDN documentation.
A brief summary of how to programmatically control Windows write-cache buffer flushing is as follows. This summary covers the basic mechanism but omits all the details of structure memory allocation and pointer handling largely because it will send everyone to sleep but also because this isn’t intended as a coding blog:
- Use the CreateFile() function in kernel32.dll to obtain a handle to “\\.\C:” (or whatever drive the App-V 5 cache is held on)
- Call the DeviceIoControl() function in kernel32.dll with the handle obtained in step 1 and the control code “IOCTL_STORAGE_QUERY_PROPERTY” to obtain a STORAGE_WRITE_CACHE_PROPERTY structure which contains a lot of information about the write cache on the specified drive including whether the write cache is enabled and if it is changeable
- Call the DeviceIoControl() function in kernel32.dll with the handle obtained in step 1 and the control code “IOCTL_DISK_GET_CACHE_SETTING” to obtain a DISK_CACHE_SETTING structure which contains a lot of information about the disk cache on the specified drive including “IsPowerProtected” which specifies if the cache is currently power protected – this is the setting I want to manipulate
- If the above “IsPowerProtected” setting is 0 then switch it to 1.
- Call the DeviceIoControl() function in kernel32.dll with the handle obtained in step 1, the control code “IOCTL_DISK_SET_CACHE_SETTING” and the above modified DISK_CACHE_SETTING structure to enable the Power Protected Write Cache (PPWC) which is presented via the GUI setting “Windows write-cache buffer flushing”. From now on I’m going to use the phrase PPWC to represent this setting (as every time I write the word “buffer” I keep typing something else by mistake).
My resulting EXE has been named “ControlDiskWriteCaching.exe”.
Oh I wish it had been that simple. It took a lot of coffee drinking and just short of 1000 lines of code to achieve this goal, although a lot of that code included API definitions and error handling.
This only works if you have local administrative rights so I’ve only jumped the first of my hurdles.
Elevate EXE rights with AppSense Application Manager
Luckily this customer was using AppSense Environment Manager and AppSense Application Manager so I had the ability to elevate an EXE running in the user’s context to local administrative rights whilst the user session itself remained non-administrative.
In AppSense Application Manager this is as easy as pie – Simply specify the signature of the EXE you wish to elevate and assign the built-in policy “Built-in elevate” to ensure that this EXE runs with local admin rights even when running in the context of a non-administrative user. Totally cool:
Why use a signature? Well I want to ensure that only my program is elevated and not some other EXE renamed to the same name as mine. So the user can’t be dodgy and copy PowerShell.exe to %TEMP%\ControlDiskWriteCaching.exe and get a free pass to being a local admin.
ControlDiskWriteCachingService App-V 5 package
I’ve since used this solution at a number of customer sites and most of them weren’t lucky enough to be using AppSense Application Manager, so I had to come up with an alternative means of getting around the local administrative rights requirement for controlling the PPWC.
I didn’t want my solution to require any additional software on the client device because I prefer to come up with agentless solutions as I often find myself in a situation where the customer doesn’t want any additional client software installed (or doesn’t have a mechanism to deploy software to all client computers).
I made use of a very handy side-effect / feature of services in an App-V 5 package: A service within an App-V 5 package will automatically start as the App-V 5 virtual environment gets created and will automatically run as SYSTEM (assuming you configured the service to run as SYSTEM during sequencing) even though the App-V 5 virtual environment is started by the user. So I can write a service and have it run with SYSTEM rights, on demand, without installing any software on the client.
So I now have two programs in my App-V 5 package:
|ControlDiskWriteCaching.exe||This is the program written earlier to enable or disable the PPWC|
|ControlDiskWriteCachingService.exe||This is a service which runs as SYSTEM and automatically starts when the virtual environment is created (i.e. as ControlDiskWriteCaching.exe runs in the user’s session)|
The PPWC is now controlled as follows:
- “ControlDiskWriteCaching.exe” is run in the user’s session and told to enable the PPWC (options specified via command-line arguments)
- The App-V 5 virtual environment gets created (because “ControlDiskWriteCaching.exe” has started) and hence the service “ControlDiskWriteCachingService.exe” automatically starts and runs as SYSTEM
- The service sets up a named pipe listener
- “ControlDiskWriteCaching.exe” attempts to control the PPWC via the DeviceIoControl() function but fails with an access denied (assuming the user is not a local administrator)
- “ControlDiskWriteCaching.exe” communicates with the service via a named pipe and asks the service to control the PPWC
- The service calls DeviceIOControl() to enable the PPWC and returns a success code via the named pipe back to “ControlDiskWriteCaching.exe”
That’s it! Wow I can see this solution being pretty useful / dangerous in the future. It basically allows me to have a non-administrative user launched EXE to (indirectly) have SYSTEM rights by passing any administrative request via a named pipe to a SYSTEM service which runs on demand via App-V 5 without having to actually be installed on the client computer.
You can see the two processes running under different accounts in task manager:
When the App-V 5 virtual environment closes (when ControlDiskWriteCaching.exe closes), the service unfortunately gets killed rather than shutdown neatly (seems to be an App-V 5 “feature”) but luckily the named pipe listener seems to close down neatly.
You can see the named pipe in Microsoft Sysinternals Process Explorer (I wasn’t in an artistic mood when I named the pipe):
App-V 5 fast streaming tool
The next step was to write a GUI to allow a user to manually 100% cache any App-V 5 application in a similar way to the existing App-V 5 UI but with the added performance provided by manipulating the PPWC. This GUI was named “App-V 5 fast streaming tool” and would have the functionality of “ControlDiskWriteCaching.exe” within it, so when a user clicks “Load into cache”, the GUI would perform the following tasks:
- Enable the PPWC via a named pipe request to the App-V 5 service “ControlDiskWriteCachingService.exe”
- Run “Mount-AppVClientPackage” via the PowerShell automation API
- Disable the PPWC via a named pipe request to the App-V 5 service “ControlDiskWriteCachingService.exe”
The PowerShell automation API is very useful. It allows me to launch PowerShell commands without having to shell out to PowerShell.exe and gives me a handle to the running state of the PowerShell command and the PowerShell command output is neatly presented in a PowerShell data collection.
Enabling PPWC during publishing refresh
Now I need to enable the Power Protected Write Cache (PPWC) as App-V 5 publishing starts and disable it again when App-V 5 publishing ends. This will speed up the App-V 5 publishing process.
I achieve this via two scheduled tasks:
“Preload App-V 5 apps fast pre publishing” scheduled task
The scheduled task “Preload App-V 5 apps fast pre publishing” is triggered as the App-V 5 publish refresh event ID 19001 is generated:
This event log is generated right at the start of App-V 5 user publishing refresh.
The scheduled task trigger looks like this:
The scheduled task action runs the command:
“ControlDiskWriteCaching.exe” /Volume \\.\C: /EnablePowerProtectedWriteCache /FallbackToControlDiskWriteCachingService
which enables the PPWC.
Note: The path to the EXE is a bit more complicated since it’s in the App-V 5 disk cache, but I have written code to quickly detect its location (without having to run any App-V 5 PowerShell commands).
“Preload App-V 5 apps fast post publishing” scheduled task
The scheduled task “Preload App-V 5 apps fast post publishing” is triggered as the App-V 5 user publishing scheduled task completes:
The App-V 5 client scheduled task for Full Infrastructure user publishing is called “1_user_logon” and is located in “Task Scheduler Library \ Microsoft \ AppV \ Publishing”:
This scheduled task is automatically created when an App-V 5 publishing server is defined (via the “Add-AppVPublishingServer” PowerShell command or via the App-V 5 GPO).
An event log is generated when this scheduled task completes (i.e. when user publishing refresh completes):
I configured the trigger for the “Preload App-V 5 apps fast post publishing” scheduled task to look for the above “Scheduled task complete” event log entry. I had to create a custom trigger to describe what I was looking for:
The scheduled task action runs the command:
“ControlDiskWriteCaching.exe” /Volume \\.\C: /DisablePowerProtectedWriteCache /FallbackToControlDiskWriteCachingService
which disables the PPWC (via the service).
Creating the “Preload App-V 5 apps fast” scheduled tasks
I don’t want to have to manually create these scheduled tasks on all client computers and I only want the scheduled task to exist when the “ControlDiskWriteCaching” App-V 5 package is published to the client desktops (I’m publishing it globally).
So I setup an App-V 5 script to run on “PublishPackage” by editing the “DeploymentConfig.xml” for the ControlDiskWriteCaching App-V 5 package. The XML for this script is shown below:
- <Arguments>“Create PreloadAppV5AppsFast scheduled task.vbs”<Arguments>
- <Wait Timeout=”30” RollbackOnError=”true“/>
- <Arguments>“Delete PreloadAppV5AppsFast scheduled task.vbs”<Arguments>
- <Wait Timeout=”30” RollbackOnError=”true“/>
Note that you need to use straight double-quotes and not the fancy curly quotes.
This runs a VBScript I wrote to create the scheduled tasks as the App-V 5 package is globally published. If the App-V 5 package is unpublished at a later time then another VBScript is run to delete the scheduled tasks.
The working directory for a script run during a “PublishPackage” event is the “Scripts” folder in the App-V 5 package, so I don’t have to provide a path.
Pre-loading important App-V 5 packages
I also need to be able to fast pre-load “important” App-V 5 packages by running the PowerShell command “Mount-AppVClientPackage” whilst the PPWC is enabled. This ensures that important apps are ready to launch without delay on computers the user hasn’t used before.
I would like to give the administrators the ability to define what applications are “important” but also for the users to be able to extend this list.
So I extend the “Preload App-V 5 apps fast post publishing” scheduled task to run the PowerShell command “Mount-AppVClientPackage” against certain App-V 5 applications before it disables the PPWC (which was enabled earlier by the “Preload App-V 5 apps fast pre publishing” scheduled task.
The scheduled task reads an INI file which can be created by an administrator and stored in a central location (I store it on the App-V 5 content servers – the scheduled task accesses the INI file via HTTP or SMB). An example INI file is shown below:
Priority 1 apps are pre-loaded first, then priority 2, then ……, then priority N. So in this example, “WinSCP” and “Paint DotNet” will be pre-loaded first and “VLC” and “Handbrake” loaded next. No other apps will be pre-loaded.
The user has the ability to extend this list by using the “App-V 5 fast streaming tool” GUI:
This writes a similar INI file into the user’s home directory (or some other location – the location is controllable via the registry or the central INI file) which is also read by the scheduled task when the user logs on to a desktop.
The final App-V 5 fast streaming package
So, after this mammoth investigation and software development side-project, I’ve finally produced a working App-V 5 package which improves App-V 5 Full Infrastructure user publishing and streaming performance to desktops using local hard disks.
I’ve un-imaginatively called it “ControlDiskWriteCachingService” although I regularly call it the “App-V 5 fast streaming tool”. I really need to come up with a cool name and my only suggestion so far – The “App-V 5 fast streaming disk flusheruppa” met with blank looks.
It is published globally (to AD groups containing computers) in App-V 5 Full Infrastructure:
When published globally to an App-V 5 desktop, the custom script defined in the “DeploymentConfig.XML” runs on the client desktop as SYSTEM and creates the two scheduled tasks described earlier.
When a user logs on and App-V 5 Full Infrastructure user publishing starts, the first scheduled task runs and enables the Power Protected Write Cache (PPWC) via the ControlDiskWriteCachingService in the same App-V 5 package which runs as SYSTEM.
When the App-V 5 Full Infrastructure publishing finishes, the second scheduled task runs and mounts (100% caches) any App-V 5 apps which have been marked as important either by the administrator or the user (via the App-V 5 fast streaming GUI at some earlier point). The PPWC is then disabled again.
The remainder of the user session runs with the PPWC disabled as normal.
Inside the App-V 5 fast streaming package
Here’s a quick look inside the App-V 5 package (as viewed in a desktop client’s App-V 5 cache):
These scripts (above) are mainly used to create the scheduled tasks or are run by the scheduled tasks. They are written in VBScript so you don’t see them flash up on the screen when they run (as PowerShell tends to do). In one case I have a VBS whose only function is to run a PowerShell script hidden.
In the PVAD (above) you can see the AppV5FastStreamingTool GUI (which has a shortcut in the App-V 5 package), the command-line tool to control the PPWC and the service which controls the PPWC (when asked to via a named pipe).
This solution isn’t usable in all environments:
- It is only usable when either manually mounting App-V 5 packages or using App-V 5 Full Infrastructure
- It is not usable when using SCCM to deliver App-V 5 packages, although it might be possible to insert into the SCCM delivery process in the future – I’d need to investigate further
- It is only of use when using a local App-V 5 cache on a hard disk
- If using a SSD for the OS drive (the drive usually used for the App-V 5 cache) then there is no performance gain (or at least none on the SSDs I’ve tried)
- There is no performance gain if using the App-V 5 Shared Content Store (e.g. with VDI or Terminal Services environments) since there is no local disk cache (and IO in such environments is typically no longer HDD backed)
So this pretty much boils down to one environment in which you’ll see a performance gain:
- App-V 5 apps delivered via App-V 5 Full Infrastructure to desktops which have hard disks
Lucky for me this is a common environment I find myself in.
Recent test with App-V 5 SP3
I recently implemented an App-V 5 SP3 Full Infrastructure solution for a customer with about 250 apps and a few thousand desktops, all using hard disks.
This is a perfect environment for the App-V 5 fast streaming tool solution, so I added this App-V 5 package to their system and ran some timing tests.
- I used an Adobe Lightroom CC App-V 5 package for my test (a 1.9GB package)
- I used “cold” tests where this was the first time this package had been mounted since the computer had been booted (see “Hot and cold timings”)
- I include the time it takes for the hard disk to settle after mounting. This is often between 30s and 60s where the disk is still at 100% utilisation after App-V 5 reports that mounting is complete (but clearly isn’t)
- The desktops were running Windows 8.1 x64
- The results are an average of 4 individual timings
- App-V 5 apps were HTTP streamed from an IIS content server (but SMB3 didn’t make much difference)
The results are below:
|Test performed:||Disk type:||Time in seconds:|
|App-V 5.0 client full “mount” time as normal||7200rpm HDD||222|
|App-V 5.0 client full “mount” time using fast streaming tool||7200rpm HDD||115|
So when using the fast streaming tool, the time taken to mount an App-V 5 package when using App-V 5 SP3 on a desktop with a 7200rpm HDD is approximately halved. This isn’t quite the gain seen back when using App-V 5 SP1, but still a significant gain.
I suspect that it was actually App-V 5 SP2 HF4 which improved the native performance of App-V 5 streaming and not SP3. I plan to run an additional set of tests with each version of the App-V 5 desktop client and will report my findings.
Will this method work for you?
If you have problems with slow App-V 5 streaming then try ticking the box “Turn off Windows write-cache buffer flushing on the device” (double click your hard drive in “Disk Drives” in Device Manager). If the option isn’t there then you cannot control the PPWC on this disk and this method won’t work.
Does this improve your App-V 5 streaming performance by a respectable amount? If not then this method won’t help you.
Don’t forget to turn the option off when you’ve finished testing.
ABOUT THE AUTHOR
Simon is Senior IT Consultant at Ultima Business Solutions and has worked as a consultant for IT solution providers for over 20 years. He specialises in End User Computing and software development and was awarded AppSense EMEA technical consultant of the year in 2015.