-
-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace WixSharp's MSI parser with a pure C# parser based on OpenMcdf #154
Conversation
I like you found a way to gather msi product codes on other platforms, but executing a process on the machine does not really provide a solution that also works in the cloud. If this is possible in an other language (so it seems) would it also be possible to do in c#? |
I can think of implementing libmsi as an FFI'd library, but reimplementing the entire thing in C# would be an entire project by itself, there's multiple thousand lines of code in libmsi. Which kind of cloud workflow are you thinking of where shell commands (well, in this case, there is no shell, it's direct program execution with an argument) would not be allowed? As far as I know, most cloud platforms would let you ship msitools as part of the runtime container |
I spend a great deal of time making this fully cross-platform, and want to remove all things that depend upon a binary that can only be run on Windows. That is why I'm quite hesitant against introducing another platform dependency. I'm thinking Azure Functions, I know you can run those from a container and you could modify the built process to include another binary, but still that will cause a lot of headache. This is just a one-men project (for now), I'm not sure if I have the time to support this. May I propose something else? What if you could specify a command format that would be executed to extract details from the msi. Then there is no calling binaries inside this library and you are still able to extract msi details on linux. Other discussion is that this information should be correct in the Winget manifest.... |
That command argument would probably be fine, although it would change nothing (you would still require calling external binaries on Linux or it would break, and it would not need to do so on Windows thanks to WixSharp) I'm trying to see if https://github.com/ironfede/openmcdf can be used to extract the version information, hopefully that can also remove WixSharp, as it's pure C# |
This would change that I mark that as |
I definitely believe OpenMcdf is an option, I've already implemented decoding the string names and tables from an MSI, all I have to do now is figure out which string is the version and code string, but it's almost 2 AM, so that'll be tomorrow. using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using Microsoft.Graph.Beta.Models;
using OpenMcdf;
namespace WingetIntune.Internal.Msi;
internal class MsiDecoder
{
public string GetCode()
{
throw new NotImplementedException();
}
public string GetVersion()
{
throw new NotImplementedException();
}
public MsiDecoder(string filePath)
{
using (var cf = new CompoundFile(filePath))
{
var pool = LoadStringPool(cf);
}
}
// references for the next lines:
// https://stackoverflow.com/questions/9734978/view-msi-strings-in-binary
private char BaseMSIDecode(char c)
{
// 0-0x3F converted to '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz._'
// all other values higher as 0x3F converted also to '_'
int result;
if (c < 10)
result = c + '0'; // 0-9 (0x0-0x9) -> '0123456789'
else if (c < (10 + 26))
result = c - 10 + 'A'; // 10-35 (0xA-0x23) -> 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
else if (c < (10 + 26 + 26))
result = c - 10 - 26 + 'a'; // 36-61 (0x24-0x3D) -> 'abcdefghijklmnopqrstuvwxyz'
else if (c == (10 + 26 + 26)) // 62 (0x3E) -> '.'
result = '.';
else
result = '_'; // 63-0xffffffff (0x3F-0xFFFFFFFF) -> '_'
return (char)result;
}
string DecodeStreamName(string name)
{
var result = new List<char>();
var source = name.ToCharArray();
foreach (char c in source)
{
var reduced = 0;
if (c == 0x4840)
{
result.Add('$');
}
else if ((c >= 0x3800) && (c < 0x4840))
{
if (c >= 0x4800)
{
reduced = c - 0x4800;
result.Add(BaseMSIDecode((char)(reduced)));
}
else
{
reduced = c - 0x3800;
result.Add(BaseMSIDecode((char)(reduced & 0x3F)));
result.Add(BaseMSIDecode((char)((reduced >> 6) & 0x3F)));
}
}
else
{
result.Add(c);
}
}
return new string(result.ToArray());
}
private char BaseMSIEncode(char c)
{
// only '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz._' are allowed and converted to 0-0x3F
int result;
if ((c >= '0') && (c <= '9')) // '0123456789' -> 0-9 (0x0-0x9)
result = c - '0';
else if ((c >= 'A') && (c <= 'Z')) // 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' -> 10-35 (26 chars) - (0xA-0x23)
result = c - 'A' + 10;
else if ((c >= 'a') && (c <= 'z')) // 'abcdefghijklmnopqrstuvwxyz' -> 36-61 (26 chars) - (0x24-0x3D)
result = c - 'a' + 10 + 26;
else if (c == '.')
result = 10 + 26 + 26; // '.' -> 62 (0x3E)
else if (c == '_')
result = 10 + 26 + 26 + 1; // '_' -> 63 (0x3F) - 6 bits
else
result = -1; // other -> -1 (0xFF)
return (char)result;
}
string EncodeStreamName(string name)
{
var result = new List<char>();
for (int i = 0; i < name.Length; i++)
{
var c = name[i];
if (c == '$')
{
result.Add((char)0x4840);
}
else if (c < 0x80 && BaseMSIEncode(c) <= 0x3F && i + 1 != name.Length)
{
i++;
var first = BaseMSIEncode(c);
var second = BaseMSIEncode(name[i]);
result.Add((char)(first + (second << 6) + 0x3800));
}
else
{
result.Add((char)(BaseMSIEncode(c) + 0x4800));
}
}
return new string(result.ToArray());
}
Dictionary<int, string> LoadStringPool(CompoundFile cf)
{
var decodedStringPool = EncodeStreamName("$_StringPool");
var streamStringPool = cf.RootStorage.GetStream(decodedStringPool);
var stringPoolBytes = streamStringPool.GetData();
var poolLength = streamStringPool.Size;
var poolWLength = BitConverter.ToInt16(stringPoolBytes, 0);
var poolRefCount = BitConverter.ToInt16(stringPoolBytes, 2);
var decodedStringData = EncodeStreamName("$_StringData");
var streamStringData = cf.RootStorage.GetStream(decodedStringData);
var stringDataBytes = streamStringData.GetData();
var strings = new Dictionary<int, string>();
for (int src = 4, stringId = 1, offset = 0; src < poolLength; src += 4)
{
Console.WriteLine("Starting decode");
var entryLength = (int)BitConverter.ToInt16(stringPoolBytes, src);
var entryRef = (int)BitConverter.ToInt16(stringPoolBytes, src + 2);
Console.WriteLine($"Of entry {entryLength} {entryRef}");
if (entryLength == 0 && entryRef == 0)
{
// Empty entry, skip.
Console.WriteLine("Skipping");
stringId++;
continue;
}
else if (entryLength == 0 && entryRef != 0)
{
// wide entry over 64kb
Console.WriteLine("Wide Entry");
continue;
}
if (src != 4)
{
var previousEntryLength = BitConverter.ToInt16(stringPoolBytes, src - 4);
var previousEntryRef = BitConverter.ToInt16(stringPoolBytes, src - 2);
Console.WriteLine($"Previous entry {previousEntryLength} {previousEntryRef}");
if (previousEntryLength == 0 && previousEntryRef != 0)
{
entryLength += previousEntryLength << 16;
Console.WriteLine($"New Size {entryLength}");
}
}
Console.WriteLine($"Adding {Encoding.UTF8.GetString(stringDataBytes.Skip(offset).Take(entryLength).ToArray())}");
strings.Add(stringId, Encoding.UTF8.GetString(stringDataBytes.Skip(offset).Take(entryLength).ToArray()));
offset += entryLength;
stringId++;
}
return strings;
}
} I want to put a special lowlight to whoever thought GLib was a good idea. msitools is unreadable. |
Welp, looks like addiction won in the end, it works. using System;
using System.Collections;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using Microsoft.Graph.Beta.Models;
using OpenMcdf;
namespace WingetIntune.Internal.Msi;
internal class MsiDecoder
{
private int stringSize = 2;
private Dictionary<int, string> intToString;
private Dictionary<string, int> stringToInt;
public string GetCode()
{
throw new NotImplementedException();
}
public string GetVersion()
{
throw new NotImplementedException();
}
public MsiDecoder(string filePath)
{
using (var cf = new CompoundFile(filePath))
{
intToString = LoadStringPool(cf);
stringToInt = intToString.ToDictionary(x => x.Value, x => x.Key);
foreach(var entry in intToString)
{
Console.WriteLine($"{entry.Key}: {entry.Value}");
}
var decodedPropertyName = EncodeStreamName("$Property");
var streamProperty = cf.RootStorage.GetStream(decodedPropertyName);
var propertyBytes = streamProperty.GetData();
var enubytes = Enumerable.Range(0, propertyBytes.Length / 2).Select(i => BitConverter.ToUInt16(propertyBytes, i * 2)).ToArray();
var cells = new List<string>();
foreach (var b in enubytes)
{
cells.Add(intToString[b]);
}
var tableSize = cells.Count() / 2;
var code = cells[cells.IndexOf("ProductCode") + tableSize];
var version = cells[cells.IndexOf("ProductVersion") + tableSize];
Console.WriteLine($"Code: {code}, Version: {version}");
}
}
// references for the next lines:
// https://stackoverflow.com/questions/9734978/view-msi-strings-in-binary
private char BaseMSIDecode(char c)
{
// 0-0x3F converted to '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz._'
// all other values higher as 0x3F converted also to '_'
int result;
if (c < 10)
result = c + '0'; // 0-9 (0x0-0x9) -> '0123456789'
else if (c < (10 + 26))
result = c - 10 + 'A'; // 10-35 (0xA-0x23) -> 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
else if (c < (10 + 26 + 26))
result = c - 10 - 26 + 'a'; // 36-61 (0x24-0x3D) -> 'abcdefghijklmnopqrstuvwxyz'
else if (c == (10 + 26 + 26)) // 62 (0x3E) -> '.'
result = '.';
else
result = '_'; // 63-0xffffffff (0x3F-0xFFFFFFFF) -> '_'
return (char)result;
}
string DecodeStreamName(string name)
{
var result = new List<char>();
var source = name.ToCharArray();
foreach (char c in source)
{
var reduced = 0;
if (c == 0x4840)
{
result.Add('$');
}
else if ((c >= 0x3800) && (c < 0x4840))
{
if (c >= 0x4800)
{
reduced = c - 0x4800;
result.Add(BaseMSIDecode((char)(reduced)));
}
else
{
reduced = c - 0x3800;
result.Add(BaseMSIDecode((char)(reduced & 0x3F)));
result.Add(BaseMSIDecode((char)((reduced >> 6) & 0x3F)));
}
}
else
{
result.Add(c);
}
}
return new string(result.ToArray());
}
private char BaseMSIEncode(char c)
{
// only '0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz._' are allowed and converted to 0-0x3F
int result;
if ((c >= '0') && (c <= '9')) // '0123456789' -> 0-9 (0x0-0x9)
result = c - '0';
else if ((c >= 'A') && (c <= 'Z')) // 'ABCDEFGHIJKLMNOPQRSTUVWXYZ' -> 10-35 (26 chars) - (0xA-0x23)
result = c - 'A' + 10;
else if ((c >= 'a') && (c <= 'z')) // 'abcdefghijklmnopqrstuvwxyz' -> 36-61 (26 chars) - (0x24-0x3D)
result = c - 'a' + 10 + 26;
else if (c == '.')
result = 10 + 26 + 26; // '.' -> 62 (0x3E)
else if (c == '_')
result = 10 + 26 + 26 + 1; // '_' -> 63 (0x3F) - 6 bits
else
result = -1; // other -> -1 (0xFF)
return (char)result;
}
string EncodeStreamName(string name)
{
var result = new List<char>();
for (int i = 0; i < name.Length; i++)
{
var c = name[i];
if (c == '$')
{
result.Add((char)0x4840);
}
else if (c < 0x80 && BaseMSIEncode(c) <= 0x3F && i + 1 != name.Length)
{
i++;
var first = BaseMSIEncode(c);
var second = BaseMSIEncode(name[i]);
result.Add((char)(first + (second << 6) + 0x3800));
}
else
{
result.Add((char)(BaseMSIEncode(c) + 0x4800));
}
}
return new string(result.ToArray());
}
Dictionary<int, string> LoadStringPool(CompoundFile cf)
{
var decodedStringPool = EncodeStreamName("$_StringPool");
var streamStringPool = cf.RootStorage.GetStream(decodedStringPool);
var stringPoolBytes = streamStringPool.GetData();
var poolLength = streamStringPool.Size;
var poolWLength = BitConverter.ToUInt16(stringPoolBytes, 0);
var poolRefCount = BitConverter.ToUInt16(stringPoolBytes, 2);
if (poolRefCount == 0)
stringSize = 2;
else if (poolRefCount == 0x8000)
stringSize = 3;
var decodedStringData = EncodeStreamName("$_StringData");
var streamStringData = cf.RootStorage.GetStream(decodedStringData);
var stringDataBytes = streamStringData.GetData();
var strings = new Dictionary<int, string>();
for (int src = 4, stringId = 1, offset = 0; src < poolLength; src += 4)
{
Console.WriteLine("Starting decode");
var entryLength = (int)BitConverter.ToUInt16(stringPoolBytes, src);
var entryRef = (int)BitConverter.ToUInt16(stringPoolBytes, src + 2);
Console.WriteLine($"Of entry {entryLength} {entryRef}");
if (entryLength == 0 && entryRef == 0)
{
// Empty entry, skip.
Console.WriteLine("Skipping");
stringId++;
continue;
}
else if (entryLength == 0 && entryRef != 0)
{
// wide entry over 64kb
Console.WriteLine("Wide Entry");
continue;
}
if (src != 4)
{
var previousEntryLength = BitConverter.ToInt16(stringPoolBytes, src - 4);
var previousEntryRef = BitConverter.ToInt16(stringPoolBytes, src - 2);
Console.WriteLine($"Previous entry {previousEntryLength} {previousEntryRef}");
if (previousEntryLength == 0 && previousEntryRef != 0)
{
entryLength += previousEntryLength << 16;
Console.WriteLine($"New Size {entryLength}");
}
}
Console.WriteLine($"Adding {Encoding.UTF8.GetString(stringDataBytes.Skip(offset).Take(entryLength).ToArray())}");
strings.Add(stringId, Encoding.UTF8.GetString(stringDataBytes.Skip(offset).Take(entryLength).ToArray()));
offset += entryLength;
stringId++;
}
return strings;
}
} This is overly simplistic as it can only get product code and product version, but if you want other data from the MSI, I can get that too a few tables that could be interesting
|
Alright, pushed my changes, this version is not perfect yet, as I'm parsing large numbers (i4) incorrectly, otherwise, parsing numbers is sufficient, and adding queries can be done in a similar way as both GetCode and GetVersion. Probably deserves a cleanup before it's accepted too. EDIT: Also tried it on windows while disabling WixSharp, seemingly works just fine! EDIT2: If anybody stumbles upon this because you're trying to find info on MSI databases, here you go, lil' writeup
|
If this all works as expected, I'm also for removing the wixsharp code that is included here: https://github.com/svrooij/WingetIntune/tree/main/src/WingetIntune/Internal/Msi The idea is that it will use this implementation on all platforms right? |
Yeah, since this is pure C#, I don't see a reason to keep WixSharp around I haven't disabled it yet, but it's probably cautious to keep it until it's battle tested? Up to you. |
Wixsharp's gone, tests pass on my end. |
I think it might be a good idea to add a test that uses this new code. |
Alright, do you think It's better to embed an msi file, or redownload one for the test? I'll write the test(s) tonight (gmt+1) otherwise. |
If you can find a really small MSI somewhere I would add that to the tests folder and use that. Otherwise just take an url from somewhere. |
Also had to implement a second constructor based on a stream, so the MSI file doesn't have to be written to disk.
There we go, I'm using a small msi from Microsoft's servers (LAPS 6.2.0.0, 1.1MB), so I'm reasonably confident it'll stay in place for a while. I'm only testing the publicly exposed interface, since that's what's in use, and I don't see the point in testing it from the IntuneManager side, since it'd basically be the same code, just with more effort, as OpenMcdf reads the files itself otherwise. |
No clue why the tests failed, there shouldn't be an impact on them, but I just imported MSAL directly. Also ran formatter (dammit I keep forgetting that one) |
That did not go as plannend, let me try again. I always squash to cleanup the history.... |
The current version of WinTuner uses WixSharp.UI.MsiParser to read the embedded database inside of msi files to get the product's code, and version, which is not available on 'nixes.
While the product code is easily available via winget, it appears that the product version field is not available, or not available anymore. Since the product version is required for msi deployment, and it is not available otherwise, this means msi packages cannot be created from 'nixes, like Linux or macOS.
This pull request implements a fallback to msitools, a red hat/GNOME maintained project that implements support for msi file reading.
This package is available in all linux distros I have checked (including smaller ones like alpine and void) as well as macOS' homebrew.
I have run the Pester test suite on Linux, as well as manually tested New-WtWingetPackage, and it does manage to package msi files with no issues.
Do note: this is the first C# code I've written in a very long time, so it may be better to cherry pick parts of this implementation if the code quality is insufficient, especially around error handling.
Closes #79