diff --git a/Docs/GettingStarted.md b/Docs/GettingStarted.md index 4ce29bc11..568f7944c 100644 --- a/Docs/GettingStarted.md +++ b/Docs/GettingStarted.md @@ -42,7 +42,7 @@ These would be your first steps on starting to work with Hastlayer: 2. Set up a Vivado and Xilinx SDK project in the Hardware Framework project as documented there, power up and program a compatible FPGA board. 3. Open the Visual Studio solution corresponding to your flavor of Hastlayer. -4. Set the `Hast.Samples.Consumer` project (under the *Samples* folder) as the startup project here. If you're working in the *client* flavor then you'll need to configure your credentials, see the that project's documentation. +4. Set the `Hast.Samples.Consumer` project (under the *Samples* folder) as the startup project here. If you're working in the *client* flavor then you'll need to configure your credentials, see that project's documentation. 5. Start the sample project. That will by default run the sample that is also added by default to the Hardware project. 6. You should be able to see the results of the sample in its console window. diff --git a/Docs/ReleaseNotes.md b/Docs/ReleaseNotes.md index 20cdd3133..148a1b092 100644 --- a/Docs/ReleaseNotes.md +++ b/Docs/ReleaseNotes.md @@ -5,6 +5,13 @@ Note that the hardware framework projects have their own release cycle and release notes. +## 1.0.10, 08.06.2017 + +- Updating and fixing hardware timing values, making hardware execution more reliable, but in certain cases slightly slower, however also causing lower FPGA resource usage. +- Fixing that hardware description caching didn't work with certain programs. +- Improved documentation. + + ## 1.0.9, 18.03.2017 - New Loopback sample to test FPGA connectivity and Hastlayer Hardware Framework resource usage. @@ -23,6 +30,7 @@ Note that running Hastlayer now requires Visual Studio 2017 or greater (any edit - Adding support for `ref` and `out` parameters, see the [issue](https://github.com/Lombiq/Hastlayer-SDK/issues/15). - `Fix64` fixed-point number type added for computations with fractions. - Simplified configuration of parallelized code: no need to manually specify the degree of parallelism any more in most cases (see `ParallelAlgorithmSampleRunner` for example: `Configure()` is just one line now). +- Improving the speed of hardware generation by a few percent. - Various smaller bugfixes and improvements. For all publicly tracked issues resolved with this release [see the corresponding milestone](https://github.com/Lombiq/Hastlayer-SDK/milestone/1?closed=1). diff --git a/Docs/WorkingWithHastlayer.md b/Docs/WorkingWithHastlayer.md index e3de2e2fc..478a24702 100644 --- a/Docs/WorkingWithHastlayer.md +++ b/Docs/WorkingWithHastlayer.md @@ -30,7 +30,7 @@ Some general constraints you have to keep in mind: - Always use the smallest data type necessary, e.g. `short` instead of `int` if 16b is enough (or even `byte`), and unsigned types like `uint` if you don't need negative numbers. - Supported primitive types: `byte`, `sbyte`, `short`, `ushort`, `int`, `uint`, `long`, `ulong`, `char`, `bool`. Floating-point numbers like `float` and `double` and numbers bigger than 64b are not yet supported, however you can use fixed-point math: multiply up your floats before handing them over to Hastlayer-executed code, then divide them back when receiving the results. If this is not enough you can use the `Fix64` 64b fixed-point number type included in the `Hast.Algorithms` library, see the `Fix64Calculator` sample. - The most important language constructs like `if` and `else` statements, `while` and `for` loops, type casting, binary operations (e.g. arithmetic, in/equality operators...), conditional expressions (ternary operator) on allowed types are supported. Also, `ref` and `out` parameters in method invocations are supported. -- Algorithms can use a fixed-size (determined at runtime) memory space modeled as a `byte` array in the class `SimpleMemory`. For inputs that should be passed to hardware implementations and outputs that should be sent back this memory space is to be used. For internal method arguments (i.e. for data that isn't coming from the host computer or should be sent back) normal method arguments can be used but you can utilize `SimpleMemory` for any other dynamic memory allocation internally too. Note that there shouldn't be concurrent access to a `SimpleMemory` instance, it's **not** thread-safe (neither in software nor on hardware)! +- Algorithms can use a fixed-size (determined at runtime) memory space modeled as an array of 32b values("cells") in the class `SimpleMemory`. For inputs that should be passed to hardware implementations and outputs that should be sent back this memory space is to be used. For internal method arguments (i.e. for data that isn't coming from the host computer or should be sent back) normal method arguments can be used but you can utilize `SimpleMemory` for any other dynamic memory allocation internally too. Note that there shouldn't be concurrent access to a `SimpleMemory` instance, it's **not** thread-safe (neither in software nor on hardware)! - Single-dimensional arrays having their size possible to determine compile-time are supported. So apart from instantiating arrays with their sizes specified as constants you can also use variables, fields, properties for array sizes, as well as expressions (and a combination of these), just in the end the size of the array needs to be resolvable at compile-time. If Hastlayer can't figure out the array size for some reason you can configure it manually, see the `UnumCalculator` sample. - To a limited degree `Array.Copy()` is also supported: only the `Copy(Array sourceArray, Array destinationArray, int length)` override and only with a `length` that can be determined at compile-time. Furthermore, `ImmutableArray` is also supported to a limited degree by converting objects of that type to standard arrays in the background (see the `Lombiq.Unum` project for examples). - Using objects created of custom classes and structs are supported. @@ -70,7 +70,7 @@ So to write fast code with Hastlayer you need implement massively parallel algor - Method invocation and access to custom properties (i.e. properties that have a custom getter or setter, so not auto-properties) cost multiple clock cycles as a baseline. Try to avoid having many small methods and custom properties (or methods you can also inline, see the "Writing Hastlayer-compatible .NET code" section). - Arithmetic operations take longer with larger number types so always use the smallest data type necessary (e.g. use `int` instead of `long` if its range is enough). This only applies to data types larger than 32b since smaller number types will be cast to `int` any way. However smaller data types lower the resource usage on the FPGA, so it's still beneficial to use them. -- Use constants where applicable to the constant values can be substituted instead of keeping read-only variables. +- Use constants where applicable so the constant values can be substituted instead of keeping read-only variables. - Memory access with `SimpleMemory` is relatively slow, so keep memory access to the minimum (use local variables and objects as temporary storage instead). - Loops with a large number of iterations but with some very basic computation inside them: this is because every iteration is at least one clock cycle, so again multiple operations can't be packed into a single clock cycle. Until Hastlayer does [loop unrolling](https://github.com/Lombiq/Hastlayer-SDK/issues/14) manual unrolling [can help](https://stackoverflow.com/questions/2349211/when-if-ever-is-loop-unrolling-still-useful). @@ -84,6 +84,8 @@ The `ParallelAlgorithm` sample does exactly this. Note that FPGAs have a finite amount of resources that you can utilize, and the more complex your algorithm, the more resources it will take. With simpler algorithms you can achieve a higher degree of parallelism on a given FPGA, since more copies of it will fit. So you can either have more complex pieces of logic parallelized to a lower degree, or simpler logic parallelized to a higher degree. +Very broadly speaking if you performance-optimize your .NET code and it executes faster as software then most possibly it will also execute faster as hardware. But do measure if your optimizations have the desired effect. + ## Troubleshooting diff --git a/Hast.Abstractions/Hast.Synthesis.Abstractions/IDeviceManifestProvider.cs b/Hast.Abstractions/Hast.Synthesis.Abstractions/IDeviceManifestProvider.cs index b81d4d4b0..48c626a65 100644 --- a/Hast.Abstractions/Hast.Synthesis.Abstractions/IDeviceManifestProvider.cs +++ b/Hast.Abstractions/Hast.Synthesis.Abstractions/IDeviceManifestProvider.cs @@ -3,7 +3,7 @@ namespace Hast.Synthesis.Abstractions { - public interface IDeviceManifestProvider : IDependency + public interface IDeviceManifestProvider : ISingletonDependency { IDeviceManifest DeviceManifest { get; } } diff --git a/Hast.Abstractions/Hast.Transformer.Abstractions/Configuration/TransformerConfiguration.cs b/Hast.Abstractions/Hast.Transformer.Abstractions/Configuration/TransformerConfiguration.cs index 269c77301..bc7c1caab 100644 --- a/Hast.Abstractions/Hast.Transformer.Abstractions/Configuration/TransformerConfiguration.cs +++ b/Hast.Abstractions/Hast.Transformer.Abstractions/Configuration/TransformerConfiguration.cs @@ -57,6 +57,13 @@ private set /// public IDictionary ArrayLengths { get; set; } = new Dictionary(); + /// + /// Gets or sets whether interfaces that are implemented by transformed types are processed. Currently such + /// interfaces don't affect the resulting hardware implementation, but the assemblies of all referenced + /// interfaces need to be loaded. If set to false such loading is not necessary. Defaults to false. + /// + public bool ProcessImplementedInterfaces { get; set; } = false; + public void AddMemberInvocationInstanceCountConfiguration(MemberInvocationInstanceCountConfiguration configuration) { diff --git a/Hast.Algorithms/Fix64.cs b/Hast.Algorithms/Fix64.cs index 9296c82cb..fb5f38f4b 100644 --- a/Hast.Algorithms/Fix64.cs +++ b/Hast.Algorithms/Fix64.cs @@ -10,6 +10,7 @@ namespace Hast.Algorithms /// /// Represents a Q31.32 fixed-point number. /// + /// /// Taken from https://github.com/asik/FixedMath.Net and modified to be Hastlayer-compatible. See the original /// license below: /// @@ -36,9 +37,8 @@ namespace Hast.Algorithms /// The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. /// /// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. - /// , IComparable + /// + public struct Fix64 : IEquatable, IComparable { private readonly long _rawValue; diff --git a/Hast.Common/Extensions/MethodInfoExtensions.cs b/Hast.Common/Extensions/MethodInfoExtensions.cs index 2986feaa9..bbe2f7eb2 100644 --- a/Hast.Common/Extensions/MethodInfoExtensions.cs +++ b/Hast.Common/Extensions/MethodInfoExtensions.cs @@ -8,13 +8,10 @@ public static class MethodInfoExtensions /// Gets the full name of the method, including the full namespace of the parent type(s) as well as their return /// type and the types of their (type) arguments. /// - public static string GetFullName(this MethodInfo method) - { - return - method.ReturnType.FullName + " " + - method.ReflectedType.FullName + "::" + - method.Name + - "(" + string.Join(",", method.GetParameters().Select(parameter => parameter.ParameterType.FullName)) + ")"; - } + public static string GetFullName(this MethodInfo method) => + method.ReturnType.FullName + " " + + method.ReflectedType.FullName + "::" + + method.Name + + "(" + string.Join(",", method.GetParameters().Select(parameter => parameter.ParameterType.FullName)) + ")"; } } diff --git a/Hast.Communication/Readme.md b/Hast.Communication/Readme.md index d465efc85..997173391 100644 --- a/Hast.Communication/Readme.md +++ b/Hast.Communication/Readme.md @@ -33,4 +33,6 @@ Also tried but don't recommend [Tftpd32](http://tftpd32.jounin.net/) (needs manu Serial is set as the default communication channel so Hastlayer will use it if you don't change anything. But when generating proxies for you hardware-accelerated objects you can also set `"Serial"` as the `CommunicationChannelName` to select this communication channel. -Connect the device(s) to the host PC with an USB cable to use USB UART as the communication channel. \ No newline at end of file +Connect the device(s) to the host PC with an USB cable to use USB UART as the communication channel. + +Be aware that for the serial communication to work it might be necessary to run the application (or Visual Studio if you're running it from source) as administrator, otherwise it won't be able to access the serial port. Also if other applications have COM ports open (like a Bluetooth dongle) then you may need to switch them temporarily off for the serial port detection to work. Alternatively you can specify the name of the COM port to use by hand in `SerialPortCommunicationService`. \ No newline at end of file diff --git a/Hast.Communication/Services/EthernetCommunicationService.cs b/Hast.Communication/Services/EthernetCommunicationService.cs index f41d2652c..6f66d3c3b 100644 --- a/Hast.Communication/Services/EthernetCommunicationService.cs +++ b/Hast.Communication/Services/EthernetCommunicationService.cs @@ -43,7 +43,8 @@ public EthernetCommunicationService( } - public override async Task Execute(SimpleMemory simpleMemory, + public override async Task Execute( + SimpleMemory simpleMemory, int memberId, IHardwareExecutionContext executionContext) { diff --git a/Hast.Communication/Services/SerialPortCommunicationService.cs b/Hast.Communication/Services/SerialPortCommunicationService.cs index ecc80fccf..85d4a3a41 100644 --- a/Hast.Communication/Services/SerialPortCommunicationService.cs +++ b/Hast.Communication/Services/SerialPortCommunicationService.cs @@ -42,7 +42,8 @@ public SerialPortCommunicationService( } - public override async Task Execute(SimpleMemory simpleMemory, + public override async Task Execute( + SimpleMemory simpleMemory, int memberId, IHardwareExecutionContext executionContext) { diff --git a/Samples/Hast.Samples.Consumer/Readme.md b/Samples/Hast.Samples.Consumer/Readme.md index c8bb3279e..77094cd4f 100644 --- a/Samples/Hast.Samples.Consumer/Readme.md +++ b/Samples/Hast.Samples.Consumer/Readme.md @@ -1,4 +1,6 @@ # Hastlayer consumer sample readme -A simple console application that showcases how an app can utilize Hastlayer. First head to the *Program.cs* file. \ No newline at end of file +A simple console application that showcases how an app can utilize Hastlayer. First head to the *Program.cs* file. + +This is a complete, thoroughly documented sample. If you'd like to see a more stripped-down version of a minimal Hastlayer-using application, check out the *Hast.Samples.Demo* project instead. \ No newline at end of file diff --git a/Samples/Hast.Samples.Consumer/SampleRunners/Fix64CalculatorSampleRunner.cs b/Samples/Hast.Samples.Consumer/SampleRunners/Fix64CalculatorSampleRunner.cs index 860030d8e..3645928cc 100644 --- a/Samples/Hast.Samples.Consumer/SampleRunners/Fix64CalculatorSampleRunner.cs +++ b/Samples/Hast.Samples.Consumer/SampleRunners/Fix64CalculatorSampleRunner.cs @@ -22,10 +22,8 @@ public static async Task Run(IHastlayer hastlayer, IHardwareRepresentation hardw var sum = fixed64Calculator.CalculateIntegerSumUpToNumber(10000000); - // This takes about 252ms on an i7 processor with 4 physical (8 logical) cores and 1300ms on an FPGA (with - // a MaxDegreeOfParallelism of 10 while the device is about 58% utilized). With a higher degree of - // parallelism it won't produce correct results on the Nexys 4 DDR board's FPGA (although it will fit with - // a parallelism degree of 13 with 80% resource usage too). + // This takes about 274ms on an i7 processor with 4 physical (8 logical) cores and 1300ms on an FPGA (with + // a MaxDegreeOfParallelism of 13 while the device is about 76% utilized). // Since this basically does what the single-threaded sample but in multiple copies on multiple threads // the single-threaded sample takes the same amount of time on the FPGA. diff --git a/Samples/Hast.Samples.Demo/Program.cs b/Samples/Hast.Samples.Demo/Program.cs index a6c976ef1..ed8206205 100644 --- a/Samples/Hast.Samples.Demo/Program.cs +++ b/Samples/Hast.Samples.Demo/Program.cs @@ -2,7 +2,6 @@ using System.Threading.Tasks; using Hast.Layer; using Hast.Samples.SampleAssembly; -using Hast.Transformer.Abstractions.Configuration; using Hast.Transformer.Vhdl.Abstractions.Configuration; namespace Hast.Samples.Demo diff --git a/Samples/Hast.Samples.SampleAssembly/Fix64Calculator.cs b/Samples/Hast.Samples.SampleAssembly/Fix64Calculator.cs index 594d13eef..6b9f36c9f 100644 --- a/Samples/Hast.Samples.SampleAssembly/Fix64Calculator.cs +++ b/Samples/Hast.Samples.SampleAssembly/Fix64Calculator.cs @@ -20,7 +20,7 @@ public class Fix64Calculator public const int ParallelizedCalculateLargeIntegerSum_Int32NumbersStartIndex = 0; public const int ParallelizedCalculateLargeIntegerSum_OutputInt32sStartIndex = 0; - public const int MaxDegreeOfParallelism = 10; + public const int MaxDegreeOfParallelism = 13; public virtual void CalculateIntegerSumUpToNumber(SimpleMemory memory) diff --git a/SharedAssemblyInfo.cs b/SharedAssemblyInfo.cs index a2a998d6c..79ba48c5a 100644 --- a/SharedAssemblyInfo.cs +++ b/SharedAssemblyInfo.cs @@ -26,5 +26,5 @@ // You can specify all the values or you can default the Build and Revision Numbers // by using the '*' as shown below: // [assembly: AssemblyVersion("1.0.*")] -[assembly: AssemblyVersion("1.0.9.0")] -[assembly: AssemblyFileVersion("1.0.9.0")] \ No newline at end of file +[assembly: AssemblyVersion("1.0.10.0")] +[assembly: AssemblyFileVersion("1.0.10.0")] \ No newline at end of file