Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Holger #14

Open
wants to merge 6 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions _posts/2018-12-14_Fortran_Files.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
---
title: Fortran Binary Files
layout: notebook
notebook: 2018-12-14_Fortran_Files.html
author: Holger Wolff
excerpt: >-
A quick introduction how Fortran stores binary files and how to read them
using Python
---
8 changes: 8 additions & 0 deletions _posts/2019-09-03-Python_Animation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
---
title: Animating fields with Python
layout: notebook
notebook: 2019-09-03_Python_Animation.ipynp
author: Holger Wolff
excerpt: >-
Quick guide on creating animated gifs and mp4 videos of datasets
---
333 changes: 333 additions & 0 deletions notebooks/2018-12-14_Fortran_Files.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,333 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# How Fortran stores binary files"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Introduction\n",
"\n",
"Fortran is still the go-to language for number crunching."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could perhaps be either expanded or removed entirely

]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Types of Fortran Files\n",
"\n",
"There are three different native ways for Fortran to store data in files:\n",
"\n",
"1. Formatted\n",
"2. Unformatted\n",
"3. Stream\n",
"\n",
"Then, there are libraries to store the data in specific formats, for example NetCDF.\n",
"\n",
"If you want to store complex data sets for a long time, I strongly recommend NetCDF or another dedicated data format. \n",
"We have detailed on this blog before how to write NetCDF files with Fortran and Python, and they have features like compression and documentation that are very beneficial.\n",
"\n",
"But what if the data isn't very complex and NetCDF would be an overkill? \n",
"Or if you received the data from someone else and they didn't bother with this?\n",
"\n",
"This blog post will help you with your task of reading the data."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## A few notes on best practices\n",
"\n",
"There are some good practices on how to deal with files in Fortran. \n",
"These make it easier to port data, because they will work on different systems.\n",
"\n",
"### Ensure that you know the kind of the variable\n",
"\n",
"If you write something like\n",
"\n",
"```fortran\n",
"integer :: ii\n",
"```\n",
"\n",
"you don't really know what kind of integer `ii` will be. \n",
"Often, you can set the default integer and real kind with compiler options, but it's far better to explicitly declare the kind in the code itself.\n",
"\n",
"Since Fortran 2003 -- and all compilers we use today are compatible with this -- you can use the intrinsic `iso_fortran_env` module to get the proper kinds:\n",
"\n",
"```fortran\n",
"use iso_fortran_env, only: int32, real64\n",
"implicit none\n",
"integer(kind=int32) :: ii\n",
"real(kind=real64) :: x(10, 100)\n",
"```\n",
"\n",
"In old code, you might find statements like:\n",
"\n",
"```fortran\n",
"integer*4 ii ! DO NOT DO THIS\n",
"```\n",
"\n",
"This syntax has *never* been standard, and I strongly discourage you from using it.\n",
"Slightly better, but still wrong, is this:\n",
"\n",
"```fortran\n",
"integer(kind=4) :: ii ! Still not good\n",
"```\n",
"\n",
"There is no guarantee that every compiler will use the same kind values for the same variable types.\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit overdramatic, kind=4 and kind=8 will be 32 and 64 bit respectively.

"If for some reason you can not use `iso_fortran_env`, use the `selected_int_kind` and `selected_real_kind` methods instead:\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could explain what the selected_*_kind() intrinsics do

"\n",
"```fortran\n",
"integer, parameter :: real64 = selected_real_kind(15, 307)\n",
"real(kind=real64) :: x(10, 100)\n",
"```\n",
"\n",
"See the table below for which type you need\n",
"\n",
"| bytes | int name | integer kind | integer max | | | real name | real kind |\n",
"|------|-------|------|------|-----| |-------|-------|-------|\n",
"| 2 | `int16` | `selected_int_kind(3)` | 127 | | | |\n",
"| 4 | `int32` | `selected_int_kind(5)` | > 2*10^9 | | | `real32` | `selected_real_kind(6, 37)` |\n",
"| 8 | `int64` | `selected_int_kind(10)` | > 9*10^18 | | | `real64` | `selected_real_kind(15, 307)` |\n",
"| 16 | ---- | `selected_int_kind(19)` | > 10^38 | | | `real128` | `selected_real_kind(33, 4931)` |\n",
"\n",
"Note that `iso_fortran_env` does not have a named type `int128`, though your compiler might have it. \n",
"Some compilers also have a 10-byte real kind.\n",
"\n",
"### newunit\n",
"\n",
"Whenever you interact with a file, you need a unit, an integer value that references a specific open file.\n",
"Some I/O streams, specifically Standard Input, Standard Output, and Standard Error have compiler dependent values for these units, which unfortunately are not standardised.\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why you're talking about stdout etc here, but they also have defined constants in iso_fortran_env

"\n",
"Keeping track of these values while remaining compiler-agnostic is getting a bit confusing.\n",
"Fortunately, there's an option for that: `newunit`.\n",
"\n",
"Instead of using a hardcoded integer value, declare an integer variable with a meaningful name, then open the file with `newunit=` instead of `unit=` parameter:\n",
"\n",
"```fortran\n",
"integer :: output_handle\n",
"...\n",
"open(newunit=output_handle, file='data.dat', ...)\n",
"...\n",
"write(output_handle, *) values(:, i)\n",
"...\n",
"close(output_handle)\n",
"```\n",
"\n",
"A new, unused value is assigned every time you open the file, and you don't have to worry about interfering file handles any more."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Stream\n",
"\n",
"Stream output has been part of Fortran 2003 and later. \n",
"\n",
"The binary representation of the data is written directly to the file, without any metadata.\n",
"\n",
"### Fortran writing stream data\n",
"\n",
"```fortran\n",
"program write_stream\n",
" use iso_fortran_env, only: int16\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably better to use int32, int16 isn't commonly used

" implicit none\n",
" integer(kind=int16) :: ii\n",
" integer :: output_handle\n",
" open(newunit=output_handle, file='stream_data.dat', action='write', &\n",
" status='replace', access='stream', format='unformatted')\n",
" write(output_handle) [(ii, ii=1, 10)]\n",
" write(output_handle) \"Hello World\"\n",
" close(output_handle)\n",
"end program write_stream\n",
"``` "
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"00000000 01 00 02 00 03 00 04 00 05 00 06 00 07 00 08 00 |................|\r\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might need to explain what we're looking at here, and why 1 is represented as 01 00

"00000010 09 00 0a 00 48 65 6c 6c 6f 20 57 6f 72 6c 64 |....Hello World|\r\n",
"0000001f\r\n"
]
}
],
"source": [
"!hexdump -C stream_data.dat"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This has written the 16-bit values from 1 to 10, followed by the ascii values for \"Hello World\".\n",
"\n",
"### Reading it into Python\n",
"\n",
"If it were purely one large array, it would be very easy to read it into Python:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([ 1, 2, 3, 4, 5, 6, 7, 8, 9,\n",
" 10, 25928, 27756, 8303, 28503, 27762], dtype=int16)"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import numpy as np\n",
"np.fromfile('stream_data.dat', '<i2')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The `np.fromfile` method reads the data stream in as-is, and iterprets the values according to the datatype you gave it, in the above case little-endian 2-byte integer.\n",
"\n",
"For an overview of possible data types, see [here](https://docs.scipy.org/doc/numpy/reference/arrays.interface.html#python-side).\n",
"\n",
"The integer values are correctly read in, but of course the 'H' and 'e' get mashed into a single integer value of 25928, 'll' becomes 27759, and so forth.\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I was unfamiliar with binary files I don't think this would be an 'of course' statement

"\n",
"Still, this might be the simplest way to transfer a single array bit-correct to python."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Unformatted sequential\n",
"\n",
"There is no standardised method to store unformatted sequential data, and the exact format might vary between different compilers and platforms.\n",
"\n",
"That said, most compilers seem to store it in a similar way by now.\n",
"\n",
"### Fortran Write\n",
"\n",
"```fortran\n",
"program write_unformatted\n",
" use iso_fortran_env\n",
" implicit none\n",
" integer(kind=int16) :: ii\n",
" integer :: output_handle\n",
" open(newunit=output_handle, file='unformatted_data.dat', form='unformatted', &\n",
" status='replace', action='write', access='sequential')\n",
" write(output_handle) [(ii, ii=1, 10)]\n",
" write(output_handle) \"Hello World\"\n",
" close(output_handle)\n",
"end program write_unformatted\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"00000000 14 00 00 00 01 00 02 00 03 00 04 00 05 00 06 00 |................|\r\n",
"00000010 07 00 08 00 09 00 0a 00 14 00 00 00 0b 00 00 00 |................|\r\n",
"00000020 48 65 6c 6c 6f 20 57 6f 72 6c 64 0b 00 00 00 |Hello World....|\r\n",
"0000002f\r\n"
]
}
],
"source": [
"!hexdump -C unformatted_data.dat"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"You can still see the values of 1 through 10 (`01 00` through `0a 00`), but you can also see that it's no longer the first value. \n",
"It starts with `14 00 00 00`, or 20, which is the number of bytes that make this list up. After the array, the 20 is repeated, in case you read in reverse.\n",
"\n",
"Next comes `0b 00 00 00`, or 11 -- exactly the number of bytes in \"Hello World\", again followed by a repeat of the record header 11.\n",
"\n",
"### Python read\n",
"\n",
"To read this data in Python, you need to know the data type of the header, almost always an unsigned int, and usually 4 bytes in length:"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be a 32 or 64 bit value, depending on compiler options. Also be careful about big vs little endian values. It may be useful to write a known header at the start of the file so this can be identified, but then if you're going that in depth you should be using netcdf

]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[ 1 2 3 4 5 6 7 8 9 10]\n",
"b'Hello World'\n"
]
}
],
"source": [
"from scipy.io import FortranFile\n",
"ff=FortranFile('unformatted_data.dat', 'r', '<u4')\n",
"print(ff.read_record('<i2'))\n",
"print(b''.join(ff.read_record('S1')))\n",
"ff.close()"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.1"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Loading