Skip to content

Commit

Permalink
Merge pull request #10 from 3dem/devel
Browse files Browse the repository at this point in the history
0.0.9
  • Loading branch information
azazellochg authored Mar 18, 2022
2 parents d311746 + 60e6737 commit b32b7f2
Show file tree
Hide file tree
Showing 12 changed files with 37,928 additions and 37,473 deletions.
2 changes: 2 additions & 0 deletions CHANGES.txt
Original file line number Diff line number Diff line change
@@ -1,2 +1,4 @@
0.0.9: add optional column types dict, update readme and tests for relion 4
0.0.8: fix appending a row to existing table
0.0.7: exported Column via Table class
0.0.6: first version for pip
84 changes: 84 additions & 0 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,3 +15,87 @@ Authors
* Jose Miguel de la Rosa-Trevín, Department of Biochemistry and Biophysics, Science for Life Laboratory, Stockholm University, Stockholm, Sweden
* Grigory Sharov, MRC Laboratory of Molecular Biology, Cambridge Biomedical Campus, England

Testing
-------

``python3 -m unittest discover emtable/tests``

Examples
--------

Each table in STAR file usually has a *data\_* prefix. You don't need to specify it, only the remaining table name is required. You can use either method below:

* option 1: ``Table(fileName=modelStar, tableName='perframe_bfactors')``
* option 2: ``Table("perframe_bfactors@" + modelStar)``

Be aware that from Relion 3.1 particles table name has been changed from "data_Particles" to "data_particles".

To start using the package, simply do:

.. code-block:: python
from emtable import Table
Reading
#######

For example, we want to read the whole *rlnMovieFrameNumber* column from modelStar file, table *data_perframe_bfactors*.

The code below will return a list of column values from all rows:

.. code-block:: python
table = Table(fileName=modelStar, tableName='perframe_bfactors')
frame = table.getColumnValues('rlnMovieFrameNumber')
We can also iterate over rows from "data_particles" Table:

.. code-block:: python
table = Table(fileName=dataStar, tableName='particles')
for row in table:
print(row.rlnRandomSubset, row.rlnClassNumber)
Alternatively, you can use **iterRows** method which also supports sorting by a column:

.. code-block:: python
mdIter = Table.iterRows('particles@' + fnStar, key='rlnImageId')
If for some reason you need to clear all rows and keep just the Table structure, use **clearRows()** method on any table.

Writing
#######

If we want to create a new table with 3 pre-defined columns, add rows to it and save as a new file:

.. code-block:: python
tableShifts = Table(columns=['rlnCoordinateX',
'rlnCoordinateY',
'rlnAutopickFigureOfMerit',
'rlnClassNumber'])
tableShifts.addRow(1024.54, 2944.54, 0.234, 3)
tableShifts.addRow(445.45, 2345.54, 0.266, 3)
tableShifts.write(f, tableName="test", singleRow=False)
*singleRow* is **False** by default. If *singleRow* is **True**, we don't write a *loop_*, just label-value pairs. This is used for "one-column" tables, such as below:


.. code-block:: bash
data_general
_rlnImageSizeX 3710
_rlnImageSizeY 3838
_rlnImageSizeZ 24
_rlnMicrographMovieName Movies/20170629_00026_frameImage.tiff
_rlnMicrographGainName Movies/gain.mrc
_rlnMicrographBinning 1.000000
_rlnMicrographOriginalPixelSize 0.885000
_rlnMicrographDoseRate 1.277000
_rlnMicrographPreExposure 0.000000
_rlnVoltage 200.000000
_rlnMicrographStartFrame 1
_rlnMotionModelVersion 1
96 changes: 60 additions & 36 deletions emtable/metadata.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,8 +19,8 @@
# *
# **************************************************************************

__version__ = '0.0.8'
__author__ = 'Jose Miguel de la Rosa Trevin'
__version__ = '0.0.9'
__author__ = 'Jose Miguel de la Rosa Trevin, Grigory Sharov'


import os
Expand All @@ -32,7 +32,6 @@
class _Column:
def __init__(self, name, type=None):
self._name = name
# Get the type from the LABELS dict, assume str by default
self._type = type or str

def __str__(self):
Expand All @@ -43,7 +42,7 @@ def __cmp__(self, other):
and self.getType() == other.getType())

def __eq__(self, other):
return self.__cmp__(other)
return self.__cmp__(other)

def getName(self):
return self._name
Expand Down Expand Up @@ -87,22 +86,32 @@ def getColumnNames(self):
return [c.getName() for c in self.getColumns()]

# ---------------------- Internal Methods ----------------------------------
def _createColumns(self, columnList, line=None, guessType=False):
""" Create the columns, optionally, a data line can be passed
to infer the Column type.
def _createColumns(self, columnList, values=None, guessType=False, types=None):
""" Create the columns.
Args:
columnList: it can be either a list of Column objects or just
strings representing the column names
values: values of a given line to guess type from
guessType: If True the type of a given column (if not passed in
types) will be guessed from the line of values
types: It can be a dictionary {columnName: columnType} pairs that
allows to specify types for certain columns.
"""
self._columns.clear()

if isinstance(columnList[0], _Column):
for col in columnList:
self._columns[col.getName()] = col
else:
if line and guessType:
typeList = _guessTypesFromLine(line)
else:
typeList = [str] * len(columnList)

for colName, colType in zip(columnList, typeList):
values = values or []
types = types or {}
for i, colName in enumerate(columnList):
if colName in types:
colType = types[colName]
elif guessType and values:
colType = _guessType(values[i])
else:
colType = str
self._columns[colName] = _Column(colName, colType)

self._createRowClass()
Expand Down Expand Up @@ -134,12 +143,14 @@ def get(self, key, default=None):
class _Reader(_ColumnsList):
""" Internal class to handling reading table data. """

def __init__(self, inputFile, tableName='', guessType=True):
def __init__(self, inputFile, tableName='', guessType=True, types=None):
""" Create a new Reader given a filename or file as input.
Args:
inputFile: can be either an string (filename) or file object.
tableName: name of the data that will be read.
guessType: if True, the columns type is guessed from the first row.
types: It can be a dictionary {columnName: columnType} pairs that
allows to specify types for certain columns.
"""
_ColumnsList.__init__(self)

Expand All @@ -163,14 +174,20 @@ def __init__(self, inputFile, tableName='', guessType=True):
values.append(parts[1])
line = self._file.readline().strip()

self._createColumns(colNames, line, guessType)
self._types = [c.getType() for c in self.getColumns()]
self._singleRow = not foundLoop

if foundLoop:
values = line.split() if line else []

self._createColumns(colNames,
values=values, guessType=guessType, types=types)
self._types = [c.getType() for c in self.getColumns()]


if self._singleRow:
self._row = self.__rowFromValues(values)
else:
self._row = self.__rowFromValues(line.split()) if line else None
self._row = self.__rowFromValues(values) if values else None

def __rowFromValues(self, values):

Expand Down Expand Up @@ -328,16 +345,20 @@ def clearRows(self):
def addRow(self, *args, **kwargs):
self._rows.append(self.Row(*args, **kwargs))

def readStar(self, inputFile, tableName=None, guessType=True):
"""
:param inputFile: Provide the input file from where to read the data.
The file pointer will be moved until the last data line of the
requested table.
:param tableName: star table name
:return:
def readStar(self, inputFile, tableName=None, guessType=True, types=None):
""" Parse a given table from the input star file.
Args:
inputFile: Provide the input file from where to read the data.
The file pointer will be moved until the last data line of the
requested table.
tableName: star table name
guessType: if True, the columns type is guessed from the first row.
types: It can be a dictionary {columnName: columnType} pairs that
allows to specify types for certain columns.
"""
self.clear()
reader = _Reader(inputFile, tableName=tableName, guessType=guessType)
reader = _Reader(inputFile,
tableName=tableName, guessType=guessType, types=types)
self._columns = reader._columns
self._rows = reader.readAll()
self.Row = reader.Row
Expand All @@ -347,13 +368,12 @@ def read(self, fileName, tableName=None):
self.readStar(f, tableName)

def writeStar(self, outputFile, tableName=None, singleRow=False):
"""
Write a Table in Star format to the given file.
:param outputFile: File handler that should be already opened and
in the position to write.
:param tableName: The name of the table to write.
:param singleRow: If True, don't write loop_, just label - value pairs.
:param writeRows: write data rows
""" Write a Table in Star format to the given file.
Args:
outputFile: File handler that should be already opened and
in the position to write.
tableName: The name of the table to write.
singleRow: If True, don't write loop_, just label - value pairs.
"""
writer = _Writer(outputFile)
writer.writeTableName(tableName)
Expand Down Expand Up @@ -392,7 +412,7 @@ def addColumns(self, *args):
Examples:
table.addColumns('rlnDefocusU=rlnDefocusV', 'rlnDefocusAngle=0.0')
"""
#TODO:
# TODO:
# Maybe implement more complex value expression,
# e.g some basic arithmetic operations or functions

Expand Down Expand Up @@ -480,20 +500,24 @@ def iterRows(fileName, key=None, reverse=False, **kwargs):
Convenience method to iterate over the rows of a given table.
Args:
fileName: the input star filename, it migth contain the '@'
fileName: the input star filename, it might contain the '@'
to specify the tableName
key: key function to sort elements, it can also be an string that
will be used to retrieve the value of the column with that name.
reverse: If true reverse the sort order.
**kwargs:
tableName: can be used explicit instead of @ in the filename.
types: It can be a dictionary {columnName: columnType} pairs that
allows to specify types for certain columns in the internal reader
"""
if '@' in fileName:
tableName, fileName = fileName.split('@')
else:
tableName = kwargs.get('tableName', None)
tableName = kwargs.pop('tableName', None)

# Create a table iterator
with open(fileName) as f:
reader = _Reader(f, tableName)
reader = _Reader(f, tableName, **kwargs)
if key is None:
for row in reader:
yield row
Expand Down
Loading

0 comments on commit b32b7f2

Please sign in to comment.