Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proper database schema #224

Open
jklmnn opened this issue Jan 10, 2021 · 5 comments
Open

Proper database schema #224

jklmnn opened this issue Jan 10, 2021 · 5 comments

Comments

@jklmnn
Copy link
Member

jklmnn commented Jan 10, 2021

The current database schema consists only of a timestamp together with the city and the full JSON dump for this city and timestamp. This results in many problems, mainly bad performance and a waste of space (never changing attributes, e.g. lot coordinates, are saved each time even though they never change). A new database scheme should have the following properties:

  • save static data (e.g. coordinates, total size...) only once
  • allow location based access (see Lot pooling + coordinate query #130)
  • allow city based access to provide the current API

This is my concept of how the new database could look like:

Screenshot_20210110_142629

@kiliankoe @hbruch what are your thoughts on this?

<?xml version="1.0" encoding="utf-8" ?>
<!-- SQL XML created by WWW SQL Designer, https://github.com/ondras/wwwsqldesigner/ -->
<!-- Active URL: http://127.0.0.1:8080/ -->
<sql>
<datatypes db="mysql">
	<group label="Numeric" color="rgb(238,238,170)">
		<type label="Integer" length="0" sql="INTEGER" quote=""/>
	 	<type label="TINYINT" length="0" sql="TINYINT" quote=""/>
	 	<type label="SMALLINT" length="0" sql="SMALLINT" quote=""/>
	 	<type label="MEDIUMINT" length="0" sql="MEDIUMINT" quote=""/>
	 	<type label="INT" length="0" sql="INT" quote=""/>
		<type label="BIGINT" length="0" sql="BIGINT" quote=""/>
		<type label="Decimal" length="1" sql="DECIMAL" re="DEC" quote=""/>
		<type label="Single precision" length="0" sql="FLOAT" quote=""/>
		<type label="Double precision" length="0" sql="DOUBLE" re="DOUBLE" quote=""/>
	</group>

	<group label="Character" color="rgb(255,200,200)">
		<type label="Char" length="1" sql="CHAR" quote="'"/>
		<type label="Varchar" length="1" sql="VARCHAR" quote="'"/>
		<type label="Text" length="0" sql="MEDIUMTEXT" re="TEXT" quote="'"/>
		<type label="Binary" length="1" sql="BINARY" quote="'"/>
		<type label="Varbinary" length="1" sql="VARBINARY" quote="'"/>
		<type label="BLOB" length="0" sql="BLOB" re="BLOB" quote="'"/>
	</group>

	<group label="Date &amp; Time" color="rgb(200,255,200)">
		<type label="Date" length="0" sql="DATE" quote="'"/>
		<type label="Time" length="0" sql="TIME" quote="'"/>
		<type label="Datetime" length="0" sql="DATETIME" quote="'"/>
		<type label="Year" length="0" sql="YEAR" quote=""/>
		<type label="Timestamp" length="0" sql="TIMESTAMP" quote="'"/>
	</group>
	
	<group label="Miscellaneous" color="rgb(200,200,255)">
		<type label="ENUM" length="1" sql="ENUM" quote=""/>
		<type label="SET" length="1" sql="SET" quote=""/>
		<type label="Bit" length="0" sql="bit" quote=""/>
	</group>
</datatypes><table x="512" y="221" name="Pools">
<row name="Name" null="0" autoincrement="0">
<datatype>VARCHAR(255)</datatype>
</row>
<row name="Source" null="0" autoincrement="0">
<datatype>MEDIUMTEXT</datatype>
</row>
<row name="Provider" null="1" autoincrement="0">
<datatype>MEDIUMTEXT</datatype>
<default>NULL</default></row>
<row name="URL" null="1" autoincrement="0">
<datatype>MEDIUMTEXT</datatype>
<default>NULL</default></row>
<row name="License" null="1" autoincrement="0">
<datatype>MEDIUMTEXT</datatype>
<default>NULL</default></row>
<key type="PRIMARY" name="Name">
<part>Name</part>
</key>
</table>
<table x="664" y="224" name="Lots">
<row name="ID" null="0" autoincrement="0">
<datatype>VARCHAR(255)</datatype>
</row>
<row name="Pool" null="0" autoincrement="0">
<datatype>VARCHAR(255)</datatype>
<relation table="Pools" row="Name" />
</row>
<row name="Type" null="0" autoincrement="0">
<datatype>ENUM</datatype>
</row>
<row name="Name" null="0" autoincrement="0">
<datatype>MEDIUMTEXT</datatype>
</row>
<row name="Address" null="1" autoincrement="0">
<datatype>MEDIUMTEXT</datatype>
</row>
<row name="Latitute" null="1" autoincrement="0">
<datatype>FLOAT</datatype>
<default>NULL</default></row>
<row name="Longitute" null="1" autoincrement="0">
<datatype>FLOAT</datatype>
<default>NULL</default></row>
<key type="PRIMARY" name="Lot_ID">
<part>ID</part>
</key>
</table>
<table x="801" y="226" name="Occupancies">
<row name="Timestamp" null="0" autoincrement="0">
<datatype>TIMESTAMP</datatype>
</row>
<row name="Lot" null="0" autoincrement="0">
<datatype>VARCHAR(255)</datatype>
<relation table="Lots" row="ID" />
</row>
<row name="Total" null="0" autoincrement="0">
<datatype>BIGINT</datatype>
<default>0</default></row>
<row name="Free" null="0" autoincrement="0">
<datatype>BIGINT</datatype>
<default>0</default></row>
<row name="State" null="0" autoincrement="0">
<datatype>ENUM</datatype>
</row>
<key type="PRIMARY" name="Live_Key">
<part>Timestamp</part>
<part>Lot</part>
</key>
</table>
</sql>
@kiliankoe
Copy link
Member

kiliankoe commented Jan 12, 2021

Looks good 👍

A couple minor things:

  • I'd suggest adding a table for datasources with their attribution info, source data and possibly an optional location as well? They don't necessarily have to match the pools, do they? Or will we just have pools for each datasource and additional pools with no further metadata? Would that require marking one pool as the canonical pool for a lot if one means to access the source metadata?
  • Lots should have a bit more metadata like a detail URL, pricing info (just as a string), opening hours (same) and possibly more address fields like region (optional), city and country? Also I think it would make sense to add a field for additional information (possibly again using postgres' json type) to support those things we've wanted to cover in the past, but overlooked. Stuff like entrance height if something like that is available. Apps would basically just show a list of key-value pairs if anything is available. Oh and there's typos in latitude and longitude, just fyi.
  • I think it would make sense to split up the timestamp for occupancies into two values, one for when the data was downloaded/fetched and one for when the data was updated (in case that information is available - which it mostly is).

@jklmnn
Copy link
Member Author

jklmnn commented Jan 12, 2021

Or will we just have pools for each datasource and additional pools with no further metadata? Would that require marking one pool as the canonical pool for a lot if one means to access the source metadata?

A pool is a data source. Pools are not meant to be reflected in the public API (except for the current legacy one).

Lots should have a bit more metadata like a detail URL, pricing info (just as a string), opening hours (same) and possibly more address fields like region (optional), city and country? Also I think it would make sense to add a field for additional information (possibly again using postgres' json type) to support those things we've wanted to cover in the past, but overlooked.

I was thinking about that. How easy/hard is it to add additional (optional) fields later to a table? Instead of adding fields in case we might need them and adding a json field that then serves as a dump for anything we forgot I'd like to add fields as we need them and alter the database accordingly. These fields all have to be optional (otherwise old data would break).

@kiliankoe
Copy link
Member

If pools are not meant to be publicly accessed, how would one see the license information?

Also I don't think it's feasible to add lot fields for every possible bit of data. There are many things to specifically add support for, but there'll always be things that are only relevant for a few lots. Having a fallback for that seems like a good solution imho.

@jklmnn
Copy link
Member Author

jklmnn commented Jan 12, 2021

If pools are not meant to be publicly accessed, how would one see the license information?

This data would be added per lot.

@kiliankoe
Copy link
Member

Ah, that makes sense 👍 But querying by pool would be possible, right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants