Skip to content

Commit

Permalink
Add images to README
Browse files Browse the repository at this point in the history
  • Loading branch information
tylercollier committed Apr 15, 2021
1 parent 5836f54 commit 2ca3cc8
Showing 1 changed file with 31 additions and 17 deletions.
48 changes: 31 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
# openresync

Open Real Estate Sync (openresync) is a node application that syncs (replicates) MLS data from one or more sources via the RESO Web API (such as Trestle or Bridge Interactive) to one or more destinations (e.g. MySQL), and allows you to see the sync status via a local website.
Open Real Estate Sync (openresync) is a node application that syncs (replicates) MLS data from one or more sources via the RESO Web API (such as Trestle or Bridge Interactive) to one or more destinations (only MySQL is supported so far), and allows you to see the sync status via a local website.

It is meant to be used by developers.

It helps you answer these questions:
It does the syncing for you, and helps you answer these questions:

* When did the last sync occur and was it successful?
* What is the most recent listing
* What is the most recent record (e.g. listing)?

## Project status

Expand All @@ -29,6 +29,18 @@ This code is not being used in production by the author yet.

### Screenshots

Sync from multiple sources (MLSs):

![Sync from multiple sources (MLSs)](https://user-images.githubusercontent.com/366538/114815106-65815100-9d6a-11eb-8cf2-7ae0dd78146f.png)

See details per source, such as the cron schedule, how many records are in the MLS vs in your destinations, and the sync (and purge) histories:

![See details per source](https://user-images.githubusercontent.com/366538/114815112-69ad6e80-9d6a-11eb-8e3e-89f828c9ecab.png)

See all the corn schedules at once, which makes it easier to not overlap them:

![See all the cron schedules at once](https://user-images.githubusercontent.com/366538/114815117-6c0fc880-9d6a-11eb-8751-6683b8569238.png)

## How do I use it?

Install the app, configure it, start the back-end server, start the web server, and visit the local website that runs.
Expand Down Expand Up @@ -63,11 +75,13 @@ As mentioned, this code is not yet being run in production by the author yet. Yo

See the heavily commented `config/config.example.js`. Copy it to `config/config.js` and edit it according to your needs.

There is an internal configuration file you should be aware, which is described in the "How does it work?" section.
There is an internal configuration file you should be aware of, which is described in the "How does it work?" section.

### .env

It's recommended to put secrets in a .env file. These will be read using the `dotenv` library and available for your config file in `process.env` values.
It's recommended to put secrets in a .env file. These will be read automatically using the `dotenv` library and available for your config file in `process.env` values.

There's no `example.env` type of file because there are no standard fields you should configure. For example, in a project that uses the Austin Board of Realtors sample dataset, you might use environment variables like ABOR_CLIENT_ID and ABOR_CLIENT_SECRET to store your Oauth credentials, and then you could reference them with e.g. `process.env.ABOR_CLIENT_ID` in your `config/config.js` file. There's no particular recommendation other than you keep your secrets out of the git repository.

## How does it work?

Expand All @@ -81,7 +95,7 @@ The server is responsible for hosting the cron jobs that do the sync work as wel

### Sync (aka replication) process

There is a initial sync and an ongoing sync. The initial sync could take hours depending on the platform and number of records in the MLS and if you filter out any. The ongoing sync would be expected to only take a minute or less if you run it say every 15 minutes.
There is an initial sync and an ongoing sync. The initial sync could take hours depending on the platform and number of records in the MLS and if you filter out any. The ongoing sync would be expected to only take a minute or less if you run it say every 15 minutes.

At a high level, first the data is downloaded from the MLS and put into files in a different directory for each resource. Once all are successfully downloaded, the sync process will go through all the files and sync the data to each destination. If there is an error, it will be logged, and retried on the next run.

Expand Down Expand Up @@ -123,36 +137,36 @@ Logging is done in [ndjson](http://ndjson.org/) format, so that if you have to g

## Customizing

It is not recommended to change any code. Or if you do, do so in a new branch. Otherwise it will be difficult for you to upgrade when new versions are released.
It is not recommended to change any code. Or if you do, do so in a new branch. Otherwise it will be difficult for you to upgrade when new versions are released. If you need behavior that doesn't exist, it would be best to create a feature request issue in Github. We need samples from the wild to know what features would be useful.

## Q&A

**Q:** Why would I use this tool and sync data locally, rather than querying the RESO Web API directly?
**A:** It's true that the RESO Web API is superior to RETS, and one reason is it allows you to efficiently query the API for specific results that could then e.g. be showed on a website. However, there are a number of use cases to sync the data locally. If you don't fit into any of the cases listed below, then you will be better off querying the MLS platform directly.
**A:** It's true that the RESO Web API is generally superior to RETS, and one reason is it allows you to efficiently query the API for specific results that could then e.g. be shown on a website. However, there are a number of use cases to sync the data locally. If you don't fit into any of the cases listed below, then you will probably be better off querying the MLS platform directly.

In the following list, there are ideas that are beyond what this application does on its own. But you'd have the power to take things another step and accomplish things the RESO Web API can't.

* Aggregates like "What's the median price?", or "What's the average number of pictures per listing?"
* Massage data
* E.g. in Phoenix, Ahwatukee is not a city, but people treat it like one. You could make searches done by your users automatically turn requests for the village (not city) of Ahwatukee into a search for the 3 zip codes representing Ahwatukee.
* Make your own fields. For example, there is no address field, but you could make your own.
* Full text search, e.g. searching the public remarks field using full stemming. This would likely require an extra destination not currently offered, such as Elastic Search. But the point is that this couldn't be done via RESO Web API.
* Make your own fields. For example, there is no address field, but you could make your own. This could simplify your code.
* Full text search, e.g. searching the public remarks field using full stemming. This would likely require an extra destination not currently offered, such as Elastic Search. But the point is that this can't currently be done via RESO Web API.
* Reference other fields
* E.g. say I want to do a query to see where ModificationTimestamp != MediaModificationTimestamp on the Media resource. But you can't refer to other fields, you can only refer to literal values in RESO Web API.
* Basically anything the RESO Web API doesn't offer. For example, some platforms offer polygon searches. But you couldn't e.g. search with a simple centroid and radius. If you build your own API using the data synced by this tool, you could do such a thing.
* E.g. say I want to do a query to see where ModificationTimestamp != MediaModificationTimestamp on the Media resource. But you can't do such a complex query in RESO Web API.
* Basically anything the RESO Web API doesn't offer. For example, some platforms offer polygon searches. But you can't e.g. search with a simple centroid and radius. If you build your own API using the data synced by this tool, you could do such a thing.

**Q:** So it just syncs the data? Is that useful? Can I e.g. show the data on a website?
**A:** Yes, it just syncs the data. But this is the mission of this project and should be a large chunk of any work needed to produce a project that uses the data. You'll still have plenty of work left to do such as field mapping (especially if you use multiple MLS sources and intend to harmonize their data and show it in one place consistently). Of course whether you're allowed to show the data publicly is a legal concern you'll need to talk with each MLS about.
**A:** Yes, it just syncs the data. But this is the mission of this project and should be a large chunk of any work needed to produce a project that uses the data. You'll still have work left to do such as field mapping (especially if you use multiple MLS sources and intend to harmonize their data and show it in one place consistently). Of course whether you're allowed to show the data publicly is a legal concern you'll need to talk with each MLS about.

**Q:** How many sources can I sync at once?
**A:** Not sure. I haven't tried more than one. Because a lot of the work done can be offloaded from node (e.g. downloading files, writing JSON files to disk, sending data to MySQL, etc), it's likely a bunch. I would still recommend trying to offset the cron schedules from one another. Another factor is if you'll be writing to the same table or different ones. For example, if you're just doing Property records from different MLSs and write to a single Property table, you might get lock problems. But if you use different MySQL databases per source, or use the `makeTableName` concept to prefix your table names such that two sync processes aren't writing to the same table, MySQL will probably be able to handle it just fine.
**Q:** How many sources can I realistically sync at once?
**A:** Not sure. I haven't tried more than one at a time. Because a lot of the work done can be offloaded from node (e.g. downloading files, writing JSON files to disk, sending data to MySQL, etc), it's likely a bunch. I would still recommend trying to offset the cron schedules from one another. Another factor is if you'll be writing to the same table or different ones. For example, if you're just doing Property records from different MLSs and write to a single Property table, you might get lock problems. But if you use different MySQL databases per source, or use the `makeTableName` concept to prefix your table names such that two sync processes aren't writing to the same table, MySQL will probably be able to handle it just fine.

**Q:** Do I have to use the web server?
**A:** No. You could use the code in the `lib/sync` dir as a library and run the download, sync, and purge processes as you see fit. See `lib/go.js` as an example. I intend to turn the sync code into its own npm module.
**A:** No. You could use the code in the `lib/sync` dir as a library and run the download, sync, and purge processes as you see fit. See `lib/sync/go.js` as an example. I intend to turn the sync code into its own npm module.

## Known limitations

1. One of the main value propositions of this application is to make it robust in error handling. It is desired that the application not crash and wisely show error sitautions to the user. However, this has not been tested very thoroughly. Some errors might be swalled altogether. Some errors are quite verbose and we don't shorten these yet. It would definitely be great to catch 502 and 504 errors from the platforms and retry downloads, but this is not done yet.
1. One of the main value propositions of this application is to make it robust in error handling. It is desired that the application not crash and wisely show error situations to the user. However, this has not been tested thoroughly. Some errors might be swallowed altogether. Some errors are quite verbose and we don't shorten these yet. It would definitely be great to catch 502 and 504 errors from the platforms and retry downloads, but this is not done yet.

## Roadmap

Expand Down

0 comments on commit 2ca3cc8

Please sign in to comment.