Building blocks for IoT Architecture and Technology

Jani Karhunen © 2016

## Focus This presentation gives few alternative building blocks for IoT / Industrial Internet solutions. Presentation progresses chronologically from data collection through data storage to data utilization. *By no means is this presentation the only truth to the vast subject. I simply present components, methods and services which I am familiar and have experience with.*
# Section 1: Architecture
## Paradigm 1: Utilization of Cloud Technology and Services
## Paradigm 2: Scalability, independence, fault tolerance, reactivity & flexibility (also cost wise) [**Microservices**](https://en.wikipedia.org/wiki/Microservices) and **Containers**, e.g. [Docker](https://www.docker.com/)
# Paradigm 3: [*Serverless Architecture*](https://github.com/serverless/serverless)
## Paradigm 4: Utilization of **APIs** (*Application Programming Interface*). As both - consumer and provider.
## Paradigm 5: Web Browser based interfaces for management and use, which scales to all devices - from desktop to mobile.
## Paradigm 6: Management ja Information Security
# Section 2: Data collection & transfer
## Sensors and meters Vast amount of different options based on the use case and requirements, including: - environment (e.g. temperature, humidity, pressure, luminance) - energy, current, voltage - flow of gas/liquid
## Data transfer Vast amount of choices based on use case and requirements, e.g.: TCP/IP based solutions - WLAN - Ethernet - 2/3/4G Mobile networks Bus based solutions - [Modbus](https://en.wikipedia.org/wiki/Modbus) - [Profibus](https://en.wikipedia.org/wiki/Profibus) Some are very easy to implement, some need more work.
# Section 3: Data Storage
## Data Storage alternatives - Relational Databases, [MS SQL Server](https://www.microsoft.com/en-us/server-cloud/products/sql-server/), [Oracle DB](https://www.oracle.com/database/index.html), [MariaDB/MySQL](https://mariadb.org/), [PostgreSQL](http://www.postgresql.org/), [Amazon RDS](https://aws.amazon.com/rds/) - NoSQL Databases, e.g. [MongoDB](https://www.mongodb.org/), [Cassandra](http://cassandra.apache.org/), [Neo4j](http://neo4j.com/), [Amazon DynamoDB](https://aws.amazon.com/dynamodb/) - [Time Series optimized Databases](https://en.wikipedia.org/wiki/Time_series_database), e.g. [InfluxDB](https://influxdata.com/) and [OpenTSDB](http://opentsdb.net/)
# Section 4: Data Validation & Processing
## Validation The raw data from sensors and meters must be validated either before or after storage. Processing rules determines, what is done to *invalid* and *missing* values.
## Missing & invalid value correctin Processing rules defines, what kind of *algorithm* estimates missing or invalid values to complete the time series.
# Section 5: Integration & APIs
## Data transfer between private (own) services 1. [API](https://en.wikipedia.org/wiki/Application_programming_interface) 2. [Queue](https://en.wikipedia.org/wiki/Message_broker), e.g. [RabbitMQ](https://www.rabbitmq.com/) and [Amazon Simple Queue Service](https://aws.amazon.com/sqs/)
## Data transfer between 3rd party (public/external) services 1. [API](https://en.wikipedia.org/wiki/Application_programming_interface), individual messages 2. File based transfer (e.g. XML, JSON, CSV), mass processing
# Section 6: Data Visualization & reporting
## Data Visualization Data from sensors and other integrated data sources can be made useful to customers and business by visualizations. Data can be utilized in Business Intelligence tools, such as [Qlik](http://www.qlik.com/) or [Tableau](http://www.tableau.com/). ... or it can be part of a online service, visualized with e.g. [D3.js](https://d3js.org/) or other available tools.
## Reporting Data can be presented in static reports, which are automatically delivered e.g. via email or to a [Slack](https://slack.com/) channel.
# Section 7: Advanced Data Processing
## Machine Learning and Predictive Analytics [Machine Learning](https://en.wikipedia.org/wiki/Machine_learning) enables advanced tools, e.g. in a form of recommendation engines. There are few "plug-in" SaaS platforms for Machine Learning, e.g. [Amazon Machine Learning](https://aws.amazon.com/machine-learning/)
# Section 8: Platforms
## Infrastructure as Code Entire Data Center can be defined as code, which can be executed and thus is automatically deployed. - [ThoughtWorks Infrastructure as Code: A Reason to Smile](https://www.thoughtworks.com/insights/blog/infrastructure-code-reason-smile) - [Ansible](https://www.ansible.com/) - [Vagrant](https://www.vagrantup.com/) - [AWS CloudFormation](https://aws.amazon.com/cloudformation/)
## Amazon Web Services [AWS](https://aws.amazon.com/) offers vast amount of various IaaS and PaaS Services with *Pay as you go* principle.
## Microsoft Azure [Azure IoT Suite](https://azure.microsoft.com/en-us/solutions/iot-suite/) offers large amount of various IaaS and PaaS Services with *Pay as you go* principle.
## IBM Bluemix [IBM Bluemix](http://www.ibm.com/cloud-computing/bluemix/) offers quite large amount of various IaaS and PaaS Services with *Pay as you go* principle.
# Section 9: Monitoring and management
## On the Cloud Just as everything else, monitoring and management of services and resources is possible to implement with cloud tools, e.g. - [Amazon CloudWatch](https://aws.amazon.com/cloudwatch/)
# Simple example
## Temperature measurement and visualization As an extremely simple example of the previous technologies I present a solution which measures temperatures and relative humidity in few places in my house and yard.
## What and where? 1. Measurement and data collection with a [**RaspberryPi** based solution](https://orchidbits.fi/2015/03/10/environment-monitoring-with-sensors-and-raspberry-pi/) and wireless sensors. Measurement is done in 1 h resolution. 2. Data is presented on a (private) web site, and in **Twitter** and **Slack** channel via APIs in real time.
## How, part 1 3. Measurement event is sent to a queue waiting processing and storage. Implemented with [**CloudAMQP**](https://www.cloudamqp.com/) 4. Measurement event is pulled from the queue and processed on a VPS in [Hetzner's cloud](http://www.hetzner.de/en/) with a [Python](https://www.python.org/) application. The application stores the measurement event to a PostgreSQL (relational) database, which is self-hosted on the same VPS.
## How, part 2 5. A separate Python application validates measurements, and fills missing values based on previous values on the same time series. The same processing is done to significantly differing values. 6. Measurement data is visualized in real time on a private web site with D3.js. 7. Data is posted on Twitter and on Slack channel via APIs.
## Further development There are numerous ideas and possibilities to develop this solution further, e.g. - expand with new sensors and measurements - develop the visualization - estimation of future temperature and humidity (which is obviously not going to work very well...)