1 Deliverable D1.1. Review of Reference Architectures for IoT Enabled Agriculture Development. Consultancy Design of a reference architecture for an IoT sensor network. Company International Center for Tropical Agriculture (CIAT). Km. 17 Recta Cali-Palmira. Palmira, Valle del Cauca, Colombia. Consultant Dr. Jairo Alejandro Gómez Escobar, Eng. PhD. E-mail: jairo.alejandro.gomez@ieee.org. Date 29th of April 2019. Revision 1.2. 1. Disclaimer This report (Report) has been commissioned by the CGIAR Platform for Big Data in Agriculture, represented by the International Center for Tropical Agriculture (CIAT). This Report was produced independently by Dr. Jairo Alejandro Gómez (Consultant), and it provides information intended for CIAT. None of the statements contained in this report are intended to establish any obligation, standard or procedure by or on behalf of CIAT. By virtue of publishing this report, CIAT is under no obligation to adopt any of the recommendations set forth in the Report. The views expressed in this Report are not necessarily the views of CIAT. The information, statements, statistics, guidelines, and commentary (together the ‘Information’) contained in this Report have been prepared by Consultant from publicly available material and from discussions held with researchers from CIAT. While the consultant has made every attempt to ensure that the information contained in this Report has been obtained from reliable sources, the consultant is not responsible for any errors or omissions, or for the results obtained from the use of this information. All information in this Report is provided "as is", with no guarantee of completeness, accuracy, timeliness or of the results obtained from the use of this information, and without warranty of any kind, express or implied, including, but not limited to warranties of performance, merchantability and fitness for a particular purpose. The Information contained in this Report has not been subject to an audit. Care has been taken in the preparation of this Report, but all advice, analysis, calculations, information, forecasts, guidelines, and recommendations are supplied for the assistance of CIAT and nothing herein shall to any extent be relied on as authoritative or as in substitution for the exercise of judgment by CIAT. The consultant engaged in the preparation of this Report shall not have any liability whatsoever for any direct or consequential loss arising from use of this Report or its contents and gives no warranty or representation (express or implied) as to the quality or fitness for the purpose of any architecture, design, process, product, or system referred to in the Report. In no event will the Consultant be liable to CIAT or anyone else for any decision made or action taken in reliance on the information in this Report or for any consequential, special or similar damages, even if advised of the possibility of such damages. Certain links in this Report connect to Web Sites maintained by third parties over whom the Consultant has no control. The Consultant makes no representations as to the accuracy or any other aspect of information contained in those Web Sites. 2 Review of Reference Architectures for IoT Enabled Agriculture Development 2. Executive summary In order to feed a growing global population worldwide and in light of new challenges including climate variability, degrading ecosystems, and loss of arable land, we must leverage the insights, agility, and precision made possible by digital tools and technologies in agriculture worldwide. To achieve this in developing economy contexts, much still must be learned about the appropriate technologies to collect, analyze, and act on relevant data for food security. Fortunately, in the last few decades, the data collection process has evolved. Expensive point-to-point networks gave way to distributed wide-area wireless sensor networks, and this set the stage for the Internet of Things (IoT). The IoT is a recent paradigm under which an array of electronic devices (known collectively as “things”) provide and exchange data through a network, very often through the Internet but it can also be a local or wide-area network. IoT approaches are enabling the development of new services spanning the perception-planning-action cycle. There are many different alternatives to abstract an IoT network, and this report highlights some of the most representative architectures developed by technological leaders, open-source communities, and researchers which are briefly summarized in the following paragraphs. Intel provides a compelling vision for end-to-end IoT solutions that includes a layered architecture for secure deployments, an example of common data flows as well as their characterization, a set of high-level software components and interfaces, and a detailed view of the possible communication technologies and protocols that can coexist in the solution. Intel has achieved a good balance with its architecture between abstraction, complexity, and usefulness. Intel provides development boards, gateways, and neural compute sticks that can enhance IoT applications, and in terms of software, it is contributing with other companies and the Linux Foundation to develop Zephyr1, a scalable real-time operating system (RTOS) supporting multiple hardware platforms. As Intel doesn’t provide implementation details of the software components in its architecture, they have to be mapped to the service solutions implemented by third-party companies. Over time, Intel has partnered with Amazon Web Services, Microsoft Azure, and Google Cloud Platforms to connect its hardware to the cloud2. Amazon Web Services (AWS) provides a simplified architecture with “things” that handle sensing and actuation, the cloud that handles the storage and the computing, and a set of services that enable the intelligence to transform logic and insights into action. Amazon provides a set of specific software components in the cloud to ease the IoT management, as well as low-level software (Amazon FreeRTOS, and AWS IoT Greengrass) that can run in different hardware platforms such as for microcontrollers and gateways, respectively. The problem with Amazon’s vision is that it assumes too early that the “things” in the IoT architecture are already connected to the Internet, providing little guidance as to how to achieve it. 1 Zephyr: a scalable real-time operating system (RTOS). https://www.zephyrproject.org 2 Intel IoT. https://software.intel.com/iot/home 3 Microsoft Azure has three main components in its architecture: “things” that produce data, insights extracted from the data, and actions based on insights. Despite its conceptual simplicity, Azure manages to provide a complete set of core and optional subsystems that support its vision without underestimating the challenges of connecting the field devices to the Internet. Azure has made a remarkable effort in improving its software development kit for IoT devices in recent years to support a wide variety of hardware platforms and programming languages. At the same time, it has powered some cutting edge IoT applications in agriculture involving TV white space technology. The Google IoT reference architecture is heavily influenced by machine learning, providing software services for handling large volumes of data once they reach the cloud. However, and similarly to AWS, it doesn’t illustrate how to transfer the data from the field to the Internet beyond simple use cases. The IBM architecture for IoT has 5 layers called user layer, proximity layer, public network, provider network, and an enterprise network. The most relevant aspect of IBM architecture is that it explicitly includes the user in the IoT abstraction process, something that most of the other architectures ignore. Having the user (researchers, farmers, stakeholders) at the core of the entire IoT deployment should be a priority for CIAT’s pilot. After reviewing the commercial options, it is clear that there are great services for IoT development. However, they can be expensive for the small farmer. For this reason, a set of open source architectures and software alternatives were explored. Mainflux is a modern, scalable, secure, open source, and patent-free IoT cloud platform that can be deployed either on-premises or in the cloud. It accepts user, device, and application connections over various network protocols, thus making a seamless bridge between them. Thinger.io is an open- source platform that offers a ready to go and scalable cloud infrastructure for connecting millions of devices or things. SiteWhere is an industrial-strength open-source application-enablement platform for the Internet of Things (IoT). DeviceHive is a micro service-based system, built with high scalability and availability in mind, and Zetta is an open source software platform for creating an Internet of Things servers that run across geo-distributed computers and the cloud. The options provided by the open source community are very competitive but they do require more technical knowledge that their commercial counterparts to get a working solution. In a recent paper, Guth et al. (2018)3 proposed a simplified architecture that summarizes the key blocks that appear in many of the IoT architectures reviewed in this report. The architecture includes sensors, actuators, devices and drivers, gateways, a layer for the IoT integration middleware and another for the applications. In this architecture, sensors and actuators communicate with field devices through firmware drivers. These field devices exchange information either directly or through a gateway with a data ingestion service called IoT integration middleware in the data center or in the cloud to feed applications and services. The orders, commands, or simply new data, can flow back to lower elements in the architecture, effectively enabling feedback loop for supervision and control. The simplicity of the architecture enables a direct approach for designing small IoT projects but it makes it hard to escalate them or to conceive large deployments, mainly because the middleware abstracts most of the required services like the managing and provisioning of the devices. As most of the architectures and services mentioned before are not specific to agriculture, two efforts were selected and examined as possible candidates in light of one concrete design effort: a network for IoT-enabled research into four crops at the campus of the International Center for Tropical Agriculture (CIAT). These initiatives 3 Jasmin Guth, Uwe Breitenbücher, Michael Falkenthal, Paul Fremantle, Oliver Kopp, Frank Leymann, and Lukas Reinfurt. A Detailed Analysis of IoT Platform Architectures: Concepts, Similarities, and Differences, pages 81-101. Springer, 2018. 4 include: i) an IoT architecture for Agriculture proposed after a comprehensive systematic literature review by Talavera et al (2017)4 having a strong focus on moving the field data to the Internet to maximize its impact by using four core layers: physical, communication, service, and application, and ii) a Microsoft initiative called FarmBeats5, which includes an end-to-end IoT platform for data-driven agriculture that uses TV White space technology to connect field gateways with an intermediate internet-connected and smart gateway in the farm. It is important to understand that the architectures help to guide the development and deployment of IoT solutions, but it would be naïve to assume that any given architecture can be applied in every single scenario, particularly in agriculture. Nevertheless, this report represents an effort to consider architectures developed by technological leaders and research groups that can be applied in more than one scenario in developing economy contexts. The reader should note that designing an IoT sensor network for development agriculture is a hard engineering challenge. The variability in the problem space is very large. Many times, only a small subset of the solutions developed for other application domains can be applied to agriculture. The design requirements may compete against each other, and sometimes they are even mutually exclusive. This difficulty is compounded by cost considerations; in developing economies there is a lot of pressure to develop very low-cost solutions because the investment margin is very narrow or the path to cost-recovery is unclear. In light of these challenges, and after a review of several reference architectures, the finding of this report is that those seeking to deploy and leverage IoT networks at CGIAR research locations should consider adopting a ready reference architecture like the ones surveyed in this document and then work to tailor it to their actual needs. For small pilots, Guth et al. (2018) architecture can be a good starting point, but for larger and more complex deployments in agriculture, Intel’s layered architecture could be combined both with Talavera’s et al. (2017) specific modules and with FarmBeats’ emphasis on local processing. By following best practices in terms of hardware development and software security, researchers will be able to deploy increasingly larger pilots with the most promising technologies in the space of a few months and iteratively develop, test, and integrate user feedback. Iterative design is a well-established best practice for developing and deploying IoT solutions, and CGIAR shouldn’t be the exception. The description of the recommended architecture for CIAT’s pilot will be covered in a subsequent report. 4 Jesús Martín Talavera, Luis Eduardo Tobón, Jairo Alejandro Gómez, María Alejandra Culman, Juan Manuel Aranda, Diana Teresa Parra, Luis Alfredo Quiroz, Adolfo Hoyos, and Luis Ernesto Garreta. Review of IoT applications in agro-industrial and environmental fields. Computers and Electronics in Agriculture, 142, Part A:283-297, 2017. 5 Deepak Vasisht, Zerina Kapetanovic, JongHoWon, Xinxin Jin, Ranveer Chandra, Sudipta Sinha, and Ashish Kapoor. Farmbeats: An iot platform for data-driven agriculture. In Networked Systems Design and Implementation (NSDI). USENIX, March 2017. 5 3. Contents 1. Disclaimer .......................................................................................................................................................... 1 2. Executive summary ........................................................................................................................................... 2 4. Introduction ....................................................................................................................................................... 6 5. IoT reference architectures ............................................................................................................................... 7 5.1. Intel IoT platform reference architectures ................................................................................................ 7 5.2. Amazon AWS IoT Architecture ................................................................................................................ 13 5.3. Microsoft Azure IoT reference architecture ............................................................................................ 17 5.4. Google IoT reference architecture .......................................................................................................... 21 5.5. IBM IoT reference architecture ............................................................................................................... 23 5.6. Open-source IoT initiatives ...................................................................................................................... 24 5.7. IoT architectures in the literature ........................................................................................................... 27 5.8. IoT architectures intended for agriculture .............................................................................................. 29 6. Conclusions ...................................................................................................................................................... 31 7. Glossary ........................................................................................................................................................... 33 6 4. Introduction As part of the CGIAR Platform for Big Data in Agriculture led by CIAT, this consultancy report highlights some of the most representative Internet of Things (IoT) architectures that have been applied in agriculture and related industries in light of a concrete design effort: a sensor network for IoT-enabled research at the campus of the International Center for Tropical Agriculture (CIAT) located in Palmira, Valle del Cauca, Colombia. The objective of the IoT sensor network is to automate data collection from crops and environmental variables, as well as to integrate them with current and future data-driven analyses and services. It should also enable some of the following cross-cutting research infrastructure needs identified by the enterprise architecture design work led by CGIAR centers: • The ability to capture and curate remote and proximal sensing data ranging from very fine to very broad scales, ensuring that measurements and image data will be interoperable. • The ability to link spatial information with semantic data and metadata of various types. • Appropriate storage and computational power to enable unique use-cases, as well as capabilities for leveraging and contributing to curated spatial and semantic data assets. • Tools enabling appropriate stewardship at different points in the research-data lifecycle. As a first step towards a general IoT sensor network, CIAT seeks to create a 1 Ha test pilot with four crops (rice, maize, cassava, and beans) to study intensive data-driven management as well as the suitability and cost-benefit of different sensor and telemetric technologies for automated measurement and management. Figure 1 presents a simplified block diagram of the desired pilot, where sensor data (including measurements and status), configuration data, and eventually actuator’s commands, will flow through the network to and from the IT infrastructure available at CIAT. Even though the first part of the pilot only involves local storage and processing of sensor data, it is expected that as the project matures, it will include access to cloud-based infrastructure and third-party services. The future IoT sensor network has two groups of users at CIAT, namely, researchers from the high-throughput phenotyping team and the data-driven agronomy team. Figure 1. Simplified representation of the pilot that will be developed by CIAT. Blocks and arrows in dashed lines represent non-essential components for the first pilot according to CIAT’s requirements. This report aims to assist CIAT researchers in reviewing existing IoT reference architectures as they can provide the guiding structure for the development of the sensor network and its first pilot. The rest of this document is organized as follows. Section 5 covers IoT reference architectures proposed by some of the technology leaders worldwide. These architectures have been included in this document because their developers attempted to 7 accommodate the needs of many different industries, and by doing so, they identified a set of common and essential elements. In particular, most IoT architectures highlight the importance of factors such as modularity, provisioning, security, management, storage, analytics, inter-operability (with third-party components and own services), as well as clear mechanisms to handle the interaction between the end user and the overall system. Section 5 also reviews the architectures behind some open-source IoT project initiatives and a selected research paper before revisiting some simplified architectures conceived to promote new deployments in agriculture. The objective is to understand how an abstract architecture can guide the design of an IoT network, as well as to comprehend the challenges involved. Some of these architectures are the result of a thorough intellectual process. They can be a stepping stone to tackle larger and more complex agriculture scenarios. However, they are by no means intended to be used in isolation, as domain context and field experience will always be required to design and deploy the desired solution. Finally, this report ends with the Conclusions in Section 6, and a Glossary in Section 7, which provides a brief description, grouped by topic, of the most relevant technical terms mentioned in this report. 5. IoT reference architectures The following subsections cover IoT architectures proposed by companies such as Intel, Amazon Web Services (AWS), Microsoft Azure, Google, IBM, and a few selected research groups. Some of these architectures are applicable to many different industries and businesses where security, reliability, and scalability are central topics. 5.1. Intel IoT platform reference architectures A few years ago, Intel developed two reference architectures for IoT6, both of which handled the need for data and device security, device discovery, provisioning (the process of setting up a new device and making it ready for use), management, data normalization, analytics, and services. Intel's reference architectures are now available under a non-disclosure agreement (NDA). The first architecture is referred to as “Version 1.0 The Intel IoT Platform Reference Architecture for Connecting the Unconnected” and its goal is connecting legacy infrastructure, which means legacy devices that were not built with intelligence nor Internet connectivity but which can be securely connected and managed through IoT gateways. The second architecture is referred to as “Version 2.0 The Intel IoT Platform Reference Architecture for Smart and Connected Things”, and it is intended for building new infrastructure ranging from battery-powered through ultra-high-performance devices. Figure 2 shows a solution for connecting legacy and modern “things” to the network infrastructure so that they can work together. The three main actors are the things, the network devices, and the cloud, which can be either public, private, or hybrid. Modern things connect directly to the Internet while legacy things have to connect through gateways. The solution highlights that the main actors should provide a set of modules that ensure security, management, and mechanisms for the developer and interested parties to interface with the system and extend it through libraries, application programming interfaces (API), or software development kits (SDK). 6 The Intel IoT Platform. Architecture White Paper Internet of Things (IoT). https://www.intel.la/content/www/xl/es/internet-of-things/white-papers/iot-platform-reference-architecture-paper.html 8 Figure 2. End-to-end IoT solution from things to network to cloud6. Intel’s IoT architecture is shown in Figure 3. It has a set of well-defined layers, where the white blocks correspond to user layers, the dark-blue blocks are the major runtime layers, and the light blue layer is intended for developers. The architecture is built up from the bottom starting with the communications and connectivity layer handling the interaction with the physical devices, and ending with the business layer. In this architecture, the security layer interacts with all the runtime and user layers. Overall, this architecture identifies the essential functions for a flexible IoT solution, but depending on the specific application some of these layers can be grouped together, moved (like the control layer), or repeated in different layers (like the analytics submodule). For agriculture applications, the communications-and-connectivity layer is probably the most expensive and challenging to implement given the number of sensing nodes required to monitor an entire crop field. Figure 3. Layered Architecture for secure end-to-end solutions6. Figure 4 presents the data flow in Intel’s architecture for devices without a native internet connectivity. It follows the same three-actor solution (things-network-cloud) that was depicted in Figure 2 while highlighting the three types of flow can occur within the network: data flow, security and management flow, and actuation and control flow. Figure 4 shows also three possible configurations of the “intelligent things” built around embedded 9 computing platforms or gateways in terms of: the services that they can include, the ways that they can interact among themselves, and the level of autonomy that they can have. In Figure 4, the protocol-manager-and-adapter block (identified as PMA) can be understood as a low-level software driver that allows the CPU of the intelligent things (commonly a microcontroller, single-board computer or gateway) to interact with the sensors and actuators. This driver handles tasks such as dealing with analog-to-digital converters, digital-to-analog converters, digital communication protocols, general purpose input/output, converting digital counts to actual measurements with specific units, and so on. The data ingestion and processing block (letter D in the figure) schedules and performs the sensor sampling, takes the result of the data acquisition and filters the sensor data. The actuation- and-control block (letter C in the figure) implements controllers that govern field actuators. These controllers can be based on simple rules (if-then-else statements), can be based on classical control theory, or can be based on analytics and machine learning algorithms (blocks with letter A). In all scenarios, intelligent things must include services that ensure security (blocks with letter S), configuration and management (blocks with letter M), interaction with third-party business and application agents (block with letter B), and APIs and libraries (blocks with letter L) to enable high-level features and processes. Depending on the application, some of the intelligent things can include dedicated storage to permit data-driven local services or to increase their autonomy and robustness under unreliable communication channels (e.g: for storing sensor measurements while the internet connection or the server can get back online). The cloud (which can be a third-party cloud, an on-premise or off-premise data center or a private cloud) in Figure 4 must have a data center management and security layer at the lowest level that includes services such as monitoring, auto-scaling, logging, and event handling, under a secure environment. The first layer of services in the cloud is intended for the “things”, and they provide the means to ensure their security, attestation, and management. The second group of processes in the cloud corresponds to secure service brokers, which are responsible for the data exchange with the network. Once the data and the metadata are in the cloud, they can be processed and stored using additional cloud services. The received data can be used to feed advanced data analytic services such as stream analytics and batch analytics, which can then be fed to other services in the cloud or be used to perform a remote control of the assets in the network. The raw, intermediate, or processed data in the cloud can feed additional services such as business logic rules, service orchestration modules, or business portals. Finally, data exchanged with higher level applications like vertical IoT apps and other IT or business systems happen through APIs, API libraries and SDKs. 10 Figure 4. Data flow for devices without native Internet connectivity6. In order to enable the data flow presented in Figure 4, Intel describes the software components required to connect the devices as well as the suggested interfaces in Figure 5. The diagram covers on-premise software for intelligent things and gateways, as well as cloud software. The developer in an IoT solution has to deal with software at three different levels: firmware for the intelligent things, middleware for the gateways, and software services for the cloud. The function of the on-premise software components is discussed next. • Firmware for sensors, actuators: gathers information from the environment and perform actuation. Field devices connect to the gateway or sensor hub through wired or wireless links. • Middleware for the sensor hub: interfaces sensors using device drivers or API libraries. • Software for local database: stores sensor data, logs, and configuration values from the cloud. • Software for data agents: Ingests and formats data for the cloud from field devices. It can communicate with sensors through API or device drivers. • Software for edge analytics agents: learns from data in near real time. It communicates with devices and with the cloud through APIs for rules on data streams, alerts, and local processing. • Middleware and software for security agents: handle authentication keys and certificates for gateways, sensors, and actuators. They communicate with security management software in the cloud. • Middleware and software for management agents: handle management for gateways, sensors, and actuators including provisioning, error handling, alerts, and events. They communicate with device management software in the cloud. The main software components in the cloud include: 11 • Cloud data ingestion: interacts with edge data agents and ingests data from edge devices. Data are made available to other cloud services through the Enterprise Service Bus (ESB), which is a communication system between mutually interacting software applications in a service-oriented architecture. The communication with the data agents can use communication protocols such Message Queue Telemetry Transport (MQTT), Representational State Transfer (REST), etc. • Cloud security management software: interacts with edge security agents in the edge, and configures and control security policies of on-premise equipment. The communication occurs with the edge security agents and the configuration database. • Cloud device management software: interacts with the management services in the edge, and configures and controls manageability policies of on-premises equipment. The communication occurs with the edge device management agent and configuration database. • Enterprise service bus: enables the communication between interacting software applications. • Operational database: manages dynamic data end-to-end. • Configuration database: contains all relevant information about the edge components and their relationships. • Analytics software: performs analysis of the data collected from edge devices. • Service orchestration software: provides automated configuration, coordination, and management of applications and services. • Configuration management: ensures on-premises configuration management of devices and their security. Figure 5. Software components and interfaces for Intel's IoT reference architecture6. The proposed communication-and-connectivity layer in Intel's architecture is shown in Figure 6. It aims to enable multi-protocol data communication between devices at the edge as well as between the endpoint devices, 12 gateways, network, and data center. It considers proximity networks (PAN), local area networks (LAN), as well as wide area networks (WAN) to connect the different devices in the architecture. Figure 6. Detailed view of communications in Intel’s IoT architecture6. Intel’s architecture enables the distribution of analytics and control processes both in endpoint devices and in the cloud, see Figure 7. This helps the developer to optimize the system either for time-critical applications by making all the inferences and control decisions at the edge, or computation-intensive applications where the inferences, visualizations, reports, and decisions are made in the cloud but control signals can be transmitted to the edge through a communication channel in the network. Figure 7. Data layer supports distributed analytics and control6. 13 The management layer in Intel’s IoT architecture allows automated discovery, registering, and provisioning of endpoint devices. It can also update applications and operating systems, manage data flows from devices such as destination and storage policy, upload or stream data, stop or reboot selected devices, define and manage events alarms and notifications, and use rules defined in the cloud to initiate actions. This layer can handle scripts, manage devices from the command shell, organizations, users, access rights, and can also upload and download files to or from a device. Each managed device has a management agent that executes the management in its device and communicates with the cloud platform via messages. Figure 8. The management layer supervises endpoint devices6. 5.2. Amazon AWS IoT Architecture AWS IoT7 enables “Internet-connected devices” to interact with the AWS Cloud and lets applications in the cloud interact with Internet-connected devices. Common IoT applications either collect and process telemetry from devices or enable users to control a device remotely. The AWS IoT service suite considers two main parts shown in Figure 9 that correspond to “Things or Devices”, which can be endpoints (sensors or actuators) or gateways, and the “Cloud” that handles storage, computing, and learning. Figure 9. AWS IoT service suite7. 7 AWS IoT services for industrial, consumer, and commercial solutions. https://aws.amazon.com/iot/ 14 The software and services in the AWS IoT architecture are shown in Figure 10. The device software includes a real- time operating system for microcontrollers called Amazon FreeRTOS that makes small, low-power edge devices easy to program, deploy, secure, connect and manage. Amazon FreeRTOS extends the FreeRTOS kernel, a popular open-source operating system for microcontrollers, with software libraries to securely connect small, low-power devices to AWS cloud services like AWS IoT Core, or to more powerful edge devices running AWS IoT Greengrass. The AWS IoT Greengrass is a software intended for gateways and it extends AWS to edge devices so they can act locally on the data they generate, while still using the cloud for management, analytics, and durable storage. With AWS IoT Greengrass, connected devices can run AWS Lambda functions (a compute service that lets you run code without provisioning or managing servers), execute predictions based on machine learning models, keep device data in sync, and communicate with other devices securely, even when not connected to the Internet. In the Cloud, AWS makes a distinction between control services and data services. As part of the control services, AWS defines an IoT Core, a managed cloud service that lets connected devices easily and securely interact with cloud applications and other devices. AWS IoT Core can support billions of devices and trillions of messages and can process and route those messages to AWS endpoints and to other devices reliably and securely. Thanks to the AWS IoT Core, applications can keep track of and communicate with all deployed devices. AWS IoT Core supports communication protocols such as HTTP, WebSockets, and MQTT. Another service considered in the architecture is the AWS IoT Device Management that onboard, organize, monitor, and remotely manage IoT devices (including sending firmware updates over-the-air) at scale. In terms of security, AWS provides the AWS IoT Device Defender, which is a fully managed service that helps the user secure the IoT devices. The AWS IoT Device Defender continuously audits the IoT configurations to make sure that they aren’t deviating from security best practices, and it sends an alert if there are any gaps in the IoT configuration that might create a security risk, such as identity certificates being shared across multiple devices or a device with a revoked identity certificate trying to connect to AWS IoT Core. The last component of the control services is the AWS IoT Things Graph, which provides a visual drag-and-drop interface for connecting and coordinating devices and web services so that the user can build IoT applications quickly. The data services provided by AWS include the AWS IoT Analytics for running analytics on large volumes of IoT data. This component automates each of the required steps to analyze data from IoT devices: it filters, transforms, and enriches IoT data before storing results in a time-series for analysis. The user can configure the service to collect specific data from deployed devices, apply mathematical transforms to process the data, and enrich the data with device-specific metadata such as device type and location before storing the results. Then, the user can analyze the data by running ad hoc or scheduled queries using the built-in Structured Query Language (SQL) query engine or perform more complex analytics and machine learning inference. A complimentary service is the AWS IoT SiteWise that can monitor operations across facilities, compute common industrial performance metrics, and build applications to analyze industrial equipment data. Finally, the AWS IoT Events is a fully managed IoT service that detects and responds to events from IoT sensors and applications. Events in AWS are patterns of data identifying specific circumstances. The user simply selects the relevant data sources to ingest, defines the logic for each event using ‘if-then-else’ statements, and selects the alert or custom action to trigger when an event occurs. 15 Figure 10. Software components in AWS IoT architecture7. Figure 11 and Figure 12 show the internal processes that occur within AWS IoT and its integration in a generic IoT project8. Devices report their state by publishing messages in JavaScript Object Notation (JSON) format on MQTT topics. Each MQTT topic has a hierarchical name that identifies the device whose state is being updated. When a message is published on an MQTT topic, the message is sent to the AWS IoT MQTT message broker, which is responsible for sending all messages published on an MQTT topic to all clients subscribed to that topic. The communication between a device and AWS IoT is protected through the use of X.509 certificates, which is a standard defining the format of public key certificates. AWS IoT can generate a certificate for the users or they can use their own. In either case, the certificate must be registered and activated with AWS IoT and then copied onto the device. When a device communicates with AWS IoT, it presents the certificate to AWS IoT as a credential. AWS recommends that all devices that connect to AWS IoT have an entry in the registry. The registry stores information about a device and the certificates that are used by the device to secure communication with AWS IoT. The user can then create rules that define one or more actions to perform based on the data in a message, this is called a rules’ engine. For example, the user can insert, update, or query a database (e.g: DynamoDB table) or invoke a Lambda function. Rules use expressions to filter messages. When a rule matches a message, the rules’ engine triggers the action using the selected properties. Rules also contain an Identity and Access Management (IAM) role that grants AWS IoT permission to the AWS resources used to perform the action. In the AWS architecture, each device has a “shadow” that stores and retrieves state information. Each item in the state information has two entries: the state last reported by the device and the desired state requested by an application. An application can request the current state information for a device. The shadow responds to the request by providing a JSON document with the state information (both reported and desired), metadata, and a version number. An application can control a device by requesting a change in its state. The shadow accepts the state change request, updates its state information, and sends a message to indicate the state information has been updated. The device receives the message, changes its state, and then reports its new state. 16 Figure 11. Interactions between AWS IoT components 8. Figure 12. IoT example with AWS. 8 How AWS IoT Works. https://docs.aws.amazon.com/iot/latest/developerguide/aws-iot-how-it-works.html 17 5.3. Microsoft Azure IoT reference architecture Microsoft shared online9 the recommended architecture and a set of implementation technology choices to build solutions based on Azure IoT. This architecture describes terminology, technology principles, common configuration environments, and composition of Azure IoT services, physical devices, and Intelligent Edge Devices. For Microsoft, IoT applications can be described as “Things” (or devices) sending data or events that are used to generate “Insights”, which in turn drive “Actions” to help improve a business or process. Therefore, the end goal of the architecture can be stated as taking action on business insights found through gathering data from assets. Figure 13. Abstraction of Azure IoT architecture9. The recommended architecture by Microsoft for IoT solutions has the following characteristics: • It is cloud native, microservice, and serverless based (see the Glossary at the end of this report for more details about these concepts). • The IoT solution subsystems should be built as discrete services that are independently deployable, and able to scale independently. • The subsystems should communicate over REST/HTTPS using JSON (as it is human readable), though binary protocols such as Avro10 can be used for high performance needs. • The use of an orchestrator (e.g. Azure Kubernetes Services - AKS or Service Fabric) is recommended to scale individual subsystems horizontally or using “Platform as a Service or PaaS” (e.g: Azure App Services) that offer built-in horizontal scale capabilities. • The architecture should support a hybrid cloud and edge computing strategy, meaning that some data processing is expected to happen on premises. For Microsoft, an IoT application includes the subsystems depicted in Figure 14, which are explained next. 1. Devices and on-premise edge gateways that have the ability to securely register with the cloud and provide connectivity options for sending and receiving data with the cloud. In this regard, the Azure IoT Hub Software Development Kits (SDK) enable secure device connectivity and transmission of telemetry data to the cloud. 2. A cloud gateway service or hub (e.g: Azure IoT Hub service) to securely accept data and provide device management capabilities (e.g: Azure IoT Hub Device Provisioning Service or DPS). 9 Azure IoT Reference Architecture. https://blogs.msdn.microsoft.com/wriju/2018/02/26/azure-iot-reference-architecture/ 10 Apache Avro is a data serialization system. https://avro.apache.org 18 3. Stream processors that consume the data (e.g: Azure Stream Analytics or Azure IoT Hub Routes with Azure Functions) and integrate them with business processes (e.g: Azure Functions and Logic Apps) before they are placed into storage. In this regard, Microsoft recommends the following databases: Azure Cosmos DB for the warm path storage, i.e: data that have to be available for reporting and visualization immediately from devices; Azure Blob Storage for the cold storage, i.e: long-term data storage used for batch processing; and Azure Time Series Insights for applications with specific reporting needs related to time series. 4. User interfaces to visualize telemetry data and facilitate device management (e.g: Power BI, TSI Explorer, native applications, or custom web user-interface applications). In addition to the core subsystems described previously, some IoT applications can include additional subsystems: 5. Intelligent edge devices allow aggregation or transformation of telemetry data, as well as on-premise processing (e.g: Azure IoT Edge). 6. Cloud telemetry data transformation helps to restructure, combine, or modify telemetry data received from devices (e.g: Azure Functions). 7. Machine learning (e.g: Azure Machine Learning) enables the execution of predictive algorithms over historical telemetry data, which facilitates predictive maintenance and other common use cases. 8. User management allows splitting the functionality among different roles and users. Figure 14. Azure IoT architecture with core subsystems only9. Figure 15. Azure IoT Architecture with core and optional subsystems9. 19 Microsoft acknowledges that there are additional components required for general IoT applications. These additional components are illustrated in Figure 16 and discussed in the following paragraphs. 9. Security requirements that includes user management and auditing, device connectivity, in-transit telemetry, and at rest security. For user management Microsoft recommends the Azure Active Directory as it supports the OAuth2 authorization protocol, and the OpenID Connect authentication layer, providing audit log records of system activities. 10. Logging and monitoring of an IoT cloud application for determining its health and troubleshooting failures both for individual subsystems and the application as a whole (e.g: see the Azure Operations Management Suite or OMS, Application Map, and App Insights). 11. High availability and disaster recovery allowing the solution to recover rapidly from systemic failures. For IoT applications, this requires hosting duplicate services, as well as duplicating application data across regions depending on acceptable failover downtimes and tolerable data losses. Figure 16. Complete Azure IoT architecture9. In terms of device connectivity options for IoT solutions, Microsoft uses the conceptual representation displayed in Figure 17. The numbers in the figure correspond to four key connectivity patterns, defined as follows: 1. Direct device connectivity to the cloud gateway: for IP capable devices that can establish secure connections via the Internet. 2. Connectivity via a field gateway (IoT Edge Device): for devices using industry-specific standards such as Constrained Application Protocol (CoAP) or OPC UA (a machine to machine communication protocol for industrial automation), short-range communication technologies (such as Bluetooth or ZigBee), as well as for resource-constrained devices not capable of hosting the Transport Layer Security (TLS) stack, or 20 devices not exposed to the Internet. This option is also useful when the stream and data aggregation is executed on a field gateway before sending the result to the cloud. 3. Connectivity via a custom cloud gateway: For devices that require protocol translation or some form of custom processing before reaching the cloud gateway communication endpoint. 4. Connectivity via a field gateway and a custom cloud gateway: Similar to the previous pattern, field gateway scenarios might require some protocol adaption or customizations on the cloud side and therefore can choose to connect to a custom gateway running in the cloud. Some scenarios require integration of field and cloud gateways using isolated network tunnels, either using virtual private network (VPN) technology or using an application-level relay service. The Azure IoT Hub, a central component in Microsoft’s architecture, provides support for protocols such as the Advanced Message Queuing Protocol (AMQP) 1.0 with optional WebSocket10 support, MQTT 3.1.1, and native HTTP 1.1 over TLS. In case that the user needs support for the CoAP protocol, it can be implemented using a protocol gateway adaptation model in the cloud. However, Microsoft highlights that the use of a cloud gateway needs to be evaluated carefully because, in general, it’s beneficial to ingest the data to the cloud gateway as fast as possible and then perform transformations on the cloud backend decoupled from the ingestion. As a final note, the Azure platform supports the Internet Protocol version 4 (IPv4) externally but is not dependent on it and will translate directly toward the Internet Protocol version 6 (IPv6) once it is available for Microsoft Azure networks and network edges. The IPv4 address space is reachable from within the IPv6 address space using a number of transition mechanisms. Figure 17. Conceptual representation of device connectivity in Azure IoT architecture9. 21 5.4. Google IoT reference architecture Figure 18 presents the reference architecture developed by Google to build a robust, maintainable, end-to-end IoT solution that can be hosted in the Google Cloud Platform. The architecture has a strong focus on data-driven services and machine learning both in the Cloud and in the Edge. Google understands IoT as the use of network- connected devices, embedded in the physical environment, to improve some existing process, or to enable a new scenario not previously possible. These devices, or things, connect to the network to provide information they gather from the environment through sensors, or to allow other systems to reach out and act on the world through actuators. Figure 18. Google IoT reference architecture11. The top-level components for Google include the device, the gateway, and the cloud. A device includes hardware and software that interact directly with the world. The devices can connect to a network to communicate with each other, or to centralized applications. These devices might be directly or indirectly connected to the internet. In this regard, a gateway enables devices that are not directly connected to the Internet to reach cloud services. In this context, and although the term gateway has a specific function in networking, it is also used to describe a class of devices that processes data on behalf of a group or cluster of devices. At the end of the process, the data from each device reach the Cloud platform, where they are processed and combined with data from other devices and sources. Google understands that each device can provide or consume different types of information such as: device metadata, sate information, telemetry (which might be preserved as a stateful variable on the device or in the cloud), commands, and operational information (useful for maintenance). 11 Google Cloud IoT. https://cloud.google.com/solutions/iot/?hl=en https://cloud.google.com/solutions/iot-overview 22 For the sake of comparison with other providers, the main Google Cloud services used in IoT are briefly described next: • Google Cloud IoT Core provides a secure MQTT broker for devices managed by IoT Core. The IoT Core MQTT broker directly connects with Cloud Pub/Sub (i.e: publish/subscribe). • Google Cloud Pub/Sub provides a globally durable message ingestion service. By creating topics for streams or channels, the user can enable different components of the application to subscribe to specific streams of data without needing to construct subscriber-specific channels on each device. Cloud Pub/Sub also natively connects to other Cloud Platform services, helping the user to connect ingestion, data pipelines, and storage systems. • Cloud Pub/Sub scales to handle data spikes that can occur when swarms of devices respond to events in the physical world, and buffers these spikes to help isolate them from applications monitoring the data. • Stackdriver Monitoring and Stackdriver Logging ingest operational information. • Google Cloud Dataflow provides the open Apache Beam programming model as a managed service for processing data in multiple ways, including batch operations, extract-transform-load (ETL) patterns, and continuous, streaming computation. • Cloud Datastore and Firebase Realtime Database allow the user to make state or telemetry data available to mobile or web apps, by storing processed or raw data in structured but schemaless databases, where IoT device data can be represented as domain or application level objects. • Google Cloud Functions allows the user to write custom logic that can be applied to each event as it arrives. This can be used to trigger alerts, filter invalid data, or invoke other APIs. Cloud Functions can operate on each published event individually. If the user needs to process data and events with more sophisticated analytics, including time windowing techniques or converging data from multiple streams, Cloud Dataflow provides an analytics tool that can be applied to streaming and batch data. • Google BigQuery provides a fully managed data warehouse with a SQL interface, so that the user can store the IoT data alongside any of the other enterprise analytics and logs. • Cloud Datalab is an interactive tool for large-scale data exploration, analysis, and visualization. IoT data can be useful for multiple use cases, depending on how the data are combined. Cloud Datalab lets the user interactively explore, transform, analyze, and visualize the data using a hosted online data workbench environment based on the open-source Jupyter project. • Tensorflow is an open-source machine-learning framework and Cloud Platform that can be applied in a distributed-and-managed training service through the Cloud Machine Learning Engine. • Cloud Bigtable provides a low-latency and high-throughput database for NoSQL data. It provides a place to drive heavily used visualizations and queries, or to absorb or serve at high volumes. Compared to BigQuery, Cloud Bigtable works better for queries that act on rows or groups of consecutive rows as it stores data by using a row-based format. Compared to Cloud Bigtable, BigQuery is a better choice for queries that require data aggregation. • Cloud Storage provides a single API for both current-use object storage and archival data used infrequently. If the selected IoT device captures media data (e.g: audio, images, video), then the Cloud Storage is the recommended option. 23 5.5. IBM IoT reference architecture IBM developed an IoT reference architecture12 shown in Figure 19 to connect IoT devices and quickly build scalable apps and visualization dashboards to gain insights from data, using its cloud and AI services. This architecture has five layers: a user layer, proximity layer, public network, provider network, and the enterprise network. One relevant aspect of this architecture is that it includes the user in the IoT abstraction process. In fact, the diagram incorporates different flows for “data and control” and “user” information. As all the blocks represented in the diagram have already been discussed in the previous architectures, no further details will be provided. The interested reader can refer to the online documentation written by IBM. Figure 19. IoT reference architecture from IBM12. 12 IoT reference architecture from IBM: https://www.ibm.com/cloud/garage/architectures/iotArchitecture/reference-architecture https://developer.ibm.com/iotplatform/resources/iot-architecture/ 24 5.6. Open-source IoT initiatives There are many IoT reference architectures from commercial providers in the market today. However, the software components that enable those architectures in the cloud create monthly fees that can be expensive for farmers. For this reason, and considering that CIAT aims to explore open-source alternatives, this section provides a description of some projects that have developed software services required for IoT implementations that can be deployed either on-premises or in an arbitrary cloud using serverless technologies. Mainflux13 is a modern, scalable, secure open source and patent-free IoT cloud platform written in the Go programming language that can be deployed either on-premises or in the cloud. It accepts user, device, and application connections over various network protocols (i.e. HTTP, MQTT, WebSocket, CoAP), thus making a seamless bridge between them. Mainflux can be used as the IoT middleware for building complex IoT solutions as shown in Figure 20. It is important to highlight that Mainflux is a member of the Linux Foundation and an active contributor to the EdgeX Foundry project. Mainflux provides the following features: multi-protocol connectivity and protocol bridging, device management and provisioning, fine-grained access control, storage support (Cassandra, InfluxDB and MongoDB), platform logging and instrumentation support, event sourcing, container- based deployment using Docker and Kubernetes, LoRaWAN network integration, SDK, command-line interface (CLI), small memory footprint and fast execution, domain-driven design architecture, high-quality code and test coverage. Mainflux infrastructure and software stack, shown in Figure 21, are composed of a set of components and microservices necessary for IoT solutions. The Mainflux IoT platform provides a set of SDKs and client libraries for various hardware platforms in several programming languages (C/C++, JavaScript, Python), as well as a set of open APIs, and development tools. In terms of deployment technology, Mainflux uses Docker containers, docker- compose, Kubernetes, NGINX load balancing, and Cisco MANTL, which is a runtime environment for microservices. Figure 20. Mainflux location within an IoT application13. 13 Mainflux. https://www.mainflux.com/index.html https://github.com/mainflux/mainflux 25 Figure 21. Mainflux architecture for IoT. Top: infrastructure stack. Bottom: software stack13. Thinger.io14 is an open-source platform that offers a ready to go and scalable cloud infrastructure for connecting millions of devices or things. The user can control these devices with an admin console, or integrate them into the business logic using REST APIs. The platform allows the user to install the server on any host with ubuntu snap packages for different platforms such as x86, x64, arm64, or armhf. The user can also deploy a Thinger.io server in AWS infrastructure using a provided Amazon Machine Image (AMI). Interestingly, the platform also offers client libraries for connecting Arduino and ARMmbed devices. 14 Thinger.io platform. https://thinger.io http://docs.thinger.io https://github.com/thinger-io 26 SiteWhere15 is an industrial-strength open-source application-enablement platform for the Internet of Things (IoT). It provides a multi-tenant microservice-based infrastructure (see Figure 22) that includes the key features required to build and deploy IoT applications. SiteWhere separates the many aspects of IoT processing into microservices, each specializing in a specific task. These include functionality such as event ingestion, big-data event persistence, device state management, large-scale command delivery, integration of device data with external systems, and much more. Each microservice is a Spring Boot application wrapped as a Docker container. The microservices self-assemble into a SiteWhere instance which is orchestrated as a highly-available distributed system on top of the Kubernetes infrastructure. A Helm chart is used to configure the list of SiteWhere microservices and other components which need to be started in order to realize a given configuration. Once started, the microservices self-assemble and then make themselves available for processing tasks. Figure 22. SiteWhere 2.0 microservices15. DeviceHive16 is a micro service-based system, built with high scalability and availability in mind. It is distributed under Apache 2.0 license (free for use and change). It gives the user visibility into the platform’s architecture (shown in Figure 23) and implementation details. DeviceHive is commercially supported by DataArt’s Internet of Things practice professional consultants and engineers. As a platform, DeviceHive listen to hundreds of devices simultaneously and scale to the required number of instances in order to guarantee data safety and availability. 15 SiteWhere https://sitewhere.io/en/ https://sitewhere.io/docs/2.0.0/platform/microservice-overview.html#microservice-structure https://github.com/sitewhere/sitewhere 16 DeviceHive. https://devicehive.com https://docs.devicehive.com/docs/devicehive-architecture https://github.com/devicehive 27 Figure 23. DeviceHive architecture16. Finally, Zetta17 is an open-source software platform built on the programming language Node.js for creating an Internet of Things servers that run across geo-distributed computers and the cloud. Zetta combines REST APIs, WebSockets and reactive programming for assembling many devices into data-intensive, real-time applications. Zetta servers run in the cloud, on PCs, and on single-board computers. With Zetta, the user can link devices together with cloud platforms to create geo-distributed networks. Zetta can turn almost any device into an API. Zetta servers communicate with microcontrollers like Arduino giving every device a REST API both locally and in the cloud. 5.7. IoT architectures in the literature In a recent paper by Guth et al. (2018)18, the researchers defined an IoT reference architecture based on existing platforms. The resulting architecture, shown in Figure 24, was kept abstract on purpose to make it applicable in a wide range of situations. The interesting part is that the researchers mapped their IoT reference architecture onto four open-source platforms (FIWARE, OpenMTC, SiteWhere, and Webinos) and four proprietary IoT solutions 17 Zetta. http://www.zettajs.org https://github.com/zettajs/zetta/wiki 18 Jasmin Guth, Uwe Breitenbücher, Michael Falkenthal, Paul Fremantle, Oliver Kopp, Frank Leymann, and Lukas Reinfurt. A Detailed Analysis of IoT Platform Architectures: Concepts, Similarities, and Differences, pages 81-101. Springer, 2018. 28 (AWS IoT, IBM’s Watson IoT Platform, Microsoft’s Azure IoT, and Samsung’s SmartThings). Then, they analyzed the component’s functionality, their naming, and they compared them with their proposal. The result of the comparison is documented in Table 1. Even though the mapping can be a bit coarse, the proposed architecture does abstract some of the common blocks that appear in the different IoT architectures reviewed so far, making it a suitable candidate for an application in agriculture. Figure 24. IoT reference architecture based on Guth et al paper 18. Table 1. IoT platform comparison summary extracted from Guth et al paper 18. 29 5.8. IoT architectures intended for agriculture Most of the architectures from technological leaders tend to focus on the software component of the IoT architecture. However, for agriculture and other applications where the environment can be very harsh, the physical components that are linked to hardware decisions are also very important. In this regard, Figure 25 shows an IoT architecture for Agriculture proposed by Talavera et al. (2017)19 that has four main layers: physical, communication, service, and application. Figure 25. IoT architecture for agro-industrial and environmental applications from Talavera et al. paper 19. 19 Jesús Martín Talavera, Luis Eduardo Tobón, Jairo Alejandro Gómez, María Alejandra Culman, Juan Manuel Aranda, Diana Teresa Parra, Luis Alfredo Quiroz, Adolfo Hoyos, and Luis Ernesto Garreta. Review of IoT applications in agro-industrial and environmental fields. Computers and Electronics in Agriculture, 142, Part A:283-297, 2017. 30 The physical layer handles perception and control from/to the environment. For perception tasks, the main objective is to produce valuable data by sensing field variables using a sensor network which nowadays can be wireless (WSN). Data produced are sent to the communication layer through field gateways. Devices in the perception layer can be powered by batteries for short-term deployments or by solar panels because of their low- power consumption. In contrast, actuators in control tasks act as data sinks, receiving orders from a communication layer or reactive precepts from sensors in the simplest cases. Information received by the control layer can alter the state of field actuators, which are often powered from the electrical grid. In the middle of the perception and control layers, there is a robot that can be either a rover or a drone, which can be used to sense or act when fixed devices are not the best option. In the communication layer, the objective is to move the information from the physical layer to the Internet, collecting data from IoT gateways based either on Ethernet or mobile networks (e.g: GPRS/3G/4G/LTE-M/NB-IoT and eventually 5G). This layer includes field gateways acting as interfaces between IoT gateways and transceivers using ZigBee, LoRA, Sigfox, WiFi, TV white space, etc. The service layer handles data ingestion from the communication layer, storage, analytics, visualization, security, and other services required for the operation such as provisioning and management. Finally, the application layer consumes services from the previous layer in the architecture and allows the user to handle monitoring, global control, prediction, and logistics. Even though the architecture presented in Figure 25 is flexible, it can be further enhanced by giving some intelligence and autonomy to the field gateways and to the IoT gateways depending on the power budget and computing capabilities of the devices. In a different approach, Microsoft researchers developed FarmBeats20, an end-to-end IoT platform for data-driven agriculture. FarmBeats uses TV White space technology to connect field gateways (called IoT stations) with an intermediate and smart gateway in the farm based on a PC form-factor device that has an internet connection and can send the processed data to the cloud. Interestingly, storage and data fusion are done in the gateway located in the farmer’s house. The gateway also posts data to a local server and supports a web server that can be accessed by the farmer when he/she is connected to the local network. Having a smart gateway in the architecture relieves the pressure on the Internet connection which make the solution autonomous and robust against poor Internet connections and outages, and also reduces the amount of data that has to be sent to the cloud. All field sensors in the solution, including soil temperature, soil pH, soil moisture, cameras to monitor the farm as well as to capture IR images of crops, and drones are connected to local gateways via WiFi or Ethernet. In the study authors deployed over 100 different sensors. The combination of WiFi and TV White space technology provides a high-bandwidth solution but it drains a lot of power. For this reason, the solution is powered by batteries that get charged through solar panels. To ensure a continuous stream of power, the duty cycle of the system is adjusted dynamically. This adjustment is possible due to the integration of weather forecast data, measurements of the current charge of the batteries, knowledge about system-specific components in the network, and the behavior and requirements set by the farmer. In this solution, the telemetry is sent to the cloud, enabling the farmer to access it even when he/she is outside the farm network. The solution aims to support long-term applications (e.g: crop suggestions, and cross-farm analytics) and it includes dashboards for visualization (e.g: mobile application and a web page). 20 Deepak Vasisht, Zerina Kapetanovic, JongHoWon, Xinxin Jin, Ranveer Chandra, Sudipta Sinha, and Ashish Kapoor. Farmbeats: An iot platform for data-driven agriculture. In Networked Systems Design and Implementation (NSDI). USENIX, March 2017. 31 So far, the solution has been deployed in two farms in the US (one in Washington state and the other in upstate New York) with an area of 5 acres and 100 acres respectively, over a period of six months, producing 10 million sensor measurements, 1 million camera images, and 100 drone videos. Figure 26 shows the system architecture of FarmBeats. For CIAT, this project is very relevant because it combines the temporal data from ground sensors with the spatial data from drones to construct an instantaneous precision map of the farm. Figure 26. FarmBeats system overview20. 6. Conclusions This report reviewed a few selected Internet of Things reference architectures designed by technology leaders to be as general as possible. In these architectures, there is a strong focus on the software components that are deployed in the cloud but they can, up to some extent, be replicated in smart gateways providing more autonomy to the solutions. Some of these components allow the provision of IoT devices, enable their management, facilitate high-speed data ingestion, forward the data through dedicated data streamers to other services providing storage for the short and long term, and enable rule-based actions, analytics, visualization, and integration with third-party applications. Also, some of these architectures recognize the importance of providing closed-loop control mechanism at different levels, and all of them consider security in their different layers, showing a greater awareness of the latent risks involved in IoT. Most of the reviewed architectures were designed for large enterprises, and with businessmen or developers in mind. Lastly, it is important to state that cloud providers are now offering mature and robust solutions but there is a caveat, the connectivity of the devices to the Internet remains a challenge. The main reason is that the process of connecting the devices to the Internet has not settled yet, in fact, it is still evolving with the hardware platforms, communication technologies, and protocols. As part of the review, some open-source IoT-platform initiatives were analyzed, and there are good reasons to be optimistic about the benefits that they will bring to future solutions developed for agriculture. Most of these initiatives were conceived to work as serverless microservices, meaning that they can be easily deployed using on- premise infrastructure. This scenario enables the development of complete and low-cost proof-of-concept using 32 a local network or using it as an intermediate fog layer in a fully-connected cloud solution. Having a fog layer reduces the dependency on the availability and bandwidth of the Internet connection, because the data can be processed, and distilled locally before they are sent to the cloud. The downside of these open-source initiatives is that they provide limited troubleshooting support and there is a significant effort required from the end-user to configure, develop, and maintain a solution. Two IoT architectures for agriculture were reviewed to better understand the challenges of deploying solutions in harsh environments, far away from IT infrastructure, and with limited connectivity. In these architectures, communications and power management (power sources, energy storage devices, and power consumption) receive much more attention because they limit what can be achieved in a real scenario and on a limited budget. We believe that IoT solutions conceived for agriculture require a different mindset, giving more importance to the extraction and movement of the data from the crops to the local gateway. In this regard, only a few long-term IoT projects in Agriculture have openly shared with the community insights from field deployments. In particular, we found a void in the literature for comparing competing communication technologies in Agriculture where the crop height, crop density, crop area, terrain topography, weather, range and bandwidth requirements can vary so much. Finally, the importance of IoT sensor networks in enabling data-driven agriculture and decision support systems can not be underestimated21. However, their design still represents a complex Engineering challenge that must be tackled in incremental steps, with thorough field trials, and with an agile development mindset22. For small pilots, Guth et al. (2018) architecture can be a good starting point, but for larger and more complex deployments in agriculture, Intel’s layered architecture could be combined both with Talavera’s et al. (2017) specific modules and with FarmBeats’ emphasis on local processing. In either case, the software and hardware components provided by the open-source community will be instrumental to implement an accessible and replicable pilot for CIAT and other CGIAR centers. 21Smart Agriculture Applications based on LoRa. https://www.semtech.com/lora/lora-applications/smart-agriculture 22Internet of Things for insights from connected devices - Fail fast and learn fast. https://www.ibm.com/cloud/garage/architectures/iotArchitecture/failing-fast 33 7. Glossary The following glossary was compiled from external sources for the reader’s convenience. Each entry has the corresponding hyperlink to the original source for a more thorough description. Reference architecture: in the field of software architecture or enterprise architecture, a reference architecture provides a template solution for an architecture for a particular domain. It also provides a common vocabulary with which to discuss implementations, often with the aim to stress commonality. The reference architecture is often based on the generalization of a set of successful implementations. Reference architectures are instantiated for a particular domain or for specific projects. Big Data: is a field that treats ways to analyze, systematically extract information from, or otherwise deal with data sets that are too large or complex to be dealt with by traditional data-processing application software. Telemetry: is an automated communications process by which measurements and other data are collected at remote or inaccessible points and transmitted to receiving equipment for monitoring. The word is derived from Greek roots: tele = remote, and metron = measure. Wireless Sensor Networks (WSN): refers to a group of spatially dispersed and dedicated sensors for monitoring and recording the physical conditions of the environment and organizing the collected data at a central location. The more modern networks are bi-directional, also enabling control of sensor activity. The WSN is built of "nodes" – from a few to several hundreds or even thousands, where each node is connected to one (or sometimes several) sensors. Each such sensor network node has typically several parts: a radio transceiver with an internal antenna or connection to an external antenna, a microcontroller, an electronic circuit for interfacing with the sensors and an energy source, usually a battery or an embedded form of energy harvesting. Internet of Things (IoT): The Internet of things (IoT) is the extension of Internet connectivity into physical devices and everyday objects. Embedded with electronics, Internet connectivity, and other forms of hardware (such as sensors), these devices can communicate and interact with others over the Internet, and they can be remotely monitored and controlled. Local area networks (LAN): is a computer network that interconnects computers within a limited area such as a residence, school, laboratory, university campus or office building. By contrast, a wide area network (WAN) not only covers a larger geographic distance, but also generally involves leased telecommunication circuits. Ethernet and Wi-Fi are the two most common technologies in use for local area networks. Wide area networks (WAN): Is any telecommunications network or computer network that extends over a large geographical distance/place. Wide-area networks are often established with leased telecommunication circuits. Microservice: is a software development technique—a variant of the service-oriented architecture (SOA) architectural style that structures an application as a collection of loosely coupled services. In a microservices architecture, services are fine-grained and the protocols are lightweight. The benefit of decomposing an application into different smaller services is that it improves modularity. This makes the application easier to understand, develop, test, and become more resilient to architecture erosion. It parallelizes development by enabling small autonomous teams to develop, deploy and scale their respective services independently. It also 34 allows the architecture of an individual service to emerge through continuous refactoring. Microservices-based architectures enable continuous delivery and deployment. Cloud computing: is the on-demand availability of computer system resources, especially data storage and computing power, without direct active management by the user. The term is generally used to describe data centers available to many users over the Internet. Large clouds, predominant today, often have functions distributed over multiple locations from central servers. If the connection to the user is relatively close, it may be designated an edge server. Clouds may be limited to a single organization (enterprise clouds) be available to many organizations (public cloud,) or a combination of both (hybrid cloud). Cloud computing relies on sharing of resources to achieve coherence and economies of scale. Advocates of public and hybrid clouds note that cloud computing allows companies to avoid or minimize up-front IT infrastructure costs. Proponents also claim that cloud computing allows enterprises to get their applications up and running faster, with improved manageability and less maintenance, and that it enables IT teams to more rapidly adjust resources to meet fluctuating and unpredictable demand. Cloud providers typically use a "pay-as-you-go" model, which can lead to unexpected operating expenses if administrators are not familiarized with cloud-pricing models. The availability of high- capacity networks, low-cost computers and storage devices as well as the widespread adoption of hardware virtualization, service-oriented architecture, and autonomic and utility computing has led to growth in cloud computing Serverless: it is a cloud-computing execution model in which the cloud provider runs the server, and dynamically manages the allocation of machine resources. Pricing is based on the actual amount of resources consumed by an application, rather than on pre-purchased units of capacity. It can be a form of utility computing. Serverless computing can simplify the process of deploying code into production. Scaling, capacity planning and maintenance operations may be hidden from the developer or operator. Serverless code can be used in conjunction with code deployed in traditional styles, such as microservices. Alternatively, applications can be written to be purely serverless and use no provisioned servers at all. Orchestration: is the automated configuration, coordination, and management of computer systems and software. A number of tools exist for automation of server configuration and management, including Ansible, Puppet, Salt, Terraform and AWS CloudFormation. For Container Orchestration there are different solutions such as Kubernetes software or managed services such as AWS EKS, AWS ECS or Amazon Fargate. Platform as a Service (PaaS): is a category of cloud computing services that provides a platform allowing customers to develop, run, and manage applications without the complexity of building and maintaining the infrastructure typically associated with developing and launching an app. Infrastructure as a service (IaaS): are online services that provide high-level APIs used to dereference various low- level details of underlying network infrastructure like physical computing resources, location, data partitioning, scaling, security, backup etc. A hypervisor, such as Xen, Oracle VirtualBox, Oracle VM, KVM, VMware ESX/ESXi, or Hyper-V, LXD, runs the virtual machines as guests. Pools of hypervisors within the cloud operational system can support large numbers of virtual machines and the ability to scale services up and down according to customers' varying requirements. Docker: Docker is a computer program that performs operating-system-level virtualization. It was first released in 2013 and is developed by Docker, Inc. Docker is used to run software packages called containers. Containers are isolated from each other and bundle their own application, tools, libraries and configuration files; they can communicate with each other through well-defined channels. All containers are run by a single operating-system 35 kernel and are thus more lightweight than virtual machines. Containers are created from images that specify their precise contents. Images are often created by combining and modifying standard images downloaded from public repositories. Kubernetes: is an open-source container orchestration system for automating application deployment, scaling, and management. It was originally designed by Google, and is now maintained by the Cloud Native Computing Foundation. It aims to provide a "platform for automating deployment, scaling, and operations of application containers across clusters of hosts". It works with a range of container tools, including Docker. Many cloud services offer a Kubernetes-based platform or infrastructure as a service (PaaS or IaaS) on which Kubernetes can be deployed as a platform-providing service. Many vendors also provide their own branded Kubernetes distributions. Snap application packages of software: are self-contained and work across a range of Linux distributions. This is unlike traditional Linux package management approaches, like APT or YUM, which require specifically adapted packages for each Linux distribution therefore adding delay between application development and its deployment for end-users. Snaps themselves have no dependency on any "app store", can be obtained from any source and can be therefore used for upstream software deployment. When snaps are deployed on Ubuntu and other versions of Linux, the Ubuntu app store is used as default back-end, but other stores can be enabled as well. Developers can use snaps to create command line tools and background services as well as desktop applications. With snap application, upgrades via atomic operation or by deltas are possible. Load balancing: In computing, load balancing improves the distribution of workloads across multiple computing resources, such as computers, a computer cluster, network links, central processing units, or disk drives. Load balancing aims to optimize resource use, maximize throughput, minimize response time, and avoid overload of any single resource. Using multiple components with load balancing instead of a single component may increase reliability and availability through redundancy. Load balancing usually involves dedicated software or hardware, such as a multilayer switch or a Domain Name System server process. Database schema: of a database system is its structure described in a formal language supported by the database management system (DBMS). The term "schema" refers to the organization of data as a blueprint of how the database is constructed (divided into database tables in the case of relational databases). The formal definition of a database schema is a set of formulas (sentences) called integrity constraints imposed on a database. In contrast, schema less means the database don't have fixed data structure, such as MongoDB, it has JSON-style data store, therefore you can change the data structure as you wish. Semantic data model (SDM): is a high-level semantics-based database description and structuring formalism (database model) for databases. This database model is designed to capture more of the meaning of an application environment than is possible with contemporary database models. An SDM specification describes a database in terms of the kinds of entities that exist in the application environment, the classifications and groupings of those entities, and the structural interconnections among them. SDM provides a collection of high-level modeling primitives to capture the semantics of an application environment. By accommodating derived information in a database structural specification, SDM allows the same information to be viewed in several ways; this makes it possible to directly accommodate the variety of needs and processing requirements typically present in database applications. The design of the present SDM is based on our experience in using a preliminary version of it. SDM is designed to enhance the effectiveness and usability of database systems. An SDM database description can serve as a formal specification and documentation tool for a database; it can provide a basis for supporting a variety of powerful user interface facilities, it can serve as a conceptual database model in the database design process; and, it can be used as the database model for a new kind of database management system. 36 Metadata: "data [information] that provides information about other data". Many distinct types of metadata exist, among these descriptive metadata, structural metadata, administrative metadata, reference metadata and statistical metadata. Descriptive metadata describes a resource for purposes such as discovery and identification. It can include elements such as title, abstract, author, and keywords. Structural metadata is metadata about containers of data and indicates how compound objects are put together, for example, how pages are ordered to form chapters. It describes the types, versions, relationships and other characteristics of digital materials. Administrative metadata provides information to help manage a resource, such as when and how it was created, file type and other technical information, and who can access it. Reference metadata describes the contents and quality of statistical data. Data normalization: Database normalization is the process of structuring a relational database in accordance with a series of so-called normal forms in order to reduce data redundancy and improve data integrity. It was first proposed by Edgar F. Codd as an integral part of his relational model. Normalization entails organizing the columns (attributes) and tables (relations) of a database to ensure that their dependencies are properly enforced by database integrity constraints. It is accomplished by applying some formal rules either by a process of synthesis (creating a new database design) or decomposition (improving an existing database design). Sensors: in the broadest definition, a sensor is a device, module, or subsystem whose purpose is to detect events or changes in its environment and send the information to other electronics, frequently a computer processor. A sensor is always used with other electronics. Actuators: An actuator is the mechanism by which a control system acts upon an environment. The control system can be simple (a fixed mechanical or electronic system), software-based (e.g. a printer driver, robot control system), a human, or any other input. Gateway: is a piece of networking hardware used for telecommunications networks that allows data to flow from one discrete network to another. Gateways are distinct from routers or switches in that they communicate using more than one protocol and can operate at any of the seven layers of the open systems interconnection model (OSI). Edge computing: is a distributed computing paradigm which brings computer data storage closer to the location where it is needed. Computation is largely or completely performed on distributed device nodes. Edge computing pushes applications, data and computing power (services) away from centralized points to locations closer to the user. The target of edge computing is any application or general functionality needing to be closer to the source of the action where distributed systems technology interacts with the physical world. Edge computing does not need contact with any centralized cloud, although it may interact with one. Fog computing: or fog networking is an architecture that uses edge devices to carry out a substantial amount of computation, storage, communication locally and routed over the internet backbone. Analog-to-digital converter (ADC, A/D, or A-to-D): In electronics, is a system that converts an analog signal, such as a sound picked up by a microphone or light entering a digital camera, into a digital signal. An ADC may also provide an isolated measurement such as an electronic device that converts an input analog voltage or current to a digital number representing the magnitude of the voltage or current. Digital-to-analog converter (DAC D/A, D2A, or D-to-A): In electronics, a digital-to-analog converter (DAC, D/A, D2A, or D-to-A) is a system that converts a digital signal into an analog signal. There are several DAC architectures; the 37 suitability of a DAC for a particular application is determined by figures of merit including: resolution, maximum sampling frequency and others. Application Programming Interface (API) is a set of subroutine definitions, communication protocols, and tools for building software. In general terms, it is a set of clearly defined methods of communication among various components. A good API makes it easier to develop a computer program by providing all the building blocks, which are then put together by the programmer. An API may be for a web-based system, operating system, database system, computer hardware, or software library. An API specification can take many forms, but often includes specifications for routines, data structures, object classes, variables, or remote calls. POSIX, Windows API and ASPI are examples of different forms of APIs. Documentation for the API usually is provided to facilitate usage and implementation. AWS Lambda: is a compute service that lets you run code without provisioning or managing servers. AWS Lambda executes your code only when needed and scales automatically, from a few requests per day to thousands per second. You pay only for the compute time you consume - there is no charge when your code is not running. With AWS Lambda, you can run code for virtually any type of application or backend service - all with zero administration. AWS Lambda runs your code on a high-availability compute infrastructure and performs all of the administration of the compute resources, including server and operating system maintenance, capacity provisioning and automatic scaling, code monitoring and logging. All you need to do is supply your code in one of the languages that AWS Lambda supports. Software development kit (SDK or devkit): is typically a set of software development tools that allows the creation of applications for a certain software package, software framework, hardware platform, computer system, video game console, operating system, or similar development platform. To enrich applications with advanced functionalities, advertisements, push notifications, and more, most app developers implement specific software development kits. Some SDKs are critical for developing a platform-specific app. Firmware: In electronic systems and computing, firmware is a specific class of computer software that provides the low-level control for the device's specific hardware. Firmware can either provide a standardized operating environment for the device's more complex software (allowing more hardware-independence), or, for less complex devices, act as the device's complete operating system, performing all control, monitoring and data manipulation functions. Typical examples of devices containing firmware are embedded systems, consumer appliances, computers, computer peripherals, and others. Almost all electronic devices beyond the simplest contain some firmware. Real-Time Operating System (RTOS): is any operating system (OS) intended to serve real-time applications that process data as it comes in, typically without buffer delays. Processing time requirements (including any OS delay) are measured in tenths of seconds or shorter increments of time. A real time system is a time bound system which has well defined fixed time constraints. Processing must be done within the defined constraints or the system will fail. Machine to machine (commonly abbreviated as M2M): refers to direct communication between devices using any communications channel, including wired and wireless. Machine to machine communication can include industrial instrumentation, enabling a sensor or meter to communicate the data it records (such as temperature, inventory level, etc.) to application software that can use it (for example, adjusting an industrial process based on temperature or placing orders to replenish inventory). Such communication was originally accomplished by having a remote network of machines relay information back to a central hub for analysis, which would then be rerouted 38 into a system like a personal computer. More recent machine to machine communication has changed into a system of networks that transmits data to personal appliances. The expansion of IP networks around the world has made machine to machine communication quicker and easier while using less power. These networks also allow new business opportunities for consumers and suppliers. Device Shadow Service: is a JSON document that is used to store and retrieve current state information for a device. The Device Shadow service maintains a shadow for each device you connect to AWS IoT. You can use the shadow to get and set the state of a device over MQTT or HTTP, regardless of whether the device is connected to the Internet. Each device's shadow is uniquely identified by the name of the corresponding thing. Device provisioning: is the act of supplying each of your connected products with the code and credentials/certificates it needs to uniquely and securely identify itself to your IoT network, and operate smoothly on first time use in a customer's hands. Secure provisioning is an important design consideration for connected product systems. Project Jupyter: is a nonprofit organization created to "develop open-source software, open-standards, and services for interactive computing across dozens of programming languages". Spun-off from IPython in 2014 by Fernando Pérez, Project Jupyter supports execution environments in several dozen languages. Project Jupyter's name is a reference to the three core programming languages supported by Jupyter, which are Julia, Python and R. Project Jupyter has developed and supported the interactive computing products Jupyter Notebook, Jupyter Hub, and Jupyter Lab, the next-generation version of Jupyter Notebook. IPv4: Internet Protocol version 4 (IPv4) is the fourth version of the Internet Protocol (IP). It is one of the core protocols of standards-based internetworking methods in the Internet, and was the first version deployed for production in the ARPANET in 1983. It still routes most Internet traffic today, despite the ongoing deployment of a successor protocol, IPv6. IPv4 is described in IETF publication RFC 791 (September 1981), replacing an earlier definition (RFC 760, January 1980). IPv4 is a connectionless protocol for use on packet-switched networks. It operates on a best effort delivery model, in that it does not guarantee delivery, nor does it assure proper sequencing or avoidance of duplicate delivery. These aspects, including data integrity, are addressed by an upper layer transport protocol, such as the Transmission Control Protocol (TCP). IPv6: Internet Protocol version 6 (IPv6) is the most recent version of the Internet Protocol (IP), the communications protocol that provides an identification and location system for computers on networks and routes traffic across the Internet. IPv6 was developed by the Internet Engineering Task Force (IETF) to deal with the long-anticipated problem of IPv4 address exhaustion. IPv6 is intended to replace IPv4.[1] In December 1998, IPv6 became a Draft Standard for the IETF, who subsequently ratified it as an Internet Standard on 14 July 2017. Devices on the Internet are assigned a unique IP address for identification and location definition. With the rapid growth of the Internet after commercialization in the 1990s, it became evident that far more addresses would be needed to connect devices than the IPv4 address space had available. By 1998, the Internet Engineering Task Force (IETF) had formalized the successor protocol. IPv6 uses a 128-bit address, theoretically allowing 2128, or approximately 3.4×1038 addresses. The actual number is slightly smaller, as multiple ranges are reserved for special use or completely excluded from use. The total number of possible IPv6 addresses is more than 7.9×1028 times as many as IPv4, which uses 32-bit addresses and provides approximately 4.3 billion addresses. The two protocols are not designed to be interoperable, complicating the transition to IPv6. However, several IPv6 transition mechanisms have been devised to permit communication between IPv4 and IPv6 hosts. 39 TLS: Transport Layer Security, and its now-deprecated predecessor, Secure Sockets Layer (SSL), are cryptographic protocols designed to provide communications security over a computer network. Several versions of the protocols find widespread use in applications such as web browsing, email, instant messaging, and voice over IP (VoIP). Websites can use TLS to secure all communications between their servers and web browsers. The TLS protocol aims primarily to provide privacy and data integrity between two or more communicating computer applications. DNS: The Domain Name System is a hierarchical and decentralized naming system for computers, services, or other resources connected to the Internet or a private network. It associates various information with domain names assigned to each of the participating entities. Most prominently, it translates more readily memorized domain names to the numerical IP addresses needed for locating and identifying computer services and devices with the underlying network protocols. By providing a worldwide, distributed directory service, the Domain Name System has been an essential component of the functionality of the Internet since 1985. The Domain Name System delegates the responsibility of assigning domain names and mapping those names to Internet resources by designating authoritative name servers for each domain. Network administrators may delegate authority over sub-domains of their allocated name space to other name servers. This mechanism provides distributed and fault- tolerant service and was designed to avoid a single large central database. HTTPS: Hypertext Transfer Protocol Secure is an extension of the Hypertext Transfer Protocol (HTTP). It is used for secure communication over a computer network, and is widely used on the Internet. In HTTPS, the communication protocol is encrypted using Transport Layer Security (TLS), or, formerly, its predecessor, Secure Sockets Layer (SSL). The protocol is therefore also often referred to as HTTP over TLS, or HTTP over SSL. The principal motivation for HTTPS is authentication of the accessed website and protection of the privacy and integrity of the exchanged data while in transit. It protects against man-in-the-middle attacks. The bidirectional encryption of communications between a client and server protects against eavesdropping and tampering of the communication. Get/Post methods: In computing, POST is a request method supported by HTTP used by the World Wide Web. By design, the POST request method requests that a web server accepts the data enclosed in the body of the request message, most likely for storing it. It is often used when uploading a file or when submitting a completed web form. In contrast, the HTTP GET request method retrieves information from the server. As part of a GET request, some data can be passed within the URL's query string, specifying (for example) search terms, date ranges, or other information that defines the query. As part of a POST request, an arbitrary amount of data of any type can be sent to the server in the body of the request message. A header field in the POST request usually indicates the message body's Internet media type. REST (Representational State Transfer): is a software architectural style that defines a set of constraints to be used for creating Web services. Web services that conform to the REST architectural style, termed RESTful Web services (RWS), provide interoperability between computer systems on the Internet. RESTful Web services allow the requesting systems to access and manipulate textual representations of Web resources by using a uniform and predefined set of stateless operations. Other kinds of Web services, such as SOAP Web services, expose their own arbitrary sets of operations. WebSockets: WebSocket is a computer communications protocol, providing full-duplex communication channels over a single TCP connection. The WebSocket protocol was standardized by the IETF as RFC 6455 in 2011, and the WebSocket API in Web IDL is being standardized by the W3C. WebSocket is a different protocol from HTTP. Both protocols are located at layer 7 in the OSI model and depend on TCP at layer 4. Although they are different, RFC 40 6455 states that WebSocket "is designed to work over HTTP ports 80 and 443 as well as to support HTTP proxies and intermediaries" thus making it compatible with the HTTP protocol. To achieve compatibility, the WebSocket handshake uses the HTTP Upgrade header to change from the HTTP protocol to the WebSocket protocol. The WebSocket protocol enables interaction between a web browser (or other client application) and a web server with lower overheads, facilitating real-time data transfer from and to the server. This is made possible by providing a standardized way for the server to send content to the client without being first requested by the client, and allowing messages to be passed back and forth while keeping the connection open. In this way, a two-way ongoing conversation can take place between the client and the server. The communications are done over TCP port number 80 (or 443 in the case of TLS-encrypted connections), which is of benefit for those environments which block non-web Internet connections using a firewall. Most browsers support the protocol, including Google Chrome, Microsoft Edge, Internet Explorer, Firefox, Safari and Opera. XMPP: Extensible Messaging and Presence Protocol (XMPP) is a communication protocol for message-oriented middleware based on XML (Extensible Markup Language). It enables the near-real-time exchange of structured yet extensible data between any two or more network entities. Originally named Jabber, the protocol was developed by the homonym open-source community in 1999 for near real-time instant messaging (IM), presence information, and contact list maintenance. Designed to be extensible, the protocol has been used also for publish- subscribe systems, signalling for VoIP, video, file transfer, gaming, the Internet of Things (IoT) applications such as the smart grid, and social networking services. Unlike most instant messaging protocols, XMPP is defined in an open standard and uses an open systems approach of development and application, by which anyone may implement an XMPP service and interoperate with other organizations' implementations. Because XMPP is an open protocol, implementations can be developed using any software license and many server, client, and library implementations are distributed as free and open-source software. Numerous freeware and commercial software implementations also exist. CoAP (Constrained Application Protocol): Constrained Application Protocol (CoAP) is a specialized Internet Application Protocol for constrained devices, as defined in RFC 7252. It enables those constrained devices called "nodes" to communicate with the wider Internet using similar protocols. CoAP is designed for use between devices on the same constrained network (e.g., low-power, lossy networks), between devices and general nodes on the Internet, and between devices on different constrained networks both joined by an internet. CoAP is also being used via other mechanisms, such as SMS on mobile communication networks. CoAP is a service layer protocol that is intended for use in resource-constrained internet devices, such as wireless sensor network nodes. CoAP is designed to easily translate to HTTP for simplified integration with the web, while also meeting specialized requirements such as multicast support, very low overhead, and simplicity. Multicast, low overhead, and simplicity are extremely important for Internet of Things (IoT) and Machine-to-Machine (M2M) devices, which tend to be deeply embedded and have much less memory and power supply than traditional internet devices have. Therefore, efficiency is very important. CoAP can run on most devices that support UDP or a UDP analogue. The Internet Engineering Task Force (IETF) Constrained RESTful Environments Working Group (CoRE) has done the major standardization work for this protocol. In order to make the protocol suitable to IoT and M2M applications, various new functionalities have been added. The core of the protocol is specified in RFC 7252; important extensions are in various stages of the standardization process. MQTT (Message Queue Telemetry Transport): is an ISO standard (ISO/IEC PRF 20922) publish-subscribe-based messaging protocol. It works on top of the TCP/IP protocol. It is designed for connections with remote locations where a "small code footprint" is required or the network bandwidth is limited. The publish-subscribe messaging pattern requires a message broker. 41 AMQP (Advanced Message Queuing Protocol): is an open standard application layer protocol for message- oriented middleware. The defining features of AMQP are message orientation, queuing, routing (including point- to-point and publish-and-subscribe), reliability and security. AMQP mandates the behavior of the messaging provider and client to the extent that implementations from different vendors are interoperable, in the same way as SMTP, HTTP, FTP, etc. have created interoperable systems. Previous standardizations of middleware have happened at the API level (e.g. JMS) and were focused on standardizing programmer interaction with different middleware implementations, rather than on providing interoperability between multiple implementations. Unlike JMS, which defines an API and a set of behaviors that a messaging implementation must provide, AMQP is a wire-level protocol. A wire-level protocol is a description of the format of the data that is sent across the network as a stream of bytes. Consequently, any tool that can create and interpret messages that conform to this data format can interoperate with any other compliant tool irrespective of implementation language. AMQP is a binary, application layer protocol, designed to efficiently support a wide variety of messaging applications and communication patterns. It provides flow controlled, message-oriented communication with message-delivery guarantees such as at-most-once (where each message is delivered once or never), at-least-once (where each message is certain to be delivered, but may do so multiple times) and exactly-once (where the message will always certainly arrive and do so only once), and authentication and/or encryption based on SASL and/or TLS. It assumes an underlying reliable transport layer protocol such as Transmission Control Protocol (TCP). OPC UA: OPC Unified Architecture is a machine to machine communication protocol for industrial automation developed by the OPC Foundation. Distinguishing characteristics are: focus on communicating with industrial equipment and systems for data collection and control, open (freely available and implementable under GPL 2.0 license Cross-platform) and not tied to one operating system or programming language, service-oriented architecture (SOA), Inherent complexity, robust security Integral information model. OMA-DM: OMA Device Management is a device management protocol specified by the Open Mobile Alliance (OMA) Device Management (DM) Working Group and the Data Synchronization (DS) Working Group. Device Management refers to the management of Device configuration and other managed objects of Devices from the point of view of the Management Authorities. Device Management includes, but is not restricted to setting initial configuration information in Devices, subsequent updates of persistent information in Devices, retrieval of management information from Devices, execute primitives on Devices, and processing events and alarms generated by Devices. Device Management allows network operators, service providers or corporate information management departments to carry out the procedures of configuring devices on behalf of the end user (customer). OMA DM Version 2.0 reuses the Management Objects which are designed for OMA DM Version 1.3 or earlier DM Protocols. OMA DM Version 2.0 introduces the new Client-Server DM protocol and a new user interaction method for Device Management using the Web Browser Component. White spaces: In telecommunications, white spaces refer to frequencies allocated to a broadcasting service but not used locally. National and international bodies assign different frequencies for specific uses, and in most cases license the rights to broadcast over these frequencies. This frequency allocation process creates a band plan, which for technical reasons assigns white space between used radio bands or channels to avoid interference. In this case, while the frequencies are unused, they have been specifically assigned for a purpose, such as a guard band. Most commonly however, these white spaces exist naturally between used channels, since assigning nearby transmissions to immediately adjacent channels will cause destructive interference to both. In addition to white space assigned for technical reasons, there is also unused radio spectrum which has either never been used, or is becoming free as a result of technical changes. In particular, the switchover to digital television frees up large areas between about 50 MHz and 700 MHz. This is because digital transmissions can be packed into adjacent 42 channels, while analog ones cannot. This means that the band can be "compressed" into fewer channels, while still allowing for more transmissions. In the United States, the abandoned television frequencies are primarily in the upper UHF "700-megahertz" band, covering TV channels 52 to 69 (698 to 806 MHz). U.S. television and its white spaces will continue to exist in UHF frequencies, as well as VHF frequencies for which mobile users and white-space devices require larger antennas. In the rest of the world, the abandoned television channels are VHF, and the resulting large VHF white spaces are being reallocated for the worldwide (except the U.S.) digital radio standard DAB and DAB+, and DMB. White Space Internet: White Space Internet uses a part of the radio spectrum known as White spaces (radio). This frequency range is created when there are gaps between television channels. These spaces can provide broadband internet access that is similar to that of 4G mobile. WiFi: is technology for radio wireless local area networking of devices based on the IEEE 802.11 standards. Wi‑Fi is a trademark of the Wi-Fi Alliance. Wi-Fi compatible devices can connect to the Internet via a WLAN and a wireless access point. Such an access point (or hotspot) has a range of about 20 meters (66 feet) indoors and a greater range outdoors. Hotspot coverage can be as small as a single room with walls that block radio waves, or as large as many square kilometers achieved by using multiple overlapping access points. Different versions of Wi- Fi exist, with different ranges, radio bands and speeds. Wi-Fi most commonly uses the 2.4 gigahertz UHF and 5 gigahertz SHF ISM radio bands; these bands are subdivided into multiple channels. Each channel can be time- shared by multiple networks. These wavelengths work best for line-of-sight. IEEE 802.11af: also referred to as White-Fi and Super Wi-Fi, is a wireless computer networking standard in the 802.11 family, that allows wireless local area network (WLAN) operation in TV white space spectrum in the VHF and UHF bands between 54 and 790 MHz. The standard was approved in February 2014. Cognitive radio technology is used to transmit on unused portions of TV channel band allocations, with the standard taking measures to limit interference for primary users, such as analog TV, digital TV, and wireless microphones. IEEE 802.11ah: is a wireless networking protocol published in 2017 to be called Wi-Fi HaLow (pronounced "HEY- Low") as an amendment of the IEEE 802.11-2007 wireless networking standard. It uses 900 MHz license exempt bands to provide extended range Wi-Fi networks, compared to conventional Wi-Fi networks operating in the 2.4 GHz and 5 GHz bands. It also benefits from lower energy consumption, allowing the creation of large groups of stations or sensors that cooperate to share signals, supporting the concept of the Internet of Things (IoT). The protocol's low power consumption competes with Bluetooth and has the added benefit of higher data rates and wider coverage range. IEEE 802.15.4: is a technical standard which defines the operation of low-rate wireless personal area networks (LR-WPANs). It specifies the physical layer and media access control for LR-WPANs, and is maintained by the IEEE 802.15 working group, which defined the standard in 2003. It is the basis for the Zigbee, ISA100.11a, WirelessHART, MiWi, 6LoWPAN, Thread and SNAP specifications, each of which further extends the standard by developing the upper layers which are not defined in IEEE 802.15.4. In particular, 6LoWPAN defines a binding for the IPv6 version of the Internet Protocol (IP) over WPANs, and is itself used by upper layers like Thread. 6LoWPAN: is an acronym of IPv6 over Low-Power Wireless Personal Area Networks. 6LoWPAN is the name of a concluded working group in the Internet area of the IETF. The 6LoWPAN concept originated from the idea that "the Internet Protocol could and should be applied even to the smallest devices," and that low-power devices with limited processing capabilities should be able to participate in the Internet of Things. The 6LoWPAN group has defined encapsulation and header compression mechanisms that allow IPv6 packets to be sent and received over 43 IEEE 802.15.4 based networks. IPv4 and IPv6 are the work horses for data delivery for local-area networks, metropolitan area networks, and wide-area networks such as the Internet. Likewise, IEEE 802.15.4 devices provide sensing communication-ability in the wireless domain. Thread: is an IPv6-based, low-power mesh networking technology for IoT products, intended to be secure and future-proof. The Thread protocol specification is available at no cost, however this requires agreement and continued adherence to an EULA. In July 2014, the "Thread Group" alliance was announced, which is a working group with the companies Nest Labs (a subsidiary of Alphabet/Google), Samsung, ARM Holdings, Qualcomm, NXP Semiconductors/Freescale, Silicon Labs, Big Ass Solutions, Somfy, OSRAM, Tyco International, and the lock company Yale in an attempt to have Thread become the industry standard by providing Thread certification for products. Thread uses 6LoWPAN, which in turn uses the IEEE 802.15.4 wireless protocol with mesh communication, as does Zigbee and other systems. Thread however is IP-addressable, with cloud access and AES encryption. A BSD licensed open-source implementation of Thread (called "OpenThread") has also been released by Nest. WiHART (WirelessHART): is a wireless sensor networking technology based on the Highway Addressable Remote Transducer Protocol (HART). Developed as a multi-vendor, interoperable wireless standard, WirelessHART was defined for the requirements of process field device networks. The protocol utilizes a time synchronized, self- organizing, and self-healing mesh architecture. The protocol supports operation in the 2.4 GHz ISM band using IEEE 802.15.4 standard radios. The underlying wireless technology is based on the work of Dust Networks' TSMP technology. Zigbee: is an IEEE 802.15.4-based specification for a suite of high-level communication protocols used to create personal area networks with small, low-power digital radios, such as for home automation, medical device data collection, and other low-power low-bandwidth needs, designed for small scale projects which need wireless connection. Hence, Zigbee is a low-power, low data rate, and close proximity (i.e., personal area) wireless ad hoc network. The technology defined by the Zigbee specification is intended to be simpler and less expensive than other wireless personal area networks (WPANs), such as Bluetooth or more general wireless networking such as Wi-Fi. Applications include wireless light switches, home energy monitors, traffic management systems, and other consumer and industrial equipment that requires short-range low-rate wireless data transfer. Its low power consumption limits transmission distances to 10–100 meters line-of-sight, depending on power output and environmental characteristics. Zigbee devices can transmit data over long distances by passing data through a mesh network of intermediate devices to reach more distant ones. Zigbee is typically used in low data rate applications that require long battery life and secure networking (Zigbee networks are secured by 128 bit symmetric encryption keys). Zigbee has a defined rate of 250 kbit/s, best suited for intermittent data transmissions from a sensor or input device. Zigbee was conceived in 1998, standardized in 2003, and revised in 2006. The name refers to the waggle dance of honey bees after their return to the beehive. Z-Wave: is a wireless communications protocol used primarily for home automation. It is a mesh network using low-energy radio waves to communicate from appliance to appliance, allowing for wireless control of residential appliances and other devices, such as lighting control, security systems, thermostats, windows, locks, swimming pools and garage door openers. Like other protocols and systems aimed at the home and office automation market. Z-Wave provides the application layer interoperability between home control systems of different manufacturers that are a part of its alliance. There are a growing number of interoperable Z-Wave products; over 1,700 in 2017, and over 2,600 by 2019. 44 LoRA (Long Range): is a patented digital wireless data communication technology developed by Cycleo of Grenoble, France, and acquired by Semtech in 2012. LoRa is a long-range wireless communication protocol that competes against other low-power wide-area network (LPWAN) wireless such as narrowband IoT (NB IoT) or LTE Cat M1. Compared to those, LoRa achieves its extremely long-range connectivity, possible 10km+, by trading off data rate. Because its data rates are below 50kbps and because LoRa is limited by duty cycle and other restrictions, it is suitable in practice for non-real time applications in which one can tolerate delays. LoRaWAN: Since LoRa defines the lower physical layer, the upper networking layers were lacking. LoRaWAN was developed to define the upper layers of the network. LoRaWAN is a media access control (MAC) layer protocol but acts mainly as a network layer protocol for managing communication between LPWAN gateways and end- node devices as a routing protocol, maintained by the LoRa Alliance. Version 1.0 of the LoRaWAN specification was released in June 2015. LoRaWAN defines the communication protocol and system architecture for the network, while the LoRa physical layer enables the long-range communication link. LoRaWAN is also responsible for managing the communication frequencies, data rate, and power for all devices. Devices in the network are asynchronous and transmit when they have data available to send. Data transmitted by an end-node device is received by multiple gateways, which forward the data packets to a centralized network server. The network server filters duplicate packets, performs security checks, and manages the network. Data are then forwarded to application servers. The technology shows high reliability for the moderate load, however, it has some performance issues related to sending acknowledgements. Sigfox: is a French global network operator founded in 2009 that builds wireless networks to connect low-power objects such as electricity meters and smartwatches, which need to be continuously on and emitting small amounts of data. Sigfox employs the differential binary phase-shift keying (DBPSK) and the Gaussian frequency shift keying (GFSK) that enables communication using the Industrial, Scientific and Medical ISM radio band which uses 868MHz in Europe and 902MHz in the US. It utilizes a wide-reaching signal that passes freely through solid objects, called "Ultra Narrowband" and requires little energy, being termed "Low-power Wide-area network (LPWAN)". The network is based on one-hop star topology and requires a mobile operator to carry the generated traffic. The signal can also be used to easily cover large areas and to reach underground objects. The ISM radio bands support limited bidirectional communication. The existing standard for Sigfox communications supports up to 140 uplink messages a day, each of which can carry a payload of 12 octets at a data rate of up to 100 bytes per second. Symphony Link: is a wireless solution for enterprise and industrial customers who need to securely connect their IoT devices to the cloud. It allows to expand the network range using power-efficient repeaters without impacting latency, it acknowledges all uplink and downlink messages to ensure successful transmission from devices, it manages frequencies, time slots, node privilege and throughput to ensure QoS, and it economizes on resources by enabling patch security issue patches or new feature or bug fix management without physical, human attention. Weightless: is a set of LPWAN open wireless technology standards for exchanging data between a base station and thousands of machines around it. These technologies allow developers to build Low-Power Wide-Area Networks (LPWAN). In an initiative reflecting the strong market traction of Weightless-P, the Weightless SIG’s has renamed the technology simply “Weightless” and has made it its core focus moving forward. Weightless hardware was first released by Ubiik Inc in July 2017 and since then the ecosystem has grown to over 100 companies spread over 40 countries. Originally, there was three published Weightless connectivity standards Weightless-P, Weightless-N and Weightless-W. Weightless-N was an uplink only LPWAN technology. Weightless W was designed 45 to operate in the TV whitespace. Weightless (Weightless-P) was the true winner with its true bi-directional, narrowband technology designed to be operated in global licensed and unlicensed ISM frequencies. Weightless is managed by the Weightless SIG, or Special Interest Group. The intention is that devices must be qualified by the Weightless Special Interest Group to standards defined by the SIG. Patents would only be licensed to those qualifying devices; thus the protocol, whilst open, may be regarded as proprietary. General Packet Radio Service (GPRS): is a packet oriented mobile data standard on the 2G and 3G cellular communication network's global system for mobile communications (GSM). GPRS was established by European Telecommunications Standards Institute (ETSI) in response to the earlier CDPD and i-mode packet-switched cellular technologies. It is now maintained by the 3rd Generation Partnership Project (3GPP). GPRS is typically sold according to the total volume of data transferred during the billing cycle, in contrast with circuit switched data, which is usually billed per minute of connection time, or sometimes by one-third minute increments. Usage above the GPRS bundled data cap may be charged per MB of data, speed limited, or disallowed. GPRS is a best-effort service, implying variable throughput and latency that depend on the number of other users sharing the service concurrently, as opposed to circuit switching, where a certain quality of service (QoS) is guaranteed during the connection. Narrowband IoT (NB-IoT): is a Low Power Wide Area Network (LPWAN) radio technology standard developed by 3GPP to enable a wide range of cellular devices and services. The specification was frozen in 3GPP Release 13 (LTE Advanced Pro), in June 2016. Other 3GPP IoT technologies include eMTC (enhanced Machine-Type Communication) and EC-GSM-IoT. NB-IoT focuses specifically on indoor coverage, low cost, long battery life, and high connection density. NB-IoT uses a subset of the LTE standard, but limits the bandwidth to a single narrow- band of 200kHz. It uses OFDM modulation for downlink communication and SC-FDMA for uplink communications. In March 2019, the Global Mobile Suppliers Association announced that over 100 operators have deployed/launched either NB-IoT or LTE-M networks. 2G (or 2-G): is short for second-generation cellular technology. Second-generation 2G cellular networks were commercially launched on the GSM standard in Finland by Radiolinja (now part of Elisa Oyj) in 1991. Three primary benefits of 2G networks over their predecessors were that phone conversations were digitally encrypted; 2G systems were significantly more efficient on the spectrum enabling far greater wireless penetration levels; and 2G introduced data services for mobile, starting with SMS text messages. 2G technologies enabled the various networks to provide the services such as text messages, picture messages, and MMS (multimedia messages). All text messages sent over 2G are digitally encrypted, allowing the transfer of data in such a way that only the intended receiver can receive and read it. After 2G was launched, the previous mobile wireless network systems were retroactively dubbed 1G. While radio signals on 1G networks are analog, radio signals on 2G networks are digital. Both systems use digital signaling to connect the radio towers (which listen to the devices) to the rest of the mobile system. With General Packet Radio Service (GPRS), 2G offers a theoretical maximum transfer speed of 50 kbit/s (40 kbit/s in practice). With EDGE (Enhanced Data Rates for GSM Evolution), there is a theoretical maximum transfer speed of 1 Mbit/s (500 kbit/s in practice). 3G: short for third generation, is the third generation of wireless mobile telecommunications technology. It is the upgrade for 2G and 2.5G GPRS networks, for faster internet speed. This is based on a set of standards used for mobile devices and mobile telecommunications use services and networks that comply with the International Mobile Telecommunications-2000 (IMT-2000) specifications by the International Telecommunication Union. 3G finds application in wireless voice telephony, mobile Internet access, fixed wireless Internet access, video calls and mobile TV. 3G telecommunication networks support services that provide an information transfer rate of at least 46 0.2 Mbit/s. Later 3G releases, often denoted 3.5G and 3.75G, also provide mobile broadband access of several Mbit/s to smartphones and mobile modems in laptop computers. This ensures it can be applied to wireless voice telephony, mobile Internet access, fixed wireless Internet access, video calls and mobile TV technologies. The first 3G networks were introduced in 1998. 4G: is the fourth generation of broadband cellular network technology, succeeding 3G. A 4G system must provide capabilities defined by ITU in IMT Advanced. Potential and current applications include amended mobile web access, IP telephony, gaming services, high-definition mobile TV, video conferencing, and 3D television. The first- release Long Term Evolution (LTE) standard was commercially deployed in Oslo, Norway, and Stockholm, Sweden in 2009, and has since been deployed throughout most parts of the world. 5G: (short for 5th Generation) is a commonly used term for certain advanced wireless systems. Industry association 3GPP defines any system using "5G NR" (5G New Radio) software as "5G", a definition that came into general use by late 2018. Others may reserve the term for systems that meet the requirements of the ITU IMT- 2020, which represents more nations. 3GPP will submit their 5G NR to the ITU. It follows 2G, 3G and 4G and their respective associated technologies (Like GSM, UMTS, LTE, LTE Advanced Pro, etc.) The first fairly substantial deployments were in April, 2019. In South Korea, SK Telecom claimed 38,000 base stations, KT Corporation 30,000 and LG U Plus 18,000. 85% are in six major cities. They are using 3.5 GHz (sub-6) spectrum and tested speeds were from 193 to 430 Mbit/s down. All carriers use Samsung base stations and equipment. Verizon opened service on a very limited number of base stations in the US cities of Chicago and Minneapolis using 400 MHz of 28 GHz millimeter wave spectrum. Download speeds in Chicago were from 80 to 634 Mbit/s. Upload speeds were from 12 to 57 Mbit/s. Ping was 25 milliseconds. There are only 5 companies in the world offering 5G radio hardware and complete systems: Huawei, ZTE, Nokia, Samsung, and Ericsson LTE-M or LTE-MTC (Machine Type Communication): which includes eMTC (enhanced Machine Type Communication), is a type of low power wide area network (LPWAN) radio technology standard developed by 3GPP to enable a wide range of cellular devices and services (specifically, for machine-to-machine and Internet of Things applications). The specification for eMTC (LTE Cat-M1) was frozen in 3GPP Release 13 (LTE Advanced Pro), in June 2016. The advantage of LTE-M over NB-IoT is its comparatively higher data rate, mobility, and voice over the network, but it requires more bandwidth, is more costly, and cannot be put into guard band frequency band for now. Bluetooth: Bluetooth is a wireless technology standard for exchanging data between fixed and mobile devices over short distances using short-wavelength UHF radio waves in the industrial, scientific and medical radio bands, from 2.4 to 2.485 GHz, and building personal area networks (PANs). It was originally conceived as a wireless alternative to RS-232 data cables. Bluetooth is managed by the Bluetooth Special Interest Group (SIG), which has more than 30,000 member companies in the areas of telecommunication, computing, networking, and consumer electronics. The IEEE standardized Bluetooth as IEEE 802.15.1, but no longer maintains the standard. The Bluetooth SIG oversees development of the specification, manages the qualification program, and protects the trademarks. A manufacturer must meet Bluetooth SIG standards to market it as a Bluetooth device. BTLE: Bluetooth Low Energy (Bluetooth LE, colloquially BLE, formerly marketed as Bluetooth Smart) is a wireless personal area network technology designed and marketed by the Bluetooth Special Interest Group (Bluetooth SIG) aimed at novel applications in the healthcare, fitness, beacons, security, and home entertainment industries. Compared to Classic Bluetooth, Bluetooth Low Energy is intended to provide considerably reduced power consumption and cost while maintaining a similar communication range. Mobile operating systems including iOS, 47 Android, Windows Phone and BlackBerry, as well as macOS, Linux, Windows, natively support Bluetooth Low Energy. Ethernet: is a family of computer networking technologies commonly used in local area networks (LAN), metropolitan area networks (MAN) and wide area networks (WAN). It was commercially introduced in 1980 and first standardized in 1983 as IEEE 802.3, and has since retained a good deal of backward compatibility and been refined to support higher bit rates and longer link distances. Over time, Ethernet has largely replaced competing wired LAN technologies such as Token Ring, FDDI and ARCNET. The original 10BASE5 Ethernet uses coaxial cable as a shared medium, while the newer Ethernet variants use twisted pair and fiber optic links in conjunction with switches. Over the course of its history, Ethernet data transfer rates have been increased from the original 2.94 megabits per second (Mbit/s) to the latest 400 gigabits per second (Gbit/s). The Ethernet standards comprise several wiring and signaling variants of the OSI physical layer in use with Ethernet. Systems communicating over Ethernet divide a stream of data into shorter pieces called frames. Each frame contains source and destination addresses, and error-checking data so that damaged frames can be detected and discarded; most often, higher- layer protocols trigger retransmission of lost frames. As per the OSI model, Ethernet provides services up to and including the data link layer. Features such as the 48-bit MAC address and Ethernet frame format have influenced other networking protocols including Wi-Fi wireless networking technology. Ethernet is widely used in home and industry. The Internet Protocol is commonly carried over Ethernet and so it is considered one of the key technologies that make up the Internet. PoE: Power over Ethernet describes any of several standard or ad-hoc systems which pass electric power along with data on twisted pair Ethernet cabling. This allows a single cable to provide both data connection and electric power to devices such as wireless access points, IP cameras, and VoIP phones. There are several common techniques for transmitting power over Ethernet cabling. Three of them have been standardized by IEEE 802.3 since 2003. These standards are known as Alternative A, Alternative B, and 4PPoE. RFID: Radio-frequency identification uses electromagnetic fields to automatically identify and track tags attached to objects. The tags contain electronically stored information. Passive tags collect energy from a nearby RFID reader's interrogating radio waves. Active tags have a local power source (such as a battery) and may operate hundreds of meters from the RFID reader. Unlike a barcode, the tag need not be within the line of sight of the reader, so it may be embedded in the tracked object. RFID is one method of automatic identification and data capture (AIDC). RFID tags are used in many industries. For example, an RFID tag attached to an automobile during production can be used to track its progress through the assembly line; RFID-tagged pharmaceuticals can be tracked through warehouses; and implanting RFID microchips in livestock and pets enables positive identification of animals. I2C: (Inter-Integrated Circuit), pronounced I-squared-C, is a synchronous, multi-master, multi-slave, packet switched, single-ended, serial computer bus invented in 1982 by Philips Semiconductor (now NXP Semiconductors). It is widely used for attaching lower-speed peripheral ICs to processors and microcontrollers in short-distance, intra-board communication. SPI: The Serial Peripheral Interface is a synchronous serial communication interface specification used for short- distance communication, primarily in embedded systems. The interface was developed by Motorola in the mid- 1980s and has become a de facto standard. Typical applications include Secure Digital cards and liquid crystal displays. SPI devices communicate in full duplex mode using a master-slave architecture with a single master. The master device originates the frame for reading and writing. Multiple slave-devices are supported through selection with individual slave select (SS) (sometimes called chip select (CS)) lines. Sometimes SPI is called a four-wire serial 48 bus, contrasting with three-, two-, and one-wire serial buses. The SPI may be accurately described as a synchronous serial interface, but it is different from the Synchronous Serial Interface (SSI) protocol, which is also a four-wire synchronous serial communication protocol. The SSI protocol employs differential signaling and provides only a single simplex communication channel. GPIO: A general-purpose input/output (GPIO) is an uncommitted digital signal pin on an integrated circuit or electronic circuit board whose behavior—including whether it acts as input or output—is controllable by the user at run time. GPIOs have no predefined purpose and are unused by default. If used, the purpose and behavior of a GPIO is defined and implemented by the designer of higher assembly-level circuitry: the circuit board designer in the case of integrated circuit GPIOs, or system integrator in the case of board-level GPIOs. ODBC: In computing, Open Database Connectivity (ODBC) is a standard application programming interface (API) for accessing database management systems (DBMS). The designers of ODBC aimed to make it independent of database systems and operating systems. An application written using ODBC can be ported to other platforms, both on the client and server side, with few changes to the data access code. ODBC accomplishes DBMS independence by using an ODBC driver as a translation layer between the application and the DBMS. The application uses ODBC functions through an ODBC driver manager with which it is linked, and the driver passes the query to the DBMS. An ODBC driver can be thought of as analogous to a printer driver or other driver, providing a standard set of functions for the application to use, and implementing DBMS-specific functionality. JDBC: Java Database Connectivity is an application programming interface (API) for the programming language Java, which defines how a client may access a database. It is a Java-based data access technology used for Java database connectivity. It is part of the Java Standard Edition platform, from Oracle Corporation. It provides methods to query and update data in a database, and is oriented towards relational databases. A JDBC-to-ODBC bridge enables connections to any ODBC-accessible data source in the Java virtual machine (JVM) host environment. SQL: or Structured Query Language is a domain-specific language used in programming and designed for managing data held in a relational database management system (RDBMS), or for stream processing in a relational data stream management system (RDSMS). It is particularly useful in handling structured data where there are relations between different entities/variables of the data. SQL offers two main advantages over older read/write APIs like ISAM or VSAM. First, it introduced the concept of accessing many records with one single command; and second, it eliminates the need to specify how to reach a record, e.g. with or without an index. JSON: In computing, JavaScript Object Notation (JSON) is an open-standard file format that uses human-readable text to transmit data objects consisting of attribute–value pairs and array data types (or any other serializable value). It is a very common data format used for asynchronous browser–server communication, including as a replacement for XML in some AJAX-style systems. JSON is a language-independent data format. It was derived from JavaScript, but as of 2017 many programming languages include code to generate and parse JSON-format data. The official Internet media type for JSON is application/json. JSON filenames use the extension .json. MongoDB: is a cross-platform document-oriented database program. Classified as a NoSQL database program, MongoDB uses JSON-like documents with schemata. MongoDB is developed by MongoDB Inc. and licensed under the Server-Side Public License (SSPL). Apache Hadoop: is a collection of open-source software utilities that facilitate using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for 49 distributed storage and processing of big data using the MapReduce programming model. Originally designed for computer clusters built from commodity hardware—still the common use—it has also found use on clusters of higher-end hardware. All the modules in Hadoop are designed with a fundamental assumption that hardware failures are common occurrences and should be automatically handled by the framework. The core of Apache Hadoop consists of a storage part, known as Hadoop Distributed File System (HDFS), and a processing part which is a MapReduce programming model. Hadoop splits files into large blocks and distributes them across nodes in a cluster. It then transfers packaged code into nodes to process the data in parallel. This approach takes advantage of data locality, where nodes manipulate the data they have access to. This allows the dataset to be processed faster and more efficiently than it would be in a more conventional supercomputer architecture that relies on a parallel file system where computation and data are distributed via high-speed networking. Apache Cassandra: is a free and open-source, distributed, wide column store, NoSQL database management system designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure. Cassandra offers robust support for clusters spanning multiple datacenters, with asynchronous masterless replication allowing low latency operations for all clients. InfluxDB: is an open-source time series database (TSDB) developed by InfluxData. It is written in Go and optimized for fast, high-availability storage and retrieval of time series data in fields such as operations monitoring, application metrics, Internet of Things sensor data, and real-time analytics. It also has support for processing data from Graphite. Message broker: A message broker (also known as an integration broker or interface engine) is an intermediary computer program module that translates a message from the formal messaging protocol of the sender to the formal messaging protocol of the receiver. Message brokers are elements in telecommunication or computer networks where software applications communicate by exchanging formally-defined messages. Authentication (from Greek: αὐθεντικός authentikos, "real, genuine", from αὐθέντης authentes, "author") is the act of confirming the truth of an attribute of a single piece of data claimed true by an entity. In contrast with identification, which refers to the act of stating or otherwise indicating a claim purportedly attesting to a person or thing's identity, authentication is the process of actually confirming that identity. In other words, authentication often involves verifying the validity of at least one form of identification. OAuth2 authorization protocol: OAuth is an open standard for access delegation, commonly used as a way for Internet users to grant websites or applications access to their information on other websites but without giving them the passwords. This mechanism is used by companies such as Amazon, Google, Facebook, Microsoft and Twitter to permit the users to share information about their accounts with third party applications or websites. Generally, OAuth provides to clients a "secure delegated access" to server resources on behalf of a resource owner. It specifies a process for resource owners to authorize third-party access to their server resources without sharing their credentials. Designed specifically to work with Hypertext Transfer Protocol (HTTP), OAuth essentially allows access tokens to be issued to third-party clients by an authorization server, with the approval of the resource owner. The third party then uses the access token to access the protected resources hosted by the resource server. OpenID Connect authentication layer: OpenID Connect (OIDC) is an authentication layer on top of OAuth 2.0, an authorization framework. The standard is controlled by the OpenID Foundation. 50 A public key certificate, also known as a digital certificate or identity certificate: is an electronic document used to prove the ownership of a public key. The certificate includes information about the key, information about the identity of its owner (called the subject), and the digital signature of an entity that has verified the certificate's contents (called the issuer). If the signature is valid, and the software examining the certificate trusts the issuer, then it can use that key to communicate securely with the certificate's subject. In email encryption, code signing, and e-signature systems, a certificate's subject is typically a person or organization. However, in Transport Layer Security (TLS) a certificate's subject is typically a computer or other device, though TLS certificates may identify organizations or individuals in addition to their core role in identifying devices. TLS is notable for being a part of HTTPS, a protocol for securely browsing the web. In a typical public-key infrastructure (PKI) scheme, the certificate issuer is a certificate authority (CA), usually a company that charges customers to issue certificates for them. By contrast, in a web of trust scheme, individuals sign each other's keys directly, in a format that performs a similar function to a public key certificate. X.509: is a standard defining the format of public key certificates. X.509 certificates are used in many Internet protocols, including TLS/SSL, which is the basis for HTTPS, the secure protocol for browsing the web. They are also used in offline applications, like electronic signatures. An X.509 certificate contains a public key and an identity (a hostname, or an organization, or an individual), and is either signed by a certificate authority or self-signed. When a certificate is signed by a trusted certificate authority, or validated by other means, someone holding that certificate can rely on the public key it contains to establish secure communications with another party, or validate documents digitally signed by the corresponding private key. X.509 also defines certificate revocation lists, which are a means to distribute information about certificates that have been deemed invalid by a signing authority, as well as a certification path validation algorithm, which allows for certificates to be signed by intermediate CA certificates, which are, in turn, signed by other certificates, eventually reaching a trust anchor. X.509 is defined by the International Telecommunications Union's Standardization sector (ITU-T), and is based on ASN.1, another ITU- T standard. Enterprise Service Bus: An enterprise service bus (ESB) implements a communication system between mutually interacting software applications in a service-oriented architecture (SOA). As it implements a distributed computing architecture, it implements a special variant of the more general client-server model, wherein, in general, any application using ESB can behave as server or client in turns. ESB promotes agility and flexibility with regard to high-level protocol communication between applications. The primary goal of the high-level protocol communication is enterprise application integration (EAI) of heterogeneous and complex service or application landscapes (a view from the network level). Batch processing: is a general term used for frequently used programs that are executed with minimum human interaction. Batch process jobs can run without any end-user interaction or can be scheduled to start up on their own as resources permit. A program that reads a large file and generates a report, for example, is considered to be a batch job. The term batch job originated in the days when punched cards contained the directions for a computer to follow when running one or more programs. Multiple card decks representing multiple jobs would often be stacked on top of one another in the hopper of a card reader, and be run in batches. Extract-transform-load (ETL): In computing, is the general procedure of copying data from one or more sources into a destination system which represents the data differently from the source(s). The ETL process became a popular concept in the 1970s and is often used in data warehousing. Data extraction involves extracting data from homogeneous or heterogeneous sources; data transformation processes data by data cleansing and transforming them into a proper storage format/structure for the purposes of querying and analysis; finally, data loading 51 describes the insertion of data into the final target database such as an operational data store, a data mart, or a data warehouse. A properly designed ETL system extracts data from the source systems, enforces data quality and consistency standards, conforms data so that separate sources can be used together, and finally delivers data in a presentation-ready format so that application developers can build applications and end users can make decisions. Kernel: is a computer program that is the core of a computer's operating system, with complete control over everything in the system. On most systems, it is one of the first programs loaded on start-up (after the bootloader). It handles the rest of start-up as well as input/output requests from software, translating them into data- processing instructions for the central processing unit. It handles memory and peripherals like keyboards, monitors, printers, and speakers. A kernel connects the application software to the hardware of a computer. The critical code of the kernel is usually loaded into a separate area of memory, which is protected from access by application programs or other, less critical parts of the operating system. The kernel performs its tasks, such as running processes, managing hardware devices such as the hard disk, and handling interrupts, in this protected kernel space. In contrast, everything a user does is in user space: writing text in a text editor, running programs in a GUI, etc. This separation prevents user data and kernel data from interfering with each other and causing instability and slowness, as well as preventing malfunctioning application programs from crashing the entire operating system. CLI: A command-line interface or command language interpreter, also known as command-line user interface, console user interface and character user interface (CUI), is a means of interacting with a computer program where the user (or client) issues commands to the program in the form of successive lines of text (command lines). A program which handles the interface is called a command language interpreter or shell (computing). Concurrency: refers to the ability of different parts or units of a program, algorithm, or problem to be executed out-of-order or in partial order, without affecting the final outcome. This allows for parallel execution of the concurrent units, which can significantly improve overall speed of the execution in multi-processor and multi-core systems. In more technical terms, concurrency refers to the decomposability property of a program, algorithm, or problem into order-independent or partially-ordered components or units. Persistence: in computer science, persistence refers to the characteristic of state that outlives the process that created it. This is achieved in practice by storing the state as data in computer data storage. Programs have to transfer data to and from storage devices and have to provide mappings from the native programming-language data structures to the storage device data structures. Picture editing programs or word processors, for example, achieve state persistence by saving their documents to files. List of figure's sources. Report Figure Page in the report Does it require getting permission for publishing? Figure 1. Simplified representation of the pilot that will be developed by CIAT. Blocks and arrows in dashed lines represent non-essential components for the first pilot according to CIAT’s requirements. 6 No Figure 2. End-to-end IoT solution from things to network to cloud. 8 Figure 3. Layer Architecture for secure end-to-end solutions. 8 Figure 4. Data flow for devices without native Internet connectivity. 10 Figure 5. Software components and interfaces for Intel's IoT reference architecture. 11 Figure 6. Detailed view of communications in Intel’s IoT architecture. 12 Figure 7. Data layer supports distributed analytics and control. 12 Figure 8. The management layer supervises endpoint devices. 13 Figure 9. AWS IoT service suite. 13 Figure 10. Software components in AWS IoT architecture. 15 Figure 11. Interactions between AWS IoT components. 16 Figure 12. IoT example with AWS. 16 Figure 13. Abstraction of Azure IoT architecture. 17 Figure 14. Azure IoT architecture with core subsystems only. 18 Figure 15. Azure IoT Architecture with core and optional subsystems. 18 Figure 16. Complete Azure IoT architecture. 19 Figure 17. Conceptual representation of device connectivity in Azure IoT architecture. 20 Figure 18. Google IoT reference architecture. 21 Figure 19. IoT reference architecture from IBM. 23 D1.1 Yes Figure 20. Mainflux location within an IoT application. 24 Figure 21. Mainflux architecture for IoT. Top: infrastructure stack. Bottom: software stack. 25 Figure 22. SiteWhere 2.0 microservices. 26 Figure 23. DeviceHive architecture. 27 Figure 24. IoT reference architecture based on Guth et al paper. 28 Table 1. IoT platform comparison summary from Guth et al paper. 28 Figure 25. IoT architecture for agro-industrial and environmental applications from Talavera et al. Paper. 29 Figure 26. FarmBeats system overview. 31 D1.1 Yes Owner of the material Consultant Google IBM Microsoft Amazon Intel SiteWhere DeviceHive Elsevier Usenix / Microsoft Mainflux. Springer URL N/A AWS IoT Services. Video (minute 1.23) https://aws.amazon.com/iot/ AWS IoT services for industrial, consumer, and commercial solutions. https://aws.amazon.com/iot/ How AWS IoT Works. https://docs.aws.amazon.com/iot/latest/developerguide/aws-iot-how-it-works.html AWS Webinar - Pushing Intelligence to the edge in industrial applications by Joseph Soricelli, Solutions Architect AWS. 16 Nov 2018. Google Cloud IoT. https://cloud.google.com/solutions/iot/?hl=en IoT reference architecture from IBM: https://www.ibm.com/cloud/garage/architectures/iotArchitecture/reference-architecture The Intel IoT Platform. Architecture White Paper Internet of Things (IoT). https://www.intel.la/content/www/xl/es/internet-of-things/white-papers/iot-platform-reference-architecture- paper.html Azure IoT Reference Architecture https://blogs.msdn.microsoft.com/wriju/2018/02/26/azure-iot-reference-architecture/ http://download.microsoft.com/download/A/4/D/A4DAD253-BC21-41D3-B9D9- 87D2AE6F0719/Microsoft_Azure_IoT_Reference_Architecture.pdf SiteWhere https://sitewhere.io/docs/2.0.0/platform/microservice-overview.html#microservice-structure DeviceHive. https://docs.devicehive.com/docs/devicehive-architecture Jesús Martín Talavera, Luis Eduardo Tobón, Jairo Alejandro Gómez, María Alejandra Culman, Juan Manuel Aranda, Diana Teresa Parra, Luis Alfredo Quiroz, Adolfo Hoyos, and Luis Ernesto Garreta. Review of IoT applications in agro-industrial and environmental fields. Computers and Electronics in Agriculture, 142, Part A:283-297, 2017. Deepak Vasisht, Zerina Kapetanovic, JongHoWon, Xinxin Jin, Ranveer Chandra, Sudipta Sinha, and Ashish Kapoor. Farmbeats: An iot platform for data-driven agriculture. In Networked Systems Design and Implementation (NSDI). USENIX, March 2017. https://www.usenix.org/system/files/conference/nsdi17/nsdi17-vasisht.pdf Mainflux. https://www.mainflux.com/index.html Jasmin Guth, Uwe Breitenbücher, Michael Falkenthal, Paul Fremantle, Oliver Kopp, Frank Leymann, and Lukas Reinfurt. A Detailed Analysis of IoT Platform Architectures: Concepts, Similarities, and Differences, pages 81-101. Springer, 2018. https://www.springerprofessional.de/en/a-detailed-analysis-of-iot-platform-architectures-concepts-