Information Storage and Management Version 3 Student Guide



Benzer belgeler
About QDenetim and QDestek (Sistem Danışmanlık)

Why SSD failed after abnormal power-off?

3 Important Pillars of a website : Sunita Network Pvt Ltd

Learn how to get started with Dropbox: Take your stuff anywhere. Send large files. Keep your files safe. Work on files together. Welcome to Dropbox!

Renice R-SATA Storage Solution for Aerospace and Defense Application

How to choose a reliable CF Card for VDR

Renice NFA100 NandFlash Test Platform

COIL SLITTING LINES COMBINATION OF CUT TO LENGTH & SLITTING LINE CUT TO LENGTH LINES

Fire Resistant Fiber Optic Cable, Fire proof fiber optic cable, Central Loose Tube Double LSZH Jacket Indoor/outdoor Type

Analysis of Energy Evaluation on Lighting Programs

ShenZhen Renice Technology Co., Ltd. PCIe M SSD. Datasheet V

Alternative Education Models By Sefa Sezer In English

Pre School Eduation Probems and School Literature Screening Main Lines - 1 İnönü Üniversitesi / Fırat Üniversitesi / Ardahan Üniversitesi / Siirt Üniversitesi by İngilizce Öğretmeni Sefa Sezer

. Condensing Water Heater Daxom. Industrial Type TS 615 EN 26

MACHMBR. Treatme nt Units for Sewage Water.

Get started with Google Drive

Background of Wear Leveling Algorithm

Erasmus+ KA2 (Sample Projects)

First Solar Series 4 PV Module ADVANCED THIN FILM SOLAR TECHNOLOGY

Renice Embedded Memory & Storage Solutions in Railway Application

Is the P/E cycle of MLC really only 3,000 times as its datasheet said?

MACHMEC. Mekanik Package Treatment Unit.

POWERSTRIPS WHERE DOES IT HURT? ANCIENT WISDOM MEETS NEW TECHNOLOGY

Aytok Automatic Self-Cleaning Filters

Within the largest 500 industrial companies of Turkey

MACHPRESS SCREWPRESS.

Interfacing a GPS Receiver with a Microprocessor via UART

Elektronik Mühendisliği ve Mekatronik Mühendisliği Projesi İngilizce Çeviri Projesi by İngilizce Öğretmeni Sefa Sezer

shown as follows: Fig. 2 Timing mode selection Fig. 4 Interrupt 1 Fig. 3 Interrupt 0

About the Author. He is married with four children. He is accessible through his personal website.

Management Science Letters

Evaluation of Energy Performance on Lighting by Using Dialux and Bep-Tr

ANALYSIS OF TURKISH PHARMACEUTICAL SECTOR. Author Mehmet ATASEVER

USING DAYLIGHT SYSTEMS TO REDUCE ENERGY CONSUMPTION DUE TO LIGHTING: A CASE STUDY OF YAŞAR UNIVERSITY CAFETERIA

MACHGRID GRID/SCREEN

Published by: PIONEER RESEARCH & DEVELOPMENT GROUP ( 33

EVALUATION OF TURKEY'S FOREIGN TRADE COMPOSITION AFTER 1980 ACCORDING TO THE COUNTRIES

VIRTUAL MUSEUM and REVIEW OF VIRTUAL MUSEUMS IN TURKEY. Research Assistant Arzu Cilasun, Yasar University, Department of Architecture

Remotus. Robust radio remote control for safety critical industrial applications ROBUST SWEDISH DESIGN SINCE 1918

Wall hung electric boiler DAXOM / Naviels. Electric Boiler. Comfort and Confidence Proper To Your Life

BC546/547/548/549/550

DETERMINATION OF SOME AGRICULTURAL PROPERTIES OF THE SOYBEAN LINES OBTAINED THE HYBRIDIZATION METHOD

12+ Interactive Sessions. 5+ Workshops. 5+ Keynote Lectures. 20+ Exhibitors. 50+ Plenary Lectures. 20 th Global Nephrologists Annual Meeting

Marketing and Financial Aspects With Micro Credit and Turkey Sample

Sepam IEC communication

General Terms and Conditions of Business of STRATO AG (Globe), Version 1.0

IMPACT OF SOCIAL MEDIA ON MARKETING:A VIEW ON SUCCESS STORIES Selçuk SERT & Utku KÖSE SOSYAL MEDYANIN PAZARLAMA ÜZERİNDEKİ ETKİSİ: BAŞARI HİKAYELERİNE BİR BAKIŞ

This article was checked by ithenticate. INFORMATION MANAGEMENT SYSTEM SERVICES FROM THE USER'S PERSPECTIVE: TARBIL

Concentrating Solar Power

The Role of Social Marketing in Creating Obesity Awareness and Its Effects on Life Quality

How to read the reliability of each MLC Nand chip from Page performance?

Graduate Certificate in Folklore Studies

Pre School Eduation Probems and School Literature Screening Main Lines - 2 İnönü Üniversitesi / Fırat Üniversitesi / Ardahan Üniversitesi / Siirt Üniversitesi by İngilizce Öğretmeni Sefa Sezer

SUN Filters,Kendi kendini temizleyen filtre,Otomatik geri yıkamalı filtre,Otomatik ters yıkamalı kum filtresi,

The Economy of Conflicts. Hasan ALPAGU. Keywords: Conflict, Economy, Integration, Evaluation, Security, Globalization, Change

The Conflict s Economy. Dr. Hasan ALPAGU. Keywords: Conflict, Economy, Integration, Evaluation, Security, Globalization, Change

Electric Boiler Daxom. Confort and Confidence Proper To Your Life

A Comparison of Performance Metrics of Turkish Twitter Messages Using Text Representations

Science and the Importance of the Scientific Approach

12+ Interactive Sessions. 5+ Workshops. 5+ Keynote Lectures. 20+ Exhibitors. Electrochemistry Plenary Lectures. Invitation.

12+ Interactive Sessions. 5+ Workshops. 5+ Keynote Lectures. 20+ Exhibitors. Electrochemistry Plenary Lectures. Invitation.

EFFECTS OF FERTILIZATION DIFFERENT LEVELS OF WATER SALINITY ON TOMATO PLANT GROWN

The Economy and the Power of Money

Daxom Plus / Daxom Hermetic Gas Water Heater

Fashion Phenomenon in Postmodern Marketing Applications and Effects on the Marketing Components

The aim of this thesis is to measure scientifically whether brand equity dimensions have any

İBRAHİM HAKKI ERZURUMİ. Published by the University of Michigan.

Just Enough ENGLISH GRAMMAR. Illustrated. Gabriele Stobbe

Energy Conversion and Management

TOWARD A NEW HORIZON IN DIGITAL MARKETING: SOCIAL MEDIA MARKETING - Naciye Güliz UĞUR - Merve TÜRKMEN BARUTÇU & DİJİTAL PAZARLAMADA YENİ BİR UFUK: SOSYAL MEDYA PAZARLAMA

Journal of Statistical Software

A Plan for Center City, Philadelphia. Daniel Schack, AICP, PTP

TABLE LIST 1. INTRODUCTION 1 2. CONCEPTUAL FRAMEWORK Foreign Language Teaching In Turkey 3

Katalog CLR-SWF-104 Fiber Switch

Improving Energy Efficiency

Fragility analysis of mid-rise R/C frame buildings

Performance Assessment of Solar-Based Hydrogen Production via H2SO4 Cycle

Clinical Nephrology. Theme: Exploring the new technologies in Clinical Nephrology. 20 th Edition of International Conference on

TRAINING PROBLEMS OF EDUCATION IN EDUCATION AND TEACHING LITERATURE SCREENING MAIN LINES. OTHER REASONS FOR PROBLEM (Impressions) Training Problems Of Education and Teaching Literature Screening Main Lines by English Teacher Sefa Sezer

CHAPTER 4 NICKEL-CADMIUM BATTERIES

Fiber Optic Connectivity. Copper Connectivity. Cable. Rack Cabinet

Wolf Safety Lamp Company. ATEX Explained. Ex Equipment

İDRİS YÜCEL AN OVERVIEW OF RELIGIOUS MEDICINE IN THE NEAR EAST: MISSION HOSPITALS OF THE AMERICAN BOARD IN ASIA MINOR ( )

Tourism, Cumalıkızık, rural tourism, tourist satisfaction, sustainability, one-way ANOVA

Evaluation of Istanbul Port in Cruise Tourism in Terms of Brand Value

International Journal of Health Sciences and Research ISSN:

THE AMERICAN SOCIETY OF MECHANICAL ENGINEERS 345 E. 47th St., New York, N.Y

English for Academic Research: Grammar Exercises

Composting with Worms Let Worms Eat Your Garbage

Security Walk Through Metal Detectors & Hand Held Metal Detectors by ThruScan - Elektral ECAC Comply manufacturer 40 years experience sqm factory area

Virtualmin'e Yeni Web Sitesi Host Etmek - Domain Eklemek

Proceedings ICABR 2015

SERVICE MANUAL. SC-T7000 series SC-T5000 series SC-T3000 series. Confidential. Large Format Color Inkjet Printer SEIJ12008

ERP 1100 Functional Emergency Stretcher

CASE OF ISTANBUL Yazar / Author:

The long and longing relations of the EU with its neighbors and the case of Turkey Why the collaboration works but integration does not work?

Teknoloji Servisleri; (Technology Services)

AB surecinde Turkiyede Ozel Guvenlik Hizmetleri Yapisi ve Uyum Sorunlari (Turkish Edition)

Silicon Heterojunction Technology Equipment. July 2016

Turks in Transoxiana. Richard N. Frye

Transkript:

Information Storage and Management Version 3 Student Guide Education Services August 2015

Welcome to Information Storage and Management Version 3. Copyright 2015 EMC Corporation. All Rights Reserved. Published in the USA. EMC believes the information in this publication is accurate as of its publication date. The information is subject to change without notice. THE INFORMATION IN THIS PUBLICATION IS PROVIDED AS IS. EMC CORPORATION MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY KIND WITH RESPECT TO THE INFORMATION IN THIS PUBLICATION, AND SPECIFICALLY DISCLAIMS IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Use, copying, and distribution of any EMC software described in this publication requires an applicable software license. The trademarks, logos, and service marks (collectively "Trademarks") appearing in this publication are the property of EMC Corporation and other parties. Nothing contained in this publication should be construed as granting any license or right to use any Trademark without the prior written permission of the party that owns the Trademark. EMC, EMC² AccessAnywhere Access Logix, AdvantEdge, AlphaStor, AppSync ApplicationXtender, ArchiveXtender, Atmos, Authentica, Authentic Problems, Automated Resource Manager, AutoStart, AutoSwap, AVALONidm, Avamar, Bus-Tech, Captiva, Catalog Solution, C-Clip, Celerra, Celerra Replicator, Centera, CenterStage, CentraStar, EMC CertTracker. CIO Connect, ClaimPack, ClaimsEditor, Claralert,cLARiiON, ClientPak, CloudArray, Codebook Correlation Technology, Common Information Model, Compuset, Compute Anywhere, Configuration Intelligence, Configuresoft, Connectrix, Constellation Computing, EMC ControlCenter, CopyCross, CopyPoint, CX, DataBridge, Data Protection Suite. Data Protection Advisor, DBClassify, DD Boost, Dantz, DatabaseXtender, Data Domain, Direct Matrix Architecture, DiskXtender, DiskXtender 2000, DLS ECO, Document Sciences, Documentum, DR Anywhere, ECS, elnput, E-Lab, Elastic Cloud Storage, EmailXaminer, EmailXtender, EMC Centera, EMC ControlCenter, EMC LifeLine, EMCTV, Enginuity, EPFM. eroom, Event Explorer, FAST, FarPoint, FirstPass, FLARE, FormWare, Geosynchrony, Global File Virtualization, Graphic Visualization, Greenplum, HighRoad, HomeBase, Illuminator, InfoArchive, InfoMover, Infoscape, Infra, InputAccel, InputAccel Express, Invista, Ionix, ISIS,Kazeon, EMC LifeLine, Mainframe Appliance for Storage, Mainframe Data Library, Max Retriever, MCx, MediaStor, Metro, MetroPoint, MirrorView, Multi-Band Deduplication,Navisphere, Netstorage, NetWorker, nlayers, EMC OnCourse, OnAlert, OpenScale, Petrocloud, PixTools, Powerlink, PowerPath, PowerSnap, ProSphere, ProtectEverywhere, ProtectPoint, EMC Proven, EMC Proven Professional, QuickScan, RAPIDPath, EMC RecoverPoint, Rainfinity, RepliCare, RepliStor, ResourcePak, Retrospect, RSA, the RSA logo, SafeLine, SAN Advisor, SAN Copy, SAN Manager, ScaleIO Smarts, EMC Snap, SnapImage, SnapSure, SnapView, SourceOne, SRDF, EMC Storage Administrator, StorageScope, SupportMate, SymmAPI, SymmEnabler, Symmetrix, Symmetrix DMX, Symmetrix VMAX, TimeFinder, TwinStrata, UltraFlex, UltraPoint, UltraScale, Unisphere, Universal Data Consistency, Vblock, Velocity, Viewlets, ViPR, Virtual Matrix, Virtual Matrix Architecture, Virtual Provisioning, Virtualize Everything, Compromise Nothing, Virtuent, VMAX, VMAXe, VNX, VNXe, Voyence, VPLEX, VSAM- Assist, VSAM I/O PLUS, VSET, VSPEX, Watch4net, WebXtender, xpression, xpresso, Xtrem, XtremCache, XtremSF, XtremSW, XtremIO, YottaYotta, Zero-Friction Enterprise Storage. Revision Date: August 2015 Revision Number: MR-1VP-ISMv3-1504 Module: Course Introduction 1

Information Storage and Management (ISM) is a unique course that provides a comprehensive understanding of the various storage infrastructure components in data center environments. It enables participants to make informed decisions on storage-related technologies in an increasingly complex IT environment, which is fast changing with the adoption of software-defined infrastructure management and third platform technologies (cloud, Big Data, social, and mobile technologies). It provides a strong understanding of storage technologies and prepares participants for advanced concepts, technologies, and processes. Participants will learn the architectures, features, and benefits of intelligent storage systems including block-based, filebased, object-based, and unified storage; software-defined storage; storage networking technologies such as FC SAN, IP SAN, and FCoE SAN; business continuity solutions such as backup and replication; the highly-critical areas of information security; and storage infrastructure management. This course takes an open-approach to describe all the concepts and technologies, which are further illustrated and reinforced with EMC-related product examples. Module: Course Introduction 2

Module: Course Introduction 3

Module: Course Introduction 4

Module: Course Introduction 5

Module: Course Introduction 6

Module: Course Introduction 7

This module focuses on digital data, the types of digital data, and information. This module also focuses on data center and its key characteristics. Further this module focuses on the key data center management processes. Finally, this module focuses on the evolution of computing platforms. Module 1: Introduction to Information Storage 1

We live in a digital universe a world that is created and defined by software. A massive amount of digital data is continuously generated, collected, stored, and analyzed through software in the digital universe. According to the 2014 Digital Universe Study conducted by International Data Corporation (IDC), it is estimated that the digital universe produces approximately 4.4 trillion gigabytes (GB) of data annually, which is doubling every two years. By these estimates, it is projected that by the year 2020, the digital universe will expand to 44 trillion GB of data. The data in the digital universe comes from diverse sources, including individuals living and working online, organizations employing information technology (IT) to run their businesses, and from a variety of smart electronic devices connected to the Internet. In organizations, the volume and importance of information for business operations continue to grow at astounding rates. Individuals constantly generate and consume information through numerous activities, such as web searches, e-mails, uploading and downloading content and sharing media files. The rapid proliferation of online social networking and Internet-enabled smartphones and tablets has also contributed significantly to the growth of the digital universe. The advent of the Internet of Things (IoT) is also gradually adding to the growth of the digital universe. The IoT is a technology trend wherein smart devices with embedded electronics, software, and sensors exchange data with other devices over the Internet. Examples of such devices are wearable gadgets smartwatches and fitness activity trackers; electronic sensors temperature sensors and heart monitoring implants; and household appliances televisions, thermostats, and lighting. The IoT has vast applications and is driving the development of several innovative technology solutions. Some application areas include weather monitoring remote monitoring and analysis of temperature and atmospheric conditions; healthcare health monitoring devices can enable doctors to remotely monitor patients and be notified in case of emergencies; and infrastructure management technicians can remotely monitor equipment and proactively schedule repair activities for maintenance crews. Module 1: Introduction to Information Storage 2

Organizations have become increasingly information-dependent in the twenty-first century, and information must be available whenever and wherever it is required. It is critical for users and applications to have continuous, fast, reliable, and secure access to information for business operations to run as required. Some examples of such organizations and processes include banking and financial institutions, government departments, online retailers, airline reservations, billing and transaction processing, social networks, stock trading, scientific research, and healthcare. It is essential for organizations to store, protect, process, and manage information in an efficient and cost-effective manner. Legal, regulatory, and contractual obligations regarding the availability, retention, and protection of data further add to the challenges of storing and managing information. Organizations also face newer challenges in the form of requirement to extract value from the information generated in the digital universe. Information can be leveraged to identify opportunities to transform and enhance businesses and gain a competitive edge. For example, an online retailer may need to identify the preferred product types and brands of customers by analyzing their search, browsing, and purchase patterns. The retailer can then maintain a sufficient inventory of popular products, and also advertise relevant products to the existing and potential customers. Furthermore, the IoT is expected to lead to new consumer and business behavior in the coming years creating new business opportunities. To meet all these requirements and more, organizations are increasingly undertaking digital transformation initiatives to implement intelligent storage solutions. These solutions not only enable efficient and optimized storage and management of information, but also enable extraction of value from information to derive new business opportunities, gain a competitive advantage, and create new sources of revenue. Module 1: Introduction to Information Storage 3

A generic definition of data is that it is a collection of facts, typically collected for the purpose of analysis or reference. Data can exist in a variety of forms such as facts stored in a person's mind, photographs and drawings, alphanumeric text and images in a book, a bank ledger, and tabled results of a scientific survey. Originally, data is the plural form of datum. However, data is now generally treated as a singular or mass noun representing a collection of facts and figures. This is especially true when referring to digital data. In computing, digital data is a collection of facts that is transmitted and stored in electronic form, and processed through software. Digital data is generated by various devices, such as desktops, laptops, tablets, mobile phones, and electronic sensors. It is stored as strings of binary values (0s and 1s) on a storage medium that is either internal or external to the devices generating or accessing the data. The storage devices may be of different types, such as magnetic, optical, or solid state storage devices. Examples of digital data are electronic documents, text files, e-mails, e-books, digital images, digital audio, and digital video. Module 1: Introduction to Information Storage 4

Based on how it is stored and managed, digital data can be broadly classified as either structured data or unstructured data. Structured data is organized in fixed fields within a record or file. For data to be structured, a data model is required. A data model specifies the format for organizing data, and also specifies how different data elements are related to each other. For example, in a relational database, data is organized in rows and columns within named tables. Semi-structured data does not have a formal data model but has an apparent, self-describing pattern and structure that enable its analysis. Examples of semi-structured data include spreadsheets that have a row and column structure, and XML files that are defined by an XML schema. Quasistructured data consists of textual data with erratic data formats, and can be formatted with effort, software tools, and time. An example of quasi-structured data is a clickstream or clickpath that includes data about which webpages a user visited and in what order which is the result of the successive mouse clicks the user made. A clickstream shows when a user entered a website, the pages viewed, the time spent on each page, and when the user exited. Unstructured data does not have a data model and is not organized in any particular format. Some examples of unstructured data include text documents, PDF files, e-mails, presentations, images, and videos. As indicated by the figure on the slide, the majority, which is more than 90 percent, of the data generated in the digital universe today is non-structured data (semi-, quasi-, and unstructured). Although the figure shows four different and separate types of data, in reality a mixture of these is typically generated. For instance, in a call center for customer support of a software product, a classic relational database management system (RDBMS) may store call logs with structured data such as date/time stamps, machine types, and problem type entered by the support desk person. In addition, there may be unstructured or semi-structured data, such as an e-mail ticket of the problem, call log information, or the actual call recording. Module 1: Introduction to Information Storage 5

The terms data and information are closely related and it is common for the two to be used interchangeably. However, it is important to understand the difference between the two. Data, by itself, is simply a collection of facts that needs to be processed for it to be useful. For example a set of annual sales figures of an organization is data. When data is processed and presented in a specific context it can be interpreted in a useful manner. This processed and organized data is called information. For example, when the annual sales data is processed into a sales report, it provides useful information, such as the average sales for a product (indicating product demand and popularity), and a comparison of the actual sales to the projected sales. Information thus creates knowledge and enables decision-making. As discussed previously, processing and analyzing data is vital to any organization. It enables organizations to derive value from data, and create intelligence to enable decision-making and organizational effectiveness. It is easier to process structured data due to its organized form. On the other hand, processing non-structured data and extracting information from it using traditional applications is difficult, time-consuming, and requires considerable resources. New architectures, technologies, and techniques (described in Module 2, Third Platform Technologies ) have emerged that enable storing, managing, analyzing, and deriving value from unstructured data coming from numerous sources. Module 1: Introduction to Information Storage 6

In a computing environment, storage devices (or simply storage ) are devices consisting of nonvolatile recording media on which information can be persistently stored. Storage may be internal (for example, internal hard drive), removable (for example, memory cards), or external (for example, magnetic tape drive) to a compute system. Based on the nature of the storage media used, storage devices can be broadly classified as given below: Magnetic storage devices: For example, hard disk drive and magnetic tape drive. Optical storage devices: For example, Blu-ray, DVD, and CD. Flash-based storage devices: For example, solid state drive (SSD), memory card, and USB thumb drive (or pen drive). Storage is a core component in an organization s IT infrastructure. Various factors such as the media, architecture, capacity, addressing, reliability, and performance influence the choice and use of storage devices in an enterprise environment. For example, disk drives and SSDs are used for storing business-critical information that needs to be continuously accessible to applications; whereas, magnetic tapes and optical storage are typically used for backing up and archiving data. The different types of storage devices are covered in Module 3, Data Center Environment. In enterprise environments, information is typically stored on storage systems (or storage arrays ). A storage system is a hardware component that contains a group of homogeneous/heterogeneous storage devices assembled within a cabinet. These enterprise-class storage systems are designed for high capacity, scalability, performance, reliability, and security to meet business requirements. The compute systems that run business applications are provided storage capacity from storage systems. Storage systems are covered in Module 4, Intelligent Storage Systems (ISS). Organizations typically house their IT infrastructure, including compute systems, storage systems, and network equipment within a data center. Module 1: Introduction to Information Storage 7

A data center is a dedicated facility where an organization houses, operates, and maintains backend IT infrastructure including compute systems, storage systems, and network equipment along with other supporting infrastructure. A data center centralizes an organization s IT equipment and data-processing operations, and is vital for carrying out business operations. A data center typically comprises the following: Facility: It is the building and floor space where the data center is constructed. It typically has a raised floor with ducts underneath holding power and network cables. IT equipment: It includes equipment such as compute systems, storage systems, network equipment and cables, and cabinets for housing the IT equipment. Support infrastructure: It includes all the equipment necessary to securely sustain the functioning of the data center. Some key support equipment are power equipment including uninterruptible power sources, and power generators; environmental control equipment including fire and water detection systems, heating, ventilation, and air conditioning (HVAC) systems; and security systems including biometrics, keycard, and video surveillance systems. An organization may build a data center to provide open access to applications over the Internet, or for privately executing business applications within its operational environment. A data center may be constructed in-house and located in an organization s own facility, or it may be outsourced, with equipment being located at a third-party site. Large organizations often maintain multiple data centers to distribute data-processing workloads and for disaster recovery. Organizations are increasingly focusing on energy-efficient technologies and efficient management practices to reduce the energy consumption of data centers and lessen the impact on the environment. Such data centers are called as green data centers. Module 1: Introduction to Information Storage 8

Data centers are designed and built to fulfill the key characteristics shown in the figure on the slide. Although the characteristics are applicable to almost all data center components, the discussion here primarily focuses on storage systems. Availability: Availability of information as and when required should be ensured. Unavailability of information can severely affect business operations, lead to substantial financial losses, and damage the reputation of an organization. Security: Policies and procedures should be established, and control measures should be implemented to prevent unauthorized access to and alteration of information. Capacity: Data center operations require adequate resources to efficiently store and process large and increasing amounts of data. When capacity requirements increase, additional capacity should be provided either without interrupting the availability or with minimal disruption. Capacity may be managed by adding new resources or by reallocating existing resources. Scalability: Organizations may need to deploy additional resources such as compute systems, new applications, and databases to meet the growing requirements. Data center resources should scale to meet the changing requirements, without interrupting business operations. Performance: Data center components should provide optimal performance based on the required service levels. Data integrity: Data integrity refers to mechanisms, such as error correction codes or parity bits, which ensure that data is stored and retrieved exactly as it was received. Manageability: A data center should provide easy, flexible, and integrated management of all its components. Efficient manageability can be achieved through automation for reducing manual intervention in common, repeatable tasks. Module 1: Introduction to Information Storage 9

The activities carried out to ensure the efficient functioning of a data center can be broadly categorized under the following key management processes: Monitoring: It is a continuous process of gathering information on various resources in the data center. The process involves monitoring parameters such as configuration, availability, capacity, performance, and security of resources. Reporting: It is a process of collating and presenting the monitored parameters such as resource performance, capacity, and utilization of resources. Reporting enables data center managers to analyze and improve the utilization of data center resources and identify problems. It also helps in establishing business justifications and chargeback of costs associated with data center operations. Provisioning: It is the process of configuring and allocating the resources that are required to carry out business operations. For example, compute systems are provisioned to run applications and storage capacity is provisioned to a compute system. Provisioning primarily includes resource management activities to meet capacity, availability, performance, and security requirements. Planning: It is a process of estimating the amount of IT resources required to support business operations and meet the changing resource requirements. Planning leverages the data collected during monitoring and enables improving the overall utilization and performance of resources. It also enables estimation of future resource requirements. Data center managers also determine the impact of incidents and devise contingency plans to resolve them. Maintenance: It is a set of standard repeatable activities for operating the data center. It involves ensuring the proper functioning of resources and resolving incidents such as malfunctions, outages, and equipment loss. It also involves handling identified problems or issues within the data center and incorporating changes to prevent future problem occurrence. Module 1: Introduction to Information Storage 10

In general, the term platform refers to hardware and software that are associated with a particular computing architecture deployed in a data center. Computing platforms evolve and grow with advances and changes in technology. The figure on the slide displays the three computing platforms of IT growth as specified by IDC. The first platform (or Platform 1) dates back to the dawn of computing and was primarily based on mainframes and terminals. The second platform (or Platform 2) emerged with the birth of the personal computer (PC) in the 1980s and was defined by the client-server model, Ethernet, RDBMSs, and web applications. The third platform (or Platform 3) of today comprises cloud, Big Data, mobile, and social technologies. Each computing platform is defined not so much by the comprising technologies but by the scale of users and the scope of applications the technologies enable. The first platform supported millions of users, with applications and solutions in the low thousands. The second platform supported hundreds of millions of users and tens of thousands of applications. The third platform is already supporting a user base of billions and has millions of applications and solutions. This is evident from the fact that over 2.4 billion people (~36 percent of the world's population) are currently connected to the Internet (more than half of them through mobile devices), and that there are over one million applications available for ios and Android devices alone. Module 1: Introduction to Information Storage 11

Mainframes are compute systems with very large processing power, memory, and storage capacity and are primarily used for centrally hosting mission-critical applications and databases in an organization s data center. Multiple users simultaneously connect to mainframes through lesspowerful devices, such as workstations or terminals. All processing is performed on the mainframe, while the terminals only provide an interface to use the applications and view results. Although mainframes offer high reliability and security, there are several cost concerns associated with them. Mainframes have high acquisition costs, and considerable floor space and energy requirements. Deploying mainframes in a data center may involve substantial capital expense (CAPEX) and operating expense (OPEX). Historically, large organizations such as banks, insurance agencies, and government departments have used mainframes to run their business operations. Module 1: Introduction to Information Storage 12

The client-server model uses a distributed application architecture, in which a compute system called server runs a program that provides services over a network to other programs running on various end-point devices called clients. Server programs receive requests for resources from client programs and in response to the requests, the clients receive access to resources, such as e-mail applications, business applications, web applications, databases, files, and printers. Client devices can be desktops, laptops, and mobile devices. Clients typically communicate with servers over a LAN or WAN, with users making use of either a client application or a web interface on a browser. In the client-server model, both the clients and the servers may have distinct processing tasks that they routinely perform. For example, a client may run the business application while the server may run the database management system (DBMS) to manage storage and retrieval of information to and from a database. This is called a two-tier architecture. Alternatively, a client may use an application or web interface to accept information while the server runs another application that processes the information and sends the data to a second server that runs the DBMS. This is called the three-tier architecture. This distributed application architecture can be extended to any number of tiers (n-tier architecture). Because both client and server systems are intelligent devices, the client-server model is completely different from the mainframe model. The figure on the slide shows an example of the client-server model. In the example, clients interact with the web server using a web browser. The web server processes client requests through HTTP and delivers HTML pages. The application server hosts a business application and the database server hosts a DBMS. The clients interact with the application server through client software. The application server communicates with the database server to retrieve information and provide results to the clients. In some implementations, applications and databases may even be hosted on the same server. (Cont d) Module 1: Introduction to Information Storage 13

Some challenges with the client-server model are associated with creation of IT silos, maintenance overhead, and scalability issues. In organizations, it is common for business units/departments to have their own servers running business applications. This leads to the creation of application and information silos (individual, disparate systems). Silos make it difficult to efficiently utilize or share IT resources, and are challenging to manage and integrate. Though the cost of server hardware is considerably less than mainframes, there is still a significant OPEX involved in maintenance of multiple servers and clients, and the software running on them. Furthermore, in this model, it is challenging to meet today s rapid growth in users, information, and applications workloads. Adding more servers does not necessarily lead to better workload management. It is also necessary to optimally distribute processing and application logic across servers and application instances. Note: In general, a compute system is a device with an operating system (OS) that runs applications. Physical servers, hosts, desktops, laptops, and mobile devices are examples of compute systems. In this course, the term compute system or compute is used to refer to physical servers and hosts on which business applications of an organization are deployed. Module 1: Introduction to Information Storage 14

The term third platform was coined by IDC, and Gartner refers to the same as a nexus of forces. The third platform is built on a foundation of cloud, Big Data, mobile, and social technologies. These are the four major disruptive technologies that are significantly transforming businesses, economies, and lives globally. At its core, the third platform has the cloud that enables a consumer to provision IT resources as a service from a cloud provider. Big Data enables analytics that create deeper insights from data for improved decision-making. Mobile devices enable pervasive access to applications and information. Social technologies connect individuals, and enable collaboration and information exchange. Over the past three decades, it was essential for organizations to intelligently leverage the second platform for their businesses. According to IDC, over the next three decades, the third platform will represent the basis for solution development and business innovation. The third platform is being used for the digital transformation, evolution, and expansion of all industries and for developing major new sources of competitive advantage. Business strategists, IT leaders, and solution developers are already building disruptive new business models and consumer services around third platform technologies. Third platform technologies are an enhancement of second platform technologies rather than a substitution. A key aspect of third platform is that it is a convergence of cloud, Big Data, mobile, and social technologies and not just each technology taken in isolation. The real key is combining two or more of the technologies to create high-value industry solutions known as mashups. For example, some of the top drivers of cloud include social and mobile solutions. This means that organizations already see the greatest value in solutions that are mashups across all four technologies. The combinations of third platform technologies are already transforming organizations such as retail, financial services, government departments, telecommunications, and healthcare. (Cont d) Module 1: Introduction to Information Storage 15

According to IDC, it is estimated that currently over 80 percent of the infrastructure and applications in most data centers belong to the second platform. Second platform technologies also currently account for 74 percent of worldwide IT spending. This means that for organizations that have a significant investment in second platform technologies, an immediate and complete shift to the third platform may not be cost-effective and practical. This has led to an intermediate computing platform called Platform 2.5, between the second and third platforms. Platform 2.5 includes the solutions and technologies that enable organizations to bridge the gap between the second and third platforms. Platform 2.5 technologies enable organizations to use a combination of second and third platform technologies. Organizations would be able to deliver second platform applications and build third platform outcomes without duplicating and moving data. For example, platform 2.5 technologies would allow an organization to run second platform applications using traditional data structures and protocols, while enabling the same data to be leveraged for analytics using Big Data technologies. IDC predicts that future global IT spending will primarily focus on segments such as wireless data, smartphones and tablets, cloud services, Big Data analytics, and IoT. This spending is estimated to be in the hundreds of billions of dollars in each of the segments. This indicates the growing industry trend towards the large-scale adoption of third platform technologies. It is estimated that by 2020 third platform technologies would account for over 40 percent of IT spending. Module 2 covers third platform technologies. Module 1: Introduction to Information Storage 16

This module covered digital data, the types of digital data, and information. This module also covered data center and its key characteristics. Further, this module covered the key data center management processes. Finally, this module covered the evolution of computing platforms. Module 1: Introduction to Information Storage 17

Module 1: Introduction to Information Storage 18

This module focuses on the four technologies that make up the third platform, namely cloud, Big Data, social, and mobile technologies. This module also focuses on cloud computing and its essential characteristics. Then, this module focuses on cloud service models and cloud deployment models. Additionally, this module focuses on Big Data analytics. Further, this module focuses on social networking and mobile computing. Lastly, this module focuses on the key characteristics of third platform infrastructure and the key imperatives for transforming to the third platform. Module 2: Third Platform Technologies 1

This lesson covers the definition of cloud computing and the essential characteristics of cloud computing. This lesson also covers cloud service models and cloud deployment models. Module 2: Third Platform Technologies 2

The National Institute of Standards and Technology (NIST) a part of the U.S. Department of Commerce in its Special Publication 800-145 defines cloud computing as a model for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction. The term cloud originates from the cloud-like bubble that is commonly used in technical architecture diagrams to represent a system, such as the Internet, a network, or a compute cluster. In cloud computing, a cloud is a collection of IT resources, including hardware and software resources that is deployed either in a single data center, or across multiple geographically-dispersed data centers that are connected over a network. A cloud infrastructure is built, operated, and managed by a cloud service provider. The cloud computing model enables consumers to hire IT resources as a service from a provider. A cloud service is a combination of hardware and software resources that are offered for consumption by a provider. The cloud infrastructure contains IT resource pools, from which resources are provisioned to consumers as services over a network, such as the Internet or an intranet. Resources return to the pool when released by consumers. Module 2: Third Platform Technologies 3

The cloud model is similar to utility services such as electricity, water, and telephone. When consumers use these utilities, they are typically unaware of how the utilities are generated or distributed. The consumers periodically pay for the utilities based on usage. Similarly, in cloud computing, the cloud is an abstraction of an IT infrastructure. Consumers simply hire IT resources as services from the cloud without the risks and costs associated with owning the resources. Cloud services are accessed from different types of client devices over wired and wireless network connections. Consumers pay only for the services that they use, either based on a subscription or based on resource consumption. When organizations use cloud services, their IT infrastructure management tasks are reduced to managing only those resources that are required to access the cloud services. The cloud infrastructure is managed by the provider, and tasks such as software updates and renewals are also handled by the provider. The figure on the slide illustrates a generic cloud computing environment. The figure on the slide illustrates a generic cloud computing environment. The cloud provides various types of hardware and software services that are accessed by consumers from different types of client devices over wired and wireless network connections. The figure includes some virtual components for relevance and accuracy. Virtualization will be introduced in Module 3, Data Center Environment and covered in detail in relevant sections of the later modules. Module 2: Third Platform Technologies 4

In SP 800-145, NIST specifies that a cloud infrastructure should have the five essential characteristics described below: On-demand self-service: A consumer can unilaterally provision computing capabilities, such as server time or networked storage, as needed automatically without requiring human interaction with each service provider. NIST Broad network access: Capabilities are available over the network and accessed through standard mechanisms that promote use by heterogeneous thin or thick client platforms (e.g., mobile phones, tablets, laptops, and workstations). NIST Resource pooling: The provider s computing resources are pooled to serve multiple consumers using a multi-tenant model, with different physical and virtual resources dynamically assigned and reassigned according to consumer demand. There is a sense of location independence in that the customer generally has no control or knowledge over the exact location of the provided resources but may be able to specify location at a higher level of abstraction (e.g., country, state, or datacenter). Examples of resources include storage, processing, memory, and network bandwidth. NIST Rapid elasticity: Capabilities can be rapidly and elastically provisioned, in some cases automatically, to scale rapidly outward and inward commensurate with demand. To the consumer, the capabilities available for provisioning often appear to be unlimited and can be appropriated in any quantity at any time. NIST Measured service: Cloud systems automatically control and optimize resource use by leveraging a metering capability at some level of abstraction appropriate to the type of service (e.g., storage, processing, bandwidth, and active user accounts). Resource usage can be monitored, controlled, and reported, providing transparency for both the provider and consumer of the utilized service. NIST Module 2: Third Platform Technologies 5

A cloud service model specifies the services and the capabilities that are provided to consumers. In SP 800-145, NIST classifies cloud service offerings into the three primary models listed below: Infrastructure as a Service (IaaS) Platform as a Service (PaaS) Software as a Service (SaaS) Cloud administrators or architects assess and identify potential cloud service offerings. The assessment includes evaluating the services to be created and upgraded, the necessary feature set for each service, and the service level objectives (SLOs) of each service aligned to consumer needs and market conditions. SLOs are specific measurable characteristics such as availability, throughput, frequency, and response time. They provide a measurement of performance of the service provider. SLOs are key elements of a service level agreement (SLA), which is a legal document that describes items such as what service level will be provided, how it will be supported, service location, and the responsibilities of the consumer and the provider. Note: Many alternate cloud service models based on IaaS, PaaS, and SaaS are defined in various publications and by different industry groups. These service models are specific to the cloud services and capabilities that are provided. Examples of such service models are Backup as a Service (BaaS), Desktop as a Service (DaaS), Test Environment as a service (TEaaS), and Disaster Recovery as a Service (DRaaS). However, these models eventually belong to one of the three primary cloud service models. Module 2: Third Platform Technologies 6

Infrastructure as a Service: The capability provided to the consumer is to provision processing, storage, networks, and other fundamental computing resources where the consumer is able to deploy and run arbitrary software, which can include operating systems and applications. The consumer does not manage or control the underlying cloud infrastructure but has control over operating systems, storage, and deployed applications; and possibly limited control of select networking components (for example, host firewalls). NIST IaaS pricing may be subscription-based or based on resource usage. The provider pools the underlying IT resources and they are typically shared by multiple consumers through a multitenant model. IaaS can even be implemented internally by an organization, with internal IT managing the resources and services. Module 2: Third Platform Technologies 7

Platform as a Service: The capability provided to the consumer is to deploy onto the cloud infrastructure consumer-created or acquired applications created using programming languages, libraries, services, and tools supported by the provider. The consumer does not manage or control the underlying cloud infrastructure including network, servers, operating systems, or storage, but has control over the deployed applications and possibly configuration settings for the application-hosting environment. NIST In the PaaS model, a cloud service includes compute, storage, and network resources along with platform software. Platform software includes software such as OS, database, programming frameworks, middleware, and tools to develop, test, deploy, and manage applications. Most PaaS offerings support multiple operating systems and programming frameworks for application development and deployment. PaaS usage fees are typically calculated based on factors, such as the number of consumers, the types of consumers (developer, tester, and so on), the time for which the platform is in use, and the compute, storage, or network resources consumed by the platform. Module 2: Third Platform Technologies 8

Software as a Service: The capability provided to the consumer is to use the provider s applications running on a cloud infrastructure. The applications are accessible from various client devices through either a thin client interface, such as a web browser (for example, web-based email), or a program interface. The consumer does not manage or control the underlying cloud infrastructure including network, servers, operating systems, storage, or even individual application capabilities, with the possible exception of limited user-specific application configuration settings. NIST In the SaaS model, a provider offers a cloud-hosted application to multiple consumers as a service. The consumers do not own or manage any aspect of the cloud infrastructure. In SaaS, a given version of an application, with a specific configuration (hardware and software) typically provides service to multiple consumers by partitioning their individual sessions and data. SaaS applications execute in the cloud and usually do not need installation on end-point devices. This enables a consumer to access the application on demand from any location and use it through a web browser on a variety of end-point devices. Some SaaS applications may require a client interface to be locally installed on an end-point device. Customer Relationship Management (CRM), email, Enterprise Resource Planning (ERP), and office suites are examples of applications delivered through SaaS. Module 2: Third Platform Technologies 9

A cloud deployment model provides a basis for how cloud infrastructure is built, managed, and accessed. In SP 800-145, NIST specifies the four primary cloud deployment models listed below: Public cloud Private cloud Hybrid cloud Community cloud Each cloud deployment model may be used for any of the cloud service models: IaaS, PaaS, and SaaS. The different deployment models present a number of tradeoffs in terms of control, scale, cost, and availability of resources. Module 2: Third Platform Technologies 10

Public cloud: The cloud infrastructure is provisioned for open use by the general public. It may be owned, managed, and operated by a business, academic, or government organization, or some combination of them. It exists on the premises of the cloud provider. NIST Public cloud services may be free, subscription-based, or provided on a pay-per-use model. A public cloud provides the benefits of low up-front expenditure on IT resources and enormous scalability. However, some concerns for the consumers include network availability, risks associated with multi-tenancy, visibility and control over the cloud resources and data, and restrictive default service levels. Module 2: Third Platform Technologies 11

Private cloud: The cloud infrastructure is provisioned for exclusive use by a single organization comprising multiple consumers (for example, business units). It may be owned, managed, and operated by the organization, a third party, or some combination of them, and it may exist on or off premises. NIST Many organizations may not wish to adopt public clouds due to concerns related to privacy, external threats, and lack of control over the IT resources and data. When compared to a public cloud, a private cloud offers organizations a greater degree of privacy and control over the cloud infrastructure, applications, and data. There are two variants of private cloud: on-premise and externally-hosted, as shown in figure 1 and figure 2 respectively on the slide. The on-premise private cloud is deployed by an organization in its data center within its own premises. In the externally-hosted private cloud (or off-premise private cloud) model, an organization outsources the implementation of the private cloud to an external cloud service provider. The cloud infrastructure is hosted on the premises of the provider and may be shared by multiple tenants. However, the organization s private cloud resources are securely separated from other cloud tenants by access policies implemented by the provider. Module 2: Third Platform Technologies 12

Community cloud: The cloud infrastructure is provisioned for exclusive use by a specific community of consumers from organizations that have shared concerns (for example, mission, security requirements, policy, and compliance considerations). It may be owned, managed, and operated by one or more of the organizations in the community, a third party, or some combination of them, and it may exist on or off premises. NIST The organizations participating in the community cloud typically share the cost of deploying the cloud and offering cloud services. This enables them to lower their individual investments. Since the costs are shared by a fewer consumers than in a public cloud, this option may be more expensive. However, a community cloud may offer a higher level of control and protection than a public cloud. As with the private cloud, there are two variants of a community cloud: on-premise and externally-hosted. In an on-premise community cloud, one or more organizations provide cloud services that are consumed by the community. The cloud infrastructure is deployed on the premises of the organizations providing the cloud services. The organizations consuming the cloud services connect to the community cloud over a secure network. The figure on the slide illustrates an example of an on-premise community cloud. Module 2: Third Platform Technologies 13

In the externally-hosted community cloud model, the organizations of the community outsource the implementation of the community cloud to an external cloud service provider. The cloud infrastructure is hosted on the premises of the provider and not within the premises of any of the participant organizations. The provider manages the cloud infrastructure and facilitates an exclusive community cloud environment for the organizations. The IT infrastructure of each of the organizations connects to the externally-hosted community cloud over a secure network. The cloud infrastructure may be shared with multiple tenants. However, the community cloud resources are securely separated from other cloud tenants by access policies implemented by the provider. Module 2: Third Platform Technologies 14

Hybrid cloud: The cloud infrastructure is a composition of two or more distinct cloud infrastructures (private, community, or public) that remain unique entities, but are bound by standardized or proprietary technology that enables data and application portability (for example, cloud bursting for load balancing between clouds.) NIST The figure on the slide illustrates a hybrid cloud that is composed of an on-premise private cloud deployed by enterprise P, and a public cloud serving enterprise and individual consumers in addition to enterprise P. Module 2: Third Platform Technologies 15

The hybrid cloud has become the model of choice for many organizations. Some use cases of the hybrid cloud model are described below. Cloud bursting: Cloud bursting is a common usage scenario of a hybrid cloud. In cloud bursting, an organization uses a private cloud for normal workloads, but optionally accesses a public cloud to meet transient higher workload requirements. For example, an application can get additional resources from a public cloud for a limited time period to handle a transient surge in workload. Web application hosting: An organization may use the hybrid cloud model for web application hosting. The organization may host mission-critical applications on a private cloud, while less critical applications are hosted on a public cloud. By deploying less critical applications in the public cloud, an organization can leverage the scalability and cost benefits of the public cloud. For example, e-commerce applications use public-facing web assets outside the firewall and can be hosted in the public cloud. Packaged applications: An organization may also migrate standard packaged applications, such as email and collaboration software out of the private cloud to a public cloud. This frees up internal IT resources for higher value projects and applications. Application development and testing: An organization may also use the hybrid cloud model for application development and testing. An application can be tested for scalability and under heavy workload using public cloud resources, before incurring the capital expense associated with deploying it in a production environment. Once the organization establishes a steady-state workload pattern and the longevity of the application, it may choose to bring the application into the private cloud environment. Module 2: Third Platform Technologies 16