Why Big Data needs DevOps in 2018

Why Big Data needs DevOps

DevOps aims at short-term development stages, many implementing stages and many types of trust, along with a relatively close balance of business objectives.

For example, in the healthcare industry, many projects today require or require Big Data to be changed quickly, and must be quickly published (actual time) to carry a consumable form because interested parties.

What is the Big Data?

The great data is made up of complex and complex data. In fact, the traditional cultural codecs software is not enough to deal with it.

Critical Information Constraints include information processing, data storage, data analysis, search, exchange, transfer, viewing, advising, updating and personal information.

The great data includes five units called volume, range, speed and value added and accuracy.

In order to increase the speed at which you must compile data from various data sources: mainframes, database management (RDBMS) and flat files in countries of the cluster Hadoop, a collection of open source software, where it should be. is changed and published: it is necessary to adopt the Integration and Permanent Devices (CI / CD) DevOps. You need the right information for the information to be discussed and converted quickly and thoroughly to check the expected business value. DevOps interview question and answer

So, how are you going to Devops CI / CD formats?

When using CI / CD formats, there are three dimensions needed to be considered.

Active code to change the information must be promoted in the pipe and must fill in the gourmet gaps as implemented
PROGRAM (DEV), PRE-PRODUCTION (PRE-PROD), and PRODUCT PRODUCTION (PROD).
CI / CD cams are the power of the person to use the data and trends of the Device, the PRE is assured and assured.
Logistics is not just the test code of support for the test, but also the data, such as the Dev, the PRE is assured and assured.

CI / CD reference guide method

The next steps in the CI / CD show are how to implement an Active Code, devoted to the PRE and the environment. These steps are shown in Appendix A.

The developer creates a part of JIRA story to start building.
The developer completes the code verification and creates applications to include the code to improve the Bitbucket branch.
Jenkins begins the CI / CD pipeline.
Analysis of basic rules is made.
In case of success, the code can be used for the development of the product.
Codes are displayed in the Area Environment.
Automated units are in the development environment.
In case of success, the code is packaged as a test item.
Act on presentation of the preparatory atmosphere.
An automated testing test on the test environment.
In the event of success, the code is labeled as an image.
The Code presented in the product environment.

Performance A

What about automated testing?

The following steps in the automated search for quality and quantitative data review when it comes to the pipes and several locations. These steps are illustrated in Appendix B.

• Tests: Developers make experimental notes to make sure that the unit units are working correctly. Bitbucket administers test scanners for each part of the code.

• Mathematical Analysis: A tool can be used to provide the ability to monitor continuous quality inspections. SonarQube is a tool for inspection / inspection analysis and to ensure that the code complies with the quality standards before marrying.

• Functional test: Approval of data format ensures that actual production meets the expected production data for each area after the next move.

• test interaction and the test of end-of-examination equipment to end the movement of data from the source of the polluting area at the request location. This allows the end of the test to end quickly and accurately whenever there is code development, configuration changes, etc.

• Performance Screening: uses Statistics Cloudera Hadoop cluster (CDH) and compares the average performance of their works and the use of targeted targeted resources.

Show B

Examples of test case reports (recommended)

lost information
Data part
The type of data is not the same
lack of translations
Incorrect translation
Incorrect data
Additional records

What happens when data is used?

In the next steps, he will show how the looge data is inseparable from various sources, including areas such as "Raw Zone" in the * Hadoop group. These steps are illustrated in Appendix C.

• M-projects management: all work areas that do not operate manage management

• Drinking tips, including:

Based on the file: files from an angle corner or HDFS with non-directional methods.
Sqoop: data from RDBMS sources using Sqoop and acceleration mode.
Alternative Data Transfer (CDC): data from external sources as one of the ways to store data.
Kafka Queue: External applications can send data to the Kafka queue, which you can then return to the data structure

Application Codes: The regulatory review process rewrites and follows the data from HQL, Python & Spark (PySpark), and Java.

• a system of unstructured zones, including:

Sign up for the following applications before working.
Transition / transit rate used only during disposal (non-routine).
Accumulation is to transfer the copy within the next 30 days to support the recovery process.
The warehouse is the final central and final RAWZ of all collected data.
Hive SQL is used to respond to the availability of the RAW table

Note: only one case is considered. There are a number of other examples, such as traffic data from the home folder and RDBMS, which we are not talking about here.

Summary

In my experience, the normal results of nutrition data and DevOps formulations include:

Structure structure reduces decision making and building performance by 80 percent.
managing and managing private data will reduce development efforts by 60 percent.
CI / CD and automated systems will reduce the probability and effort of 70 percent.

Search This Blog

devops training in chennai