BIG DATA PROBLEMS AND SOLUTIONS TO THE WORLD

Rishabh Jain
15 min readSep 17, 2020

--

Hey! researchers here is a research of the world core companies who are using the Big Data technology for solving the real world use cases or we can say the techniques to handle the huge amount of the data which can be on the basis of the Volume and Velocity of the data or structure of the data whic his also known as Variety of Data.

But Why Big Data ? Why is it required in today’s world? Is Big Data a problem or a technique?

Big data is a term that describes the large volume of data — both structured and unstructured — that inundates a business on a day-to-day basis. But it’s not the amount of data that’s important. It’s what organizations do with the data that matters. Big data can be analyzed for insights that lead to better decisions and strategic business moves.

Big Data helps the organizations to create new growth opportunities and entirely new categories of companies that can combine and analyze industry data. These companies have ample information about the products and services, buyers and suppliers, consumer preferences that can be captured and analyzed.

BIG DATA PROBLEM AND TECHNIQUE :

The big data problem means that data is growing at a much faster rate than computational speeds. And it is the result of the fact that storage cost is getting cheaper day by day, so people as well as almost all business or scientific organizations are storing more and more data. Social activities, scientific experiments, biological explorations along with the sensor devices are great big data contributors.

Big data is beneficial to the society and business but at the same time, it brings challenges to the scientific communities. The existing traditional tools, machine learning algorithms, and techniques are not capable of handling, managing, and analyzing big data, although various scalable machine learning algorithms, techniques, and tools (e.g., Hadoop and Apache Spark open source platforms) are prevalent.

Volume literally means the the quantity of something and in BigData it also implies amount or quantity of the data that is being generated from its source respectively.

Velocity in Big Data means that the speed at which the data is generated from its source which all knows that which is generating at a very high pace.

Structure or Variety of the Data is now divided into three categories :

  1. Unstructured Data => Unstructured data is data that does not follow a specified format for big data. If 20 percent of the data available to enterprises is structured data, the other 80 percent is unstructured. Unstructured data is really most of the data that you will encounter. Until recently, however, the technology didn’t really support doing much with it except storing it or analyzing it manually.

Some Sources of Unstructured Data are :

  • Web pages
  • Images (JPEG, GIF, PNG, etc.)
  • Videos
  • Memos
  • Reports
  • Word documents and PowerPoint Presentations
  • Surveys

2. Semi-Structured Data => It is the data which does not conforms to a data model but has some structure. It lacks a fixed or rigid schema. It is the data that does not reside in a rational database but that have some organisational properties that make it easier to analyse. With some process, we can store them in the relational database.

Some Sources of Semi-Structured Data are :

  • E-mails
  • XML and other markup languages
  • Binary executables
  • TCP/IP packets
  • Zipped files
  • Integration of data from different sources
  • Web pages

3. Structured Data => It is coded within your page markup and is used to provide information about a page and its content. In addition to helping Google better understand your page content, structured data is also used to enable rich results.Although not all structured data leads to a rich result, marking up content with schema.org structured data (commonly called “schema markup”) can certainly improve your chances of obtaining a rich result in SERPs.

Some Sources of Structured Data are :

  • SQL Databases
  • Spreadsheets such as Excel
  • OLTP Systems
  • Online forms
  • Sensors such as GPS or RFID tags
  • Network and Web server logs
  • Medical devices

Before talking about the Core companies let us first understand that what were the problems and challenges that companies faced and why was it so necessary to adopt the Big Data technology.

1. Lack of Understanding

Companies can leverage data to boost performance in many areas. Some of the best use cases for data are to: decrease expenses, create innovation, launch new products, grow the bottom line, and increase efficiency, to name a few. Despite the benefits, companies have been slow to adopt data technology or put a plan in place for how to create a data centered culture.

2. High Cost of Data Solutions

After understanding how your business will benefit most from implementing data solutions, you’re likely to find that buying and maintaining the necessary components can be expensive. Along with hardware like servers and storage to software, there also comes the cost of human resources and time.

3. Too Many Choices

According to psychologist Barry Schwartz, less really can be more. Coined as the “paradox of choice,” Schwartz explains how option overload can cause inaction on behalf of a buyer. Instead, by limiting a consumer’s choices, anxiety and stress can be lessened.

In the world of data and data tools, the options are almost as widespread as the data itself, so it is understandably overwhelming when deciding the solution that’s right for your business, especially when it will likely affect all departments and hopefully be a long-term strategy.

4. Using Data for Meaning

You may have the data. It’s clean, accurate and organised. But, how do you use it to provide valuable insights to improve your business? Many organisations are turning to robust data analysis tools that can help assess the big picture, as well as break down the data into meaningful bits of information that can then be transformed into actionable outcomes.

5. Complex Systems for Managing Data

Moving from a legacy data management system and integrating a new solution comes as a challenge in itself. Furthermore, with data coming from multiple sources, and IT teams creating their own data while managing data, systems can become complex quickly.

6. Compliance Hurdles

When collecting information, security and government regulations come into play. With the somewhat recent introduction of the General Data Protection Regulation (GDPR), it’s even more important to understand the necessary requirements for data collection and protection, as well as the implications of failing to adhere.

7. Pace of Technology

Inventor, author and futurist Ray Kurzweil put it best when he defined the accelerating rate of change of technology. Each subsequent technological advancement builds more quickly upon the last because they evolve at each step to become more efficient and therefore can better inform what comes next. For example, just consider how rapidly cloud computing and artificial intelligence are improving.

There were many other problems and are now solved with the incoming of this Big Data not as a problem but as a solution to the industry and to the world.

Let us now see how the companies uses this technology to avoid many of the problems that they would face and also about the user’s day to day life easiness and time saving facilities which has been now possible because of this Technology.

Now lets talk about the Google first. The company Google needs no introduction, is so advanced and most established company to the World.

Google has emerged as a Master of almost all the technologies of the world and one of them is Big Data.

Actually when it comes to Big data Google is the Undisputed champion in the World. Some of the amazing facts below will make you realize the same.

1. Ranking and Prioritizing the Search Results

There are numerous different factors that go into the rankings of your search results. Google examines the following features of a website’s content when defining relevance including:

  • Site structure relations
  • Page structure relations
  • External link relevance
  • Internal link relevance

It won’t be wrong to state that Google knows everything about us and all credit goes to Big Data analytics. In fact, Google has mastered the domain of big data analytics and it has developed several tools and techniques to capture the data of users which includes their preference, their likes, dislikes, the area of specialization, their requirement etc. Google not only gather those vital data, but it also processes it quickly and efficiently to deliver us the required search result for any particular query.

2. Literal & Semantic search

The main aim of the literal search engine is to find the root of your search phrase by looking for a match for some of the word or entire phrase. The root of the phrase is then examined and explored upon to display better search results. While semantic search engine tries to understand the context of the phrase by analyzing the terms and language in knowledge graph database to directly answer a question with specific information.

3. Tracking Cookies

Google can keep a track on users across the web by using cookies. If a user is logged or signed into Google and the user is simultaneously browsing other websites, Google can track the websites they are visiting. Google tracks its users across the web by tracking cookies. Thus, Google can collect several data related to users such as their preference, inclination, favorites, requirements etc. Whenever a user searches anything on Google, it incorporates all that information before displaying the results in proper rank.

4. Indexed pages

Indexed pages are the collection of web pages stored to respond to search queries. Indexing is the process of adding web pages into google search index. It involves assigning keywords or phrases to web pages within a metadata tag or meta-tag so that webpage can be retrieved easily with a search engine that is tailored to search the keywords field. Once the meta-tag is created, Google will crawl and index your webpage. It generally takes 4 days to 4 weeks for any new website to be crawled and indexed by Google.

5. Real-time Data Feeds

Although it doesn’t promote itself as such, Google is actually a collection of data and a set of tools for working with it. It has progressed from an index of web pages to a central hub for real-time data feeds on just about anything that can be measured such as weather reports, travel reports, stock market and shares, shopping suggestions, travel suggestions, and several other things.

Big data helps in SEO

SEO practices depend on the use of large quantities of data, which is nothing but big data. As you know, search engine optimization is an endeavor to enhance the search result rankings of a website based on the availability of online data. The search engine giants like Google direct visitors to pages that appear to be relevant or authoritative. Whether a site is trustworthy or not is determined by the number and quality of links it gets from other websites.

The role of SEO in marketing campaigns is well-known to everyone, but its techniques continuously change due to the evolution of big data.

Deeper SEO Insights

Search engines convert website content into quantifiable data. And in coming days, these will be able to produce more accurate results, which marketers can use for insights. It is because of big data that SEO applies different techniques, such as keywords, on-page optimization, linking, to reach out to their customers. All the efforts combining local SEO, content marketing, and mobile data will help generate accurate user insights, and this can be possible only because of the contribution of big data.

Social Media

Social networking sites churn a large amount of data, which search engine giants cannot afford to ignore. And this has been possible because so many users in large numbers continue to join these social channels over time. You get proof of this when you look at the user base of social media sites like Twitter, Facebook, and so on. That’s why enterprises are focused on improving their presence on these social platforms to improve their rankings in the search engines.

Above all these Big Data is used in many fields or I should say it has made its very secure place in the industry and can be integrated with almost all the technologies in the world.

Now let us move to our Second Company which is Facebook. The another established company which is now in the contact of almost all the people in the world.

How Facebook is Using Big Data

Have you ever seen one of the videos on Facebook that shows a “flashback” of posts, likes, or images — like the ones you might see on your birthday or on the anniversary of becoming friends with someone? If so, you have seen examples of how Facebook uses Big Data.

A report from McKinsey & Co. stated that by 2009, companies with more than 1,000 employees already had more than 200 terabytes of data of their customer’s lives stored. Consider adding that startling amount of stored data to the rapid growth of data provided to social media platforms since then. There are trillions of tweets, billions of Facebook likes, and other social media sites like Snapchat, Instagram, and Pinterest are only adding to this social media data deluge.

Social media accelerates innovation, drives cost savings, and strengthens brands through mass collaboration. Across every industry, companies are using social media platforms to market and hype up their services and products, along with monitoring what the audience is saying about their brand.

The convergence of social media and big data gives birth to a whole new level of technology.

The Facebook Context

Arguably the world’s most popular social media network with more than two billion monthly active users worldwide, Facebook stores enormous amounts of user data, making it a massive data wonderland. It’s estimated that there will be more than 183 million Facebook users in the United States alone by October 2019. Facebook is still under the top 100 public companies in the world, with a market value of approximately $475 billion.

Every day, we feed Facebook’s data beast with mounds of information. Every 60 seconds, 136,000 photos are uploaded, 510,000 comments are posted, and 293,000 status updates are posted. That is a LOT of data.

At first, this information may not seem to mean very much. But with data like this, Facebook knows who our friends are, what we look like, where we are, what we are doing, our likes, our dislikes, and so much more. Some researchers even say Facebook has enough data to know us better than our therapists!

Apart from Google, Facebook is probably the only company that possesses this high level of detailed customer information. The more users who use Facebook, the more information they amass. Heavily investing in its ability to collect, store, and analyze data, Facebook does not stop there. Apart from analyzing user data, Facebook has other ways of determining user behavior.

  1. Tracking cookies: Facebook tracks its users across the web by using tracking cookies. If a user is logged into Facebook and simultaneously browses other websites, Facebook can track the sites they are visiting.
  2. Facial recognition: One of Facebook’s latest investments has been in facial recognition and image processing capabilities. Facebook can track its users across the internet and other Facebook profiles with image data provided through user sharing.
  3. Tag suggestions: Facebook suggests who to tag in user photos through image processing and facial recognition.
  4. Analyzing the Likes: A recent study conducted showed that it is viable to predict data accurately on a range of personal attributes that are highly sensitive just by analyzing a user’s Facebook Likes. Work conducted by researchers at Cambridge University and Microsoft Research shows how the patterns of Facebook Likes can very accurately predict your sexual orientation, satisfaction with life, intelligence, emotional stability, religion, alcohol use and drug use, relationship status, age, gender, race, and political views — among many others.

Facebook Inc. analytics Chief Ken Rudin says,

“Big Data is crucial to the company’s very being.”

He goes on to say that,

“Facebook relies on a massive installation of Hadoop, a highly scalable open-source framework that uses clusters of low-cost servers to solve problems. Facebook even designs its hardware for this purpose. Hadoop is just one of many Big Data technologies employed at Facebook.”

After knowing about the Google and Facebook let us now move to the other core Company IBM (International Business Machine). Though many thought that it has not so known but it is more old than Google and Facebook and moreover the world are using its utilities or facility than Google or Facebook.

Some of the facilities or innovations given by IBM are:

  1. ATM Machine
  2. NEXA automation and Intelligence
  3. First Super Computer
  4. Recording Magnetic Tapes
  5. Scanning Tunneling Microscopy

and many other products were given by IBM to the world and most important it has a collabrative relation to almost all the companies and all the fields of the world.

It would not be wrong to say that:

“If using the product IBM is right behind it”

Now let us know how IBM uses Big Data to avoid many day to day problems and profit itself to become a Game changer

1. Helping a mid-size company migrate to the cloud

A few months ago, the IBM Cloud Garage partnered with a mid-size company to assess and transform their entire application portfolio with the cloud. At the end of our initial assessment, we provided a transformation vision and implementation plan based on the IBM Cloud Garage Method. We also provided a target cloud architecture, implemented squad models, and created an actionable plan to divide projects into multiple Minimal Viable Products with a strangler pattern to modernize and migrate their applications to cloud.

2. Motivation to move Big Data stack to the cloud

While many companies today seek to harness Big Data to cultivate new business insights, this mid-size company’s use of Big Data is integral to their core mission and is baked into many of their business decisions. Big Data powers their innovation in customer service by anticipating what customers like and how they will interact. They then learn from these interactions to improve future experiences.Their current Big Data platform was adequate but fairly expensive to maintain. It was also lagging behind current software and hardware technologies due to multiple acquisitions and several integrations.

3. Assessing the current application portfolio and drafting target deployment models

Before executing a cloud migration and embarking on digital transformation, an organization must understand their long-term business goals, pain points, archaeology of application and data infrastructure portfolio, and individual structure, operations, and processes. The IBM Cloud Garage performed a comprehensive review of the client’s applications supporting the core business functions, grouping them into different categories to evaluate against various cloud deployment models. As previously discussed, this client’s Big Data platform is one of their core components for all their applications.

4. Target cloud architecture for the Big Data platform

When we joined the client, they had already started building a target architecture model that applied leading open-source technologies. In doing so, however, the client planned on implementing a roll-your-own technology stack to a Big Data platform on the cloud without leveraging any of the cloud-native services that allow for rapid provisioning (such as Hadoop and Spark clusters) or for flexibility for data at rest with the Object storage.

Our proposed architecture was split into three major categories to address the data flows:

  1. Real-time
  2. Near real-time
  3. Batch jobs execution

5. The role of data governance

Implementing data governance was one of the client’s key requirements in migrating their data platform to the cloud. Our team recommended IBM Data Catalog and Data Refinery, which are now part of the IBM Watson Knowledge Catalog. The Knowledge Catalog provides a broad range of capabilities and tools for data cataloging, discovery, find ability, and governance.

IBM Watson Knowledge Catalog includes built-in data discovery algorithms that utilize machine learning to auto-classify the contents of each data set and a governance policy manager and engine. When you add the data to the enterprise catalog, for example, sensitive data will automatically become classified.

These were the uses which are used by the big companies in the industry.

THANKS A LOT !! HOPE YOU LEARNED AND ENJOYED.

--

--

Rishabh Jain
Rishabh Jain

Written by Rishabh Jain

I am a tech enthusiast, researcher and an integration seeker. I love to explore and learn about the right technology and right concepts from its foundation.

No responses yet