Step 1: Install Java (Required for Hadoop)

lapt-get install openjdk-8-jdk-headless qq > /dev/null

Step 2: Download and Extract Hadoop

Iwget -q https://downloads.apache.org/hadoop/common/hadoop-3.3.6/hadoop-3.3.6.tar.gz Itar -xzf hadoop-3.3.6.tar.gz

Step 3: Set JAVA_HOME environment:

import os

#Set JAVA HOME properly

os.environ("JAVA HOME"] = "/usr/lib/jvm/java-8-openjdk-amd64"

#Add Java to PATH

os.environ["PATH"] = os.environ["JAVA HOME"] + "/bin:" + os.environ["PATH"]

#Verify lava

java -version

openjdk version "1.8.0_482"

OpenJDK Runtime Environment (build 1.8.0 482-8u482-ga-us1-Bubuntu1-22.04-608) OpenJDK 64-Bit Server VM (build 25.482-b08, mixed mode)

Step 4: Set Hadoop environment:

Import os

os.environ["HADOOP_HOME"]="/content/hadoop-3.3.6"

os, environ ["PATH"] os.environ["HADOOP HOME"]+"/bin:" os.environ["PATH"]

Step 5: Install Apache Pig

Iwget -q https://downloads.apache.org/pig/pig-0.17.0/pig-0.17.0.tar.gz

Itar -xzf pig-6.17.0.tar.gz    Step 6: Create dataset

Student Dataset

Student ID Name

Course

Marks

101

Amit Data Science

85

102

Neha Al

90

103

Rahul Big Data

78

104

Priya Machine Learning 88.

105

Kiran Data Analytics 92

Step 7: Upload Dataset to HDFS (Simulation)

Imkdir -p input

Icp student data.csv input/

Ils input

student data.csv

Step 8: Create Pig Script

writefile student.pig

student data LOAD "input/student data.csv

USING PigStorage(',')

AS (id:int, name:chararray, course:chararray, marks:int);

Display dataset DUMP student data;

Filter students with marks greater than B5 high marks FILTER student data BY marks. 853

DUMP high marks;

writing student.pig

Step 6: Run Pig Script

Ipig-0.17.0/bin/pig student.pig

Success!

Job Stats (Lime in seconds):

Jobld Baps Reduces MaaMapTime

Tise Haptiva Pediantaptise MexReducetise

job local195973622 0001 1

Input(s) Successfully read 5 records from: "file:///content/input/student data.css"

output(a) Successfully stared 5 records in "file:/top/top-151465548/tap-1626667077

Counters

total records written Tutal bytes written:

Spillable Mesory Manager spill courte Spillable Memory Manager spill count Tetal bags proactively spilledro    2020-04-08 11:00:14,524 [111] INFU Urg

(101, Amit, Data Science, 85)

(102, Neha, AI, 90)

(104, Priya, Machine Learning, 88)

(103, Rahul, Big Data, 78)

(182, Neha, A1,90)

(184, Priya, Machine Learning, 88) (105, Kiran, Data Analytics,92)

2826-04-08 17:50:28,977 [main] DFO org.apache.pig.Rain Pig script completed in 16 seconds and 21 milliseconds (16803 ms)

Step 7:

Similated HBase Storage (Using Python Dictionary)

Import pandas as pd

data pd.read csv("student data.csv", header-None) data.columns ["ID", "Name", "Course", "Marks"]

#Simulated HBase table hbase table()

for index, row in data. Iterrows():

hbase table[row["ID"]] = { ase

"info:name": row "Name",

"info:course": row "Course"),

"info:marks": row["Marks"]

print (hbase table)

101: ['info:name': 'Amit', 'info: course': 'Data Science', 'info:marks': 851,

102: ('info:name': 'Neha', 'info:course': 'AI', 'info:marks': 90), 103: ('info:name': 'Rahul', 'info:course': 'Big Data', 'info:marks': 781, 104: ('info:name': 'Priya', 'info:course': 'Machine Learning', 'info:marks': 88),

105: {'info:name': 'Kiran', 'info:course': 'Data Analytics', 'info:marks': 92)