Step 1: Install Java (Required for Hadoop) lapt-get install openjdk-8-jdk-headless qq > /dev/null Step 2: Download and Extract Hadoop Iwget -q https://downloads.apache.org/hadoop/common/hadoop-3.3.6/hadoop-3.3.6.tar.gz Itar -xzf hadoop-3.3.6.tar.gz Step 3: Set JAVA_HOME environment: import os #Set JAVA HOME properly os.environ("JAVA HOME"] = "/usr/lib/jvm/java-8-openjdk-amd64" #Add Java to PATH os.environ["PATH"] = os.environ["JAVA HOME"] + "/bin:" + os.environ["PATH"] #Verify lava java -version openjdk version "1.8.0_482" OpenJDK Runtime Environment (build 1.8.0 482-8u482-ga-us1-Bubuntu1-22.04-608) OpenJDK 64-Bit Server VM (build 25.482-b08, mixed mode) Step 4: Set Hadoop environment: Import os os.environ["HADOOP_HOME"]="/content/hadoop-3.3.6" os, environ ["PATH"] os.environ["HADOOP HOME"]+"/bin:" os.environ["PATH"] Step 5: Install Apache Pig Iwget -q https://downloads.apache.org/pig/pig-0.17.0/pig-0.17.0.tar.gz Itar -xzf pig-6.17.0.tar.gz Step 6: Create dataset Student Dataset Student ID Name Course Marks 101 Amit Data Science 85 102 Neha Al 90 103 Rahul Big Data 78 104 Priya Machine Learning 88. 105 Kiran Data Analytics 92 Step 7: Upload Dataset to HDFS (Simulation) Imkdir -p input Icp student data.csv input/ Ils input student data.csv Step 8: Create Pig Script writefile student.pig student data LOAD "input/student data.csv USING PigStorage(',') AS (id:int, name:chararray, course:chararray, marks:int); Display dataset DUMP student data; Filter students with marks greater than B5 high marks FILTER student data BY marks. 853 DUMP high marks; writing student.pig Step 6: Run Pig Script Ipig-0.17.0/bin/pig student.pig Success! Job Stats (Lime in seconds): Jobld Baps Reduces MaaMapTime Tise Haptiva Pediantaptise MexReducetise job local195973622 0001 1 Input(s) Successfully read 5 records from: "file:///content/input/student data.css" output(a) Successfully stared 5 records in "file:/top/top-151465548/tap-1626667077 Counters total records written Tutal bytes written: Spillable Mesory Manager spill courte Spillable Memory Manager spill count Tetal bags proactively spilledro 2020-04-08 11:00:14,524 [111] INFU Urg (101, Amit, Data Science, 85) (102, Neha, AI, 90) (104, Priya, Machine Learning, 88) (103, Rahul, Big Data, 78) (182, Neha, A1,90) (184, Priya, Machine Learning, 88) (105, Kiran, Data Analytics,92) 2826-04-08 17:50:28,977 [main] DFO org.apache.pig.Rain Pig script completed in 16 seconds and 21 milliseconds (16803 ms) Step 7: Similated HBase Storage (Using Python Dictionary) Import pandas as pd data pd.read csv("student data.csv", header-None) data.columns ["ID", "Name", "Course", "Marks"] #Simulated HBase table hbase table() for index, row in data. Iterrows(): hbase table[row["ID"]] = { ase "info:name": row "Name", "info:course": row "Course"), "info:marks": row["Marks"] print (hbase table) 101: ['info:name': 'Amit', 'info: course': 'Data Science', 'info:marks': 851, 102: ('info:name': 'Neha', 'info:course': 'AI', 'info:marks': 90), 103: ('info:name': 'Rahul', 'info:course': 'Big Data', 'info:marks': 781, 104: ('info:name': 'Priya', 'info:course': 'Machine Learning', 'info:marks': 88), 105: {'info:name': 'Kiran', 'info:course': 'Data Analytics', 'info:marks': 92)