In this blog we are going to see the import , incremental import commands in Sqoop as well as some other commands.

Before going to import commands, we need to know about sqoop. In simple words, it's an open source command-line interface application which can import bulk data from relational databases , data warehouse systems to HDFS as well as export data from Hadoop file system to relational databases.

To know further about Sqoop and its some basic commands check the link below.
click here   in that block you can get the overview and introduction of Apache scoop with some basic commands and it will be very useful for beginners.

Now, let us come to our main topic, Sqoop import and incremental import.

          


Sqoop  is as simple as getting a scoop of ice cream 😜


Sqoop Import Operation:

  Sqoop import --connect jdbc:mysql://[hostname]/[database name] --username [username] --passsword [password] --table [table name] --target-dir [hdfs target path] -m 1  


Inside the [ ] brackets , you need to replace the respective values as mentioned in text. 

Note:

-m and 1 must be provided in following situations:

    a. If the table does not have a primary key.

    b. If we wand only one output file.


In case the output columns are getting shuffled or rearranged then use --columns 

Sqoop import --connect jdbc:mysql://[hostname]/[database name] --username [username] --passsword [password] --table [table name] --target-dir [hdfs target path] --columns "[column names seperated with comma] -m 1  

Inside the [ ] brackets , you need to replace the respective values as mentioned in text. 

Import Subset of Data:

eg:

SELECT *
FROM Emplyee 
WHERE sal>=5000;

Then in the above situation, we can go with below sqoop command.

  Sqoop import --connect jdbc:mysql://[hostname]/[database name] --username [username] --passsword [password] --table [table name] --target-dir [hdfs target path] --where "sal>=5000" -m 1  

Inside the [ ] brackets , you need to replace the respective values as mentioned in text. 

Import on basis of Sql Query: 

Here we need to remember two things, so that we wont forget the concept.
  1.     No need to mention --table
  2.     Need to mention \$CONDITIONS in the where conditions

  Sqoop import --connect jdbc:mysql://[hostname]/[database name] --username [username] --passsword [password]  --target-dir [hdfs target path] --query "select * from table where sal>=5000 and \$CONDITIONS" -m 1 

Inside the [ ] brackets , you need to replace the respective values as mentioned in text.

Changing Delimiter: 

To change the Delimiter based on your preference , we are going to use fields-terminated-by

  Sqoop import --connect jdbc:mysql://[hostname]/[database name] --username [username] --passsword [password] --table [table name] --target-dir [hdfs target path] -m 1 --fields-terminated-by '#'  

Inside the [ ] brackets , you need to replace the respective values as mentioned in text. 

For example in the place of [hdfs target path] you need to provide your HDFS path which is the destination location to import.

Sqoop Incremental Imports:

Sqoop give as a special feature, that is Sqoop Incremental imports. We can import the new data to the already imported data in HDFS, by knowing the last value of specific column which is already available in HDFS. 

In Sqoop incremental imports, we need to take care of three things.

  1.  --incremental append :
         append new data in the exist folder

  2.   --check-column :
          Specify the column on which the incremental import is dependent on (it would be a primary key or unique key)

  3. --last -value :
          Specify the last value entered already .

  Sqoop import --connect jdbc:mysql://[hostname]/[database name] --username [username] --passsword [password] --table [table name] --target-dir [hdfs target path]  -- incremental append --check-column [column name] --last-value [the last value in the current table in hdfs] -m 1   

Inside the [ ] brackets , you need to replace the respective values as mentioned in text.