Exercise Sqoop

Install crime-data-la database in MySQL

Copy datasets crime_data_la.csv and crime_data_area_name.csv in /user/ubuntu/crime_la in HDFS.

From the console, copy crime_data_la.csv and crime_data_area_name.csv to the local filesystem using:

hadoop fs -copyToLocal /user/ubuntu/crime_la/crime_data_la.csv
hadoop fs -copyToLocal /user/ubuntu/crime_la/crime_data_area_name.csv
ls -a

Create a new database crime_la in MySQL

mysql -u root -p
CREATE DATABASE crime_la;
USE crime_la;

Create table crime_data_la and area_lookup:

CREATE TABLE crime_data_la
(dr_number INT, date_reported VARCHAR(255), date_occured VARCHAR(255), 
tm_occured INT, area_id INT, reporting_district INT, crime_code INT, victime_age INT,
victim_sex VARCHAR(255), victim_descent VARCHAR(255), coord_lat  FLOAT, coord_long  FLOAT);

alter table crime_data_la add primary key (dr_number);

CREATE TABLE area_lookup (area_id INT, area_name VARCHAR(255));

alter table area_lookup add primary key (area_id);

Load data into crime_la database:

LOAD DATA LOCAL INFILE 'crime_data_la.csv'
INTO TABLE crime_data_la
FIELDS TERMINATED BY ','
  OPTIONALLY ENCLOSED BY '"'
LINES TERMINATED BY '\r\n'
(dr_number, date_reported, date_occured,tm_occured,area_id, reporting_district, crime_code, victime_age,
victim_sex, victim_descent,    coord_lat, coord_long);
LOAD DATA LOCAL INFILE 'crime_data_area_name.csv'
INTO TABLE area_lookup
FIELDS TERMINATED BY ','
    ENCLOSED BY '"'
LINES TERMINATED BY '\n'
(area_id, area_name);

Check the data has been correctly loaded:

SELECT * FROM crime_data_la LIMIT 10;

SELECT * FROM area_lookup;

Grant all privileges on crime_la

GRANT ALL PRIVILEGES
ON crime_la.*
TO 'ubuntu'@'%'
IDENTIFIED BY 'ubuntu';

Exercise 1

  • Create the crime_la database in MySQL using the course material.
  • Using sqoop, import crime_data_la table from MySQL into HDFS such that fields are separated by a ‘*' and lines are separated by '\n'.
  • Null values are represented as -1 for numbers and "NA" for strings.
  • Only records with date_reported between 2013-01-15 and 2014-05-27 should be imported.
  • Compare date strings using CAST('date_string' AS DATE) or str_to_date( 'date_string' , '%Y-%m-%d’). Either way works.
CAST(date_reported AS DATE)
CAST('2015-05-27' AS DATE)
str_to_date(date_reported,'%Y-%m-%d')
str_to_date('2015-05-27','%Y-%m-%d')
  • Use 3 mappers for importing.
  • The destination file should be stored as a text file to directory /user/ubuntu/crime_la/export.

Exercise 2

  • Create the crime_la database in Hive (i.e. in HUE).
  • Using sqoop, import area_lookup table from MySQL into crime_la Hive database such that fields are separated by a '|' and lines are separated by '\n'.

  • Only records with area_id between 5 and 20 should be imported.

  • Use 2 mappers for importing.

  • Overwrite the table if it already exists.

results matching ""

    No results matching ""