Highest scored 'impala' questions

58 votes

5 answers

37k views

How does impala provide faster query response compared to hive

I have recently started looking into querying large sets of CSV data lying on HDFS using Hive and Impala. As I was expecting, I get better response time with Impala compared to Hive for the queries I ...

techuser soma

4,836

asked May 26, 2013 at 2:07

43 votes

2 answers

35k views

Fast Hadoop Analytics (Cloudera Impala vs Spark/Shark vs Apache Drill)

I want to do some "near real-time" data analysis (OLAP-like) on the data in a HDFS. My research showed that the three mentioned frameworks report significant performance gains compared to Apache Hive. ...

user2306380

611

asked Jun 25, 2013 at 6:18

27 votes

4 answers

82k views

How to copy all hive table from one Database to other Database

I have default db in hive table which contains 80 tables . I have created one more database and I want to copy all the tables from default DB to new Databases. Is there any way I can copy from One DB ...

Aman

3,251

asked Oct 29, 2014 at 17:21

21 votes

3 answers

24k views

Impala can't access all hive table

I try to query hbase data through hive (I'm using cloudera). I did a fiew hive external table pointing to hbase but the thing is Cloudera's Impala doesn't have an access to all those tables. All hive ...

Nosk

753

asked Dec 10, 2013 at 16:44

16 votes

3 answers

43k views

Difference between invalidate metadata and refresh commands in Impala?

I saw at this link which affects Impala version 1.1: Since Impala 1.1, REFRESH statement only works for existing tables. For new tables you need to issue "INVALIDATE METADATA" statement. Does this ...

covfefe

2,575

asked Feb 15, 2017 at 1:24

14 votes

1 answer

22k views

How to calculate seconds between two timestamps in Impala?

I do not see an Impala function to subtract two datestamps and return seconds (or minutes) between the two. http://www.cloudera.com/documentation/archive/impala/2-x/2-0-x/topics/...

ADJ

5,082

asked Mar 7, 2016 at 19:55

14 votes

2 answers

2k views

how to efficiently move data from Kafka to an Impala table?

Here are the steps to the current process: Flafka writes logs to a 'landing zone' on HDFS. A job, scheduled by Oozie, copies complete files from the landing zone to a staging area. The staging data ...

Alex Woolford

4,523

asked Jan 25, 2016 at 23:54

12 votes

1 answer

2k views

How to efficiently update Impala tables whose files are modified very frequently

We have a Hadoop-based solution (CDH 5.15) where we are getting new files in HDFS in some directories. On top os those directories we have 4-5 Impala (2.1) tables. The process writing those files in ...

Victor

2,490

asked Feb 6, 2020 at 8:24

12 votes

7 answers

20k views

RODBC ERROR: Could not SQLExecDirect in mysql

I have been trying to write an R script to query Impala database. Here is the query to the database: select columnA, max(columnB) from databaseA.tableA where columnC in (select distinct(columnC) from ...

Gowtham Ganesh

340

asked May 11, 2015 at 12:46

11 votes

3 answers

73k views

Convert YYYYMMDD String to Date in Impala

I'm using SQL in Impala to write this query. I'm trying to convert a date string, stored in YYYYMMDD format, into a date format for the purposes of running a query like this: SELECT datadate, ...

nxl4

724

asked Oct 8, 2015 at 19:24

11 votes

3 answers

13k views

How does computing table stats in hive or impala speed up queries in Spark SQL?

For increasing performance (e.g. for joins) it is recommended to compute table statics first. In Hive I can do:: analyze table <table name> compute statistics; In Impala: compute stats <...

Raphael Roth

27.2k

asked Sep 22, 2016 at 7:23

11 votes

2 answers

15k views

Write pandas table to impala

Using the impyla module, I've downloaded the results of an impala query into a pandas dataframe, done analysis, and would now like to write the results back to a table on impala, or at least to an ...

SummerEla

1,932

asked Sep 1, 2015 at 17:52

11 votes

2 answers

1k views

Big data signal analysis: better way to store and query signal data

I am about doing some signal analysis with Hadoop/Spark and I need help on how to structure the whole process. Signals are now stored in a database, that we will read with Sqoop and will be ...

Ameba Spugnosa

1,224

asked Apr 24, 2016 at 10:24

11 votes

1 answer

1k views

Can ETL informatica Big Data edition (not the cloud version) connect to Cloudera Impala?

We are trying do a proof of concept on Informatica Big Data edition (not the cloud version) and I have seen that we might be able to use HDFS, Hive as source and target. But my question is does ...

sun_dare

1,156

asked Dec 23, 2015 at 21:11

9 votes

1 answer

10k views

Impala command to know DB table size

Is there any way that we can check the DB table size and other properties ? I tried COMPUTE STATS but it gives the details of table except the size. any link to find information and other details are ...

Shantesh

1,520

asked Jan 16, 2018 at 5:38

9 votes

2 answers

12k views

Create table from CSV with values containing commas enclosed in quotes

I'm trying to create a table in Impala from a CSV that I've uploaded into an HDFS directory. The CSV contains values with commas enclosed inside quotes. Example: 1.66.96.0/19,"NTT Docomo,INC.","...

nxl4

724

asked Jun 7, 2016 at 19:57

9 votes

2 answers

458 views

Is there a way to turn off DESCRIBE in R dplyr sql

I'm using R shiny and dplyr to connect to a database and query the data in Impala. I do the following. con <- dbPool(odbc(), Driver = [DIVER], Host = [HOST], Schema = [SCHEMA], Port = [PORT], UID =...

bink1time

141

asked Aug 12, 2019 at 19:03

8 votes

2 answers

10k views

How do I set a variable in an Impala query using HUE?

I need to add parameters in several locations in a long query. I want to use parameters because I need to run the query multiple times with different values substituted in. This is very cumbersome ...

OTM

188

asked Jun 8, 2020 at 20:27

8 votes

3 answers

41k views

Get sequential number of a row (rank) within a partition without using ROW_NUMBER() OVER function

I need to rank rows by partition (or group), i.e. if my source table is: NAME PRICE ---- ----- AAA 1.59 AAA 2.00 AAA 0.75 BBB 3.48 BBB 2.19 BBB 0.99 BBB 2.50 I would like to get target table: ...

Andrey Dmitriev

558

asked May 2, 2014 at 10:03

8 votes

1 answer

342 views

Running impala cluster from portable binaries

I'm evaluating multiple big data tools. One of them is of course Impala. I would like to start Impala cluster by manually starting processes on the cluster nodes. As I'm currently doing for Spark, H2O,...

jangorecki

16.5k

asked Aug 22, 2016 at 20:03

7 votes

2 answers

11k views

How to duplicate cloudera impala table with impala-shell or other means?

I see a table "test" in Impala when I do show tables; I want to make a copy of the "test" table so that it is an exact duplicate, but named "test_copy". Is there a impala query I can execute to do ...

Rolando

60.7k

asked Oct 29, 2014 at 17:02

7 votes

1 answer

35k views

Dropping multiple partitions in Impala/Hive

1- I'm trying to delete multiple partitions at once, but struggling to do it with either Impala or Hive. I tried the following query, with and without ': ALTER TABLE cz_prd_corrti_st....

k_mishap

451

asked Aug 7, 2017 at 9:18

7 votes

1 answer

23k views

Difference in days between two dates in Impala

I am trying to find a date difference In Impala. I have tried a few options. my most recent is below ABS(dayofyear(CAST(firstdate AS TIMESTAMP)-dayofyear(CAST(seconddate AS TIMESTAMP) an example of ...

burnsa9

131

asked Dec 4, 2017 at 19:03

7 votes

4 answers

48k views

ROW_NUMBER( ) OVER in impala

I have a use case where I need to use ROW_NUMBER() over PARTITION: Something like: SELECT Column1 , Column 2 ROW_NUMBER() OVER ( PARTITION BY ACCOUNT_NUM ORDER BY FREQ, MAN, MODEL) as ...

user1189851

4,981

asked Oct 6, 2014 at 19:20

7 votes

5 answers

10k views

Will Spark SQL completely replace Apache Impala or Apache Hive? [closed]

I need to deploy Big Data Cluster on our servers. But I just know about knowledge of Apache Spark. Now I need to know whether Spark SQL can completely replace Apache Impala or Apache Hive. I need ...

Tim Koo

109

asked Oct 25, 2016 at 9:37

7 votes

2 answers

28k views

extract the date from a timestamp value variable in Impala

How can I extract the date from a timestamp value variable in Impala? eg time = 2018-04-11 16:05:19 should be 2018-04-11

Anna

444

asked Jun 24, 2018 at 20:19

7 votes

2 answers

14k views

Uploading CSV for Impala

I am trying to upload the csv file on HDFS for Impala and failing many time. Not sure what is wrong here as I have followed the guide. And the csv is also on HDFS. CREATE EXTERNAL TABLE gc_imp ...

LonelySoul

1,212

asked Aug 23, 2013 at 4:45

7 votes

2 answers

41k views

Impala: Show tables like query

I am working with Impala and fetching the list of tables from the database with some pattern like below. Assume i have a Database bank, and tables under this database are like below. cust_profile ...

Manindar

999

asked Mar 24, 2017 at 12:27

7 votes

3 answers

5k views

How to find the COMPRESSION_CODEC used on a Parquet file at the time of its generation?

Usually in Impala, we use the COMPRESSION_CODEC before inserting data into a table for which the underlying files are in Parquet format. Commands used to set COMPRESSION_CODEC: set ...

Gomz

850

asked Aug 20, 2019 at 12:16

7 votes

1 answer

4k views

Impala cannot find com.mysql.jdbc.Driver

I'm trying to set up Cloudera Impala with CDH4 in pseudo distributed mode on Red Hat 5. I have Hive using JDBC to connect to a MySQL metastore, but I'm having trouble setting up Impala with JDBC. I've ...

supermaria

121

asked Jun 18, 2013 at 15:51

6 votes

2 answers

8k views

Installing cloudera impala without cloudera manager

Kindly provide the link for installing the imapala in ubuntu without cloudera manager. Couldn't able to install with official link. Unable to locate package impala using these queries : sudo apt-...

Naresh

5,245

asked Jun 17, 2013 at 11:33

6 votes

1 answer

13k views

Calling JDBC to impala/hive from within a spark job and creating a table

I am trying to write a spark job in scala that would open a jdbc connection with Impala and let me create a table and perform other operations. How do I do this? Any example would be of great ...

user1189851

4,981

asked Oct 29, 2014 at 15:48

6 votes

1 answer

11k views

Impala - convert existing table to parquet format

I have a table that has partitions and I use avro files or text files to create and insert into a table. Once the table is done, is there a way to convert into parquet. I mean I know we could have ...

user1189851

4,981

asked Oct 14, 2014 at 16:10

6 votes

3 answers

14k views

Save Impala Shell query results in CSV

How can I save my query results in a CSV file via the Impala Shell. My Code: impala-shell -q "use test; select * from teams; -- From this point I need to save the query results to /Desktop (for ...

user6203336

asked Apr 14, 2018 at 16:04

6 votes

4 answers

13k views

Comma delimited string to individual rows - Impala SQL

Let's suppose we have a table: Owner | Pets ------------------------------ Jack | "dog, cat, crocodile" Mary | "bear, pig" I want to get as a result: Owner | Pets ------------------------...

ifotopoulos

83

asked May 23, 2016 at 19:38

6 votes

2 answers

5k views

Performance of Apache Drill

Are there any performance benchmark(genuine ones) that compare Stinger vs Impala vs Drill? Also, which is preferred - my use case will be mainly towards ad-hoc interactive queries on top of Hive. ...

Sai

127

asked Aug 22, 2015 at 6:44

6 votes

1 answer

21k views

Jdbc settings for connecting to Impala

What is the combination of driver and jdbc URL to use for CDH5 (I am on CDH5.3)? I have tried a few including: jdbc:hive2://myserver:21050/;auth=noSasl And with the following driver: org.apache....

WestCoastProjects

61.2k

asked Mar 6, 2015 at 4:40

6 votes

3 answers

8k views

Custom SerDe not supported by Impala, what's the best way to query files in CSV w/double quotes?

I have a CSV data with each field surronded with double quotes. When I created Hive table used serde 'com.bizo.hive.serde.csv.CSVSerde' When above table is queried in Impala I am getting error SerDe ...

prasannads

639

asked Sep 3, 2014 at 10:56

6 votes

2 answers

469 views

Immediate evaluation of CTE

I am trying to optimize a very long and complex impala query which contains multiple CTE. Each CTE is used multiple times. My expectation is that once a CTE is created, I should be able to direct ...

AYK

3,312

asked Nov 6, 2017 at 9:26

6 votes

1 answer

11k views

Impala/Hive to get list of tables along with its size

I have used a query in Oracle DB to produce the list of tables in a database along with its owner and respective table size. Here is the sample query i have shared. select owner, table_name, round((...

Manindar

999

asked Apr 20, 2017 at 9:22

6 votes

2 answers

34k views

How to set configuration in Hive-Site.xml file for hive metastore connection?

I want to connect MetaStore using the java code. I have no idea how to set configuration setting in Hive-Site.xml file and where I'll post the Hive-Site.xml file. Please help. import java.sql....

mohit sharma

259

asked Apr 7, 2015 at 6:25

6 votes

1 answer

11k views

Implement CREATE AS SELECT in Impala

Pls help me on how to implement CREATE TABLE AS SELECT For simple create table t1 as select * from t2; I can implement as Create table t1 like t2; insert into t1 as select * from t2; But how to ...

on_the_shores_of_linux_sea

1,002

asked Oct 23, 2013 at 3:17

6 votes

0 answers

1k views

How to use Impala to read Hive view containing complex types?

I have some data that is processed and model based on case classes, and the classes can also have other case classes in them, so the final table has complex data, struct, array. Using the case class I ...

Shikkou

565

asked Mar 26, 2019 at 17:33

5 votes

4 answers

6k views

Presto vs Impala: architecture, performance, functionality

Could you highligh major differences between the two in architecture & functionality in 2019? And how that differences affect performance? For some reason this excellent question was tagged as ...

VB_

45.3k

asked Dec 10, 2019 at 21:38

5 votes

3 answers

12k views

Invalidate metadata/refresh imapala from spark code

I'm working on a NRT solution that requires me to frequently update the metadata on an Impala table. Currently this invalidation is done after my spark code has run. I would like to speed things up ...

Havnar

2,578

asked Jul 6, 2016 at 9:29

5 votes

3 answers

8k views

Cloudera Impala INVALIDATE METADATA

As has been discussed in impala tutorials, Impala uses a Metastore shared by Hive. but has been mentioned that if you create or do some editions on tables using hive, you should execute INVALIDATE ...

masoumeh

478

asked Nov 24, 2015 at 7:54

5 votes

2 answers

12k views

Is there a way to show partitions on Cloudera impala?

Normally, I can do show partitions <table> in hive. But when it is a parquet table, hive does not understand it. I can go to hdfs and check the dir structure, but that is not ideal. Is there any ...

interskh

2,571

asked Aug 1, 2013 at 19:38

5 votes

2 answers

2k views

Hive/Impala performance with string partition key vs Integer partition key

Are numeric columns recommended for partition keys? Will there be any performance difference when we do a select query on numeric column partitions vs string column partitions?

rohit pothuri

53

asked Aug 29, 2018 at 16:24

5 votes

3 answers

4k views

Load large csv in hadoop via Hue would only store a 64MB block

Im using the Cloudera quickstart vm 5.1.0-1 Im trying to load my 3GB csv in Hadoop via Hue and what I tried so far is: - Load the csv into the HDFS and specifically into a folder called datasets ...

bobo32

992

asked Oct 16, 2014 at 21:46

5 votes

2 answers

2k views

Impala on Hadoop 2.2.0 without CDH?

I want to test and configure Impala with my Hadoop 2.2.0 distribution, not Cloudera ones. I want to know if its possible to use Impala without CDH, because I only read that Impala is CDH dependent. ...

BAndrade

107

asked Dec 24, 2013 at 12:00

Collectives™ on Stack Overflow

Questions tagged [impala]

Related Tags