Questions tagged [impala]

Apache Impala is the open source, native analytic database for Apache Hadoop. Impala is shipped by Cloudera, MapR, Oracle, and Amazon.

Filter by
Sorted by
Tagged with
58 votes
5 answers
37k views

How does impala provide faster query response compared to hive

I have recently started looking into querying large sets of CSV data lying on HDFS using Hive and Impala. As I was expecting, I get better response time with Impala compared to Hive for the queries I ...
techuser soma's user avatar
43 votes
2 answers
35k views

Fast Hadoop Analytics (Cloudera Impala vs Spark/Shark vs Apache Drill)

I want to do some "near real-time" data analysis (OLAP-like) on the data in a HDFS. My research showed that the three mentioned frameworks report significant performance gains compared to Apache Hive. ...
user2306380's user avatar
27 votes
4 answers
82k views

How to copy all hive table from one Database to other Database

I have default db in hive table which contains 80 tables . I have created one more database and I want to copy all the tables from default DB to new Databases. Is there any way I can copy from One DB ...
Aman's user avatar
  • 3,251
21 votes
3 answers
24k views

Impala can't access all hive table

I try to query hbase data through hive (I'm using cloudera). I did a fiew hive external table pointing to hbase but the thing is Cloudera's Impala doesn't have an access to all those tables. All hive ...
Nosk's user avatar
  • 753
16 votes
3 answers
43k views

Difference between invalidate metadata and refresh commands in Impala?

I saw at this link which affects Impala version 1.1: Since Impala 1.1, REFRESH statement only works for existing tables. For new tables you need to issue "INVALIDATE METADATA" statement. Does this ...
covfefe's user avatar
  • 2,575
14 votes
1 answer
22k views

How to calculate seconds between two timestamps in Impala?

I do not see an Impala function to subtract two datestamps and return seconds (or minutes) between the two. http://www.cloudera.com/documentation/archive/impala/2-x/2-0-x/topics/...
ADJ's user avatar
  • 5,082
14 votes
2 answers
2k views

how to efficiently move data from Kafka to an Impala table?

Here are the steps to the current process: Flafka writes logs to a 'landing zone' on HDFS. A job, scheduled by Oozie, copies complete files from the landing zone to a staging area. The staging data ...
Alex Woolford's user avatar
12 votes
1 answer
2k views

How to efficiently update Impala tables whose files are modified very frequently

We have a Hadoop-based solution (CDH 5.15) where we are getting new files in HDFS in some directories. On top os those directories we have 4-5 Impala (2.1) tables. The process writing those files in ...
Victor's user avatar
  • 2,490
12 votes
7 answers
20k views

RODBC ERROR: Could not SQLExecDirect in mysql

I have been trying to write an R script to query Impala database. Here is the query to the database: select columnA, max(columnB) from databaseA.tableA where columnC in (select distinct(columnC) from ...
Gowtham Ganesh's user avatar
11 votes
3 answers
73k views

Convert YYYYMMDD String to Date in Impala

I'm using SQL in Impala to write this query. I'm trying to convert a date string, stored in YYYYMMDD format, into a date format for the purposes of running a query like this: SELECT datadate, ...
nxl4's user avatar
  • 724
11 votes
3 answers
13k views

How does computing table stats in hive or impala speed up queries in Spark SQL?

For increasing performance (e.g. for joins) it is recommended to compute table statics first. In Hive I can do:: analyze table <table name> compute statistics; In Impala: compute stats <...
Raphael Roth's user avatar
  • 27.2k
11 votes
2 answers
15k views

Write pandas table to impala

Using the impyla module, I've downloaded the results of an impala query into a pandas dataframe, done analysis, and would now like to write the results back to a table on impala, or at least to an ...
SummerEla's user avatar
  • 1,932
11 votes
2 answers
1k views

Big data signal analysis: better way to store and query signal data

I am about doing some signal analysis with Hadoop/Spark and I need help on how to structure the whole process. Signals are now stored in a database, that we will read with Sqoop and will be ...
Ameba Spugnosa's user avatar
11 votes
1 answer
1k views

Can ETL informatica Big Data edition (not the cloud version) connect to Cloudera Impala?

We are trying do a proof of concept on Informatica Big Data edition (not the cloud version) and I have seen that we might be able to use HDFS, Hive as source and target. But my question is does ...
sun_dare's user avatar
  • 1,156
9 votes
1 answer
10k views

Impala command to know DB table size

Is there any way that we can check the DB table size and other properties ? I tried COMPUTE STATS but it gives the details of table except the size. any link to find information and other details are ...
Shantesh's user avatar
  • 1,520
9 votes
2 answers
12k views

Create table from CSV with values containing commas enclosed in quotes

I'm trying to create a table in Impala from a CSV that I've uploaded into an HDFS directory. The CSV contains values with commas enclosed inside quotes. Example: 1.66.96.0/19,"NTT Docomo,INC.","...
nxl4's user avatar
  • 724
9 votes
2 answers
458 views

Is there a way to turn off DESCRIBE in R dplyr sql

I'm using R shiny and dplyr to connect to a database and query the data in Impala. I do the following. con <- dbPool(odbc(), Driver = [DIVER], Host = [HOST], Schema = [SCHEMA], Port = [PORT], UID =...
bink1time's user avatar
  • 141
8 votes
2 answers
10k views

How do I set a variable in an Impala query using HUE?

I need to add parameters in several locations in a long query. I want to use parameters because I need to run the query multiple times with different values substituted in. This is very cumbersome ...
OTM's user avatar
  • 188
8 votes
3 answers
41k views

Get sequential number of a row (rank) within a partition without using ROW_NUMBER() OVER function

I need to rank rows by partition (or group), i.e. if my source table is: NAME PRICE ---- ----- AAA 1.59 AAA 2.00 AAA 0.75 BBB 3.48 BBB 2.19 BBB 0.99 BBB 2.50 I would like to get target table: ...
Andrey Dmitriev's user avatar
8 votes
1 answer
342 views

Running impala cluster from portable binaries

I'm evaluating multiple big data tools. One of them is of course Impala. I would like to start Impala cluster by manually starting processes on the cluster nodes. As I'm currently doing for Spark, H2O,...
jangorecki's user avatar
  • 16.5k
7 votes
2 answers
11k views

How to duplicate cloudera impala table with impala-shell or other means?

I see a table "test" in Impala when I do show tables; I want to make a copy of the "test" table so that it is an exact duplicate, but named "test_copy". Is there a impala query I can execute to do ...
Rolando's user avatar
  • 60.7k
7 votes
1 answer
35k views

Dropping multiple partitions in Impala/Hive

1- I'm trying to delete multiple partitions at once, but struggling to do it with either Impala or Hive. I tried the following query, with and without ': ALTER TABLE cz_prd_corrti_st....
k_mishap's user avatar
  • 451
7 votes
1 answer
23k views

Difference in days between two dates in Impala

I am trying to find a date difference In Impala. I have tried a few options. my most recent is below ABS(dayofyear(CAST(firstdate AS TIMESTAMP)-dayofyear(CAST(seconddate AS TIMESTAMP) an example of ...
burnsa9's user avatar
  • 131
7 votes
4 answers
48k views

ROW_NUMBER( ) OVER in impala

I have a use case where I need to use ROW_NUMBER() over PARTITION: Something like: SELECT Column1 , Column 2 ROW_NUMBER() OVER ( PARTITION BY ACCOUNT_NUM ORDER BY FREQ, MAN, MODEL) as ...
user1189851's user avatar
  • 4,981
7 votes
5 answers
10k views

Will Spark SQL completely replace Apache Impala or Apache Hive? [closed]

I need to deploy Big Data Cluster on our servers. But I just know about knowledge of Apache Spark. Now I need to know whether Spark SQL can completely replace Apache Impala or Apache Hive. I need ...
Tim Koo's user avatar
  • 109
7 votes
2 answers
28k views

extract the date from a timestamp value variable in Impala

How can I extract the date from a timestamp value variable in Impala? eg time = 2018-04-11 16:05:19 should be 2018-04-11
Anna 's user avatar
  • 444
7 votes
2 answers
14k views

Uploading CSV for Impala

I am trying to upload the csv file on HDFS for Impala and failing many time. Not sure what is wrong here as I have followed the guide. And the csv is also on HDFS. CREATE EXTERNAL TABLE gc_imp ...
LonelySoul's user avatar
  • 1,212
7 votes
2 answers
41k views

Impala: Show tables like query

I am working with Impala and fetching the list of tables from the database with some pattern like below. Assume i have a Database bank, and tables under this database are like below. cust_profile ...
Manindar's user avatar
  • 999
7 votes
3 answers
5k views

How to find the COMPRESSION_CODEC used on a Parquet file at the time of its generation?

Usually in Impala, we use the COMPRESSION_CODEC before inserting data into a table for which the underlying files are in Parquet format. Commands used to set COMPRESSION_CODEC: set ...
Gomz's user avatar
  • 850
7 votes
1 answer
4k views

Impala cannot find com.mysql.jdbc.Driver

I'm trying to set up Cloudera Impala with CDH4 in pseudo distributed mode on Red Hat 5. I have Hive using JDBC to connect to a MySQL metastore, but I'm having trouble setting up Impala with JDBC. I've ...
supermaria's user avatar
6 votes
2 answers
8k views

Installing cloudera impala without cloudera manager

Kindly provide the link for installing the imapala in ubuntu without cloudera manager. Couldn't able to install with official link. Unable to locate package impala using these queries : sudo apt-...
Naresh's user avatar
  • 5,245
6 votes
1 answer
13k views

Calling JDBC to impala/hive from within a spark job and creating a table

I am trying to write a spark job in scala that would open a jdbc connection with Impala and let me create a table and perform other operations. How do I do this? Any example would be of great ...
user1189851's user avatar
  • 4,981
6 votes
1 answer
11k views

Impala - convert existing table to parquet format

I have a table that has partitions and I use avro files or text files to create and insert into a table. Once the table is done, is there a way to convert into parquet. I mean I know we could have ...
user1189851's user avatar
  • 4,981
6 votes
3 answers
14k views

Save Impala Shell query results in CSV

How can I save my query results in a CSV file via the Impala Shell. My Code: impala-shell -q "use test; select * from teams; -- From this point I need to save the query results to /Desktop (for ...
user avatar
6 votes
4 answers
13k views

Comma delimited string to individual rows - Impala SQL

Let's suppose we have a table: Owner | Pets ------------------------------ Jack | "dog, cat, crocodile" Mary | "bear, pig" I want to get as a result: Owner | Pets ------------------------...
ifotopoulos's user avatar
6 votes
2 answers
5k views

Performance of Apache Drill

Are there any performance benchmark(genuine ones) that compare Stinger vs Impala vs Drill? Also, which is preferred - my use case will be mainly towards ad-hoc interactive queries on top of Hive. ...
Sai's user avatar
  • 127
6 votes
1 answer
21k views

Jdbc settings for connecting to Impala

What is the combination of driver and jdbc URL to use for CDH5 (I am on CDH5.3)? I have tried a few including: jdbc:hive2://myserver:21050/;auth=noSasl And with the following driver: org.apache....
WestCoastProjects's user avatar
6 votes
3 answers
8k views

Custom SerDe not supported by Impala, what's the best way to query files in CSV w/double quotes?

I have a CSV data with each field surronded with double quotes. When I created Hive table used serde 'com.bizo.hive.serde.csv.CSVSerde' When above table is queried in Impala I am getting error SerDe ...
prasannads's user avatar
6 votes
2 answers
469 views

Immediate evaluation of CTE

I am trying to optimize a very long and complex impala query which contains multiple CTE. Each CTE is used multiple times. My expectation is that once a CTE is created, I should be able to direct ...
AYK's user avatar
  • 3,312
6 votes
1 answer
11k views

Impala/Hive to get list of tables along with its size

I have used a query in Oracle DB to produce the list of tables in a database along with its owner and respective table size. Here is the sample query i have shared. select owner, table_name, round((...
Manindar's user avatar
  • 999
6 votes
2 answers
34k views

How to set configuration in Hive-Site.xml file for hive metastore connection?

I want to connect MetaStore using the java code. I have no idea how to set configuration setting in Hive-Site.xml file and where I'll post the Hive-Site.xml file. Please help. import java.sql....
mohit sharma's user avatar
6 votes
1 answer
11k views

Implement CREATE AS SELECT in Impala

Pls help me on how to implement CREATE TABLE AS SELECT For simple create table t1 as select * from t2; I can implement as Create table t1 like t2; insert into t1 as select * from t2; But how to ...
on_the_shores_of_linux_sea's user avatar
6 votes
0 answers
1k views

How to use Impala to read Hive view containing complex types?

I have some data that is processed and model based on case classes, and the classes can also have other case classes in them, so the final table has complex data, struct, array. Using the case class I ...
Shikkou's user avatar
  • 565
5 votes
4 answers
6k views

Presto vs Impala: architecture, performance, functionality

Could you highligh major differences between the two in architecture & functionality in 2019? And how that differences affect performance? For some reason this excellent question was tagged as ...
VB_'s user avatar
  • 45.3k
5 votes
3 answers
12k views

Invalidate metadata/refresh imapala from spark code

I'm working on a NRT solution that requires me to frequently update the metadata on an Impala table. Currently this invalidation is done after my spark code has run. I would like to speed things up ...
Havnar's user avatar
  • 2,578
5 votes
3 answers
8k views

Cloudera Impala INVALIDATE METADATA

As has been discussed in impala tutorials, Impala uses a Metastore shared by Hive. but has been mentioned that if you create or do some editions on tables using hive, you should execute INVALIDATE ...
masoumeh's user avatar
  • 478
5 votes
2 answers
12k views

Is there a way to show partitions on Cloudera impala?

Normally, I can do show partitions <table> in hive. But when it is a parquet table, hive does not understand it. I can go to hdfs and check the dir structure, but that is not ideal. Is there any ...
interskh's user avatar
  • 2,571
5 votes
2 answers
2k views

Hive/Impala performance with string partition key vs Integer partition key

Are numeric columns recommended for partition keys? Will there be any performance difference when we do a select query on numeric column partitions vs string column partitions?
rohit pothuri's user avatar
5 votes
3 answers
4k views

Load large csv in hadoop via Hue would only store a 64MB block

Im using the Cloudera quickstart vm 5.1.0-1 Im trying to load my 3GB csv in Hadoop via Hue and what I tried so far is: - Load the csv into the HDFS and specifically into a folder called datasets ...
bobo32's user avatar
  • 992
5 votes
2 answers
2k views

Impala on Hadoop 2.2.0 without CDH?

I want to test and configure Impala with my Hadoop 2.2.0 distribution, not Cloudera ones. I want to know if its possible to use Impala without CDH, because I only read that Impala is CDH dependent. ...
BAndrade's user avatar
  • 107

1
2 3 4 5
42