answerstu

apache spark - Confused about getting Scala to run in IntelliJ? Import errors

I am trying to run a simple scala program in IntelliJ. My build.sbt looks like this:name := "UseThis"version := "0.1"scalaVersion := "2.12.4"libraryDependencies += "org.apache.spark" %% "spark-core" % "1.2.0"And my import code looks like this:package WorkWork.FriendsByAgeimport com.sun.glass.ui.Window.Levelimport com.sun.istack.internal.logging.Loggerimport org.apache.spark._import org.apache.spark.SparkContext._import org.apache.log4j._I don't get why the import fails. It tells me the dependency failed to load or wasn't found, but I put the li...Read more

opengl - LWJGL and Scala, drawing an object

I am working on a project that I am writing in Scala and I am using the LWJGL and I am having some issues making an object appear onscreen. I have used LWJGL with java in the past so I tried to write pretty much the same code but in scala.Here is my main class that initializes the windowimport java.nio.FloatBufferimport org.lwjgl.BufferUtilsimport org.lwjgl.glfw.{GLFWErrorCallback, GLFWKeyCallback}import org.lwjgl.glfw.GLFW._import org.lwjgl.opengl.GLimport org.lwjgl.opengl.GL11._import org.lwjgl.opengl.GL15._object Main {def main(args:Array[St...Read more

types - Inference of traits in scala

I'm having trouble with type inference in scala. I'm using classes with higher-kinded types, which extend some traits; but the scala compiler can't resolve the types to the traits they extend. The minimal example is shown here:trait traitA[X]trait traitB[X]class A[X] extends traitA[X] {}class B extends traitB[C] {}class C {}val a = Seq[A[B]]()val b: Seq[traitA[traitB[C]]] = aError: type mismatch; found : Seq[A[B]] required: Seq[traitA[traitB[C]]] val b: Seq[traitA[traitB[C]]] = aI can get a Seq[traitA[B]] but not a Seq[traitA[traitB[C]]]...Read more

oop - Scala: Trait initialisation code using hard-coded values

The following code:trait A{ val s: String = "A" println(s"A's initialiser run, s= $s")}object O1 extends A { override val s = "O1" println(s"Object's initialiser run, s= $s") def foo: Unit = println("I am just being called to instantiate the object! :| ")}object O2 extends AnyRef with A { override val s = "O2" println(s"Object's initialiser run, s= $s") def foo: Unit = println("I am just being called to instantiate the object! :| ")}println("////with inheritance:")O1.fooprintln("////with mix-in:")O2.fooprints:////with inheritance:A's in...Read more

enums - Extending a set of named values via inheritance in scala or java

I know a lot of questions on Enumerations and traits have been asked, however I want to start from the requirement I have to discuss the most elegant way and I realize that neither Enumeration nor trait will probably do the job..Consider first this simple example which forms my requirementtrait Sound{ def makeSound(animal: Animal.value): String}The crucial part is that I want programmers to be able to check the list of allowed animals by typing in "Animal.". However the list should be able to be extended by new implementations. For these reaso...Read more

scala - Create column with nested list aggregation in Dataframe

I have dataframe with this structure val df = Seq( ("john", "tomato", 1), ("john", "carrot", 4), ("bill", "apple", 1), ("john", "tomato", 2), ("bill", "taco", 2) ).toDF("name", "food", "price")I need to make nested lists aggregating, like this name | acc |-----+---------------------------+john |[(tomato, 3), (carrot, 4)] |bill |[(apple, 1), (taco,2 )] |I tried this way, but it is not right. dff.groupBy($"name") .agg(collect_list(struct($"food", $"price")).as("foods")) .show(false)+----+---------------------...Read more

scala - Create new binary column based off of join in spark

My situation is I have two spark data frames, dfPopulation and dfSubpopulation.dfSubpopulation is just that, a subpopulation of dfPopulation.I would like a clean way to create a new column in dfPopulation that is binary of whether the dfSubpopulation key was in the dfPopulation key. E.g. what I want is to create the new DataFrame dfPopulationNew:dfPopulation = X Y key 1 2 A 2 2 A 3 2 B 4 2 C 5 3 CdfSubpopulation = X Y key 1 2 A ...Read more

scala - parallelism in spark sql when list of tables are need to be processed

Using Spark 1.6.0 , cdh 5.7.0 I have a csv file which has list of tables to be processed and I want to achieve parallelism in processing. As of now I am using collect to process each , tried using future option in scala and even tried this https://blog.knoldus.com/2015/10/21/demystifying-asynchronous-actions-in-spark/ val allTables = sc.textFile("hdfs://.......") allTables.collect().foreach( table => { val processing = sqlContext.sql(s"select * from ${table} ") processing.saveAsParquetFile("hdfs://.......") } ...Read more

scala - Spark Jdbc connection JDBCOptions

I am trying to use the method savetable in the Spark JdbcUtils https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala The defination of the method is the following. It accepts the JDBCOptions as one of the parameter. def saveTable( df: DataFrame, tableSchema: Option[StructType], isCaseSensitive: Boolean, options: JDBCOptions)Following is the class of the JDBCOptionshttps://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/...Read more

scala - Spark 2 iterating over a partition to create a new partition

I have been scratching my head trying to come up with a way to reduce a dataframe in spark to a frame which records gaps in the dataframe, preferably without completely killing parallelism. Here is a much-simplified example (It's a bit lengthy because I wanted it to be able to run):import org.apache.spark.sql.SparkSessioncase class Record(typ: String, start: Int, end: Int);object Sample { def main(argv: Array[String]): Unit = { val sparkSession = SparkSession.builder() .master("local") .getOrCreate(); val...Read more

scala - Spark Best way groupByKey, orderBy and filter

I have 50GB of data with this schema [ID, timestamp, countryId] and I would like to get each "change" of each person in all of their events ordered by timestamp using spark 2.2.1. I mean if I have this events:1,20180101,21,20180102,31,20180105,32,20180105,31,20180108,41,20180109,32,20180108,32,20180109,6I would like to obtain this:1,20180101,21,20180102,31,20180108,41,20180109,32,20180105,32,20180109,6For this I have developed this code:val eventsOrdened = eventsDataFrame.orderBy("ID", "timestamp")val grouped = eventsOrdened .rdd.map(x => ...Read more

intellij idea - Play 2.3 Scala Anorm import: "not found: object anorm"

So I've been working on a web app the last few days and got around to starting the database side of things. The problem I'm getting is: not found: object anormfor the line import anorm._I have "com.typesafe.play" %% "anorm" % "2.3.6" and "anorm"in my libraryDependencies in the build.sbt.I have done an "activator clean", "activator compile" and "activator run" along with resynchronizing the IntelliJ IDEA 14.1 project.Using:Play 2.3Scala 2.11.1Thanks for any help...Read more

scala - value wholeTextFiles is not a member of org.apache.spark.SparkContext

I have a Scala code like below :-import org.apache.spark.SparkContextimport org.apache.spark.SparkContext._import org.apache.spark.SparkConfimport org.apache.spark._object RecipeIO {val sc = new SparkContext(new SparkConf().setAppName("Recipe_Extraction")) def read(INPUT_PATH: String): org.apache.spark.rdd.RDD[(String)]= { val data = sc.wholeTextFiles("INPUT_PATH") val files = data.map { case (filename, content) => filename} (files) }}When I compile this code using sbt it gives me the error : value wholeTextFiles is not a member of org.apa...Read more