Get Raw Text From Html
Im on quite a basic level of android development. I would like to get text from a page such as 'http://www.google.com'. (The page i will be using will only have text, so no picture
Solution 1:
From the sample code you gave you are not even reading the response from the request. I would get the html with the following code
URLu=newURL("http://www.google.com");
URLConnectionconn= u.openConnection();
BufferedReaderin=newBufferedReader(
newInputStreamReader(
conn.getInputStream()));
StringBufferbuffer=newStringBuffer();
String inputLine;
while ((inputLine = in.readLine()) != null)
buffer.append(inputLine);
in.close();
System.out.println(buffer.toString());
From there you would need to pass the string into some kind of html parser if you want only the text. From what I've heard JTidy would is a good library for this however I have never used any Java html parsing libraries.
Solution 2:
You want to extract text from HTML file? You can make use of specialized tool such as the Jericho HTML parser library. I'm not sure if it can be used directly in Android app, it is quite big, but it is open source so you can make use of its code and take only what you need for your task.
Solution 3:
Here is one way:
public String scrape(String urlString)throws Exception {
URLurl=newURL(urlString);
URLConnectionconnection= url.openConnection();
BufferedReaderreader=newBufferedReader(newInputStreamReader(
connection.getInputStream()));
Stringline=null, data = "";
while ((line = reader.readLine()) != null) {
data += line + "\n";
}
return data;
}
Post a Comment for "Get Raw Text From Html"