Fires of Heaven Guild Message Board  

Go Back   Fires of Heaven Guild Message Board > General forums > Development
User Name
Password
ForumSpy Register FAQ Members List Calendar Search Today's Posts Mark Forums Read

Reply
 
LinkBack Thread Tools Search this Thread Rate Thread Display Modes
Old 12-15-2007, 10:59 AM   #1 (permalink)
Big W Powah!
You pussies can -interwebs better than that.
 
Join Date: Jan 2003
Location: Earth
Posts: 2,080
-54 Internets
Major Undertaking--Need A LOT OF Help

I've been volunteered to develop a VB/Whatever Else is Needed application to pull data from NFL.Com (game stats, etc), parse it to a VB form, and send it to a database on a local machine to be used with our Database Development class.


If anybody is willing to help (this is for no credit), I'd be greatly appreciated. Right now we're in stages of having to pull from the website (without viewing source) to get it to us. This is difficult, as you can imagine.

We need something working roughly by next week, unfortunantly.

If anybody is able and willing, please post ideas/suggestions/code snippets here. Thank you.
__________________
WTB - INTARWEBS
Some clarification on my previous signature:
I WANT NEGATIVE INTARWEBS YOU FUCKHEADS
Big W Powah! is offline   Reply With Quote
Old 12-15-2007, 11:25 AM   #2 (permalink)
Hachima
Registered User
 
Join Date: Oct 2004
Posts: 1,695
If you just have to pull the data once, all you have to do is copy the table from their site with the headers, paste it into excel. Then arrange the columns to match the columns in your table in MSSQL, if thats what you are using and then paste. You could have all the data populated in 30 min that way.
Hachima is offline   Reply With Quote
Old 12-15-2007, 02:37 PM   #3 (permalink)
Big W Powah!
You pussies can -interwebs better than that.
 
Join Date: Jan 2003
Location: Earth
Posts: 2,080
-54 Internets
Quote:
Originally Posted by Hachima View Post
If you just have to pull the data once, all you have to do is copy the table from their site with the headers, paste it into excel. Then arrange the columns to match the columns in your table in MSSQL, if thats what you are using and then paste. You could have all the data populated in 30 min that way.
We're using Access, actually.

And my bad on origional post, getting it from ESPN I think. I'll have to check 100%.

We need to pull data, yes, but its more than just tables--we need basically everything ahving to do with the game (game temp, umpire, etc..)

I'll look into that option though.

Just a side note: I know aboslute dick about sports, we were just given print outs of games (which I don't have atm) to do data entry with, to build data for the rest of the class. The teacher said fuck this, and assigned a few people to automate it. I was one of them--I just need to figure this out.

Thanks so much for your help, sir.
__________________
WTB - INTARWEBS
Some clarification on my previous signature:
I WANT NEGATIVE INTARWEBS YOU FUCKHEADS

Last edited by Big W Powah! : 12-15-2007 at 02:42 PM.
Big W Powah! is offline   Reply With Quote
Old 12-15-2007, 03:14 PM   #4 (permalink)
Phelps McManus
I'm dangerous!
 
Join Date: Jan 2002
Location: Atlanta
Posts: 853
-4 Internets
I hope you can use .NET. I haven't tried what you want to do; I am only providing possible starting points of what sounds like an arduous journey.

Do a search on MSDN for "HTML Parsing by ASP.NET XML Web Services" as well as the "HttpGetClientProtocol" and "MatchAttribute" classes (found in the System.Web.Services.Protocols namespace).

If those don't pan out, go up to System.Web namespace and look into HttpRequest.

You can also do Google searches for people who actually used these classes.

Good luck.
Phelps McManus is offline   Reply With Quote
Old 12-15-2007, 04:56 PM   #5 (permalink)
Hachima
Registered User
 
Join Date: Oct 2004
Posts: 1,695
Well it sounds like you will have to manually parse the data out unless there are XML versions of the data already parsed out there on the net, which there often is. I would try to find that first.
Hachima is offline   Reply With Quote
Old 12-15-2007, 08:43 PM   #6 (permalink)
Tannybaker
Banned
 
Join Date: Feb 2005
Posts: 28
+0 Internets
Your idea is interesting but it will never work
Tannybaker is offline   Reply With Quote
Old 12-16-2007, 11:14 AM   #7 (permalink)
Phelps McManus
I'm dangerous!
 
Join Date: Jan 2002
Location: Atlanta
Posts: 853
-4 Internets
Quote:
Originally Posted by Tannybaker View Post
Your idea is interesting but it will never work
Why wouldn't it work? Looking at the source for NFL scores on ESPN.com, you can run right down to the 'div class="teams"' tag and rip out the team names. Go down 5 lines to 'div class="tscore"' tag and rip off the scores. You can even look for a winner arrow to determine if the game is over.

If you want, you could probably even use XML to record the structure of their divs and parse all kinds of shit.

Be careful how often you run this. If ESPN thinks you are a web crawler bot (which you are), they could ban your IP. I recommend only running manually after games or on a schedule (like Sunday night and Tuesday morning)"
Phelps McManus is offline   Reply With Quote
Old 12-16-2007, 11:19 AM   #8 (permalink)
Big W Powah!
You pussies can -interwebs better than that.
 
Join Date: Jan 2003
Location: Earth
Posts: 2,080
-54 Internets
Quote:
Originally Posted by Phelps McManus View Post
Why wouldn't it work? Looking at the source for NFL scores on ESPN.com, you can run right down to the 'div class="teams"' tag and rip out the team names. Go down 5 lines to 'div class="tscore"' tag and rip off the scores. You can even look for a winner arrow to determine if the game is over.

If you want, you could probably even use XML to record the structure of their divs and parse all kinds of shit.

Be careful how often you run this. If ESPN thinks you are a web crawler bot (which you are), they could ban your IP. I recommend only running manually after games or on a schedule (like Sunday night and Tuesday morning)"
Good point. We'll just have to hope that doesn't happen, heh. Right now we're looking at archived scores from 2005, and 2006. 2007 will come if and when those first two seasons are done.

Thanks for the help guys. and yes, we have access to full versions of .NET 2003. 6 days left to get a semi-working prototype put together. wish me luck.
__________________
WTB - INTARWEBS
Some clarification on my previous signature:
I WANT NEGATIVE INTARWEBS YOU FUCKHEADS

Last edited by Big W Powah! : 12-16-2007 at 11:29 AM.
Big W Powah! is offline   Reply With Quote
Old 12-17-2007, 09:24 AM   #9 (permalink)
Zippygoose
Math Enthusiast/Badass MC
 
Zippygoose's Avatar
 
Join Date: Jun 2002
Location: Seattle
Posts: 614
+0 Internets
Send a message via AIM to Zippygoose
While I also know nothing about sports, I'd first try to find a web service out there that you can hook into that pulls this information for you before you go down the rabbit hole of scraping a site.
Zippygoose is online now   Reply With Quote
Old 12-17-2007, 11:08 AM   #10 (permalink)
Niceshot23
Registered User
 
Join Date: Oct 2004
Posts: 107
+0 Internets
seeing that this is a development assignment ... im not sure if your teacher would cut you slack on grabbing that data from somewhere other then parsing it yourself .... anyway, the code below is just an eagles eye point of view of how easy the project is ..... the only thing thats difficult about this is that its tideous ... not that its difficult ... oh, and unless im missing something, i dont see a place where the umpire or weather related data is stored on espn so thats not a possibility ... you can only get what you can....



'historical data
private sub StartParse()

dim intYear as integer
dim intWeek as integer
dim iYear as integer
dim iWeek as integer

intYear = 2005
intWeek = 1

for iYear = intYear to 2006
for iWeek = intWeek to 17
call GetGames("http://scores.espn.go.com/nfl/scoreboard?weekNumber=" & intWeek & "&seasonYear=" & intYear & "&seasonType=2"
next iWeek
next iYear

end sub

private sub GetGames(strPage as string)

dim strGameID as string
dim strGameURL as string

dim intStart as long
dim intEnd as long

strGameURL = "http://scores.espn.go.com/nfl/boxscore?gameId="

'on this page, parse for every "http://scores.espn.go.com/nfl/boxscore?gameId=" get the id and put it into strGame ID inside a Loop
strGameID = 250911002

call ParseGame(strGameURL & strgameid)

end sub

Private sub ParseGame(strGameURL as string)

'parse the game .... the webbrowser control gives you the html to the whole page ... thats what you need to parse ... everything on this page follows patterns so parsing it is not hard


end sub

Last edited by Niceshot23 : 12-17-2007 at 11:13 AM.
Niceshot23 is offline   Reply With Quote
Old 12-22-2007, 06:54 AM   #11 (permalink)
Big W Powah!
You pussies can -interwebs better than that.
 
Join Date: Jan 2003
Location: Earth
Posts: 2,080
-54 Internets
Quote:
Originally Posted by Niceshot23 View Post
seeing that this is a development assignment ... im not sure if your teacher would cut you slack on grabbing that data from somewhere other then parsing it yourself .... anyway, the code below is just an eagles eye point of view of how easy the project is ..... the only thing thats difficult about this is that its tideous ... not that its difficult ... oh, and unless im missing something, i dont see a place where the umpire or weather related data is stored on espn so thats not a possibility ... you can only get what you can....



'historical data
private sub StartParse()

dim intYear as integer
dim intWeek as integer
dim iYear as integer
dim iWeek as integer

intYear = 2005
intWeek = 1

for iYear = intYear to 2006
for iWeek = intWeek to 17
call GetGames("http://scores.espn.go.com/nfl/scoreboard?weekNumber=" & intWeek & "&seasonYear=" & intYear & "&seasonType=2"
next iWeek
next iYear

end sub

private sub GetGames(strPage as string)

dim strGameID as string
dim strGameURL as string

dim intStart as long
dim intEnd as long

strGameURL = "http://scores.espn.go.com/nfl/boxscore?gameId="

'on this page, parse for every "http://scores.espn.go.com/nfl/boxscore?gameId=" get the id and put it into strGame ID inside a Loop
strGameID = 250911002

call ParseGame(strGameURL & strgameid)

end sub

Private sub ParseGame(strGameURL as string)

'parse the game .... the webbrowser control gives you the html to the whole page ... thats what you need to parse ... everything on this page follows patterns so parsing it is not hard


end sub
Sorry about the lack of updates; thanks guys, this has been helpful as hell. Been working like a busy bee, and by around noon, thanks to you guys, I should have a wokring prototype.

Also: This is a development project for a non-development class--We just need an ability to automate 3 seasons worth of football being entered into an access database, which will then be migrated to a MSSQL server (yes, we HAVE to migrate it, part of the class), which will then be used to be developed into a school-wide fantasy football league (or 3 or 4). This will be developed partially by me, and partially by the rest of the extra-cirricular development crew at our school.


Honestly, I'm doing all this to be able to add "project resolution and management" or some shit like that to my resume. (this is an entirely non-credited assignment. But it needs to get done)
__________________
WTB - INTARWEBS
Some clarification on my previous signature:
I WANT NEGATIVE INTARWEBS YOU FUCKHEADS
Big W Powah! is offline   Reply With Quote
Old 12-25-2007, 03:05 PM   #12 (permalink)
Sabolin
Registered User
 
Join Date: May 2003
Posts: 264
+5 Internets
If you haven't already figured out a way to pull down the HTML page to parse, I'd just like to recommend cURL as a great command-line tool for downloading web pages to parse. I've used it many times in conjunction w/ VB6 for parsing meteorological data out of national weather service forecast pages automatically.

edit- Ah I see "working prototype" in the post above mine. Oh well, better late than never
Sabolin is offline   Reply With Quote
Old 12-26-2007, 08:46 AM   #13 (permalink)
Niceshot23
Registered User
 
Join Date: Oct 2004
Posts: 107
+0 Internets
Quote:
Originally Posted by Sabolin View Post
If you haven't already figured out a way to pull down the HTML page to parse, I'd just like to recommend cURL as a great command-line tool for downloading web pages to parse. I've used it many times in conjunction w/ VB6 for parsing meteorological data out of national weather service forecast pages automatically.

edit- Ah I see "working prototype" in the post above mine. Oh well, better late than never
It's not necessary to bring in a 3rd party control when its available to any PC's with internet explorer .... the control i think is called Microsoft Internet Web Browser Control ... or something like that ... I'm sure they found it already or else they'd be way behind.... the command to go to the URL is "webbrowsercontrol1.navigate [expression]" and to get the html its "strHTML = webbrowsercontrol1.html" after you navigate.... its an easy project to undertake ... should only take a few hours to have a stable prototype
Niceshot23 is offline   Reply With Quote
Old 02-10-2008, 05:25 PM   #14 (permalink)
Big W Powah!
You pussies can -interwebs better than that.
 
Join Date: Jan 2003
Location: Earth
Posts: 2,080
-54 Internets
Quote:
Originally Posted by Niceshot23 View Post
It's not necessary to bring in a 3rd party control when its available to any PC's with internet explorer .... the control i think is called Microsoft Internet Web Browser Control ... or something like that ... I'm sure they found it already or else they'd be way behind.... the command to go to the URL is "webbrowsercontrol1.navigate [expression]" and to get the html its "strHTML = webbrowsercontrol1.html" after you navigate.... its an easy project to undertake ... should only take a few hours to have a stable prototype
The project ended up falling apart due to lack of assistance/the change to trying to read .pdf files/the lack of consistancy in the files once transferred to .txt or .xls
__________________
WTB - INTARWEBS
Some clarification on my previous signature:
I WANT NEGATIVE INTARWEBS YOU FUCKHEADS
Big W Powah! is offline   Reply With Quote
Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is On
Trackbacks are On
Pingbacks are On
Refbacks are On
uberguilds network



All times are GMT -7. The time now is 07:46 AM.


Powered by vBulletin® Version 3.6.3
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.0.0 RC6