Fires of Heaven Guild Message Board  

Go Back   Fires of Heaven Guild Message Board > General forums > Development
User Name
Password
ForumSpy Register FAQ Members List Calendar Search Today's Posts Mark Forums Read

Reply
 
LinkBack Thread Tools Search this Thread Rate Thread Display Modes
Old 10-20-2007, 01:32 PM   #1 (permalink)
PigBenis
duh
 
PigBenis's Avatar
 
Join Date: May 2002
Location: Boiler Up
Posts: 601
-24 Internets
Send a message via AIM to PigBenis
Compare 2 mp3's to see if they are the same

Hey guys, I'm about to start working on a project here soon. It's going to involve a lot of mp3 files from a lot of different people. What I want to do is cut down on duplicate mp3's as much as possible. Obviously, file name is an easy way to get rid of dupes. I've found MD5 will work to get those tracks that may have different file names, but are the same exact file. However, after changing the ID3 tag MD5 doesn't work anymore.

I have a feeling MD5 is probably gonna be it, but can anyone think of more ways to possible cut down on dupe's even more, even mp3s with different sizes? I was thinking about like comparing song duration, with some of the file name, with some of the id3 stuff, blah blah blah.
PigBenis is offline   Reply With Quote
Old 10-20-2007, 09:29 PM   #2 (permalink)
niteflyx
Fuckin' 07er
 
niteflyx's Avatar
 
Join Date: May 2007
Location: Philly
Posts: 337
+0 Internets
Are these amongst the same 'songs' from different users, or just yourself?

So many people download music now, and use different bitrates to encode at, even with the same id3 tags and bitrates, there'll be differences between files from one user to another, if one downloaded it and one ripped from CD most likely.
__________________
niteflyx is offline   Reply With Quote
Old 10-20-2007, 09:35 PM   #3 (permalink)
Fog
Registered User
 
Join Date: Feb 2006
Posts: 1,634
+7 Internets
Even if you wrote a program to do an MD5 of just the audio data, ignoring the ID3/APE tags, you would still run into the problem of a million different people using a million different MP3 encoders at a million different bitrates. As such, the only plausible way of detecting duplicates would be some kind of acoustic fingerprinting; I don't know if there are any free software solutions for doing this (and unless you are looking for a summer project, it would probably be a Difficult Problem to write a good homegrown solution.)

EDIT: I asked a friend and he recommended this. It includes source and a good explanation of the process.

Last edited by Fog : 10-20-2007 at 09:39 PM.
Fog is offline   Reply With Quote
Old 10-21-2007, 12:02 AM   #4 (permalink)
PigBenis
duh
 
PigBenis's Avatar
 
Join Date: May 2002
Location: Boiler Up
Posts: 601
-24 Internets
Send a message via AIM to PigBenis
wow I'm gonna look into that foosic, looks really interesting.
PigBenis is offline   Reply With Quote
Old 11-08-2007, 04:17 PM   #5 (permalink)
AladainAF
Registered User
 
AladainAF's Avatar
 
Join Date: Aug 2002
Location: Texas
Posts: 2,050
edit: nm, fog handled it.
AladainAF is offline   Reply With Quote
Old 11-08-2007, 05:27 PM   #6 (permalink)
Isuldor
Registered User
 
Join Date: Mar 2002
Location: Orange County, California
Posts: 34
+1 Internets
Shareaza added support for audio data only hashing 4 years ago. You could download mp3s from multiple sources regardless of any ID3 modifications. Too bad distributed/decentralized p2p is basically dead or otherwise useless.
Isuldor is offline   Reply With Quote
Old 11-08-2007, 07:19 PM   #7 (permalink)
Fog
Registered User
 
Join Date: Feb 2006
Posts: 1,634
+7 Internets
Quote:
Originally Posted by Isuldor View Post
Too bad distributed/decentralized p2p is basically dead or otherwise useless.
Yeah, because nobody uses torrents.
Fog is offline   Reply With Quote
Old 11-08-2007, 09:38 PM   #8 (permalink)
PigBenis
duh
 
PigBenis's Avatar
 
Join Date: May 2002
Location: Boiler Up
Posts: 601
-24 Internets
Send a message via AIM to PigBenis
Someone should try this foosic. Quite easy to use, but sadly it doesn't work all that well.

Code:
FingerPrint blah = new FingerPrint(); blah.readFrom("C:\\1.mp3"); FingerPrint blah2 = new FingerPrint(); blah2.readFrom("C:\\n2.mp3"); blah.displayFull(); blah2.displayFull(); System.out.println(blah.fullMatch(blah2)); System.out.println(blah.quickMatch(blah2));
Unless I'm missing something here.
PigBenis is offline   Reply With Quote
Old 11-08-2007, 11:12 PM   #9 (permalink)
Isuldor
Registered User
 
Join Date: Mar 2002
Location: Orange County, California
Posts: 34
+1 Internets
Quote:
Originally Posted by Fog View Post
Yeah, because nobody uses torrents.
Really? Because I wasn't even talking about Bittorrent. Fucking dipshit.

Yeah, decentralized/distributed searchable networks are basically dead or useless.
Isuldor is offline   Reply With Quote
Old 11-09-2007, 02:59 AM   #10 (permalink)
Fog
Registered User
 
Join Date: Feb 2006
Posts: 1,634
+7 Internets
Quote:
Originally Posted by Isuldor View Post
Yeah, decentralized/distributed searchable networks are basically dead or useless.
Right, because there aren't any good torrent search engines.
Fog is offline   Reply With Quote
Old 11-09-2007, 10:07 AM   #11 (permalink)
Isuldor
Registered User
 
Join Date: Mar 2002
Location: Orange County, California
Posts: 34
+1 Internets
Quote:
Originally Posted by Fog View Post
Right, because there aren't any good torrent search engines.
Are you really that clueless? Feel free to explain to me how a website with a database constitutes a p2p network. I'm waiting.
Isuldor is offline   Reply With Quote
Old 11-09-2007, 05:57 PM   #12 (permalink)
Ham n Cheese
You can betray me
 
Ham n Cheese's Avatar
 
Join Date: Dec 2002
Location: Houston
Posts: 8,675
+20 Internets
Send a message via AIM to Ham n Cheese Send a message via MSN to Ham n Cheese
Sorry semi related derail inc.

Does anyone know how I can find out if an mp3 is v0/v2? All I know how to do is look at the basic bitrate of something.

anyone able to tell me what this is too?
Encoder : EAC (Secure mode) / LAME 3.92
Codec : LAME 3.97
Bitrate : VBR ~251K/s 44100Hz Joint Stereo
ID3-Tag : ID3v2.3
Ham n Cheese is offline   Reply With Quote
Old 11-09-2007, 07:42 PM   #13 (permalink)
Fog
Registered User
 
Join Date: Feb 2006
Posts: 1,634
+7 Internets
Quote:
Originally Posted by Ham n Cheese View Post
Sorry semi related derail inc.[/quote

Does anyone know how I can find out if an mp3 is v0/v2? All I know how to do is look at the basic bitrate of something.
I'm not sure what switches and settings, etc, the -vX presets enable nowadays, but I don't think there is anything extremely important which they set beyond just bitrate. I know that LAME embeds further encoding data into the MP3 in some cases which can be read to get information about the LAME settings that were used; foobar2000, for example, reads this data from files. The best documentation I could find in reference to the structure of that information is this page, but at the top it gives a LAME 3.88 tag example so it might be outdated.

Quote:
anyone able to tell me what this is too?
Encoder : EAC (Secure mode) / LAME 3.92
Codec : LAME 3.97
Bitrate : VBR ~251K/s 44100Hz Joint Stereo
ID3-Tag : ID3v2.3
What what is? I don't understand.
Fog is offline   Reply With Quote
Old 11-09-2007, 07:44 PM   #14 (permalink)
Fog
Registered User
 
Join Date: Feb 2006
Posts: 1,634
+7 Internets
Quote:
Originally Posted by Isuldor View Post
Are you really that clueless? Feel free to explain to me how a website with a database constitutes a p2p network. I'm waiting.
It sounds like it does to me. I don't understand the fundamental difference between people sharing a set of files on a searchable P2P network and someone downloading portions of a file from multiple users sharing it, and people seeding a set of torrents in a searchable torrent database and someone downloading portions of a file from multiple users seeding it.
Fog is offline   Reply With Quote
Old 11-15-2007, 01:20 PM   #15 (permalink)
Luthair
Registered User
 
Join Date: May 2002
Posts: 914
-4 Internets
Quote:
Originally Posted by Fog View Post
Yeah, because nobody uses torrents.
Torrents aren't really decentralized, they (mostly) still require a tracker.
Luthair is offline   Reply With Quote
Reply


Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

vB code is On
Smilies are On
[IMG] code is On
HTML code is On
Trackbacks are On
Pingbacks are On
Refbacks are On
uberguilds network



All times are GMT -7. The time now is 02:55 AM.


Powered by vBulletin® Version 3.6.3
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
SEO by vBSEO 3.0.0 RC6