|
|
Or, use your gamerDNA username: (more...)
| ||||||
| |
![]() |
| | LinkBack | Thread Tools | Search this Thread | Rate Thread | Display Modes |
| | #1 (permalink) |
| hildog Join Date: Aug 2002
Posts: 860
| database question.. Right, I have a large database with a lot of repeating data. The repeating data will be sorted but the database will still be extremely large - at the moment it's about 1.8m rows, but I guess that isn't that large. one of the fields is called 'key' and is a repeating field that coudl be one of say.. 50,000 things. 'company' is a field that states the name of a company. I run the following sql query: SELECT distinct key from adata where company like "%MBNA%" It then runs through the database of 1.8m entries and finds how many individual keys that the company MBNA is uniquely associated with. The problem is this is for a website back end, where the response has to be quick. Currently this takes about 3 minutes to complete the query.. I assume it has to scan the entire 1.8m row database. Is there anyway to speed this up? The MySQL database does absolutely TONS of hard disk read/writes during the query. Ugh it's so annoying, I need to to be so much faster. Maybe if the whole database is in ram? The database is a dedicated windows XP machine w/1 gb ram and a 2.8ghz processor. Would it be significantly faster if I installed CentOS? Please help ![]() |
| | |
| | #2 (permalink) |
| Registered User Join Date: Dec 2004
Posts: 65
| Create a table named `companies` and a column in `adata` named `company_id` that references the companies in `adata` as a foreign key. Then make the comparison on `companies` (which will have less records than `adata` and therefore fasten up the lookup). SELECT DISTINCT `adata`.`key` FROM `adata` JOIN `companies` ON `adata`.`company_id` = `companies`.`id` WHERE `companies`.`name` LIKE '%MBNA%' |
| | |
| | #3 (permalink) |
| hildog Join Date: Aug 2002
Posts: 860
| Problem is, company's will be added in constantly. About 10 rows are being added to the adata table each second (every second of the day.. everyday etc) Adata table is constantly getting bigger and bigger. Forever. New data being constantly added. The php front end is analyzing this data. Alot of it is repeating.. Sound bad? I looked at the computer while ruinning a query.. 0% cpu use.. all hard disk swapping. |
| | |
| | #4 (permalink) |
| I'm dangerous! Join Date: Jan 2002 Location: Atlanta
Posts: 891
| I am not familiar with MySQL. Is there a field for each column of the table that lets you specify whether that column is indexed? A search on a non-indexed column, would be pretty slow. Also, if that company column is indexed (it needs to be), then the database spends all that time indexing your string. Maio's solution would help a lot. You should almost always use string tables (two columns: primary key and the string) for repeating data. I don't know how arbitrarily you came up with that 50,000 number for the data. It implies that they repeat an average of 30-35 times over 1.8m rows. If this is the case then you should really turn this into a relational database and create a 3rd table called 'CompanyData'. This table would have a foreign key for the 'Company' and 'adata' tables (so two columns total). Unless a new company or adata item is created, all updates will be to this CompanyData table. Since it is just 2 integers, it should be fairly fast.
__________________ If God didn't want us to eat animals, why did He make them out of meat? |
| | |
| | #5 (permalink) |
| hildog Join Date: Aug 2002
Posts: 860
| I don't know anything about indexing (or databases really!!), but it's clear I need to entirely redesign the database. Anyone fancy talking to me about it on some kind of messenger program? :P On the 30x repeat.. true! :/ Or I could keep asking questions in this thread.. i'll reply more concisely after Charlie Wilsons War finishes ![]() Thanks for the help so far.. |
| | |
| | #6 (permalink) |
| Registered User Join Date: Feb 2006
Posts: 1,657
+5 Internets | This is a rude thing to ask, but I have to admit I'm curious as to how you got in charge of designing and administering this big corporate database of something if you don't know how to design databases. I suggest you just head out to the bookstore and get a couple good database books if you want to learn this stuff. It's not a very good idea to design large and interconnected systems like database schemas with the mindset of hacking something together and then solving the problems that pop up one at a time. |
| | |
| | #7 (permalink) |
| hildog Join Date: Aug 2002
Posts: 860
| Hehe, it's not a big corporate database.. I don't know where you got that idea. It's a personal project but it must be made right as it could turn into something more. The data comes from backend system coded in Visual Studio (something that I have a lot more experience in than this). As for the book - I bought a SQL book today but I wanted some real human input. |
| | |
| | #8 (permalink) |
| I'm dangerous! Join Date: Jan 2002 Location: Atlanta
Posts: 891
| This scenario is pretty common. I am an engineer, not a programmer, but I got stuck with our billing database at my last company (~50 employees, $11mil/yr revenue). It was written in Access, bloated, and slow as molasses. The company I work at now has ~1000 employees, ~500mil/yr revenue, and uses PeopleSoft (on MSSQL server). They must have a team of at least 10 contractors who work on it. Costs a goddamn fortune (millions per year) and is less user friendly than the Access database. We need cheat sheets to fill out expenses. My neighbor is a DBA for the cable company here (Time Warner). Based on discussions with him and my past 2 jobs, I don't think most people in charge of their company's databases understand normalization and proper database design. There is just not enough talent to meet the demand, so a lot of people just get thrown at it because they are good with computers. Anyway, shoot your questions on here and I can try my best to answer them. I am no DBA but I have a decent grasp of design.
__________________ If God didn't want us to eat animals, why did He make them out of meat? Last edited by Phelps McManus : 04-04-2008 at 08:41 PM. |
| | |
| | #9 (permalink) |
| hildog Join Date: Aug 2002
Posts: 860
| OK, I appreciate this.. here goes. I think I need a complete redesign. Currently I have two tables in a database, one called kwlist. The other one called adata. Kwlist has two fields one is a list of unique words or phrases, the other is a primary key, incrementing integer. Adata has multiple fields, one being adataid an incrementing primary key (integer). Then 'kw' which is a plain text version of a word from the kwlist. Would it make my database more efficient if I replaced the plaintext kw in adata with it's related ID in the KWlist table? After this we have company name (often repeating), text data (repeating), more text data (repeating), Date added, Time added, a true/false field. Problem with company name is that they are constantly being added so i'm not sure about the link table thing. I have no idea how to make it so repeating data isn't added, perhaps add a field called 'last scanned' and if it's a duplicated - update 'last scanned' rather than add the new field? Problem is, the thing adding all the data is a Visual Studio back end that simply pulls the data and pushes it into a MySQL query. It cannot make intelligent decisions unless I add another mySQL query to check a situation for example.. I hope this all made sense ![]() At this moment i'd be pretty willing to pay someone for a database design because I think it's a real skill that probably takes a long time to develop. |
| | |
| | #10 (permalink) |
| Registered User Join Date: Jan 2006
Posts: 69
+4 Internets | It seems that you are overlooking some fundamental aspects of how a database should be structured. Your database should be broken up into multiple tables depending on expected usage. Any columns that you intend to search or sort by must be indexed for good read performance. Matching text as a means to look up data should be avoided and used only when necessary. Whenever possible instead use a primary key or unique id. In the case of your statement in the OP I believe that the company column will need to have a fulltext index to perform your query fastest. It seems to me that you should have separate company table. Primary key being the company's unique id, and your key as one of its columns. When your program interacts with the database it would use the company's unique id to reference information. Using a primary key is the fastest possible way to get information from the database. (Your primary key should be some kind of integer) Could be something like this: [KEY_WORDS] id, word [COMPANIES] id, some_key, key_word_id, name [LOG] id, company_id, note1, note2, date_added, time_added, hidden Assuming your program can know the company's ID when referencing it your statement could be this "SELECT some_key FROM companies WHERE id = '2'" Or if you are associating multiple key words with each company: [KEY_WORDS] id, word [COMPANY_WORDS] id, key_word_id, company_id [COMPANIES] id, some_key, name [LOG] id, company_id, note1, note2, date_added, time_added, hidden I'm guessing as to what you are intending to do so this may be totally off track. Last edited by mylek : 04-05-2008 at 05:28 AM. |
| | |
| | #11 (permalink) |
| hildog Join Date: Aug 2002
Posts: 860
| Ok, so I construct the database like this: [company list] - {company id} - {company name in text} [kw list] - {kw id} - {word/phriase in text} [log] - {kw id, instead of kw list from text} - {company id, instead of company in text} - {various data} - {various data} Now say I want to pull a list of phrases that the company is associated with in the database in the log. Would this be faster than my previous set up? i.e. this time i'm doing a : search log for ID etc etc join the tables, or something? ![]() |
| | |
| | #12 (permalink) | |||||||
| I'm dangerous! Join Date: Jan 2002 Location: Atlanta
Posts: 891
| Quote:
Quote:
Quote:
Quote:
Quote:
This should eliminate redundant data without the need of a 'last scanned' field or anything like that. Quote:
Quote:
__________________ If God didn't want us to eat animals, why did He make them out of meat? | |||||||
| | |
| | #13 (permalink) | |
| I'm dangerous! Join Date: Jan 2002 Location: Atlanta
Posts: 891
| Quote:
SELECT Company.Name, KwList.Name, Log.Data1, Log.Data2, Log.Timestamp FROM (Company JOIN Log ON (Company.ID = Log.CompanyID)) LEFT JOIN KwList ON (Log.KwListID = KwList.ID) WHERE Company.Name LIKE "%MBNA%" or maybe SELECT Company.Name, KwList.Name, Log.Data1, Log.Data2, Log.Timestamp FROM (Select * FROM Company WHERE Company.Name LIKE "%MBNA%") JOIN Log ON (Company.ID = Log.CompanyID)) LEFT JOIN KwList ON (Log.KwListID = KwList.ID)
__________________ If God didn't want us to eat animals, why did He make them out of meat? | |
| | |
| | #15 (permalink) |
| Registered User Join Date: Oct 2004
Posts: 112
| select * from kwlist where kw in (select kw from adata where company like "%MBNA%") since they are both not primary or foreign keys, then do not join the tables ... that will make it do a table scan which is the slowest querying you can do in sql ... this should retrieve the data you are looking for without having to do a redesign of the db .... and im not too sure if mysql uses it ... but so that you dont keep inserting the same data, then use "IF NOT EXISTS(select field1, field2 from table where field a = 'random' and b = 'random2') insert into table select 'random1', 'random2'" |
| | |
![]() |
| Thread Tools | Search this Thread |
| Display Modes | Rate This Thread |
| |